Notes on Eigenvalues and Singular Values

Spring 2022

1 Eigenvalues

Given a square n × n matrix A, an eigenvalue is a scalar (real or complex number) λ satisfying

Copyright By cscodehelp代写 加微信 cscodehelp

Ax = λx forsomenonzerovectorxcalledaneigenvector.1 Thisisequivalenttowriting

(λI − A)x = 0

so, since x ̸= 0, A − λI must be singular, and hence

det(λI − A) = 0.

From the (complicated!) definition of determinant, it follows that det(λI−A) is a polynomial in the variable λ with degree n, and this is called the char- acteristic polynomial. By the fundamental theorem of algebra (a nontrivial result), it follows that the characteristic polynomial has n roots which we denote λ1, . . . , λn, but these may not be distinct (different from each other). For example, the identity matrix I has characteristic polynomial (λ−1)n and so all its eigenvalues are equal to one. Note that if A is real, the eigenvalues may not all be real, but those that are not real must occur in complex con- jugate pairs λ = α±βi. It does not matter what order we use for numbering the λj. Although in principle we could compute eigenvalues by finding the roots of the characteristic polynomial, in practice there are much better al- gorithms, and in any case there is no general formula for finding the roots of a polynomial of degree 5 or more,2 so whatever we do we will have to use some kind of approximation algorithm.

1If x is an eigenvector, so is αx for any nonzero scalar α.

2This was an open question for centuries that was finally resolved in the 19th century.

If all the eigenvalues are distinct, each λj corresponds to an eigenvector xj (the null vector of λj I − A), which is unique except for scalar multiplication, and in this case the eigenvectors xj , j = 1, . . . , n, are linearly independent.3 So, in this case, the n × n matrix

X = [x1,x2,…,xn]

is nonsingular. By the eigenvalue-eigenvector definition, we have

AX = XΛ, where Λ = diag(λ1,…,λn),

the diagonal matrix of eigenvalues, so since X is nonsingular we can premut-

liply both sides by X−1, or postmultiply both sides by X−1, to obtain X−1AX = Λ and A = XΛX−1,

the eigenvalue decomposition or spectral decomposition of A. We say that X defines a similarity transformation that diagonalizes A, displaying its eigenvalues in the diagonal matrix Λ. However, if the eigenvalues are not distinct, this may not be possible. For example, the Jordan block

0 1 J=00

has eigenvalues λ1 = λ2 = 0, and there is only one linearly independent eigenvector, namely x = [1 0]T , or any scalar multiple of this vector. We say that J is not diagonalizable.

If A is real symmetric (A = AT ) then all eigenvalues are real and, regard- less of whether they are distinct or not, there is always a set of n eigenvec- tors, say qj , j = 1, . . . , n, which are not only linearly independent, but also orthonormal, that is, with qiT qj = 0 if i ̸= j and qiT qi = 1.4 So, the matrix

Q = [q1,…,qn]

3SupposeAx=λx,Ay=μy,withλ̸=μ,andy=αx,withα̸=0. ThenA(αx)= μ(αx), so Ax = μx, which is not possible since Ax = λx and λ ̸= μ. This argument can be extended to show that the set of all n eigenvectors is linearly independent.

4The proof that the eigenvalues must be real when A = AT is not difficult but we do not give it here. However, let’s show why eigenvectors for distinct eigenvalues of A = AT must be orthogonal. Suppose Ax = λx, Ay = μy, with λ ̸= μ. Then (1) yTAx = yT(λx) = λyTx and (2) xTAy = xT(μy) = μxTy. Also, (3) the scalar yTAx = (yTAx)T =xTATy=xTAy. Combining(1),(2)and(3),wehaveλyTx=μxTy=μyTx, so, since λ ̸= μ, we must have yT x = 0, i.e., x and y are orthogonal. This argument can be extended to show that there are n mutually orthogonal eigenvectors when A = AT .

is an orthogonal matrix with inverse QT . We have

AQ = QΛ, where Λ = diag(λ1,…,λn),

the diagonal matrix of eigenvalues, and hence

QT AQ = Λ and A = QΛQT .

Thus, Q defines an orthogonal similarity transformation that diagonalizes A.5 Orthogonal matrices have very nice properties and lead to numerical algorithms with optimal stability properties.

In general, when A is nonsymmetric, it does not have orthogonal eigenvec- tors. However, there is a very nice property called the Schur decomposition.6 Assuming A is real, the Schur decomposition is

where Q is orthogonal and U is quasi-upper triangular, which means upper triangular except that there may be 2×2 blocks along the diagonal, with one subdiagonal entry per block. Each real eigenvalue appears on the diagonal of U, as a 1 × 1 block, and each complex conjugate pair of eigenvalues of A consists of the eigenvalues of a 2 × 2 diagonal block. The columns of Q are called Schur vectors, but these are generally not eigenvectors.7 This prop- erty is exploited by algorithms for computing eigenvalues of nonsymmetric matrices.8

The main Matlab function for computing eigenvalues is eig. See also functions roots and schur.

5The same property holds for complex Hermitian matrices (A = A∗, where the su- perscript ∗ denotes complex conjugate transpose), and in fact for all normal matrices (satisfying AA∗ = A∗A): then the qi are complex and we must write qi∗qj = 0 if i ̸= j and qi∗qi = 1 and we say that Q is unitary instead of orthogonal, with Q−1 = Q∗.

6The proof of this is more complicated but can be found in many books, such as Numerical Linear Algebra by Trefethen and Bau.

7If A is complex, so the eigenvalues do not occur in complex conjugate pairs, then Q is unitary, with Q∗Q = I, and U is an upper triangular complex matrix, with no 2 × 2 blocks.

8Including the “QR algorithm,” described at a high level by A&G and in more detail by Trefethen and Bau.

2 Singular Values

Let A be an m×n real9 matrix, with m ≥ n. The key idea of the singular value decomposition (SVD) is that multiplication by A maps the unit sphere in Rn to a “hyper-ellipse” in Rm:

Multiplication by A takes the unit sphere in Rn to a hyper-ellipse in Rm. From Numerical Linear Algebra by Trefethen and Bau, SIAM.

This means that there is a set vj,j = 1,…,n of orthonormal vectors in Rn (the “right singular vectors”) such that

Avj = σjuj, j = 1,…,n

where uj,j = 1,…,n is a set of orthonormal vectors in Rm (the “left singular vectors”), and σj are nonnegative real numbers (“the singular values”).10

We assume for convenience that

σ1 ≥σ2 ≥···≥σn ≥0.

So, we can write

σ2 A[v1,v2,…,vn] = [u1,u2,…,un] .. ,

9Everything applies to the complex case too, just by changing the transpose operations to “complex conjugate transpose” and “orthogonal matrices” to “unitary matrices”.

10The proof of this fundamental fact is not too difficult but not trivial either. A good reference for this is the book by Trefethen and Bau. The derivation of the SVD now follows from this fact.

where V = [v1,v2 …,vn], Uˆ = [u1,u2 …,un] and Σˆ = diag(σ1,σ2 …,σn) (note the “hats” on U and on Σ, but not on V). The n×n matrix V is square with orthonormal columns, so V is an “orthogonal” matrix, with V −1 = V T , and hence we can multiply both sides of this equation on the right by V T to obtain the“reduced” form of the singular value decomposition:

A = Uˆ Σˆ V T .

Matlab calls this the “economy size” SVD and it can be computed by [Uhat,Sigmahat,V]=svd(A,0). The words “reduced” and “economy size” are used because the matrix Uˆ has only n columns of length m ≥ n, so it is not square if m > n. Note that its n columns are orthonormal with UˆTUˆ = I. We can introduce an additional set of m−n orthonormal vectors, all orthogonal to u1, . . . , un, so now we have a square matrix

U = [u1,…,un,un+1,…,um]

that is an orthogonal matrix, so U −1 = U T . Also, define the m × n matrix

Σˆ Σ=0,

where we have appended another m−n zero rows to Σˆ to obtain Σ, a matrix with the same dimension as A. Then we have

ˆ Σˆ ˆˆ UΣ = U,un+1,…,um 0 = UΣ,

so using the equation AV = UˆΣˆ given above, we get AV =UΣ.

Finally, multiplying on the right by V −1 = V T , we have A = UΣV T ,

the “full” SVD, which is computed by [U,Sigma,V]=svd(A). Note that the reduced and full SVD are the same for square matrices (the case m = n).

Another useful way to interpret the SVD is that A can be written as the following sum of rank-one matrices:

A = σ i u i v iT .

The SVD tells us many things about a matrix. Here are some of them:

Rank. From the equation A = UΣVT, since U and V are m × m and n × n orthogonal matrices respectively, it follows that the number of linearly independent rows of A, and the number of linearly independent columns of A, are the same, namely, the number of nonzero singular values of A. This numberiscalledtherank ofA. Let’ssaytherankofAisr,wheren≥r≥0. Then we have

σ1 ≥σ2 ≥···≥σr >σr+1 =···=σn =0.

If r = n, there are no singular values equal to zero, and we say A has “full rank”, or “full column rank”: its columns are linearly independent, so Ax = 0 implies x = 0. (Of course the rows cannot be linearly independent if m > n).

Range. The range of A is the set of all vectors y such that y = Az for some z ∈ Rn. Since A = UΣVT, we have Az = UΣVTz. Whatever z is, ΣVTz is a linear combination of the first r columns of Σ, since the rest of them are zero, so UΣVTz is a linear combination of u1,…,ur. So, u1,…ur form an orthonormal basis for the range of A.

Null space. The null space of A is the set of all vectors z such that Az = 0. SinceA=UΣVT,thismeansAz=UΣVTz=0. Letw=VTz,soz=Vw, and then Az = U ΣV T z = (U Σ)w. This vector is a linear combination of the columns σ1u1, . . . , σrur, since the rest of the columns of UΣ are zero. So, for this linear combination to be zero, we need the first r components of w to be zero. So, z is a linear combination of vr+1,…,vn, which therefore form an orthonormal basis for the null space of A.

Nearest low rank matrix. If A = UΣVT has full rank, so σn > 0, the nearest11 rank-deficient matrix (that is, with rank less than n), or nearest singular matrix if m = n, is obtained by replacing σn in Σ by zero, and the

11Using either the 2-norm or the Frobenius norm to define “nearest”, i.e., the matrix B minimizing ∥A − B∥2 or ∥A − B∥F , over all rank-deficient matrices, or over all matrices with rank s.

nearest rank s matrix is obtained by replacing σs+1, . . . , σn by zero. Thus,

the nearest rank s matrix is

s σiuiviT.

The proof of this can be found in many books including Trefethen and Bau.

Two-norm. The definition of ∥A∥2 is

∥A∥2 = max∥Ax∥= max∥UΣVTx∥= max∥ΣVTx∥= max∥Σy∥=σ1,

∥x∥=1 ∥x∥=1 ∥x∥=1 ∥y∥=1

where the vector norm is the 2-norm, so the matrix 2-norm of A is its largest singular value. Another way to see the same thing is that pre- and post- multiplication by orthogonal matrices preserves the 2-norm, so

∥A∥2 = ∥UΣVT∥2 = ∥Σ∥2 = σ1.

Inverse of a square matrix. When A is square and nonsingular, with

A = UΣV T , we have

A−1 =(UΣVT)−1 =VΣ−1UT =n 1viuTi .

Condition number in the two-norm. When A is square and nonsingular,

its 2-norm condition number is

κ(A) = ∥A∥2∥A−1∥2 = σ1

because the 2-norm of A is σ1 and, from the formula for the inverse given

above, the 2-norm of A−1 is ∥Σ−1∥ = σ−1, the largest of the reciprocals of 2n

the singular values of A. If A is singular, so that σn = 0, we sometimes say that the condition number of A is ∞. If A is not square, the usual definition κ(A) = ∥A∥2∥A−1∥2 does not make sense, but we can still define the two-norm condition number to be κ(A) = σ1/σn.

Condition number of the normal equations using the two-norm. We already observed in the QR notes that, for an m × n matrix A, with m ≥ n, using A = QˆRˆ, we have AT A = RˆT Rˆ. We can explore this further using the reduced singular value decomposition A = Uˆ Σˆ V T . We have

ATA=VΣˆUˆTUˆΣˆVT =VΣˆ2VT. 7

This means that the singular values (and also the eigenvalues) of AT A are σi2, i = 1,…,n, so its two-norm condition is

This explains why solving least squares problems via QR is sometimes much

more accurate than using the normal equations.

Solving the least squares problem using the SVD. In fact, we can alternatively solve the least squares problem using the SVD. Using the full SVD, A = UΣV T , we have for the two-norm that

∥Ax − b∥ = ∥UT (Ax − b)∥ = ∥ΣV T x − UT b∥. Lettingy=VTx∈Rn andz=UTb∈Rm,thisbecomes

Σˆ ∥Ax−b∥=∥Σy−z∥= 0 y−z.

If A has linearly independent columns, i.e., A has rank n, as we usually assume for least squares problems, then σn > 0, and we can solve the least squaresproblembysettingyi =zi/σi,i=1,…,n,andthensettingx=Vy; the remaining residual is ∥[zn+1, . . . , zm]∥. This is just as accurate a method as using QR, but it is much more work to compute the SVD than to compute the QR factorization. The big advantage of using the SVD is that even if the rank of A is r with r < n, we can solve the “rank-deficient” least squares problem by setting yi = zi/σi only for i = 1, . . . , r, and setting the remaining components of y to zero. We can even solve an “approximately” rank-deficient problem by “truncating” or “thresholding” some of the small singular values to zero, as discussed in Ascher and Greif p. 203; this results in a larger residual, but reduces the norm of the solution x.
Relationship Between Eigenvalues and SVD. Several important rela- tionships between eigenvalues and singular values are explored in questions in the homework.
Relationship Between SVD and QR. Unlike the SVD, the QR decom- position cannot be used directly in the rank deficient case. One reason is that, although Rˆ having a zero on the diagonal tells us that Rˆ and hence A is rank deficient, knowing how many zeros are on the diagonal of Rˆ does not tell us the rank of Rˆ and hence does not tell us the rank of A. There is
κ(AT A) = 1 = [κ(A)]2.
a variant of QR called the “full orthogonal decomposition” that does reveal the rank, but it is not used much because the SVD is so convenient.
Note also that the QR factorization can be computed in a finite number of transformations, via Householder’s method (or Gram-Schmidt), which would give the exact result if there were no rounding errors. In this sense QR is like LU. We know this is not possible for SVD or eigenvalue computations, for the reason given on p. 1 of these notes.
Another thing that QR has in common with LU is that it can exploit spar- sity, but in this case the preferred computational method is to use Givens ro- tations instead of Householder reflectors, since they can eliminate one nonzero at a time.
程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: kyit630461@163.com