Talk:Singular value decomposition
From Wikipedia, the free encyclopedia
I think they are errors in the statement of the theorem: first U should be an m-by-m matrix (not an m-by-n matrix)... a unitary matrix is necessarily a square matrix. and Σ should be m-by-n with diagonal entries (it is neither n-by-n nor diagonal matrix)
best regards, Sebastien.
- You're absolutely right on the dimensions, thanks. It got changed about a month ago, and sadly nobody noticed. The rectangular matrix Σ is sometimes said to be a diagonal matrix, even though it is rectangular, since its off-diagonal entries are zero. However, I saw that Horn & Johnson does not do this, so I also changed that for clarity. -- Jitse Niesen (talk) 22:50, 5 September 2005 (UTC)
I removed this:
- An important extension of the singular-value decomposition theorem says that if M is a symmetric square matrix then one may take G = H, and in the case in which n=m=r (the full-rank case) and all of the singular values are different one must take G = H. That is the spectral theorem.
This is incorrect: a symmetric matrix can have negative eigenvalues, and then the spectral theorem decomposition wouldn't be a singular-value decomposition.
Something is also fishy with the term "singular vector": they are not uniquely determined by M; different singular-value decompositions of M can yield different singular vectors. What is the clean formulation of this situation? AxelBoldt 00:17 Dec 13, 2002 (UTC)
For all the textbooks and articles on matrix theory I have ever read, this is the one and only one article uses the term singular vector!!! Wshun
Well, Google gives about 3000 hits for the phrase "singular vector", and most of them seem to refer to the columns/rows of the matrices in the singular value decomposition. So the term is apparently in use, it's just that the definition is not quite clear. I restored some of the removed material about them, and also mentioned the fact that singular value deompositions always exist. AxelBoldt 19:30 Apr 6, 2003 (UTC)
- Singular vector is standard terminology. Its good enough for Strang, Golub and for Hansen, so its good enough for me Billlion 14:21, 3 Sep 2004 (UTC)
Why directing the article Singular value to Singular value decomposition? It is as bizarre as directing prime number to prime number theorem, or calculus to the fundamental theorem of calculus. It is better to reverse the direction. In my opinion, the two articles should not be combined together! Wshun
- I would be happy for the content of s-number to go to into this one, I think the article should cover operator case Billlion 08:23, 17 Sep 2004 (UTC)
[edit] Response to Wshun
Singular vector is indeed standard terminology. See Gilbert Strang's books. Michael Hardy 14:54, 3 Sep 2004 (UTC)
[edit] Algorithms please
How does one compute the SVD in practice? Lupin 13:19, 5 Feb 2005 (UTC)
- See the article
- 'LAPACK users manual (http://www.netlib.org/lapack/lug/node53.html) gives details of subroutines to calculate the SVD.'
- The numerical algorithms are explained and referenced there.In Matlab/Octave is easy, just use the svd command, or in Mathematica SingularValueDecomposition,Billlion 19:45, 5 Feb 2005 (UTC)
- I tried to understand the methods used but on the very first algorithm they use ("reduced bidiagonal form") I can find very little explanatory material, anywhere. Yet, I have noticed that most of the source code I have come by tend to use this undocumented technique. --ANONYMOUS COWARD0xC0DE 02:12, 27 November 2006 (UTC)
-
-
- I'm not sure what "reduced bidiagonal form" is; are you talking about computing the SVD by reducing the matrix to bidiagonal form? One reference for the Golub–Kahan algorithm is Golub and Van Loan, Matrix computations, John Hopkins University Press, Section 8.6, "Computing the SVD". This is a standard reference for numerical linear algebra. -- Jitse Niesen (talk) 03:17, 27 November 2006 (UTC)
- Yes that's exactly what I am talking about. Thank you for your reference I just ordered it at my library (ISBN:0801854148). --ANONYMOUS COWARD0xC0DE 05:23, 28 November 2006 (UTC)
- I'm not sure what "reduced bidiagonal form" is; are you talking about computing the SVD by reducing the matrix to bidiagonal form? One reference for the Golub–Kahan algorithm is Golub and Van Loan, Matrix computations, John Hopkins University Press, Section 8.6, "Computing the SVD". This is a standard reference for numerical linear algebra. -- Jitse Niesen (talk) 03:17, 27 November 2006 (UTC)
-
- First you get the eigenvector of mTm *the smaller matrix* and assign it to V. In this setting U and V are orthogonal however is not even close. So instead you calculate U via where followed by inverting each element on the diagonal giving Thats the simplest way to solve an M-by-N matrix where M>=N, and I believe if N>=M you can use haven't tried it yet. However U is not orthogonal in my example, and I don't know how to correct that. My method seems to be the same as that used by the $50 .net matrix library, for example input my 3-by-2 matrix M into their free trial interface, it will cut off the offending columns in U, the dimensions of U would be 3-by-2 instead of 3-by-3 in violation of the orthogonality property. Thus I am left with just one question, how do I assign the appropriate numbers/values (the exact values in the columns are irrelevant as their eigenvalues are zero) to the remaining columns of a matrix to fulfill the orthogonality property? --ANONYMOUS COWARD0xC0DE 23:09, 27 November 2006 (UTC)
-
- Dear AC, the singular vectors u you get by calculating the eigenvectors of M M* may each be +1 or -1 times the singular vectors uv corresponding to the eigenvectors v of M* M. (Can you see that this is so?)
-
- More generally, if there are degenerate singular vectors v, ie more than one sharing the same singular value, then an SV u that you get from M M* may be any unit linear combination of the vectors uv that correspond to the vectors v that share that singular value. (See the section in the article on whether the SVD is uniquely determined).
-
- The way round this, if you are calculating the SVD by hand, is to calculate the vectors v, then multiply each by M; and then σv = the magnitude of M v, and uv = (1/σ)M v. That will ensure you get left and right singular vectors that match. Jheald 09:48, 4 December 2006 (UTC)
-
- For the null space of M you can use any set of orthonormal u vectors that correspond to zero eigenvalues of M M*; you can find v vectors (if any more are needed) by finding the eigenvectors that correspond to zero eigenvalues of M* M. Alternatively, you can find either set of zero singular vectors by orthogonalising random vectors with respect to the vectors you've already got (u or v respectively), until you've successfully found a full basis. Jheald 10:07, 4 December 2006 (UTC)
[edit] Singular values, singular vectors, and their relation to the SVD
The first math expression in this section appears in the code but not the page, at least on my screen. I don't know enough about the expression mark-up to fix it. proteus71 17:22, 2 Nov 2005 (UTC)
- Could you please check whether it is still not appearing? It appears on my screen. It might have something to do with the servers being slow today (probably because they're overloaded). -- Jitse Niesen (talk) 22:30, 2 November 2005 (UTC)
[edit] Freedom in U and V
I removed the following text:
- "If M is real, and the singular values are all distinct, then unique orthogonal matrices U and VT can be obtained. Otherwise U and V can be transformed by any unitary matrix S that commutes with Σ, such that U' = U S and (V')* = S* V*."
It is always possible to replace U and V by −U and −V, respectively. For the second part to be true, we need n = m.
Obviously, there is something to be said about the freedom in U and V. There is some discussion about it further down, but there are probably things to add. -- Jitse Niesen (talk) 00:06, 14 January 2006 (UTC)
[edit] Recent changes
In the intro, I changed 'is analogous' (to eigenvalue decomposition) to 'is similar', and added some text which makes it clearer that the two decompositions are not too similar. Also, I dropped the square characterization of symmetric and Hermitian matrizes since such matrices are always square.
In the 'Singular values, singular vectors, and their relation to the SVD' section I changed to "if and only if" since this is a definition of what a singular value is. Also, we must require that u and v are normalized in order to talk about a singular value. Added some text about the relation between the theorem and the singular values and vectors defined here. Added a defintion of what degenerate signular value is. Tried to make it clearer in when and in what way the SVD is unique or not.
The discussion on the relation with eigenvalue decomposition is expanded and given a section of its own.
Added a discussion on the relation between SVD and the range and null space of M. Changed singular value = sqrt(eigenvalue) to singular value squared = eigenvalue.
Added a discussion on range and null space in "applications" section.
Moved small section on rank to applications, since it is an application and it follows from the range and null space stuff.
--KYN 20:28, 16 January 2006 (UTC)
- Great stuff. I tried to further clarify the uniqueness question; hope that's okay with you. Two further comments: firstly, it is quite common for mathematicians to use "if" in definitions where it should logically be "if and only if" (see, for instance, If and only if#Definitions); secondly, out of curiosity, why must u and v be normalized? I checked a couple of books and they all defined the singular vectors as columns of the matrices U and V, which means that they are indeed of unit length, but I don't see a particular reason to demand that the singular vectors are normalized. -- Jitse Niesen (talk) 22:07, 17 January 2006 (UTC)
-
- You want the singular vectors to be normalised because (i) that's what makes them useful as unit basis vectors; and (ii) it's only if you normalise the singular vectors that you get the singular values coming out as well-defined gain factors in the Σ part of the decomposition. -- Jheald 22:23, 17 January 2006 (UTC).
- Could you please explain (ii)? I don't see it. Sorry to be such a nuisance, but I'm just trying to understand why eigenvectors are not normalized while singular vectors are. -- Jitse Niesen (talk) 22:44, 17 January 2006 (UTC)
-
- Eigenvectors don't need to be normalised if the only property that you're interested in is A v = λ v. That is often enough the case that often enough you can work quite happily with un-normalised eigenvectors. But if you're using the eigenvectors for a spectral decomposition, A = V Λ V*, then those column vectors in V of course do need to be normalised.
-
- The difference with SVD is that it is essentially always the property M = U Σ V* that makes SVD practically useful, and for that the singular vectors need to be normalised; so it is essentially only normalised singular vectors that are ever of interest. It is of course true that M (k v) would still equal σ (k u), but this relation is not what makes the SVD useful. -- Jheald 23:27, 17 January 2006 (UTC).
-
- I agree and add to that my copy of Golub and Van Loan Matrix Computations (2nd ed, page 71) states that u and v are "unit 2-norm vectors". To make the above point even more clear: Assume that M v=sigma u and M^{\star} u=sigma v for normalized u and v. Now, look at the same relations with u replaced by 2u and sigma by sigma': M v = 2 sigma' u and M^{\star} 2 u = sigma' v. Identifying with the first relations gives (1) sigma = 2 sigma' and (2) sigma = sigma'/2, which implies sigma = sigma' = 0! Consequently, we need to impose some restrictions on u and v in order to make an interpretation of the two relations. This fact is not obvious from the outset, and maybe worth mentioning somewhere in the article? --KYN 10:08, 18 January 2006 (UTC)
- In reply to Jheald, I kind of see your point, but the relation Mv = σu is important for the geometric interpretation of the SVD, which was historically the motivation. However, I'll accept this as the reason for the normalization condition.
- In reply to KYN, what you showed is that if you replace u by 2u and sigma is not zero, the relations M v=sigma u and M^{\star} u=sigma v cannot be satisfied, unless you replace v by 2v. This is also what Jheald means with his/her last sentence: you have to scale u and v by the same factor. If you do that, sigma does not change. -- Jitse Niesen (talk) 12:23, 18 January 2006 (UTC)
[edit] In "Statement of the theorem" section
it says in the bullet list that U and V "describe" the row or columns of M. This sound a bit fuzzy. What exactly do they describe and in what way? --KYN 22:45, 16 January 2006 (UTC)
[edit] Usage of ∗-symbol
Hi, I just have seen, that the ∗-symbol, especially "<sup>∗</sup>" as used in the paragraph "Relation to eigenvalue decomposition" is not rendered correctly in Opera8 and IE (because of the font I've chosen?) while being correctly shown in FF. Is there a reason, why not just use a simple asterisk instead of the ∗? I mean, the asterisk is used anywhere else for this purpose. --(Tobi)134.109.148.28 14:53, 21 March 2006 (UTC)
- I think the ∗ symbol looks a bit nicer, but it's not essential. And because it does not display correctly on some systems, it has to go, so I removed it. Thanks for your comment. -- Jitse Niesen (talk) 05:18, 22 March 2006 (UTC)
[edit] How to compute compact or truncated SVD?
Is there a way to compute SVD eigenvectors in a sequence? That is to say, first compute the eigenvector for the largest eigenvalue, then compute the eigenvector for the second largest eigenvalue, etc. 04 March 2006
- Yes, this is possible. The idea is that the singular values of A are related to the eigenvalues of A * A and apply an iterative eigenvalue algorithm like Lanczos iteration to find the eigenvalues of the symmetric matrix A * A. I'm afraid I can't tell you the details though. One problem is that you might lose accuracy when forming the product A * A. I note that Matlab uses the matrix
- instead. -- Jitse Niesen (talk) 04:43, 5 April 2006 (UTC)
Q: Thanks for the answer. But as I understand, Lanczos iteration can only find the eigenvector corresponding to the largest eigenvalue. After the largest eigenvector being computed, how is the eigenvector corresponding to the second largest eigenvalue computed? I appreciate if you give a specific reference. Apr 05 2006
- Sorry, I forgot that you're after the vectors. It seems that Lanczos can do this, at least this is what the theorem in Section VI.6.3 in Y. Saad, Numerical methods for large eigenvalue problems, indicates. I know only very little about large-scale numerical linear algebra, so I can only recommend you to look in that book (which is online at Saad's web site). Another interesting reference is Parlett, The Symmetric Eigenvalue Problem, Prentice-Hall, 1980. I haven't read it, but it's a whole book so it's bound to touch on computing the eigenvectors, though it might be a bit out-dated. -- Jitse Niesen (talk) 02:06, 6 April 2006 (UTC)
-
- One way into getting your head around the Lanczos method is to imagine using the Conjugate gradient method to minimise the potential function for some known positive definite symmetrical matrix A, starting from an initial position x = x0, and then iteratively getting closer to the solution x = A-1 b.
-
- As a reminder, at each step of conjugate gradient we minimise the function in a particular search direction; then at the minimum point for that search direction, we calculate the negative of the gradient, which is just b - A xn. This gives a new direction, orthogonal to the subspace we have just minimised. However, although the negative of the gradient gives the direction of steepest descent, it can be shown that it is actually suboptimal in its component in the direction we have just come: the optimal search direction is the conjugate direction, which is a linear combination of the new gradient and direction we have examined immediately previously.
-
- By searching along the conjugate direction, rather than the steepest descent direction, we don't botch up the minimisation w.r.t. the previous search directions, so there is none of the "zig-zagging" that you get using pure gradient descent; instead by the end of the nth step, the Conjugate gradient algorithm should have accurately minimised the quadratic function in the space spanned by [Ax0,A2x0,A3x0,...Anx0] added to the initial guess x0.
-
- The connection with the Lanczos algorithm is this: suppose we want to give an estimate of the matrix A, just from that sequence of gradients we have calculated. The conjugate gradient property is equivalent to saying that the truncation of A in the Krylov space [Ax0,A2x0,...Anx0] can be represented by UTUT, where T is a symmetric tridiagonal matrix (ie a diagonal and one off-diagonal, repeated above and below), and U is the (normalised) set of (already orthogonal) gradient directions that we have progressively calculated.
-
- It is quick to eigen-decompose T into V Λ V^T as it's (i) not very big (usually we will stop after a number of iterations n which is very much less big than the whole rank of A), and (ii) already tridiagonalised.
-
- So this quite readily gives the eigen-decomposition of the truncation of A in this Krylov space.
-
- Note that this is not the same as the first n eigenvalues and eigenvectors of A. However, because of the wonders of exponential increase, it should give you quite a good handle on, say, the first n/4 eigenvalues and eigenvectors of A, even if they are only hardly present in your initial seed vector x0. (The more different the largest eigenvalues are from the rest of the pack, the more quickly more of their eigenvectors will become well approximated).
-
- For more details on Conjugate gradient, and how it relates to the Lanczos method, see eg Golub and van Loan (3e) section 10.2 (pp. 520-532). A more direct presentation of the Lanczos algorithm by itself is given earlier in chapter 9 (pp.470-490).
-
- Note that in real-world numerics (as opposed to idealised infinite precision numerics), issues arise, essentially because the minimisation can't be done perfectly, so the new gradient vectors aren't automatically perfectly orthogonalised to the foregoing ones, and if left untreated this creates a seed which can allow duplicate appearances of particular eigenvectors to build up. The numerical analysis of how this occurs has been well known since the early 70s; various fixes (of various levels of computational expense) are known, and discussed in the book.
-
- It's well worth playing with simple codes to get your own feel for what's going on; but if you need to apply it really seriously, for that reason it may well be worth searching out pre-canned routines which have been tuned into the ground both for accuracy and for efficiency.
-
- Hope this helps,
-
- All best -- Jheald 20:55, 6 April 2006 (UTC).
-
- BTW: At the moment the WP articles on Lanczos algorithm, Gradient descent, and Conjugate gradient method could all use some work IMO. In particular
- the Gradient descent article should make much clearer what is wrong with Gradient descent as a method, and why Conjugate gradient fixes it.
- the CG article isn't so bad so far as it goes, but should make it much clearer that CG is a method for general nonlinear minimisation, not just for solving linear systems; more clearly explain (with a picture?) why it fixes the problems of gradient descent; and explain how the approximation to the curvature (Hessian) matrix can be built up through successive CG minimisations.
- the Lanczos method article could use a lot more expansion, particularly regarding explanation, motivation, some of the numerical issues that arise, and how to address them.
- I can't offer any time of my own at the moment to fix them, sadly; but it would be rather good if somebody could have a go. -- Jheald 21:13, 6 April 2006 (UTC).
- BTW: At the moment the WP articles on Lanczos algorithm, Gradient descent, and Conjugate gradient method could all use some work IMO. In particular
-
- BTW (2): The Lanczos method is particularly good if either (i) you're only interested in the largest k eigenvectors, where k << n the size of the matrix; and/or (ii) if it is unusually fast to apply the matrix (or operator) A to a vector (e.g. if A is sparse; or if it is (say) a Fourier operator).
-
- In cases where you just want a quicker SVD, eg because you know the matrix is very non square (m >> n), and you're not interested in the null-space singular vectors, then, as the article says, you can save a lot of time by doing a QR decomposition first, to give an m × n orthogonal Q matrix and an n × n triangular matrix R, and then calculating the much smaller SVD of R (an approach called R-SVD). Also note that often (including in R-SVD), it is not worth explicitly calculating the coefficients of the orthogonal matrices; instead, don't calculate them at all if you don't need them; or, if you're just calculating them only then to apply them to a few vectors, instead apply the elementary rotations directly to those vectors as the rotations get calculated, and never explicitly form the orthogonal matrices at all.
-
- Most "canned" SVD routines should give you some or all of these options. Golub and van Loan (3e) gives an idea of how much the required flop counts can be reduced, depending on how much of the SVD you don't actually need, on their page 254. -- Jheald 21:34, 6 April 2006 (UTC).
[edit] Possible errata in "Relation to eigenvalue decomposition"
I think instead of "the squares of the non-zero singular values of M" it should say "the squares of the modulus of the non-zero singular values of M". Don't you think so?
singular values are by definition non-negative, therefore real. Mct mht 17:40, 20 June 2006 (UTC)
[edit] Dense article
I find this article overly dense. When trying to understand SVD it was unhelpful. For example compare with http://www.uwlax.edu/faculty/will/svd/svd/index.html, where the intuition behind the matrix decomposition is presented using a traditional graphic form.