Talk:Levenberg-Marquardt algorithm

From Wikipedia, the free encyclopedia

Contents

[edit] Possible replacement article?

Could I please recommend that people interested in this article look at Sam Roweis' document on Levenberg Marquardt at <http://www.cs.toronto.edu/~roweis/notes/lm.pdf>. It seems to be a very complete discription. If it were simply pasted in (pdf2html?) it would answer the questions below and add detail. This would have to be checked with the author, of course, which I will be happy to do. I am new to this, so my etiquette is probably short of the mark, but I hope this is useful.

I suggest that if you think there is valuable insights in Roweis' article you might use the information there to improve the existing wikipedia article. The questions below are not fundamentally important, but the article can be improved. Certainly no case for wholesale replacement.Billlion 09:49, 5 September 2006 (UTC)

[edit] correction

Should

(JTJ + λ)q = -JTf

be

(JTJ + λ I)q = -JTf. ?

The preceding unsigned comment was added by 129.13.70.88 (talk • contribs) on 08:18, 29 August 2005.

Both formulas have the same meaning, namely JTJq + λq = −JTf. As far as I am concerned, it is a matter of taste which you prefer, but the second is perhaps clearer. -- Jitse Niesen (talk) 20:46, 30 August 2005 (UTC)

[edit] Transpose

Where f and p are introduced:

fT=(f1, ..., fm), and pT=(p1, ..., pn).

they are presented with superscript T, which I think indicates that they are transposed. But f extends over 1..m, and p extends over 1..n, so I don't understand why both are transposed. Can someone please explain what I am missing here? Thanks. --CarlManaster 22:49, 15 December 2005 (UTC)

I think the reasoning is that (f1, ..., fm) is a row vector. However, we want f to be a column vector. So the transpose is used to convert from rows to columns. It is perhaps clearer to write
f=(f1, ..., fm)T, and p=(p1, ..., pn)T.
To be honest, many mathematical texts do not bother to distinguish between row and column vectors, hoping that the reader can deduce from the context which one is meant.
Perhaps you know this, and your question is: why do we want to use a column vector? Well, f has to be a column vector because fTf further down the article should be a inner product, and p has to be a column vector because it's added to q, which is a column vector because otherwise the product Jq does not make sense. -- Jitse Niesen (talk) 23:19, 15 December 2005 (UTC)

[edit] Improvement by Marqardt

What's about the improvement Marquardt made to the algorithm? He replaced I by diag[H], i.e. the diagonal of the (approximated) Hessian, to incorporate some local curvature estimation. This makes the algorithm go further in directions of smaller gradient to get out of narrow valleys on the error surface.

last entry by G.k. 13:46, 27 April 2006 (UTC)

I believe this is correct. It also has the important practical benefit of making λ dimensionless, so λ=1 is (I think) a sensible general starting point.