Talk:Conjugate gradient method

From Wikipedia, the free encyclopedia

1 Needs example
2 First few lines are messy
3 Stretching, preconditioning
4 Removal of "Nonlinear conjugate gradient" section
5 Comment
6 Another version of algorithm
7 Thanks
8 Good information, but could be clearer
9 Bold for vectors
10 Philefou's Octave Implementation
11 description of iterative method
12 Notation and clarification.
13 Re-render equations

[edit] Needs example

This article needs an example usage case, preferably with picture. —Preceding unsigned comment added by 18.243.0.128 (talk) 17:09, 30 March 2008 (UTC)

[edit] First few lines are messy

I think that it is more correct to say: Conjugate gradient is a line search method for optimizing a function of several variables. It is often applied to solve linear systems of equations. (save the details for later)

This article is too condensed. Could you please give some more explications? Where do the alphas come from? What does mean: "This suggests taking the first basis vector p1 to be the gradient of f at x = x0, which equals -b" ? Sorry, I dont understand what you are talking about.

Ulrich

[edit] Stretching, preconditioning

From reading Shewchuk's PDF (linked from this article), I'm beginning to understand this. I'm still a bit unclear about what the conjugacy condition does compared with what preconditioning does. As I understand it, the problem with steepest descent is that long valleys lead to zigzagging. It seems like picking conjugate orthogonal directions essentially stretches the energy space so that all valleys are circular rather than long and thin (see Shewchuk's figure 22 on p. 23). Is that accurate? (If so, it seems like a much simpler way of explaining the solution.) The thing is, this sounds a lot like what preconditioning is supposed to accomplish. That is, we want to use M so that $M - 1 A$ has similar eigenvalues. Could someone clarify? Thanks. —Ben FrantzDale 19:11, 17 January 2007 (UTC)

[edit] Removal of "Nonlinear conjugate gradient" section

I would like to explain this edit I made. While it is true that Wikipedia material evolves in time, and starting with at least something is better than nothing, and people will eventually fix and expand on things, nevertheless, I find it a bad idea to insert content which you deliberatly know to need a lot more work.

I would still understand creating a poor quality new article. However, I would be opposed to taking a reasonably well written article as this one, and adding to it material you do not consider of the same quality. Such material will (in my experience) linger for ages without anybody touching it, resulting in a decreased overall quality of the article.

I suggest to BenFrantzDale to either fork it off to a new article, or actually make the effort of making sure the material is already good enough. Comments? Oleg Alexandrov (talk) 03:27, 18 January 2007 (UTC)

Fair enough. I started nonlinear conjugate gradient method. If it gets good enough it might be worth merging with this one or it might not. —Ben FrantzDale 03:36, 18 January 2007 (UTC)

OK. Actually I think Nonlinear conjugate gradient method has enough info to stand on its own. Although it would be nice to be expanded I think. Oleg Alexandrov (talk) 04:17, 18 January 2007 (UTC)

[edit] Comment

In your algorithm, the formula to calculate pk differs from Jonathan Richard Shewchuk paper. The index of r should be k instead of k-1. Mmmh, sorry, it seams to be correct! ;-) —The preceding unsigned comment was added by 171.66.40.105 (talk • contribs) 01:45, 21 February 2007 (UTC)

[edit] Another version of algorithm

I'm using such algorithm. IMHO it's clearer, and easier to understand.

$r_1 := b - A x_1 \,$

$p_1 := r_1 \,$

$k := 1 \,$

repeat until $r_k \,$ is "sufficiently small":

$k := k + 1 \,$

$\alpha_k := \frac{r_k^\top r_k}{p_k^\top A p_k} \,$

$r_{k+1} := r_k - \alpha_k A p_k \,$

$\beta_k := \frac{r_{k+1}^\top r_{k+1}}{r_k^\top r_k} \,$

$p_{k+1} := r_{k+1} + \beta_k p_k \,$

$x_{k+1} := x_k - \alpha_k p_k \,$

end repeat

The result is $x_k \,$

—Preceding unsigned comment added by 149.156.83.157 (talk • contribs) 10:11, 30 March 2007

I agree, that's a bit clearer so I changed it in the article. I changed it to check for the remainder immediately after you computed it, which I think is more logical. Thanks for your comment! -- Jitse Niesen (talk) 13:50, 30 March 2007 (UTC)

I don't know for me it is more logical in while() expresion. When implementing CG, You can save additional vector (4, instead of 5) when written in reversed order, but it's only technical problem, not so important. y

Yes, the algorithm is not optimized and it doesn't need to be. It seems we disagree about the logical place for the test. Do change it if you feel strongly about it, because I don't really care. Incidentally, profuse thanks (if it was you) for fixing my sign error. -- Jitse Niesen (talk) 04:07, 31 March 2007 (UTC)

[edit] Thanks

I had a conversation two days ago with the CTO of a small high-tech company in Silicon Valley (the specific details are not relevant here). At some point we needed to see how exactly the conjugate gradient works. He said (literally) "Wikipedia must have an article about that." And sure enough, the article was there. It felt gratifying to have a personal experience of how much Wikipedia is used. In particular, Jitse, thanks for this very nice article. The time you spent on it was well-worth it. :) Oleg Alexandrov (talk) 19:44, 31 March 2007 (UTC)

That's very good to hear. Fortunately it wasn't an expert; that would be embarrassing as I think that the presentation can be much improved. I suppose the check with my fee is in the mail? ;) -- Jitse Niesen (talk) 04:51, 1 April 2007 (UTC)

[edit] Good information, but could be clearer

I came across the conjugate gradient method when programming it for a class in numerics for solving elliptic PDEs. I used the information here in conjunction with our own notes, and found it useful for coding the algorithm. I think the information could be made a bit clearer and and the formatting and presentation of this article improved a little. While everyone will probably already know this, maybe A^T could be made clearer as A Transpose (just to make it quite clear) at the beginning. The diagram's information could be linked to the text and vice-versa. Currently, the diagram seems to stand on its own. (Starscreaming Crackerjack 09:57, 18 August 2007 (UTC))

[edit] Bold for vectors

I found this page useful, but it would help immensely if the vectors and matrices were in bold font. I sat staring for a while at a value that was treated like a vector but looked like a scalar summation. —Preceding unsigned comment added by 138.64.2.77 (talk) 20:16, 11 September 2007 (UTC)

[edit] Philefou's Octave Implementation

In the past I was using Octave and needed the lsqr function, which is currently unavailable in Octave (http://www.gnu.org/software/octave/projects.html). I was planning to implement it with the conjugate gradient method, but I see Philefou added Octave code on January 14th, 2008. Since he/she has already written the code, I highly recommend he/she submit it to the Octave team. It should probably conform to this specification: http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/ref/lsqr.html If Philefou reads this and decides not to contribute the code, please leave an appropriate comment and I (or maybe someone else) will write an implementation of lsqr. 168.7.230.201 (talk) 18:36, 21 January 2008 (UTC)

[edit] description of iterative method

The description of the iterative method is confusing, particularly the formula defining p_{k+1}. I changed it to what it sounds like it needs to be in order to make the previous paragraphs have a lick of sense, but I'm still not sure it's right. Can someone fix this? (I guess I will eventually if no one else does, but I don't have my books on hand...)--141.211.61.254 (talk) 23:54, 4 May 2008 (UTC)

[edit] Notation and clarification.

Could someone explain the beta? The article explains that $α k$ is the kth component of the solution in the p basis (i.e., it is how far to go at the kth step). But what about beta? We have

$\beta_k := \frac{r_{k+1}^\top r_{k+1}}{r_k^\top r_k}=\frac{\|r_{k+1}\|^2}{\|r_k\|^2}$

and it gets used to find the next p according to

$p_{k+1} := r_{k+1} + \beta_k p_k \,$

The preceding section says that we'll update p by

$p_{k+1} = r_{k} - \frac{p_k^\top A r_{k}}{p_k^\top A p_k} p_k$

and

r k + 1 = r k - α A p k

r k = r k + 1 + α A p k

$p_{k+1} = r_{k+1} + \alpha A p_k - \frac{p_k^\top A r_{k}}{p_k^\top A p_k} p_k = r_{k+1} + \alpha A p_k - \frac{\langle p_k, r_{k}\rangle_A}{\langle p_k, p_k\rangle_A} p_k$

How does this algebra work out and what does the beta mean? Maybe it's the end of the day and my algebra is just failing me. Also, the Octave code should have similar variable names to the mathematics. —Ben FrantzDale (talk) 22:56, 15 May 2008 (UTC)

[edit] Re-render equations

Some equation images are not rendered correctly. You can see this on horizontal bars in some equations, it is blurry whereas it should be sharp. Or was that a joke, because it's about gradients? (see here, the second equation has the top symbols rendered blurry, whereas the first eq has a distinct line)