Talk:Gradient descent

From Wikipedia, the free encyclopedia

[edit] Search in trial direction

I was under the impression that gradient descent did a line search in the trial direction (direction of steepest gradient). Can anyone else confirm this? njh 04:45, 25 November 2005 (UTC)

I guess you are saying that given a direction, one should find the optimal step in that direction. That is I guess also gradient descent, but from what I know, in practice it is very hard to do that. So one just takes some reasonable step in one direction, and then from there takes a new direction, as this article illustrates. Did I understand your question right? Oleg Alexandrov (talk) 05:59, 25 November 2005 (UTC)
Usually one finds a trial direction, then uses a one dimensional minimisation algorithm like brent's method along the vector thus defined. This is generally a better idea than making arbitrary small steps as you only need to compute the gradient at the minimum in each direction (of course it still suffers from zigzaging). Conjugate gradient method extends this to ensure that zigzaging doesn't happen. njh 08:37, 25 November 2005 (UTC)
I myself used gradient descent only for infinite-dimensional optimization, when the variable of optimization is not a vector, but a function. Then it is really a pain to find the optimal step. :) And you don't need to use an arbitrarily small step size as you say, you can make it adaptive, so you can try to jump a bit more at each step than at the previous one, and scale back if you jump too much. Of course, in this way you are not guaranteed to find the true minimum, but those infinite-dimensional problems often times have an infinite number of local minima to start with, so it is not as if you lost something. Oleg Alexandrov (talk) 15:54, 25 November 2005 (UTC)

[edit] Someone needs to correct the definition

Doesn't sound logical to me: "algorythm that approaches a local minimumof a function" mean that the algorythm itself is approaching something. What?

I am interested in what I can do (or what I can find) with this algorythm.

Inyuki 21:46, 2 November 2006 (UTC)

You are right. That poor wording has been there from the very beginning. I rewrote the article at some point but failed to notice that.
This algorithm is used to find local minima of functions. Oleg Alexandrov (talk) 04:40, 3 November 2006 (UTC)

[edit] Understanding the algorythm

Excusme, maybe someone could tell me, if I got it right, I tried to write it in procedural code here: http://howto.wikia.com/wiki/Gradient_descent

Thank you.

Inyuki 23:33, 7 November 2006 (UTC)

Looks okay to me, for the very basic algorithm. One problem is that you're unlikely to land on a point where the gradient is exactly zero. Terminate the loop when the gradient is "small enough" (choosing what this means is the hard part). -- Jitse Niesen (talk) 01:10, 8 November 2006 (UTC)
In a many applications, the minimum derivative is a user-input variable. --Cowbert 05:04, 29 December 2006 (UTC)