Nonlinear conjugate gradient method

In numerical optimization, the nonlinear conjugate gradient method generalizes the conjugate gradient method to nonlinear optimization. For a quadratic function $\displaystyle f(x)$ :

\displaystyle f(x)=\|Ax-b\|^2

The minimum of $f$ is obtained when the gradient is 0:

\nabla_x f=2 A^\top(Ax-b)=0

Whereas linear conjugate gradient seeks a solution to the linear equation $\displaystyle A^\top Ax=A^\top b$ , the nonlinear conjugate gradient method is generally used to find the local minimum of a nonlinear function using its gradient $\nabla_x f$ alone. It works when the function is approximately quadratic near the minimum, which is the case when the function is twice differentiable at the minimum.

Given a function $\displaystyle f(x)$ of $N$ variables to minimize, its gradient $\nabla_x f$ indicates the direction of maximum increase. One simply starts in the opposite (steepest descent) direction:

\Delta x_0=-\nabla_x f (x_0)

with an adjustable step length $\displaystyle \alpha$ and performs a line search in this direction until it reaches the minimum of $\displaystyle f$ :

\displaystyle \alpha_0:= \arg \min_\alpha f(x_0+\alpha \Delta x_0)

\displaystyle x_1=x_0+\alpha_0 \Delta x_0

After this first iteration in the steepest direction $\displaystyle \Delta x_0$ , the following steps constitute one iteration of moving along a subsequent conjugate direction $\displaystyle s_n$ , where $\displaystyle s_0=\Delta x_0$ :

Calculate the steepest direction: $\Delta x_n=-\nabla_x f (x_n)$ ,
Compute $\displaystyle \beta_n$ according to one of the formulas below,
Update the conjugate direction: $\displaystyle s_n=\Delta x_n+\beta_n s_{n-1}$
Perform a line search: optimize $\displaystyle \alpha_n=\arg \min_{\alpha} f(x_n+\alpha s_n)$ ,
Update the position: $\displaystyle x_{n+1}=x_{n}+\alpha_{n} s_{n}$ ,

With a pure quadratic function the minimum is reached within N iterations (excepting roundoff error), but a non-quadratic function will make slower progress. Subsequent search directions lose conjugacy requiring the search direction to be reset to the steepest descent direction at least every N iterations, or sooner if progress stops. However, resetting every iteration turns the method into steepest descent. The algorithm stops when it finds the minimum, determined when no progress is made after a direction reset (i.e. in the steepest descent direction), or when some tolerance criterion is reached.

Within a linear approximation, the parameters $\displaystyle \alpha$ and $\displaystyle \beta$ are the same as in the linear conjugate gradient method but have been obtained with line searches. The conjugate gradient method can follow narrow (ill-conditioned) valleys where the steepest descent method slows down and follows a criss-cross pattern.

Four of the best known formulas for $\displaystyle \beta_n$ are named after their developers and are given by the following formulas:

Fletcher–Reeves:

\beta_{n}^{FR} = \frac{\Delta x_n^\top \Delta x_n} {\Delta x_{n-1}^\top \Delta x_{n-1}}

Polak–Ribière:

\beta_{n}^{PR} = \frac{\Delta x_n^\top (\Delta x_n-\Delta x_{n-1})} {\Delta x_{n-1}^\top \Delta x_{n-1}}

Hestenes-Stiefel:

\beta_n^{HS} = -\frac{\Delta x_n^\top (\Delta x_n-\Delta x_{n-1})} {s_{n-1}^\top (\Delta x_n-\Delta x_{n-1})}

Dai–Yuan:

\beta_{n}^{DY} = -\frac{\Delta x_n^\top \Delta x_n} {s_{n-1}^\top (\Delta x_n-\Delta x_{n-1})}

These formulas are equivalent for a quadratic function, but for nonlinear optimization the preferred formula is a matter of heuristics or taste. A popular choice is $\displaystyle \beta=\max\{0,\,\beta^{PR}\}$ which provides a direction reset automatically.

Newton based methods - Newton-Raphson Algorithm, Quasi-Newton methods (e.g., BFGS method) - tend to converge in fewer iterations, although each iteration typically requires more computation than a conjugate gradient iteration as Newton-like methods require computing the Hessian (matrix of second derivatives) in addition to the gradient. Quasi-Newton methods also require more memory to operate (see also the limited memory L-BFGS method).

External links

An Introduction to the Conjugate Gradient Method Without the Agonizing Pain by Jonathan Richard Shewchuk.
A NONLINEAR CONJUGATE GRADIENT METHOD WITH A STRONG GLOBAL CONVERGENCE PROPERTY by Y. H. DAI and Y. YUAN.

Optimization: Algorithms, methods, and heuristics

Unconstrained nonlinear: Methods calling …

… functions

… and gradients

Convergence	Trust region Wolfe conditions

Quasi–Newton	BFGS and L-BFGS DFP Symmetric rank-one (SR1)

Other methods	Gauss–Newton Gradient Levenberg–Marquardt Conjugate gradient Truncated Newton

… and Hessians

Newton's method

The graph of a strictly concave quadratic function is shown in blue, with its unique maximum shown as a red dot. Below the graph appears the contours of the function: The level sets are nested ellipses.

Constrained nonlinear

General	Barrier methods Penalty methods

Differentiable	Augmented Lagrangian methods Sequential quadratic programming Successive linear programming

Convex optimization

Convex
minimization

Linear and
quadratic

Interior point	Affine scaling Ellipsoid algorithm of Khachiyan Projective algorithm of Karmarkar

Basis-Exchange	Simplex algorithm of Dantzig Revised simplex algorithm Criss-cross algorithm Principal pivoting algorithm of Lemke

Combinatorial

Paradigms

Graph
algorithms

Minimum spanning tree	Bellman–Ford Borůvka Dijkstra Floyd–Warshall Johnson Kruskal

Network flows

Metaheuristics

Evolutionary algorithm Hill climbing Local search Simulated annealing Tabu search

Categories
- Algorithms and methods
- Heuristics
Software

This article is issued from Wikipedia - version of the Tuesday, December 08, 2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

Nonlinear conjugate gradient method

See also

External links