Wolfe conditions

From Wikipedia, the free encyclopedia

In the unconstrained minimization problem, the Wolfe conditions are a set of inequalities for performing inexact line search, especially in quasi-Newton methods, first published by Philip Wolfe in 1969.^[1]^[2]

In these methods the idea is to find

$\min _{x}f({\mathbf {x}})$

for some smooth $f:{\mathbb R}^{n}\to {\mathbb R}$ . Each step often involves approximately solving the subproblem

$\min _{{\alpha }}f({\mathbf {x}}_{k}+\alpha {\mathbf {p}}_{k})$

where ${\mathbf {x}}_{k}$ is the current best guess, ${\mathbf {p}}_{k}\in {\mathbb R}^{n}$ is a search direction, and $\alpha \in {\mathbb R}$ is the step length.

The inexact line searches provide an efficient way of computing an acceptable step length $\alpha$ that reduces the objective function 'sufficiently', rather than minimizing the objective function over $\alpha \in {\mathbb R}^{+}$ exactly. A line search algorithm can use Wolfe conditions as a requirement for any guessed $\alpha$ , before finding a new search direction ${\mathbf {p}}_{k}$ .

Armijo rule and curvature

Denote a univariate function $\phi$ restricted to the direction ${\mathbf {p}}_{k}$ as $\phi (\alpha )=f({\mathbf {x}}_{k}+\alpha {\mathbf {p}}_{k})$ . A step length $\alpha _{k}$ is said to satisfy the Wolfe conditions if the following two inequalities hold:

i) $f({\mathbf {x}}_{k}+\alpha _{k}{\mathbf {p}}_{k})\leq f({\mathbf {x}}_{k})+c_{1}\alpha _{k}{\mathbf {p}}_{k}^{{{\mathrm T}}}\nabla f({\mathbf {x}}_{k})$ ,

ii) ${\mathbf {p}}_{k}^{{{\mathrm T}}}\nabla f({\mathbf {x}}_{k}+\alpha _{k}{\mathbf {p}}_{k})\geq c_{2}{\mathbf {p}}_{k}^{{{\mathrm T}}}\nabla f({\mathbf {x}}_{k})$ ,

with $0<c_{1}<c_{2}<1$ . (In examining condition (ii), recall that to ensure that ${\mathbf {p}}_{k}$ is a descent direction, we have ${\mathbf {p}}_{k}^{{{\mathrm T}}}\nabla f({\mathbf {x}}_{k})<0$ .)

$c_{1}$ is usually chosen to be quite small while $c_{2}$ is much larger; Nocedal gives example values of $c_{1}=10^{{-4}}$ ^[3] and $c_{2}=0.9$ for Newton or quasi-Newton methods and $c_{2}=0.1$ for the nonlinear conjugate gradient method. Inequality i) is known as the Armijo rule ^[4] and ii) as the curvature condition; i) ensures that the step length $\alpha _{k}$ decreases $f$ 'sufficiently', and ii) ensures that the slope has been reduced sufficiently.

Strong Wolfe condition on curvature

The Wolfe conditions, however, can result in a value for the step length that is not close to a minimizer of $\phi$ . If we modify the curvature condition to the following,

iia) ${\big |}{\mathbf {p}}_{k}^{{{\mathrm T}}}\nabla f({\mathbf {x}}_{k}+\alpha _{k}{\mathbf {p}}_{k}){\big |}\leq c_{2}{\big |}{\mathbf {p}}_{k}^{{{\mathrm T}}}\nabla f({\mathbf {x}}_{k}){\big |}$

then i) and iia) together form the so-called strong Wolfe conditions, and force $\alpha _{k}$ to lie close to a critical point of $\phi$ .

The principal reason for imposing the Wolfe conditions in an optimization algorithm where ${\mathbf {x}}_{{k+1}}={\mathbf {x}}_{k}+\alpha {\mathbf {p}}_{k}$ is to ensure convergence of the gradient to zero. In particular, if the cosine of the angle between ${\mathbf {p}}_{k}$ and the gradient,

$\cos \theta _{k}={\frac {\nabla f({\mathbf {x}}_{k})^{{{\mathrm T}}}{\mathbf {p}}_{k}}{\|\nabla f({\mathbf {x}}_{k})\|\|{\mathbf {p}}_{k}\|}}$

is bounded away from zero and the i) and ii) hold, then $\nabla f({\mathbf {x}}_{k})\rightarrow 0$ .

An additional motivation, in the case of a quasi-Newton method is that if ${\mathbf {p}}_{k}=-B_{k}^{{-1}}\nabla f({\mathbf {x}}_{k})$ , where the matrix $B_{k}$ is updated by the BFGS or DFP formula, then if $B_{k}$ is positive definite ii) implies $B_{{k+1}}$ is also positive definite.

References

"Line Search Methods". Numerical Optimization. Springer Series in Operations Research and Financial Engineering. 2006. pp. 30–32. doi:10.1007/978-0-387-40065-5_3. ISBN 978-0-387-30303-1.
"Quasi-Newton Methods". Numerical Optimization. Springer Series in Operations Research and Financial Engineering. 2006. pp. 135–163. doi:10.1007/978-0-387-40065-5_6. ISBN 978-0-387-30303-1.

↑ Wolfe, P. (1969). "Convergence Conditions for Ascent Methods". SIAM Review 11 (2): 226–000. doi:10.1137/1011036. JSTOR 2028111.
↑ Wolfe, P. (1971). "Convergence Conditions for Ascent Methods. II: Some Corrections". SIAM Review 13 (2): 185–000. doi:10.1137/1013035.
↑ Nocedal, Jorge; Wright, Stephen (1999). Numerical Optimization.
↑ Armijo, Larry (1966). "Minimization of functions having Lipschitz continuous first partial derivatives". Pacific J. Math. 16 (1): 1–3.

Optimization: Algorithms, methods, and heuristics

Unconstrained nonlinear: Methods calling ...

... functions

... and gradients

Convergence	Trust region Wolfe conditions

Quasi–Newton	BFGS and L-BFGS DFP Symmetric rank-one (SR1)

Other methods	Gauss–Newton Gradient Levenberg–Marquardt Conjugate gradient

... and Hessians

Newton's method

The graph of a strictly concave quadratic function is shown in blue, with its unique maximum shown as a red dot. Below the graph appears the contours of the function: The level sets are nested ellipses.

Constrained nonlinear

General	Barrier methods Penalty methods

Differentiable	Augmented Lagrangian methods Sequential quadratic programming Successive linear programming

Convex optimization

Convex
minimization

Linear and
quadratic

Interior point	Ellipsoid algorithm of Khachiyan Projective algorithm of Karmarkar

Basis-Exchange	Simplex algorithm of Dantzig Criss-cross algorithm Principal pivoting algorithm of Lemke

Combinatorial

Paradigms

Graph
algorithms

Minimum spanning tree	Bellman–Ford Borůvka Dijkstra Floyd–Warshall Johnson Kruskal

Network flows

Dinic
Edmonds–Karp
Ford–Fulkerson
Push-relabel maximum flow

Metaheuristics

Evolutionary algorithm Hill climbing Local search Simulated annealing Tabu search

Categories
- Algorithms and methods
- Heuristics
Software

This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.