BFGS method

From Wikipedia, the free encyclopedia

In mathematics, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is a method to solve an unconstrained nonlinear optimization problem.

The BFGS method is derived from the Newton's method in optimization, a class of hill-climbing optimization techniques that seeks the stationary point of a function, where the gradient is 0. Newton's method assumes that the function can be locally approximated as a quadratic in the region around the optimum, and use the first and second derivatives to find the stationary point.

In Quasi-Newton methods the Hessian matrix of second derivatives of the function to be minimized does not need to be computed at any stage. The Hessian is updated by analyzing successive gradient vectors instead. Quasi-Newton methods are a generalization of the secant method to find the root of the first derivative for multidimensional problems. In multi-dimensions the secant equation is under-determined, and quasi-Newton methods differ in how they constrain the solution. The BFGS method is one of the most successful members of this class^{[citation needed]}.

1 Rationale
2 Algorithm
3 See also
4 Bibliography

[edit] Rationale

The search direction $\mathbf{p}_k$ at stage k is given by the solution of the analogue of the Newton equation

$B_k \mathbf{p}_k = - \nabla f(\mathbf{x}_k).$

A line search in the direction $\mathbf{p}_k$ is then used to find the next point $\mathbf{x}_{k+1}$ .

Instead of requiring the full Hessian matrix at the point $\mathbf{x}_{k+1}$ to be computed as $B k + 1$ , the approximate Hessian at stage k is updated by the addition of two matrices.

B k + 1 = B k + U k + V k

Both $U k$ and $V k$ are rank-one matrices but have different bases. The rank one assumption here means that we may write

$C=\mathbf{a}\mathbf{b}^T$

So equivalently, $U k$ and $V k$ construct a rank-two update matrix which is robust against the scale problem often suffered in the gradient descent searching.

(as in Broyden's method, the multidimensional analogue of the secant method). The quasi-Newton condition imposed on this update is

$B_{k+1} (\mathbf{x}_{k+1}-\mathbf{x}_k ) = \nabla f(\mathbf{x}_{k+1}) -\nabla f(\mathbf{x}_k ).$

[edit] Algorithm

From an initial guess $\mathbf{x}_0$ and an approximate Hessian matrix $B 0$ the following steps are repeated until $x$ converges to the solution.

Obtain $\mathbf{s}_k$ by solving: $B_k \mathbf{s}_k = -\nabla f(\mathbf{x}_k).$
Perform a line search to find the optimal $α k$ in the direction found in the first step, then update $\mathbf{x}_{k+1} = \mathbf{x}_k + \alpha_k\mathbf{s}_k.$
$\mathbf{y}_k = {\nabla f(\mathbf{x}_{k+1}) - \nabla f(\mathbf{x}_k)}.$
$B_{k+1} = B_k + \frac{\mathbf{y}_k \mathbf{y}_k^{\top}}{\mathbf{y}_k^{\top} \mathbf{s}_k} - \frac{B_k \mathbf{s}_k (B_k \mathbf{s}_k)^{\top}}{\mathbf{s}_k^{\top} B_k \mathbf{s}_k}.$

$f(\mathbf{x})$ denotes the objective function to be minimized. Convergence can be checked by observing the norm of the gradient, $\left|\nabla f(\mathbf{x}_k)\right|$ . Practically, $B 0$ can be initialized with $B 0 = I$ , so that the first step will be equivalent to a gradient descent, but further steps are more and more refined by $B k$ , the approximation to the Hessian.

The first step of the algorithm is carried out using an approximate inverse of the matrix $B k$ , which is usually obtained efficiently by applying the Sherman–Morrison formula to the fourth line of the algorithm, giving

$B_{k+1}^{-1} = B_k^{-1} + (\mathbf{s}_k \mathbf{s}_k^{\top}) (\alpha_k\mathbf{s}_k^{\top}\mathbf{y}_k+\mathbf{y}_k^{\top} B_k^{-1} \mathbf{y}_k)/ (\mathbf{s}_k^{\top} \mathbf{y}_k)^2 - (B_k^{-1} \mathbf{y}_k \mathbf{s}_k^{\top} + \mathbf{s}_k \mathbf{y}_k^{\top}B_k^{-1}) / (\mathbf{s}_k^{\top} \mathbf{y}_k).$

Credible intervals or confidence intervals for the solution can be obtained from the inverse of the final Hessian matrix.

[edit] See also

[edit] Bibliography

Broyden, C. G., The Convergence of a Class of Double-rank Minimization Algorithms ,Journal of the Institute of Mathematics and Its Applications 1970, 6, 76-90
Fletcher, R., A New Approach to Variable Metric Algorithms, Computer Journal 1970, 13, 317-322
Goldfarb, D., A Family of Variable Metric Updates Derived by Variational Means, Mathematics of Computation 1970, 24, 23-26
Shanno, D. F.,Conditioning of Quasi-Newton Methods for Function Minimization , Mathematics of Computation 1970, 24, 647-656
Avriel, Mordecai 2003. Nonlinear Programming: Analysis and Methods. Dover Publishing. ISBN 0-486-43227-0.