Lagrange multipliers

From Wikipedia, the free encyclopedia

Fig. 1. Drawn in green is the locus of points satisfying the constraint g(x,y) = c. Drawn in blue are contours of f.  Arrows represent the gradient, which points in a direction normal to the contour.
Fig. 1. Drawn in green is the locus of points satisfying the constraint g(x,y) = c. Drawn in blue are contours of f. Arrows represent the gradient, which points in a direction normal to the contour.

In mathematical optimization problems, Lagrange multipliers, named after Joseph Louis Lagrange, is a method for finding the local extrema of a function of several variables subject to one or more constraints. This method reduces a problem in n variables with k constraints to a solvable problem in n + k variables with no constraints. The method introduces a new unknown scalar variable, the Lagrange multiplier, for each constraint and forms a linear combination involving the multipliers as coefficients.

The justification for this can be carried out in the standard way as concerns partial differentiation, using either total differentials, or their close relatives, the chain rules. The objective is to find the conditions, for some implicit function, so that the derivative in terms of the independent variables of a function equals zero for some set of inputs.

Contents

[edit] Introduction

Consider a two-dimensional case. Suppose we have a function, f(x,y), to maximize subject to

g\left( x,y \right) = c,

where c is a constant. We can visualize contours of f given by

f \left( x, y \right)=d_n

for various values of dn, and the contour of g given by g(x,y) = c.

Suppose we walk along the contour line with g = c. In general the contour lines of f and g may be distinct, so traversing the contour line for g = c could intersect with or cross the contour lines of f. This is equivalent to saying that whilst moving along the contour line for g = c the value of f can vary. Only when the contour line for g = c touches contour lines of f tangentially, do we not increase or decrease the value of f - that is, when the contour lines touch but do not cross. This occurs at the constrained local extrema and the constrained inflection points of f.

A familiar example can be obtained from weather maps, with their contour lines for temperature and pressure: the constrained extrema will occur where the superposed maps show touching lines (isopleths).

Geometrically we translate the tangency condition to saying that the gradients of f and g are parallel vectors at the maximum, since the gradients are always normal to the contour lines. Introducing an unknown scalar, λ, we solve

\nabla \Big[f \left(x, y \right) + \lambda \left(g \left(x, y \right) - c \right) \Big] = 0

for λ ≠ 0.

Once values for λ are determined, we are back to the original number of variables and so can go on to find extrema of the new unconstrained function

F \left( x , y \right) = f \left( x , y \right) + \lambda \left( g \left( x , y \right) - c \right)

in traditional ways. That is, F(x,y) = f(x,y) for all (x,y) satisfying the constraint because g(x,y) − c equals zero on the constraint, but the extrema of F(x,y) are all on g(x,y) = c.

[edit] The method of Lagrange multipliers

Let f be a function defined on Rn, and let the constraints be given by gk(x) = 0 (perhaps by moving the constant to the left, as in gk(x) − c = 0). Now, define the Lagrangian, Λ, as

\Lambda(\mathbf x, \boldsymbol \lambda) = f + \sum_k \lambda_k g_k.

Observe that both the optimization criteria and constraints gk are compactly encoded as extrema of the Lagrangian:

\nabla_{\mathbf x} \Lambda = 0 \Leftrightarrow \nabla_{\mathbf x} f = - \sum_k \lambda_k \nabla_{\mathbf x} g_k,

and

\nabla_{\mathbf \lambda} \Lambda = 0 \Leftrightarrow g_k = 0.

Often the Lagrange multipliers have an interpretation as some salient quantity of interest. To see why this might be the case, observe that:

\frac{\partial \Lambda}{\partial {g_k}} = \lambda_k.

Thus, λk is the rate of change of the quantity being optimized as a function of the constraint variable. As examples, in Lagrangian mechanics the equations of motion are derived by finding stationary points of the action, the time integral of the difference between kinetic and potential energy. Thus, the force on a particle due to a scalar potential, F = −∇V, can be interpreted as a Lagrange multiplier determining the change in action (transfer of potential to kinetic energy) following a variation in the particle's constrained trajectory. In economics, the optimal profit to a player is calculated subject to a constrained space of actions, where a Lagrange multiplier is the value of relaxing a given constraint (e.g. through bribery or other means).

The method of Lagrange multipliers is generalized by the Karush-Kuhn-Tucker conditions.

[edit] Example

[edit] Simple example

Suppose you want to find the maximum values for

f(x, y) = x^2 y \,

with the condition that the x and y coordinates lie on the circle around the origin with radius √3, that is,

x^2 + y^2 = 3. \,

As there is just a single condition, we will use only one multiplier, say λ.

Use the constraint to define a function g(x, y):

g (x, y) = x^2 +y^2 -3. \,

The function g is identically zero on the circle of radius √3. So any multiple of g(xy) may be added to f(xy) leaving f(xy) unchanged. Let

\Phi (x, y, \lambda) = f(x,y) + \lambda g(x, y) = x^2y +  \lambda (x^2 + y^2 - 3). \,

The critical values of Φ occur when its gradient is zero. The partial derivatives are

\begin{align} \frac{\partial \Phi}{\partial x}       &= 2 x y + 2 \lambda x &&= 0, \qquad \text{(i)} \\ \frac{\partial \Phi}{\partial y}       &= x^2 + 2 \lambda y   &&= 0, \qquad \text{(ii)} \\ \frac{\partial \Phi}{\partial \lambda} &= x^2 + y^2 - 3       &&= 0. \qquad \text{(iii)} \end{align}

Equation (iii) is just the original constraint. Equation (i) implies λ = −y; and substituting this result into equation (ii),

x^2 - 2y^2 = 0. \,

Then x2 = 2y2. Substituting into equation (iii) and solving for y gives this value of y:

y = \pm 1. \,

Clearly there are four critical points:

(\sqrt{2},1); \quad (-\sqrt{2},1); \quad (\sqrt{2},-1); \quad (-\sqrt{2},-1).

Evaluating the objective at these points, we find

\Phi(\sqrt{2},1) = 2; \quad \Phi(-\sqrt{2},1) = 2; \quad \Phi(\sqrt{2},-1) = -2; \quad \Phi(-\sqrt{2},-1) = -2.

Therefore, the objective function attains a maximum at

(\sqrt{2},1) \quad\text{and}\quad (-\sqrt{2},1),

and a minimum at the other two critical points.

[edit] Another example

Suppose we wish to find the discrete probability distribution with maximal information entropy. Then

f(p_1,p_2,\ldots,p_n) = -\sum_{k=1}^n p_k\log_2 p_k.

Of course, the sum of these probabilities equals 1, so our constraint is g(p) = 1 with

g(p_1,p_2,\ldots,p_n)=\sum_{k=1}^n p_k.

We can use Lagrange multipliers to find the point of maximum entropy (depending on the probabilities). For all k from 1 to n, we require that

\frac{\partial}{\partial p_k}(f+\lambda (g-1))=0,

which gives

\frac{\partial}{\partial p_k}\left(-\sum_{k=1}^n p_k \log_2 p_k + \lambda (\sum_{k=1}^n p_k - 1) \right) = 0.

Carrying out the differentiation of these n equations, we get

-\left(\frac{1}{\ln 2}+\log_2 p_k \right)  + \lambda = 0.

This shows that all pi are equal (because they depend on λ only). By using the constraint ∑k pk = 1, we find

p_k = \frac{1}{n}.

Hence, the uniform distribution is the distribution with the greatest entropy.

[edit] Economics

Constrained optimization plays a central role in economics. For example, the choice problem for a consumer is represented as one of maximizing a utility function subject to a budget constraint. The Lagrange multiplier has an economic interpretation as the shadow price associated with the constraint, in this case the marginal utility of income.

[edit] See also

[edit] External links

For references to Lagrange's original work and for an account of the terminology see the Lagrange Multiplier entry in

For additional text and interactive applets