Chain rule

From Wikipedia, the free encyclopedia

In calculus, the chain rule is a formula for the derivative of the composite of two functions.

In intuitive terms, if a variable, y, depends on a second variable, u, which in turn depends on a third variable, x, then the rate of change of y with respect to x can be computed as the rate of change of y with respect to u multiplied by the rate of change of u with respect to x.

1 Definition
2 Examples
3 Chain rule for several variables
4 Proof of the chain rule
5 The fundamental chain rule
6 Tensors and the chain rule
7 Higher derivatives
8 See also

[edit] Definition

The chain rule states that

$(f \circ g)'(x) = (f(g(x)))' = f'(g(x)) g'(x),\,$

which in short form is written as $(f \circ g)' = f'\circ g\cdot g'$ .

Alternatively, in the Leibniz notation, the chain rule is

$\frac {df}{dx} = \frac {df} {dg} \frac {dg}{dx}.$

In integration, the counterpart to the chain rule is the substitution rule.

[edit] Examples

[edit] Example I

Suppose, that one is climbing a mountain at a rate of 0.5 kilometres per hour. The temperature is lower at higher elevations; suppose the rate by which it decreases is 6 °C per kilometre. If one multiplies 6 °C per kilometre by 0.5 kilometre per hour, one obtains 3 °C per hour. This calculation is a typical chain rule application.

[edit] Example II

Consider $f (x) = (x 2 + 1) 3$ . We have $f (x) = h (g (x))$ where $g (x) = x 2 + 1$ and $h (x) = x 3 .$ Thus,

$f '(x) \,$	$= 3(x^2 + 1)^2(2x) \,$
	$= 6x(x^2 + 1)^2. \,$

In order to differentiate the trigonometric function

$f(x) = \sin(x^2),\,$

one can write $f (x) = h (g (x))$ with $h (x) = sin x$ and $g (x) = x 2$ . The chain rule then yields

$f'(x) = 2x \cos(x^2) \,$

since $h'(g (x)) = cos(x 2)$ and $g'(x) = 2 x$ .

[edit] Example III

Differentiate $\arctan\,\sin\, x$ , etc.

$\frac{d}{dx}\arctan\,x\,=\,\frac{1}{1+x^2}$

$\frac{d}{dx}\arctan\,f(x)\,=\,\frac{f'(x)}{1+f^2(x)}$

$\frac{d}{dx}\arctan\,\sin\,x\,=\,\frac{\cos\,x}{1+\sin^2\,x}$

[edit] Chain rule for several variables

The chain rule works for functions of more than one variable. Consider the function $z = f (x, y)$ where $x = g (t)$ and $y = h (t)$ , and $g (t)$ and $h (t)$ are differentiable with respect to $t$ , then

${\ dz \over dt}={\partial f \over \partial x}{dx \over dt}+{\partial f \over \partial y}{dy \over dt}$

Suppose that each function of $z = f (u, v)$ is a two-variable function such that $u = h (x, y)$ and $v = g (x, y)$ , and that these functions are all differentiable. Then the chain rule would look like:

${\partial z \over \partial x}={\partial z \over \partial u}{\partial u \over \partial x}+{\partial z \over \partial v}{\partial v \over \partial x}$

${\partial z \over \partial y}={\partial z \over \partial u}{\partial u \over \partial y}+{\partial z \over \partial v}{\partial v \over \partial y}$

If we considered $\vec r = (u,v)$ above as a vector function, we can use vector notation to write the above equivalently as the dot product of the gradient of f and a derivative of $\vec r$ :

$\frac{\partial f}{\partial x}=\vec \nabla f \cdot \frac{\partial \vec r}{\partial x}$

More generally, for functions of vectors to vectors, the chain rule says that the Jacobian matrix of a composite function is the product of the Jacobian matrices of the two functions:

$\frac{\partial(z_1,\ldots,z_m)}{\partial(x_1,\ldots,x_p)} = \frac{\partial(z_1,\ldots,z_m)}{\partial(y_1,\ldots,y_n)} \frac{\partial(y_1,\ldots,y_n)}{\partial(x_1,\ldots,x_p)}$

[edit] Proof of the chain rule

Let f and g be functions and let x be a number such that f is differentiable at g(x) and g is differentiable at x. Then by the definition of differentiability,

$g(x+\delta)-g(x)= \delta g'(x) + \epsilon(\delta)\delta \,$ where $\epsilon(\delta) \to 0 \,$ as $\delta\to 0.$

Similarly,

$f(g(x)+\alpha) - f(g(x)) = \alpha f'(g(x)) + \eta(\alpha)\alpha \,$ where $\eta(\alpha) \to 0 \,$ as $\alpha\to 0. \,$

Now

$f(g(x+\delta))-f(g(x))\,$	$= f(g(x) + \delta g'(x)+\epsilon(\delta)\delta) - f(g(x)) \,$
	$= \alpha_\delta f'(g(x)) + \eta(\alpha_\delta)\alpha_\delta \,$

where $\alpha_\delta = \delta g'(x) + \epsilon(\delta)\delta \,$ . Observe that as $\delta\to 0,$ $\frac{\alpha_\delta}{\delta}\to g'(x)$ and $\alpha_\delta \to 0$ , thus $\eta(\alpha_\delta)\to 0$ . Therefore

$\frac{f(g(x+\delta))-f(g(x))}{\delta} \to g'(x)f'(g(x))\mbox{ as } \delta \to 0.$

[edit] The fundamental chain rule

The chain rule is a fundamental property of all definitions of derivative and is therefore valid in much more general contexts. For instance, if E, F and G are Banach spaces (which includes Euclidean space) and f : E → F and g : F → G are functions, and if x is an element of E such that f is differentiable at x and g is differentiable at f(x), then the derivative (the Fréchet derivative) of the composition g o f at the point x is given by

$\mbox{D}_x\left(g \circ f\right) = \mbox{D}_{f\left(x\right)}\left(g\right) \circ \mbox{D}_x\left(f\right).$

Note that the derivatives here are linear maps and not numbers. If the linear maps are represented as matrices (namely Jacobians), the composition on the right hand side turns into a matrix multiplication.

A particularly clear formulation of the chain rule can be achieved in the most general setting: let M, N and P be C^k manifolds (or even Banach-manifolds) and let

f : M → N and g : N → P

be differentiable maps. The derivative of f, denoted by df, is then a map from the tangent bundle of M to the tangent bundle of N, and we may write

$\mbox{d}\left(g \circ f\right) = \mbox{d}g \circ \mbox{d}f.$

In this way, the formation of derivatives and tangent bundles is seen as a functor on the category of C^∞ manifolds with C^∞ maps as morphisms.

[edit] Tensors and the chain rule

See tensor field for an advanced explanation of the fundamental role the chain rule plays in the geometric nature of tensors.

[edit] Higher derivatives

Faà di Bruno's formula generalizes the chain rule to higher derivatives. The first few derivatives are

$\frac{df}{dx} = \frac{df}{dg}\frac{dg}{dx}$

$\frac{d^2 f}{d x^2} = \frac{d^2 f}{d g^2}\left(\frac{dg}{dx}\right)^2 + \frac{df}{dg}\frac{d^2 g}{dx^2}$

$\frac{d^3 f}{d x^3} = \frac{d^3 f}{d g^3} \left(\frac{dg}{dx}\right)^3 + 3 \frac{d^2 f}{d g^2} \frac{dg}{dx} \frac{d^2 g}{d x^2} + \frac{df}{dg} \frac{d^3 g}{d x^3}$

$\frac{d^4 f}{d x^4} =\frac{d^4 f}{dg^4} \left(\frac{dg}{dx}\right)^4 + 6 \frac{d^3 f}{d g^3} \left(\frac{dg}{dx}\right)^2 \frac{d^2 g}{d x^2} + \frac{d^2 f}{d g^2} \left\{ 4 \frac{dg}{dx} \frac{d^3 g}{dx^3} + 3\left(\frac{d^2 g}{dx^2}\right)^2\right\} + \frac{df}{dg}\frac{d^4 g}{dx^4}$

Category: Differential calculus

Chain rule

From Wikipedia, the free encyclopedia

Contents

[edit] Definition

[edit] Examples

[edit] Example I

[edit] Example II

[edit] Example III

[edit] Chain rule for several variables

[edit] Proof of the chain rule

[edit] The fundamental chain rule

[edit] Tensors and the chain rule

[edit] Higher derivatives

[edit] See also

Views

Navigation

interaction

Search

In other languages