Automatic differentiation
From Wikipedia, the free encyclopedia
In mathematics and computer algebra, automatic differentiation, or AD, sometimes alternatively called algorithmic differentiation, is a method to numerically evaluate the derivative of a function specified by a computer program. Two classical ways of doing this are:
- symbolically differentiate the function as an expression, and evaluate it in the point; or
- use finite differences.
The drawback with symbolic differentiation is low speed, and the difficulty of converting a computer program into a single expression. Moreover, many functions grow quite complex as higher derivatives are calculated. Two important drawbacks with finite differences are round-off errors in the discretization process, and cancellation. Both classical methods have problems with calculating higher derivatives, where the complexity and errors increase. Automatic differentiation solves all of the mentioned problems.
There are two modes of AD, called forward accumulation and reverse accumulation. Forward accumulation is less useful in practice, but easy to implement on computers. Reverse accumulation allows the efficient calculation of gradients.
Contents |
[edit] Forward accumulation AD
A new arithmetic is introduced using ordered pairs as objects. The new arithmetic uses ordinary arithmetic on the first element of the pair and a first order differentiation arithmetic applying the chain rule on the second element. The basic arithmetic operations and some standard functions are defined by
and in general for the primitive function g,
where g(1) and g(2) are the derivatives of g with respect to its first and second arguments, respectively. Also, when a binary basic arithmetic operation is applied to mixed arguments, say the ordered pair and the real number c, the real number is first lifted to .
The derivative of a function at the point x0 is now found by calculating using the above arithmetic, which gives as the result.
Multivariate functions can be handled with the same efficiency and mechanisms as univariate functions by adopting a directional derivative operator, which finds the directional derivative of at in the direction by calculating using the same arithmetic as above.
[edit] Higher order forward accumulation AD
The above arithmetic can be generalized, in the natural way, to second order and higher derivatives. However, the arithmetic rules quickly grow very complicated. Instead, truncated Taylor series arithmetic is used. This is possible because the Taylor summands in a Taylor series of a function are products of known coefficients and derivatives of the function.
[edit] Reverse accumulation AD
The data flow graph of a computation can be manipulated to calculate the gradient of its original calculation. This is done by adding an adjoint node for each primal node, connected by adjoint edges which parallel the primal edges but flow in the opposite direction. The nodes in the adjoint graph represent multiplication by the derivatives of the functions calculated by the nodes in the primal. For instance, addition in the primal causes fanout in the adjoint; fanout in the primal causes addition in the adjoint; a unary function y = f(x) in the primal causes x' = f'(x)y' in the adjoint; etc.
Backpropagation of errors in multilayer perceptrons, a technique used in machine learning, is a special case of reverse mode AD.
[edit] Software
There are plenty of libraries for automatic differentiation.