Cybenko Theorem

From Wikipedia, the free encyclopedia

Formulated by Cybenko in 1989, the theorem states:

Let \varphi be any continuous sigmoid-type function [e.g., \varphi(\xi) = 1/(1+e^{-\xi})]. Then, given any continuous real-valued function f on [0,1]n (or any other compact subset of Rn) and ε > 0, there exists vectors \mathbf{w_1}, \mathbf{w_2}, ..., \mathbf{w_N}, \mathbf{\alpha} and \mathbf{\theta} and a parameterized function G(\mathbf{\cdot},\mathbf{w},\mathbf{\alpha},\mathbf{\theta}): [0,1]^n \rightarrow R such that

|G(\mathbf{x},\mathbf{w},\mathbf{\alpha},\mathbf{\theta}) - f(x)| < |\epsilon| for all \mathbf{x} \in [0,1]^n

where

G(\mathbf{x},\mathbf{w},\mathbf{\alpha},\mathbf{\theta}) = \sum_{i=1}^N\alpha_j\varphi(\mathbf{w}_j^T\mathbf{x} + \theta_j)

and \mathbf{w}_j \in R^n, \alpha_j, \theta_j \in R, \mathbf{w} = (\mathbf{w}_1, \mathbf{w}_2, ... \mathbf{w}_N), \mathbf{\alpha} = (\alpha_1, \alpha_2, ... \alpha_N), and \mathbf{\theta} = (\theta_1, \theta_2, ..., \theta_N).

This theorem says that a single hidden layer, feed forward neural network is capable of approximating any continuous, multivariate function to the desired degree of accuracy and that failure to map a function arises from poor choices for \mathbf{w}_1, \mathbf{w}_2, ..., \mathbf{w}_N, \mathbf{\alpha}, and \mathbf{\theta} or an insufficient number of hidden neurons.

[edit] References

  • Hassoun, M. (1995) Fundamentals of Artificial Neural Networks MIT Press, p.48