Rademacher complexity

From Wikipedia, the free encyclopedia

In statistics and machine learning, Rademacher complexity measures richness of a class of real-valued functions with respect to a probability distribution.

Let \mathcal{F} be a class of real-valued functions defined on a domain space Z. The empirical Rademacher complexity of \mathcal{F} on a sample S=(z_1, z_2, \dots, z_m) \in Z^m is defined as


\widehat \mathcal{R}_S(\mathcal{F}) 
= 
\frac{2}{m} \ \operatorname{E} \left(  \sup_{f \in \mathcal{F}} \left| \sum_{i=1}^m \sigma_i f(z_i) \right| \right)

where \sigma_1, \sigma_2, \dots, \sigma_m are independent random variables such that \Pr(\sigma_i = +1) = \Pr(\sigma_i = -1) = 1/2 for any i=1,2,\dots,m. The random variables \sigma_1, \sigma_2, \dots, \sigma_m are referred to as Rademacher variables.

Let P be a probability distribution over Z. The Rademacher complexity of the function class \mathcal{F} with respect to P for sample size m is


\mathcal{R}_m(\mathcal{F}) 
= 
\operatorname{E} \left( \widehat \mathcal{R}_S(\mathcal{F}) \right)

where the above expectation is taken over an identically independently distributed (i.i.d.) sample S=(z_1, z_2, \dots, z_m) generated according to P.

One can show, for example, that there exists a constant C, such that any class of {0,1}-indicator functions with Vapnik-Chervonenkis dimension d has Rademacher complexity upper-bounded by C\sqrt{\frac{d}{m}}.

[edit] References

Peter L. Bartlett, Shahar Mendelson (2002) Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. Journal of Machine Learning Research 3 463-482