Directional statistics

From Wikipedia, the free encyclopedia

Directional statistics is the subdiscipline of statistics that deals with directions (unit vectors in Rn), axes (lines through the origin in Rn) or rotations in Rn. More generally, directional statistics deals with observations on compact Riemannian manifolds.

The overall shape of a protein can be parameterized as a sequence of points on the unit  sphere. Shown are two views of the spherical histogram of such points for a large collection of protein structures. The statistical treatment of such data is in the realm of directional statistics.
The overall shape of a protein can be parameterized as a sequence of points on the unit sphere. Shown are two views of the spherical histogram of such points for a large collection of protein structures. The statistical treatment of such data is in the realm of directional statistics.[1]

The fact that 0 degrees and 360 degrees are identical angles, so that for example 180 degrees is not a sensible mean of 2 degrees and 358 degrees, provides one illustration that special statistical methods are required for the analysis of some types of data (in this case, angular data). Other examples of data that may be regarded as directional include statistics involving days of the week, months of the year, compass directions, dihedral angles in molecules, orientations, rotations and so on.

Contents

[edit] von Mises distribution

The equivalent in circular statistics of the Gaussian or normal distribution in conventional statistics is the von Mises distribution. This distribution on the circle probably the best known distribution from the field of directional statistics. The distribution is widely used to model angular data, where an angle is represented by a two-dimensional unit vector (that is, a point on the circle).

[edit] Wrapped distributions

Any probability density function p(x) on the line can be "wrapped" around the circumference of a circle of unit radius.[2] That is, the pdf of the wrapped variable


\theta = x_w=x \mod 2\pi\ \ \in (-\pi,\pi]

is


p_w(\theta)=\sum_{k=-\infty}^{\infty}{p(\theta+2\pi k)}

This concept can be extended to the multivariate context by an extension of the simple sum to a number of F sums that cover all dimensions in the feature space:


p_w(\vec\theta)=\sum_{k_1=-\infty}^{\infty}{p(\vec\theta+2\pi k_1\mathbf{e}_1+\dots+2\pi k_F\mathbf{e}_F)}

where \mathbf{e}_k=(0,\dots,0,1,0,\dots,0)^{\mathsf{T}} is the kth Euclidean basis vector.

[edit] Distributions on higher dimensional manifolds

Three points sets sampled from different Kent distributions on the sphere.
Three points sets sampled from different Kent distributions on the sphere.

There also exist distributions on the two-dimensional sphere (such as the Kent distribution[3]), the N-dimensional sphere (the Von Mises-Fisher distribution[4]) or the torus (the bivariate von Mises distribution[5]).

The Matrix-von Mises-Fisher distribution is a distribution on the Stiefel manifold, and can be used to construct probability distributions over rotation matrices.[6]

The Bingham distribution is a distribution over axes in N dimensions, or equivalently, over points on the (N-1)-dimensional sphere with the antipodes identified.[7] For example, if N=2, the axes are undirected lines through the origin in the plane. In this case, each axis cuts the unit circle in the plane (which is the one-dimensional sphere) at two points that are each other's antipodes. For N=4, the Bingham distribution is a distribution over the space of unit quaternions. Since a unit quaternion corresponds to a rotation matrix, the Bingham distribution for N=4 can be used to construct probability distributions over the space of rotations, just like the Matrix-von Mises-Fisher distribution.

These distributions are for example used in geology[8], crystallography[9] and bioinformatics[10].

[edit] Example: the mean of a series of angles

A simple way to calculate the mean of a series of angles (in the interval [0°, 360°)) is to calculate the mean of the cosines and sines of each angle, and obtain the angle by calculating the inverse tangent. Consider the following three angles as an example: 10, 20, and 30 degrees. Intuitively, calculating the mean would involve adding these three angles together and dividing by 3, in this case indeed resulting in a correct mean angle of 20 degrees. By rotating this system anticlockwise through 15 degrees the three angles become 355 degrees, 5 degrees and 15 degrees. The naive mean is now 125 degrees, which is the wrong answer, as it should be 5 degrees. The true mean \scriptstyle\bar \theta can be calculated in the following way, using the mean sine \scriptstyle\bar s and the mean cosine \scriptstyle\bar c \not = 0:


\bar s = \frac{1}{3} \left( \sin (355^\circ) + \sin (5^\circ) + \sin (15^\circ) \right) 
=  \frac{1}{3} \left( -0.087 + 0.087 + 0.259 \right) 
\approx 0.086

\bar c = \frac{1}{3} \left(  \cos (355^\circ) + \cos (5^\circ) + \cos (15^\circ) \right) 
=  \frac{1}{3} \left( 0.996 + 0.996 + 0.966 \right) 
\approx 0.986

\bar \theta = 

\left.
\begin{cases}
\arctan \left( \frac{\bar s}{ \bar c} \right) & \bar s > 0 ,\ \bar c > 0 \\
 \arctan \left( \frac{\bar s}{ \bar c} \right) + 180^\circ & \bar c < 0 \\
\arctan \left (\frac{\bar s}{\bar c}
\right)+360^\circ & \bar s <0 ,\ \bar c >0 
\end{cases}
\right\}

= \arctan \left( \frac{0.086}{0.986} \right) 

= \arctan (0.087) = 5^\circ.

[edit] See also

[edit] References

  1. ^ Hamelryck, T., Kent, J., Krogh, A. (2006) Sampling realistic protein conformations using local structural bias. PLoS Comput. Biol., 2(9): e131. Public Library of Science (PLoS). Retrieved on 2008-02-01.
  2. ^ Bahlmann, C., (2006), Directional features in online handwriting recognition, Pattern Recognition, 39
  3. ^ Kent, J (1982) The Fisher–Bingham distribution on the sphere. J Royal Stat Soc, 44, 71–80.
  4. ^ Fisher, RA (1953) Dispersion on a sphere. Proc. Roy. Soc. London Ser. A., 217, 295-305
  5. ^ Mardia, KM. Taylor, CC., Subramaniam, GK. (2007) Protein Bioinformatics and Mixtures of Bivariate von Mises Distributions for Angular Data. Biometrics, 63, 505–512
  6. ^ Downs, (1972) Orientational statistics. Biometrica, 59, 665-676
  7. ^ Bingham, C. (1974) An Antipodally Symmetric Distribution on the Sphere. Ann. Statist., 2, 1201-1225.
  8. ^ Peel, D., Whiten, WJ., McLachlan, GJ. (2001) Fitting mixtures of Kent distributions to aid in joint set identification. J. Am. Stat. Ass., 96, 56-63
  9. ^ Krieger Lassen, N. C., Juul Jensen, D. & Conradsen, K. (1994) On the statistical analysis of orientation data. Acta Cryst., A50, 741-748.
  10. ^ Kent, J.T., Hamelryck, T. (2005). Using the Fisher-Bingham distribution in stochastic models for protein structure. In S. Barber, P.D. Baxter, K.V.Mardia, & R.E. Walls (Eds.), Quantitative Biology, Shape Analysis, and Wavelets, pp. 57-60. Leeds, Leeds University Press

[edit] Books on directional statistics

  • Batschelet, E. Circular statistics in biology, Academic Press, London, 1981. ISBN 0-12-081050-6.
  • Fisher, NI., Statistical Analysis of Circular Data, Cambridge University Press, 1993. ISBN 0-521-35018-2
  • Fisher, NI., Lewis, T., Embleton, BJJ. Statistical Analysis of Spherical Data, Cambridge University Press, 1993. ISBN 0-521-45699-1
  • Mardia, KV. and Jupp P., Directional Statistics (2nd edition), John Wiley and Sons Ltd., 2000. ISBN 0-471-95333-4

[edit] External links

Languages