Directional statistics

From Wikipedia, the free encyclopedia

Directional statistics is the subdiscipline of statistics that deals with directions (unit vectors in Rⁿ), axes (lines through the origin in Rⁿ) or rotations in Rⁿ. More generally, directional statistics deals with observations on compact Riemannian manifolds.

The overall shape of a protein can be parameterized as a sequence of points on the unit sphere. Shown are two views of the spherical histogram of such points for a large collection of protein structures. The statistical treatment of such data is in the realm of directional statistics.^[1]

The fact that 0 degrees and 360 degrees are identical angles, so that for example 180 degrees is not a sensible mean of 2 degrees and 358 degrees, provides one illustration that special statistical methods are required for the analysis of some types of data (in this case, angular data). Other examples of data that may be regarded as directional include statistics involving days of the week, months of the year, compass directions, dihedral angles in molecules, orientations, rotations and so on.

1 von Mises distribution
2 Wrapped distributions
3 Distributions on higher dimensional manifolds
4 Example: the mean of a series of angles
5 See also
6 References
7 Books on directional statistics
8 External links

[edit] von Mises distribution

The equivalent in circular statistics of the Gaussian or normal distribution in conventional statistics is the von Mises distribution. This distribution on the circle probably the best known distribution from the field of directional statistics. The distribution is widely used to model angular data, where an angle is represented by a two-dimensional unit vector (that is, a point on the circle).

[edit] Wrapped distributions

Any probability density function $p (x)$ on the line can be "wrapped" around the circumference of a circle of unit radius.^[2] That is, the pdf of the wrapped variable

$\theta = x_w=x \mod 2\pi\ \ \in (-\pi,\pi]$

$p_w(\theta)=\sum_{k=-\infty}^{\infty}{p(\theta+2\pi k)}$

This concept can be extended to the multivariate context by an extension of the simple sum to a number of $F$ sums that cover all dimensions in the feature space:

$p_w(\vec\theta)=\sum_{k_1=-\infty}^{\infty}{p(\vec\theta+2\pi k_1\mathbf{e}_1+\dots+2\pi k_F\mathbf{e}_F)}$

where $\mathbf{e}_k=(0,\dots,0,1,0,\dots,0)^{\mathsf{T}}$ is the $k$ th Euclidean basis vector.

[edit] Distributions on higher dimensional manifolds

Three points sets sampled from different Kent distributions on the sphere.

There also exist distributions on the two-dimensional sphere (such as the Kent distribution^[3]), the N-dimensional sphere (the Von Mises-Fisher distribution^[4]) or the torus (the bivariate von Mises distribution^[5]).

The Matrix-von Mises-Fisher distribution is a distribution on the Stiefel manifold, and can be used to construct probability distributions over rotation matrices.^[6]

The Bingham distribution is a distribution over axes in N dimensions, or equivalently, over points on the (N-1)-dimensional sphere with the antipodes identified.^[7] For example, if N=2, the axes are undirected lines through the origin in the plane. In this case, each axis cuts the unit circle in the plane (which is the one-dimensional sphere) at two points that are each other's antipodes. For N=4, the Bingham distribution is a distribution over the space of unit quaternions. Since a unit quaternion corresponds to a rotation matrix, the Bingham distribution for N=4 can be used to construct probability distributions over the space of rotations, just like the Matrix-von Mises-Fisher distribution.

These distributions are for example used in geology^[8], crystallography^[9] and bioinformatics^[10].

[edit] Example: the mean of a series of angles

A simple way to calculate the mean of a series of angles (in the interval [0°, 360°)) is to calculate the mean of the cosines and sines of each angle, and obtain the angle by calculating the inverse tangent. Consider the following three angles as an example: 10, 20, and 30 degrees. Intuitively, calculating the mean would involve adding these three angles together and dividing by 3, in this case indeed resulting in a correct mean angle of 20 degrees. By rotating this system anticlockwise through 15 degrees the three angles become 355 degrees, 5 degrees and 15 degrees. The naive mean is now 125 degrees, which is the wrong answer, as it should be 5 degrees. The true mean $\scriptstyle\bar \theta$ can be calculated in the following way, using the mean sine $\scriptstyle\bar s$ and the mean cosine $\scriptstyle\bar c \not = 0$ :

$\bar s = \frac{1}{3} \left( \sin (355^\circ) + \sin (5^\circ) + \sin (15^\circ) \right) = \frac{1}{3} \left( -0.087 + 0.087 + 0.259 \right) \approx 0.086$

$\bar c = \frac{1}{3} \left( \cos (355^\circ) + \cos (5^\circ) + \cos (15^\circ) \right) = \frac{1}{3} \left( 0.996 + 0.996 + 0.966 \right) \approx 0.986$

$\bar \theta = \left. \begin{cases} \arctan \left( \frac{\bar s}{ \bar c} \right) & \bar s > 0 ,\ \bar c > 0 \\ \arctan \left( \frac{\bar s}{ \bar c} \right) + 180^\circ & \bar c < 0 \\ \arctan \left (\frac{\bar s}{\bar c} \right)+360^\circ & \bar s <0 ,\ \bar c >0 \end{cases} \right\} = \arctan \left( \frac{0.086}{0.986} \right) = \arctan (0.087) = 5^\circ.$

[edit] See also

Yamartino method

[edit] References

^ Hamelryck, T., Kent, J., Krogh, A. (2006) Sampling realistic protein conformations using local structural bias. PLoS Comput. Biol., 2(9): e131. Public Library of Science (PLoS). Retrieved on 2008-02-01.
^ Bahlmann, C., (2006), Directional features in online handwriting recognition, Pattern Recognition, 39
^ Kent, J (1982) The Fisher–Bingham distribution on the sphere. J Royal Stat Soc, 44, 71–80.
^ Fisher, RA (1953) Dispersion on a sphere. Proc. Roy. Soc. London Ser. A., 217, 295-305
^ Mardia, KM. Taylor, CC., Subramaniam, GK. (2007) Protein Bioinformatics and Mixtures of Bivariate von Mises Distributions for Angular Data. Biometrics, 63, 505–512
^ Downs, (1972) Orientational statistics. Biometrica, 59, 665-676
^ Bingham, C. (1974) An Antipodally Symmetric Distribution on the Sphere. Ann. Statist., 2, 1201-1225.
^ Peel, D., Whiten, WJ., McLachlan, GJ. (2001) Fitting mixtures of Kent distributions to aid in joint set identification. J. Am. Stat. Ass., 96, 56-63
^ Krieger Lassen, N. C., Juul Jensen, D. & Conradsen, K. (1994) On the statistical analysis of orientation data. Acta Cryst., A50, 741-748.
^ Kent, J.T., Hamelryck, T. (2005). Using the Fisher-Bingham distribution in stochastic models for protein structure. In S. Barber, P.D. Baxter, K.V.Mardia, & R.E. Walls (Eds.), Quantitative Biology, Shape Analysis, and Wavelets, pp. 57-60. Leeds, Leeds University Press

[edit] Books on directional statistics

Batschelet, E. Circular statistics in biology, Academic Press, London, 1981. ISBN 0-12-081050-6.
Fisher, NI., Statistical Analysis of Circular Data, Cambridge University Press, 1993. ISBN 0-521-35018-2
Fisher, NI., Lewis, T., Embleton, BJJ. Statistical Analysis of Spherical Data, Cambridge University Press, 1993. ISBN 0-521-45699-1
Mardia, KV. and Jupp P., Directional Statistics (2nd edition), John Wiley and Sons Ltd., 2000. ISBN 0-471-95333-4

[edit] External links

Kanti Mardia, University of Leeds
Peter Jupp, University of St Andrews
Christopher Bingham, University of Minnesota

v • d • e

Probability distributions

Discrete univariate with finite support

Benford · Bernoulli · binomial · categorical · hypergeometric · Rademacher · discrete uniform · Zipf · Zipf-Mandelbrot

Discrete univariate with infinite support

Boltzmann · Conway-Maxwell-Poisson · compound Poisson · discrete phase-type · extended negative binomial · Gauss-Kuzmin · geometric · logarithmic · negative binomial · parabolic fractal · Poisson · Skellam · Yule-Simon · zeta

Continuous univariate supported on a bounded interval

Beta · Kumaraswamy · raised cosine · triangular · U-quadratic · uniform · Wigner semicircle

Continuous univariate supported on a semi-infinite interval

Beta prime · Burr · chi-square · Coxian · Erlang · exponential · F · Fermi-Dirac · folded normal · Fréchet · Gamma · generalized extreme value · generalized inverse Gaussian · half-logistic · half-normal · Hotelling's T-square · hyper-exponential · hypoexponential · inverse chi-square (scale inverse chi-square) · inverse Gaussian · inverse gamma · Lévy · log-normal · log-logistic · Maxwell-Boltzmann · Maxwell speed · Nakagami · noncentral chi-square · Pareto · phase-type · Rayleigh · relativistic Breit–Wigner · Rice · Rosin–Rammler · shifted Gompertz · truncated normal · type-2 Gumbel · Weibull · Wilks' lambda

Continuous univariate supported on the whole real line

Cauchy · extreme value · exponential power · Fisher's z · Fisher-Tippett · generalized hyperbolic · hyperbolic secant · Landau · Laplace · Lévy skew alpha-stable · logistic · normal (Gaussian) · normal inverse Gaussian · skew normal · Student's t · type-1 Gumbel · Variance-Gamma · Voigt

Multivariate (joint)

Discrete: Ewens · multinomial · multivariate Polya Continuous: Dirichlet · Generalized Dirichlet · multivariate normal · multivariate Student · normal-scaled inverse gamma · normal-gamma Matrix-valued: inverse-Wishart · matrix normal · Wishart

Directional, degenerate, and singular

Directional: Kent · von Mises · von Mises–Fisher Degenerate: discrete degenerate · Dirac delta function Singular: Cantor

Families

exponential · natural exponential · location-scale · maximum entropy · Pearson · Tweedie

Categories: Statistical data types | Statistical models

Directional statistics

From Wikipedia, the free encyclopedia

Contents

[edit] von Mises distribution

[edit] Wrapped distributions

[edit] Distributions on higher dimensional manifolds

[edit] Example: the mean of a series of angles

[edit] See also

[edit] References

[edit] Books on directional statistics

[edit] External links

Views

Navigation

Interaction

Search

Languages