Circular statistics

From Wikipedia, the free encyclopedia

Circular or directional statistics is the subdiscipline of statistics that deals with directions (unit vectors in \Bbb{R}^{n}), axes (lines through the origin in \Bbb{R}^{n}) or rotations in \Bbb{R}^{n}.

The fact that 0 degrees and 360 degrees are identical angles, so that for example 180 degrees is not a sensible mean of 2 degrees and 358 degrees, provides one illustration that special statistical methods are required for the analysis of circular data.

Other examples of data that may be regarded as directional include statistics involving days of the week, months of the year, compass directions, dihedral angles in molecules, orientations, rotations and so on.

The fundamental insight is that such data are often best handled not as numbers, but as unit vectors. So to average a number of times of the day, treat each time as a unit vector whose angle is the appropriate fraction of a circle, compute their sum, and divide by N to get a mean with both direction and magnitude. The closer the times are to being completely random, the smaller that vector mean will be, whereas if the mean has a large magnitude that would imply a significant tendency in the data. The equivalent in circular statistics of the Gaussian or normal distribution in conventional statistics is the von Mises distribution.

Contents

[edit] Higher-dimensional distributions

There also exist distributions on the three-dimensional sphere (like the 5-parameter Fisher-Bingham distribution or Kent distribution), the N-dimensional sphere (the Von Mises-Fisher distribution) or the torus. These distributions are for example used in geology and bioinformatics. The Matrix-Fisher distribution is a distribution on the Stiefel manifold, and can be used to construct probability distributions over rotation matrices.

[edit] Example: the mean of a series of angles

A simple way to calculate the mean of a series of angles (in the interval [0^\circ,360^\circ[) is to calculate the mean of the cosines and sines of each angle, and obtain the angle by calculating the inverse tangent. Consider the following three angles as an example: 10, 20, and 30 degrees. Intuitively, calculating the mean would involve adding these three angles together and dividing by 3, in this case indeed resulting in a correct mean angle of 20 degrees.

By rotating this system anticlockwise through 15 degrees the three angles become 355 degrees, 5 degrees and 15 degrees. The naive mean is now 125 degrees, which is the wrong answer, as it should be 5 degrees.

The true mean \bar \theta can be calculated in the following way, using the mean sine \bar s and the mean cosine \bar c \not = 0:

\bar s = \frac{1}{3} \left( \sin (355^\circ) + \sin (5^\circ) + \sin (15^\circ) \right)  =  \frac{1}{3} \left( -0.087 + 0.087 + 0.259 \right)  \approx 0.086
\bar c = \frac{1}{3} \left(  \cos (355^\circ) + \cos (5^\circ) + \cos (15^\circ) \right)  =  \frac{1}{3} \left( 0.996 + 0.996 + 0.966 \right)  \approx 0.986
\bar \theta =   \begin{cases} \arctan \left( \frac{\bar s}{ \bar c} \right) & \bar s > 0 , \bar c > 0 \\  \arctan \left( \frac{\bar s}{ \bar c} \right) + 180^\circ & \bar c < 0 \\ \arctan \left (\frac{\bar s}{\bar c} \right)+360^\circ & \bar s <0 , \bar c >0\\ \end{cases}  = \arctan \left( \frac{0.086}{0.986} \right)   = \arctan (0.087) = 5^\circ

[edit] References

  • Batschelet, E. Circular statistics in biology, Academic Press, London, 1981. ISBN 0-12-081050-6.
  • Fisher, NI., Statistical Analysis of Circular Data, Cambridge University Press, 1993. ISBN 0-521-35018-2
  • Fisher, NI., Lewis, T., Embleton, BJJ. Statistical Analysis of Spherical Data, Cambridge University Press, 1993. ISBN 0-521-45699-1
  • Mardia, KV. and Jupp P., Directional Statistics (2nd edition), John Wiley and Sons Ltd., 2000. ISBN 0-471-95333-4

[edit] External links

In other languages