Correlation dimension

From Wikipedia, the free encyclopedia

In chaos theory the correlation dimension (denoted by ν) is a measure of the dimensionality of the space occupied by a set of random points. For example, if we have a set of random points on the real number line between 0 and 1, the correlation dimension will be ν=1, while if they are distributed on say, a triangle embedded in 3-space (or m-space), the correlation dimension will be ν=2. This is what we would intuitively expect from a measure of dimension. The real utility of the correlation dimension is in determining the (possibly fractional) dimensions of fractal objects. There are other methods of measuring dimension (e.g. the Hausdorff dimension, the box-counting dimension, and the information dimension) but the correlation dimension has the advantage of being straightforwardly and quickly calculated, and is often in agreement with other calculations of dimension.

For any set of N points in an m-dimensional space

\vec x(i)=[x_{1}(i),x_{2}(i),\ldots,x_{m}(i)]

where i=1,2,\ldots N then the correlation integral C(\varepsilon) is calculated by:

C(\varepsilon)=\lim_{N \rightarrow \infty} \frac{g}{N^2}

where g is the total number of pairs of points which have a distance between them that is less than or equal to distance \varepsilon (a graphical representation of such close pairs is the recurrence plot). As the number of points tends to infinity, and the distance between them tends to zero, the correlation integral, for small values of \varepsilon, will take the form:

C(\varepsilon) \sim \varepsilon^\nu

If the number of points is sufficiently large, and evenly distributed, a plot of the correlation integral versus \varepsilon will yield an estimate of ν. This idea can be qualitatively understood by realizing that for higher dimensional objects, there will be more ways for points to be close to each other, and so the number of pairs close to each other will rise more rapidly for higher dimensions.

Grassberger, et. al. (1983) is the main reference for this technique, and gives the results of such estimates for a number of fractal objects, as well as comparing the values to other measures of fractal dimension. The technique can be used to distinguish between chaotic and truly random behavior. For example, in the "Sun in Time" article, the method was used to show that the number of sunspots on the sun, after accounting for the known cycles such as the daily and 11-year cycles, is very likely not random noise, but rather chaotic noise, with a low-dimensional fractal attractor.

[edit] See also

[edit] References

  • P. Grassberger and I. Procaccia (1983). "Measuring the strangeness of strange attractors". Physica 9D: 189-208.  (LINK)
  • Sonett, C., Giampapa, M., and Matthews, M. (Eds.) (1992). The Sun in Time. University of Arizona Press. ISBN 0-8165-1297-3. 

Point correlation dimension (PD2i) and PD2i parameters.

To make a correlation integral, the data are discretized (digitized) into a series of sequential data points (Pi), the total length (N) of which is, as a common rule, exponentially greater than the suspected dimension (e.g., N > 10D2). Multidimensional vectors must then be made from the data series and subtracted from one another to form the correlation integral. A small number of jumps over adjacent data points (jumps of size ) are taken sequentially through the serial data to select data points, the values of which will then be used as coordinates to make the m-dimensional vectors (Vi). Note that the values of i run throughout the data length, from i = 1 to N (almost N, minus the size of m). A second set of identical vectors are then made (Vj), and, for a fixed value of i (e.g., i = 1, the first data point), all vector-difference lengths [Vj fixed V(i = 1)] are determined. Some of these differences will be large, but many (most) will be small. Then the next correlation integral (Vj fixed Vi, where i = 2) is made, and so on. Each of the sequential correlation integrals, from i = 1 to i = N m, is a function of time, because i is a function of time. To make the single D2 correlation integral, all possible vector difference lengths (i = 1 to N m) are added together, so the D2 algorithm is not time dependent. For the pointwise correlation dimension (D2i), which is time dependent, only those vector-difference lengths for each fixed value of i are used to make the correlation integral. If one records from a sine-wave generator that is suddenly replaced by, e.g., a Lorenz chaotic generator, then a nonstationarity will occur in the data stream. Formally, the statistical properties of the data will be different the moment after the replacement. The PD2i algorithm is used to address data nonstationarity, because this problem invariably arises in long epochs of biological data when the system uncontrollably changes state (e.g., when shifting from sleeping to waking, or from quiescence to alerting). The algorithm aims for nonstationary data, the vectors made from "points" of data that are stationary with respect to the "point" the reference vector is in (a "point" is a small strip of data, m × data points long) will contribute uncontaminated vector-difference lengths only to the small log R part of the scaling region.

James E. Skinner, Brian A. Nester, and William C. Dalsey 2000Nonlinear dynamics of heart rate variability during experimental hemorrhage in ketamine-anesthetized rats Am J Physiol Heart Circ Physiol 279: 1669-1678.