Surrogate data testing

Surrogate data testing[1] (or the method of surrogate data) is a statistical proof by contradiction technique and similar to parametric bootstrapping used to detect non-linearity in a time series. The technique basically involves specifying a null hypothesis H_0 describing a linear process and then generating several surrogate data sets according to H_0 using Monte Carlo methods. A discriminating statistic is then calculated for the original time series and all the surrogate set. If the value of the statistic is significantly different for the original series than for the surrogate set, the null hypothesis is rejected and non-linearity assumed.

The particular surrogate data method to be used is directly related to the null hypothesis. Usually this is similar to the following: The data is a realization of a stationary linear system, whose output has been possibly measured by a monotonically increasing possibly nonlinear (but static) function.[1] Here linear means that each value is linearly dependent on past values or on present and past values of some independent identically distributed (i.i.d.) process, usually also Gaussian. This is equivalent to saying that the process is ARMA type. In case of fluxes (continuous mappings), linearity of system means that it can be expressed by a linear differential equation. In this hypothesis, the static measurement function is one which depends only on the present value of its argument, not on past ones.

Many algorithms to generate surrogate data have been proposed. They are usually classified in two groups:[2]

The last surrogate data methods do not depend on a particular model, nor on any parameters, thus they are non-parametric methods. These surrogate data methods are usually based on preserving the linear structure of the original series (for instance, by preserving the autocorrelation function, or equivalently the periodogram, an estimate of the sample spectrum). Among constrained realizations methods, the most widely used (and thus could be called the classical methods) are:

  1. Algorithm 0, or RS (for Random Shuffle):[1][3] New data are created simply by random permutations of the original series. The permutations guarantee the same amplitude distribution than the original series, but destroy any linear correlation. This method is associated to the null hypothesis of the data being uncorrelated noise (possibly Gaussian and measured by a static nonlinear function).
  2. Algorithm 1, or RP (for Random Phases; also known as FT, for Fourier Transform):[1][4] In order to preserve the linear correlation (the periodogram) of the series, surrogate data are created by the inverse Fourier Transform of the modules of Fourier Transform of the original data with new (uniformly random) phases. If the surrogates must be real, the Fourier phases must be antisymmetric with respect to the central value of data.
  3. Algorithm 2, or AAFT (for Amplitude Adjusted Fourier Transform):[1][2] This method has approximately the advantages of the two previous ones: it tries to preserve both the linear structure and the amplitude distribution. This method consists of these steps:
    • Scaling the data to a Gaussian distribution (Gaussianization).
    • Performing a RP transformation of the new data.
    • Finally doing a transformation inverse of the first one (de-Gaussianization).
    The drawback of this method is precisely that the last step changes somewhat the linear structure.
  4. Iterative algorithm 2, or IAAFT (for Iterative Amplitude Adjusted Fourier Transform):[5] This algorithm is an iterative version of AAFT. The steps are repeated until the autocorrelation function is sufficiently similar to the original, or until there is no change in the amplitudes.

Many other surrogate data methods have been proposed, some based on optimizations to achieve an autocorrelation close to the original one,[6][7][8] some based on wavelet transform[9][10][11] and some capable of dealing with some types of non-stationary data.[12][13]

References

  1. 1 2 3 4 5 J. Theiler, S. Eubank, A. Longtin, B. Galdrikian, J. Doyne Farmer (1992). Physica D 58: 77. doi:10.1016/0167-2789(92)90102-S. Missing or empty |title= (help)
  2. 1 2 J. Theiler, D. Prichard (1996). Physica D 94 (4). doi:10.1016/0167-2789(96)00050-4. Missing or empty |title= (help)
  3. J.A. Scheinkman, B. LeBaron (1989). J. Business 62: 311 https://ideas.repec.org/a/ucp/jnlbus/v62y1989i3p311-37.html. Missing or empty |title= (help)
  4. A.R. Osborne, A.D. Kirwan Jr., A. Provenzale, L. Bergamasco (1986). Physica D 23: 75. doi:10.1016/0167-2789(86)90113-2. Missing or empty |title= (help)
  5. T. Schreiber, A. Schmitz (1996). "Improved Surrogate Data for Nonlinearity Tests". Phys. Rev. Lett. 77 (4): 635–638. doi:10.1103/PhysRevLett.77.635. PMID 10062864.
  6. T. Schreiber, A. Schmitz (2000). Physica D 142: 346. doi:10.1016/S0167-2789(00)00043-9. Missing or empty |title= (help)
  7. T. Schreiber (1998). "Improved Surrogate Data for Nonlinearity Tests". Phys. Rev. Lett. 80 (4): 635–638. doi:10.1103/PhysRevLett.77.635. PMID 10062864.
  8. R. Engbert (2002). Chaos Solitons Fractals 13 (1). doi:10.1016/S0960-0779(00)00236-8. Missing or empty |title= (help)
  9. M. Breakspear, M. Brammer, P.A. Robinson (2003). Physica D 182 (1). doi:10.1016/S0167-2789(03)00136-2. Missing or empty |title= (help)
  10. C.J. Keylock (2006). Phys. Rev. E 73: 036707. doi:10.1103/PhysRevE.73.036707. Missing or empty |title= (help)
  11. C.J. Keylock (2007). Physica D 225 (2). doi:10.1016/j.physd.2006.10.012. Missing or empty |title= (help)
  12. T. Nakamura, M. Small, Y. Hirata (2006). Phys. Rev. E 74: 026205. doi:10.1103/PhysRevE.74.026205. Missing or empty |title= (help)
  13. J.H. Lucio, R. Valdés, L.R. Rodríguez (2012). Phys. Rev. E 85: 056202. doi:10.1103/PhysRevE.85.056202. Missing or empty |title= (help)
This article is issued from Wikipedia - version of the Saturday, March 21, 2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.