Cross-correlation

In signal processing, cross-correlation is a measure of similarity of two series as a function of the lag of one relative to the other. This is also known as a sliding dot product or sliding inner-product. It is commonly used for searching a long signal for a shorter, known feature. It has applications in pattern recognition, single particle analysis, electron tomography, averaging, cryptanalysis, and neurophysiology.

For continuous functions f and g, the cross-correlation is defined as:

(f \star g)(\tau)\ \stackrel{\mathrm{def}}{=} \int_{-\infty}^{\infty} f^*(t)\ g(t+\tau)\,dt,

where $f^*$ denotes the complex conjugate of $f$ and $\tau$ is the lag.

Similarly, for discrete functions, the cross-correlation is defined as:

(f \star g)[n]\ \stackrel{\mathrm{def}}{=} \sum_{m=-\infty}^{\infty} f^*[m]\ g[m+n].

Visual comparison of convolution, cross-correlation and autocorrelation.

The cross-correlation is similar in nature to the convolution of two functions.

In an autocorrelation, which is the cross-correlation of a signal with itself, there will always be a peak at a lag of zero, and its size will be the signal power.

In probability and statistics, the term cross-correlations is used for referring to the correlations between the entries of two random vectors X and Y, while the autocorrelations of a random vector X are considered to be the correlations between the entries of X itself, those forming the correlation matrix (matrix of correlations) of X. This is analogous to the distinction between autocovariance of a random vector and cross-covariance of two random vectors. One more distinction to point out is that in probability and statistics the definition of correlation always includes a standardising factor in such a way that correlations have values between −1 and +1.

If $X$ and $Y$ are two independent random variables with probability density functions f and g, respectively, then the probability density of the difference $Y - X$ is formally given by the cross-correlation (in the signal-processing sense) $f \star g$ ; however this terminology is not used in probability and statistics. In contrast, the convolution $f * g$ (equivalent to the cross-correlation of f(t) and g(−t) ) gives the probability density function of the sum $X + Y$ .

Explanation

As an example, consider two real valued functions $f$ and $g$ differing only by an unknown shift along the x-axis. One can use the cross-correlation to find how much $g$ must be shifted along the x-axis to make it identical to $f$ . The formula essentially slides the $g$ function along the x-axis, calculating the integral of their product at each position. When the functions match, the value of $(f\star g)$ is maximized. This is because when peaks (positive areas) are aligned, they make a large contribution to the integral. Similarly, when troughs (negative areas) align, they also make a positive contribution to the integral because the product of two negative numbers is positive.

With complex-valued functions $f$ and $g$ , taking the conjugate of $f$ ensures that aligned peaks (or aligned troughs) with imaginary components will contribute positively to the integral.

In econometrics, lagged cross-correlation is sometimes referred to as cross-autocorrelation.^[1]

Properties

The cross-correlation of functions f(t) and g(t) is equivalent to the convolution of f^*(−t) and g(t). That is:

f\star g = f^*(-t)*g.

If f is a Hermitian function, then $f\star g = f*g.$

If both f and g are Hermitian, then $f \star g = g \star f$ .

$(f\star g)\star(f\star g)=(f\star f)\star (g\star g).$

Analogous to the convolution theorem, the cross-correlation satisfies

\mathcal{F}\{f\star g\}=(\mathcal{F}\{f\})^* \cdot \mathcal{F}\{g\},

where

\mathcal{F}

denotes the Fourier transform, and an asterisk again indicates the complex conjugate. Coupled with fast Fourier transform algorithms, this property is often exploited for the efficient numerical computation of cross-correlations (see circular cross-correlation).

The cross-correlation is related to the spectral density (see Wiener–Khinchin theorem).

The cross-correlation of a convolution of f and h with a function g is the convolution of the cross-correlation of f and g with the kernel h:

(f * h) \star g = h(-)*(f \star g).

Time series analysis

In time series analysis, as applied in statistics and signal processing, the cross-correlation between two time series describes the normalized cross-covariance function.

Let $(X_t,Y_t)$ represent a pair of stochastic processes that are jointly wide-sense stationary. Then the cross-covariance and the cross-correlation are given by

cross-covariance	$\gamma_{XY}(\tau) = \operatorname{E}[(X_t - \mu_X)(Y_{t+\tau} - \mu_Y)],$
cross-correlation	$\rho_{XY}(\tau) = \operatorname{E}[ (X_t-\mu_X)\,(Y_{t+\tau}-\mu_Y)]/(\sigma_{X} \sigma_{Y}),$

where $\mu_X$ and $\sigma_X$ are the mean and standard deviation of the process $(X_t)$ , which are constant over time due to stationarity; and similarly for $(Y_t)$ , respectively. $\operatorname{E}[\ ]$ indicates the expected value. That the cross-covariance and cross-correlation are independent of t is precisely the additional information (beyond being individually wide-sense stationary) conveyed by the requirement that $(X_t,Y_t)$ are jointly wide-sense stationary.

The cross-correlation of a pair of jointly wide sense stationary stochastic process can be estimated by averaging the product of samples measured from one process and samples measured from the other (and its time shifts). The samples included in the average can be an arbitrary subset of all the samples in the signal (e.g., samples within a finite time window or a sub-sampling of one of the signals). For a large number of samples, the average converges to the true cross-correlation.

Time delay analysis

Cross-correlations are useful for determining the time delay between two signals, e.g. for determining time delays for the propagation of acoustic signals across a microphone array.^[2]^[3] After calculating the cross-correlation between the two signals, the maximum (or minimum if the signals are negatively correlated) of the cross-correlation function indicates the point in time where the signals are best aligned, i.e. the time delay between the two signals is determined by the argument of the maximum, or arg max of the cross-correlation, as in

\tau_\mathrm{delay}=\underset{t}{\operatorname{arg\,max}}((f \star g)(t))

Normalized cross-correlation

For image-processing applications in which the brightness of the image and template can vary due to lighting and exposure conditions, the images can be first normalized. This is typically done at every step by subtracting the mean and dividing by the standard deviation. That is, the cross-correlation of a template, $t(x,y)$ with a subimage $f(x,y)$ is

\frac{1}{n} \sum_{x,y}\frac{(f(x,y) - \overline{f})(t(x,y) - \overline{t})}{\sigma_f \sigma_t}

where $n$ is the number of pixels in $t(x,y)$ and $f(x,y)$ , $\overline{f}$ is the average of f and $\sigma_f$ is standard deviation of f. In functional analysis terms, this can be thought of as the dot product of two normalized vectors. That is, if

F(x,y) = f(x,y) - \overline{f}

and

T(x,y) = t(x,y) - \overline{t}

then the above sum is equal to

\left\langle\frac{F}{\|F\|},\frac{T}{\|T\|}\right\rangle

where $\langle\cdot,\cdot\rangle$ is the inner product and $\|\cdot\|$ is the L² norm. Thus, if f and t are real matrices, their normalized cross-correlation equals the cosine of the angle between the unit vectors F and T, being thus 1 if and only if F equals T multiplied by a positive scalar.

Normalized correlation is one of the methods used for template matching, a process used for finding incidences of a pattern or object within an image. It is also the 2-dimensional version of Pearson product-moment correlation coefficient.

Nonlinear systems

Caution must be applied when using cross correlation for nonlinear systems. In certain circumstances, which depend on the properties of the input, cross correlation between the input and output of a system with nonlinear dynamics can be completely blind to certain nonlinear effects.^[4] This problem arises because some quadratic moments can equal zero and this can incorrectly suggest that there is little "correlation" (in the sense of statistical dependence) between two signals, when in fact the two signals are strongly related by nonlinear dynamics.

References

↑ Campbell; Lo; MacKinlay (1996). The Econometrics of Financial Markets. NJ: Princeton University Press. ISBN 0691043019.
↑ Rhudy, Matthew; Brian Bucci; Jeffrey Vipperman; Jeffrey Allanach; Bruce Abraham (November 2009). "Microphone Array Analysis Methods Using Cross-Correlations". Proceedings of 2009 ASME International Mechanical Engineering Congress, Lake Buena Vista, FL. doi:10.1115/IMECE2009-10798.
↑ Rhudy, Matthew (November 2009). "Real Time Implementation of a Military Impulse Classifier". University of Pittsburgh, Master's Thesis.
↑ Billings, S. A. (2013). Nonlinear System Identification: NARMAX Methods in the Time, Frequency, and Spatio-Temporal Domains. Wiley. ISBN 978-1-118-53556-1.

External links

Statistics

Descriptive statistics

Continuous data

Location	Mean arithmetic geometric harmonic Median Mode

Dispersion	Range Standard deviation Coefficient of variation Percentile Interquartile range

Shape	Variance Skewness Kurtosis Moments L-moments

Count data

Index of dispersion

Summary tables

Dependence

Statistical graphics

Data collection

Study design	Effect size Standard error Statistical power Sample size determination

Survey methodology	Sampling stratified cluster Opinion poll Questionnaire

Controlled experiments	Design control optimal Controlled trial Randomized Random assignment Replication Blocking Factorial experiment

Uncontrolled studies	Observational study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Confidence interval Testing hypotheses Power

Unbiased estimators	Mean unbiased minimum-variance Median unbiased

Biased estimators	Maximum likelihood Method of moments Minimum distance Density estimation

Parametric tests	Likelihood-ratio Wald Score

Specific tests

Z (normal) Student's t-test F Shapiro–Wilk Kolmogorov–Smirnov

Goodness of fit	Chi-squared G Sample source (Anderson–Darling) Sample normality (Shapiro–Wilk) Skewness / kurtosis normality (Jarque-Bera) Model comparison (Likelihood-ratio) Model quality (Akaike criterion)

Signed-rank	1-sample (Wilcoxon) 2-sample (Mann–Whitney U) 1-way anova (Kruskal–Wallis)

Bayesian inference

Correlation	Pearson product–moment Partial correlation Confounding variable Coefficient of determination

Regression analysis	Errors and residuals Regression model validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)

Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression

Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity

Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions

Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality

Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey

Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)

Frequency domain	Spectral density estimation Fourier analysis Wavelet

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time

Hazard function	Nelson–Aalen estimator

Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics

Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification

Social statistics	Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population Psychometrics

Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category
Portal
Commons
WikiProject

This article is issued from Wikipedia - version of the Monday, November 09, 2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.