Talk:Autocorrelation

From Wikipedia, the free encyclopedia

This article is within the scope of WikiProject Statistics, which collaborates to improve Wikipedia's coverage of statistics. If you would like to participate, please visit the project page.

Please add new talk topics at the bottom.

1 Old stuff not previously in a section
2 avoiding autocorrelation
3 Which autocorrelation are we talking about?
4 Reducing the complexity using the DFT
5 Error?
6 Re-editing this article to make it more coherent
7 Proposed conventions
8 Autocorrelation of a periodic function
9 Time series
10 Partial Autocorrelation
11 Not user-friendly
12 infinite variance?
13 Confidence bounds

[edit] Old stuff not previously in a section

There needs to be a laymans definition of this, with a real world, applicable example.

I have been a little hesitant to edit these articles partially because there are so many different notations, and i don't know which to use.

The autocorrelation function can be written as:

$R(\tau) \ R_f(t) \ \rho_f(t) \ R_{ff}(\tau)$

$\hat r_x(l) \ R_{xx}(k) \ \rho_k$

and time series can be written:

$x_n \ x[n] \ x(n) \$

Does Wikipedia have any standards for this?

I think in general, using more specific, accurate notation tends to muddle the understanding for someone first being introduced to a subject, which is also the type of person probably reading Wikipedia.

In other words,

$R(\tau) = \int f(t)f(t+\tau) \, dt$

is preferable to

$R_f(\tau) = \lim_{T \to \infty} {1 \over 2T} \int_{-T}^T f^* (t)f(t+\tau) \, dt$

How much detail should we use? Should we use the simplified version to introduce the topic, and then show the more detailed versions?

Also, this article started out in the world of probability, and I have converted it to signal processing. The two should both be included in the same article. - Omegatron 16:56, May 25, 2004 (UTC)

From the article:

The autocorrelation definition then becomes

R(j) =	∑	x_nx_{n − j}
	n

which is the definition of autocovariance.

I have some doubts about this claim. The autocorrelation is a function (of j) while the autocovariance is a number. I think what is true is that in this case the autocorrelation at j = 0 is the autocovariance.

Secondly, I have not seen definitions of the autocorrelation where the mean is substracted. Can someone confirm whether this is a common practice? -- Pgabolde 16:29, 17 Dec 2004 (UTC)

The web and the autocovariance article both seem to think that autocovariance is a function.

I believe there are many variations on the autocorrelation formula, with different weightings, normalizations, etc. and they are all still called autocorrelation. This (kind of a "vertical offset"?) seems to be just another variation. - Omegatron 18:54, Dec 17, 2004 (UTC)

I can confirm that this is common practice in a wide variety of fields. To a mathematician the autocorrelation with the mean subtracted and divided by the variance is the standard definition (it has the useful property of being in the range [-1,1]. See, for example Priestley's classic "Spectral Analysis and Time Series" (1982) London New York Academic Press.

Autocovariance is a function not a number -- I do not know where the idea that is is a single number has come from. I can think of no field where that would be the case.

--Richard Clegg 14:46, 6 Feb 2005 (UTC)

This is rather tricky --- to me, what you have defined is autocovariance not autocorrelation although I'm aware these are sometimes used interchangably and many people use the formula given on this page. Also, should we not formally define the autocorrelation in terms of expectation? For a process X_t (either discrete or continuous) then

$R(k) = \frac{E [(X_t - \mu)(X_{t+k} - \mu)]}{\sigma^2}$

where μ is the mean E[X] and σ² is the variance. This is nicer mathematically since it is normalised to the range [-1,1]. It should also be noted that the autocovariance (indeed the mean and variance) are not necessarily defined unless the process is weakly stationary. Another reason for such an edit would be to bring this page into line with the definition for correlation.

Richard Clegg

I have tried to reconcile the differing definitions of autocorrelation in different disciplines. I hope nobody thinks it is arrogant of me to put the mathematical definition first.

Richard Clegg

Is o a common symbol for convolution? I've never seen it before... - Omegatron 14:45, Feb 6, 2005 (UTC)

Maybe you meant a circle?

$R_f(\tau) = f^*(-\tau) \circ f(\tau) = \int_{-\infty}^{\infty} f(t+\tau)f^*(t)\, dt$

Not that I have seen that before, either... - Omegatron 14:51, Feb 6, 2005 (UTC)

I did mean the circle but wasn't sure how to get that in the non math environment -- would be grateful if you could change it (thanks). It is relatively commonly used -- although not as commonly as the * but the * was already used in the equation to designate complex conjugate hence, I hoped to avoid confusion.

--Richard Clegg 14:57, 6 Feb 2005 (UTC)

yeah, i figured. added. -Omegatron 15:43, Feb 6, 2005 (UTC)

[edit] avoiding autocorrelation

Generalized Least Squares-regression can be applied on data in order to avoid violation OLS-assumption of non-autocorrelation.

[edit] Which autocorrelation are we talking about?

The article has two sections that talk about "autocorrelation" without specifying whether they refer to the statistics definition or the signal processing definition. --Smack (talk) 22:35, 27 May 2005 (UTC)

Are they not the same thing? - Omegatron 23:29, May 27, 2005 (UTC)

What bothers me is the use of the two different signal processing definitions without specifying which is which. Several of the "properties" , for example the autocorrelation of white noise and the Wiener-Khinchin theorem, hold only for the second definition (the limit as T tends to infinity of 1 over T of the integral). I find this confusing. --Assemblany 07:17, 28 March 2007 (UTC)

Can you be more specific about what problems you see, in what sections? Smack's remarks from 2 years ago don't have much relation to the current article. Dicklyon 16:28, 28 March 2007 (UTC)

[edit] Reducing the complexity using the DFT

It is possible to reduce the complexity of the Autocorrelation from $O (n 2)$ to $O (n l o g n)$ using the Wiener-Khinchin theorem. This theorem is mentioned in the article but this interesting property is not.

I was thinking about adding this sentence right after the description of the Wiener-Khinchin theorem:

which allows to compute the discrete autocorrelation for zero-centered signals using a discrete Fourier transform, and hence reducing the complexity from

O (n 2)

O (n l o g n)

$R = \mathcal{F}^{-1}(|\mathcal{F}(x)|).\,$

where $\mathcal{F}^{-1}$ is the inverse discrete fourier transform, $\mathcal{F}$ is the discrete fourier transform and || is the Complex Modulus.

But I don't like very much the style. Anyone has a suggestion? --Nova77 01:49, 17 December 2005 (UTC)

[edit] Error?

Shouldn't text below first image of "The Blue Danube" say Fourier transformation or something similar? It is not "original signal" for sure.

What makes you certain it's not the original signal? I'm not an audio engineer but my first guess would be that it is the original signal -- it's the right length. I just ask because I have no particular reason to believe it's not the original signal. It's certainly not the fourier transform -- it's in the time domain not the frequency domain for a start. --Richard Clegg 10:27, 31 May 2006 (UTC)

That autocorrelation function also makes no sense, which I why I removed the whole thing. If there's a relationship between those two figures, or between either figure and the Blue Danube music, it's certainly not apparent. Neither looks like what it says it is. Dicklyon 16:11, 1 June 2006 (UTC)

[edit] Re-editing this article to make it more coherent

I think this article has got a bit "wooly" due to it being edited by people from different backgrounds. I know that a lot of different definitions of autocorrelation are used and different fields have different ideas about it but I think we're missing the commonality here and hence the article is very confusing. Does anyone have any suggestions as to how to make this more coherent? --Richard Clegg 09:48, 1 June 2006 (UTC)

I agree. I worked around to it when trying to make LTI system theory correct; lots of things got touched. I'm still trying to sort out what the various definitions of autocorrelation and autocovariance are in different fields. The article was originally written with the expectation based approach, which seems to be most common, but the formulas for computing an ACF from a given signal, or estimating the ACF of a process from a sample, are more of the integral or sum form; the statistician would say those are estimators, not definitions. Sorry for the wool. I hope someone can help with a rewrite that takes correctness into account better than the old one did. Dicklyon 16:07, 1 June 2006 (UTC)

Thanks. I think you have helped the article. Perhaps I should try to write something which goes between the expectation based formula and the commonly used estimators. The problem is I would do that from a stats perspective which is not necessarily helpful. I imagine a lot of people who want to use the ACF are in engineering or science. --Richard Clegg 09:58, 2 June 2006 (UTC)

[edit] Proposed conventions

I propose the following conventions to help to distinguish various definitions. I have to say that these conventions are a summary of what I've seen in engineering (optical) and signal processing field, but I think would be good also for the "mathematician's world". The reason I'm posting here to stimulate constructive discussion, to finally reach a common agreement.

[edit] For two stochastic variables X and Y

Correlation:

$R(X,Y) = E (X Y)\,$

Covariance:

$\mathrm{cov}(X,Y) = E((X-\mu_{X})(Y-\mu_{Y})) = E(X Y) - \mu_{X} \mu_{Y}\,$

Correlation coefficient

$\rho_{X,Y} = \frac{\mathrm{cov}(X,Y)}{\sigma_{X} \sigma_{Y}} = \frac{E(XY)-E(X)E(Y)}{\sqrt{E(X^2)-E^2(X)}~\sqrt{E(Y^2)-E^2(Y)}}$

look Correlation

[edit] For a stochastic continuous process X(t)

Auto-correlation:

$R_X(t_1,t_2) = R(X(t_1), X(t_2)) = E( X(t_1) X(t_2) )\,$

Auto-covariance:

$C_X(t_1,t_2) = \mathrm{cov}(X(t_1), X(t_2)) = E( (X(t_1)-\mu_{X}(t_1)) \cdot (X(t_2)-\mu_{X}(t_2)) ) \,$

Degree of correlation (also know in optics as "degree of coherence"):

$\rho_{X}(t_1,t_2) = \frac{\mathrm{cov}(X(t_1),X(t_2))}{\sigma_{X}(t_1) \sigma_{X}(t_2)} = \frac{ E(X(t_1)X(t_2)) - E(X(t_1)) \cdot E(X(t_2)) } { \sqrt{E(X^2(t_1)) - E^2(X(t_1))} ~ \sqrt{ E(X^2(t_2)) - E^2(X(t_2) } }$

[edit] If the process is second-order stationary

Auto-correlation:

$R_X(\tau) = R_{X}(t, t - \tau) = E( X(t) X(t -\tau) )\,$

Auto-covariance:

$C_X(\tau) = \mathrm{cov}(X(t), X(t-\tau)) = R_{X}(\tau) - \mu_{X}^2\,$

Degree of correlation (also know in optics as "degree of coherence"):

$\rho_{X}(\tau) = \frac{\mathrm{cov}(X(t),X(t-\tau))}{\sigma_{X}^2} = \frac{ R_{X}(\tau) - \mu_{X}^2 } { \sigma_{X}^2 }$

All the above are directly extensible to the discrete case.

I add also a consideration. Expressing the formulas in terms of the mean E() gives the advatage that one can see the logical correspondence with the determinisc signal for which all the above quantities can be defined, but the mean operation is done directly in the temporal doman without using the probability density function.

~ TheNoise

[edit] A response

That looks like a fine coherent set of terminology. But lets look at exactly where the article stands and what you are proposing the change.

First, in the statistics section, the definitions are discrete-time and they differ in terms of what name goes with removal or means or not. I'm no expert on that field, but if those are how they use the terms and symbols, then we shouldn't prescribe something diffferent for the sake of consistency with engineering; rather we should just describe in terms of the terms that the field uses, while pointing out the differences to avoid confusion. Otherwise, a stat guy will come along and change it all back to his way.

Second, in engineering, I've usually seen, and always preferred, the double-subscript notation to make the "auto-" and "cross-" symbologies mutually consistent. If you drop back to a single-subscript, it is no longer recognizable as a special case of the same definition. But we are only weakly linked (in the lead and one other place) to cross-corrrelation, which should be played up and made consistent if possible.

I checked some books and found the "Cov" is usually capitalized (in my small sample), and it and "E" are set in roman type (in \mathrm{}).

I would be in favor of keeping equations as simple as possible, but not leaving out anything important. For example, define the mean and standard deviation in the context of the Expectation notation, and don't use the expanded form of the definitions here (and put more words around them to say what the equations say):

$\mu_{X} = \mathrm{E}(x)\,$

$\mathrm{Cov}(X,Y) = \mathrm{E}((X-\mu_{X})(Y-\mu_{Y})) \,$

$\sigma_X = \sqrt{\mathrm{Cov}(X, X)}$

$\rho_{X,Y} = \frac{\mathrm{Cov}(X,Y)}{\sigma_{X} \sigma_{Y}}$

The problem we have here is that there is no consensus in the literature. To me an article about "autocorrelation" *must* be about what autocorrelation means in the real world. Unfortunately, it is used inconsistently and this article has to reflect that. If we were writing a paper, thesis or article, we would be free to simply write our definitions and use them consistently. Here, I think we have by necessity to do something different and to explain what is likely to be meant by autocorrelation when someone sees it in the literature. This means we must acknowledge that the word has a number of meanings. For what it is worth I have seen cov and Cov but never COV. I have seen E in roman type, in italics and in \mathbb (the latter being my preference when I write papers).

While I have sympathy with the idea of trying to make a consistent set of definitions, I think we must by necessity do something different. --Richard Clegg 18:21, 9 September 2006 (UTC)

I think I sort of half agree. Certainly we must reflect what the real world terminology is. But we can do that best by adopting one or a few internally consistent sets of notation that we think are most common, and then mentioning the differences. The biggest differences are the definitional differences between the stat and eng fields, and those sections need to each be made compatible with their fields; internally, though, they should be consistent. As to cov, Cov, E, etc., we just need to pick a style. The one I mentioned was based on a quick check of a few books, but I'm flexible if someone shows that some other conventions are more common (not just personally preferred). Dicklyon 19:24, 9 September 2006 (UTC)

I agree with Dycklon here. Even if there are multiple real world conventions we should keep one and then mention the differences that can be encountered. There is one problem of definitions: the auto-covariance is sometimes defined as as the degree of correlation (using the terminology of the proposed convention). But if we define both

R x

and

ρ X

we can distinguish the two entities say also that sometime

R x

is defined as

ρ X

. It's more clear IMHO than define

R x

in two different manner and also say that one of the two is also known as degree of correlation. Moreover, it seems to me that the big ambiguity is more verbal that symbolical. In fact the coefficient of correlation (usually indicate with

ρ

) is also called correlation (thus causing ambiguity), while when one refers to the "raw" un-normalized correlation he calls it with

R

. We can at least agree with this convention of symbols and then explain with the words that these entities are called in different manners. Regarding the typographical convention we should simply keep try to be coherent among some related articles. For example I prefer to use the {} for the $E\{\cdot\}\,$ operator. This would render the formulas a bit more difficult to edit (we should write \{ and \}) but it'll increase the readability, IMHO. For the $E\,$ itself, it is indifferent to me to use the roman character $\mathrm{E}\,$ or the bold one $\mathbf{E}\,$ , but also this formatting would require more editing. For the covariance (cov) I used simply the convention used in the correlation page (although I forgot to put it in roman font). ~ TheNoise 14:12, 10 September 2006 (UTC)

I am a graduate student in engineering and have been pouring over a dozen texts who all seem to define autocorrelation a bit differently and frankly, sloppily. So I am both somewhat pleased that there is so much discussion about the very topic here, and also dismayed that is has not been made more clear. In my opinion, the most clear treatment is in Bendat and Piersol's Random Data text. I think the definitions and notation are very similar to what was suggested above. A clear distinction is made between the autocorrelation and the correlation coefficient (the latter normalized by the variance), which is not at all clear in the current Wikipedia entry. I will note, however that B&P define autocorrelation as an expected value, and therefore normalized by the length of the signal, while most signal processing texts (and indeed MATLAB) include no such normalization. This has been the cause of much confusion.--Vschmidt (talk) 14:59, 29 April 2008 (UTC)

[edit] Autocorrelation of a periodic function

There's a statement on the page that the autocorrelation of a periodic function with itself is again periodic with the same period. That can't be correct. The integral doesn't converge.

This is a problem with the variety of different definitions on this page. Using the definition in the statistics section it is an invalid question since the periodic function is not second-order stationary so you must use the two parameter ACF. The signal processing definition assumes the signal has an integral which converges so that most (all) periodic signals would not be so integrable. In spirit though, the assertion is correct. --Richard Clegg 17:36, 5 October 2006 (UTC)

I think you got that a bit wrong. The statistics definition works fine, as the expected value of a finite product. The integral definition is problematic, can it too can be OK if you allow interpretation of the integral in terms of delta functions. Bottom line, if the autocorrelation exists, it is periodic. You might prefer conditions that say it doesn't exist when the definitions don't lead to finite values though. Dicklyon 17:47, 5 October 2006 (UTC)

I see what you mean about stationarity, though. In some case the expectation will be OK, like if the phase is random and the process is ergodic. The integral method, however, averages over phases, so doesn't care about stationarity so strictly; but it lead to delta functions. Isn't math wonderfully nasty? Dicklyon 17:52, 5 October 2006 (UTC)

The statistics definition with one parameter is only valid for a second order stationary process. If the process is not second order stationary then E[X(t)X(t+T)] will be a function of t as well as T. I'm not sure what you mean by "if the phase is random" in the case of the autocorrelation of a periodic function. --Richard Clegg 22:00, 5 October 2006 (UTC)

What I mean is that if the phase is a random variable, and each sample function from the process has a different phase, with a uniform distribution over all phases, then the periodic process is second-order stationary. That is, the expected value of the product of two points with a certain time difference includes an averaging over all phases, so doesn't depend on the two points, just their difference. Right? Dicklyon 22:33, 5 October 2006 (UTC)

Hmm... I can see your point. It would be "in the spirit" of ACF but would lead to some strange quirks in the mathematics. I'm happier saying "undefined" but it seems like there are enough definitions of ACF out there that one will fit. --Richard Clegg 23:18, 5 October 2006 (UTC)

Yes, and another one that would work is the limit as T goes to infinity of 1/T times the integral of the (expected) product of a segment of length T times a shift of itself. For an ergodic stationary process it will give the same result, and for the periodic signal it has no difficulty. Since the process is ergodic you can do it on one sample function instead of relying on an expected value. I haven't seen that as a definition per se, but maybe it is, somewhere. Dicklyon 00:56, 6 October 2006 (UTC)

Here's a good page that explains what I was just talking about, the ergodic hypothesis applied to ACF: [1] Dicklyon 01:00, 6 October 2006 (UTC)

And here's one that defines the ACF of an ergodic process in the way I described above: [2] Dicklyon 01:02, 6 October 2006 (UTC)

But you were talking about a non-stationary and hence non-ergodic process.--Richard Clegg 01:21, 6 October 2006 (UTC)

A periodic process can be stationary and ergodic, or not, depending on whether its phase is uniformly distributed over its period. But even if that's not the case, the limit of the integral will converge to something that corresponds to what is usually meant by the ACF of such a function. That is, sin(t) is not stationary, but sin(t+phi) for an appropriately distributed random variable phi, which is a constant in any given sample function from the process, is stationary and ergodic, if I understand this stuff right; I took a course on ergodic processes from Bob Gray about 30 years ago, but it went in one ear and out both. Dicklyon 02:12, 6 October 2006 (UTC)

Ah... got you. So it is periodic but you are unsure at what point in its phase it begins. Clever, that case had not occurred to me. Such a process could be both stationary and ergodic, you are absolutely correct. I hadn't appreciated what you meant about phase even though you raised it early on. Hmm... I wonder if we could include the sin(t+phi) as an illustrative example somehow? --Richard Clegg 08:13, 6 October 2006 (UTC)

I think it opens a can of worms that I'm not so sure about. Let's don't do it unless we can find a text with such a treatment, so that whatever we say is verifiable. Instead, I put the limit definition that works, even if it's not random phase. Dicklyon 15:02, 6 October 2006 (UTC)

[edit] Time series

According to this page,

A time series is a sequence of observations which are ordered in time (or space). ... There are two kinds of time series data:

1. Continuous, where we have an observation at every instant of time, e.g. lie detectors, electrocardiograms. We denote this using observation X at time t, X(t).

2. Discrete, where we have an observation at (usually regularly) spaced intervals. We denote this as Xt.

If I interpret this correctly, the statistical definition in our article does not apply to time series, since a sequence of observations is a set of definite data, not a random process, even if it is assumed to have been produced by a random process. So the expectation operator does not apply. Furthermore, we had "discrete time series and processes" which seems like an inappropriate and non-parallel way to divide things up. So I changed these things. Please react, preferably with citations that clear up exactly how this issue is usually treated in statistics. Dicklyon 15:51, 18 October 2006 (UTC)

[edit] Partial Autocorrelation

I believe this should be a section within this page. Canking 18:33, 25 November 2006 (UTC)

I just looked up what that is, and I think it rates a separate article. Dicklyon 05:02, 26 November 2006 (UTC)

Unfortunately I don't have the expertise to write the article, sorry. Perhaps somebody else will write it Canking 10:55, 8 December 2006 (UTC)

[edit] Not user-friendly

Definitions not user friendly (not very accessible to the layman, in particular); it seems geared more for the intermediate to advanced mathematical student. AppleJuggler 03:17, 23 January 2007 (UTC)

What would a non-mathematical layman do with a definition of autocorrelation? If we produced a simpler definition that he could understand, but which was not actually correct, would that be an improvement? Dicklyon 06:24, 23 January 2007 (UTC)

Good point. AppleJuggler 07:06, 6 February 2007 (UTC)

Eaton and Eaton: "LabTutor" uses an easy to understand example of wind velocity measurements on a gusty day. If we measure two samples one millisecond apart we expect them to be the same and indeed the autocovariance value (which this article equates with autocorrelation even though technically speaking it isn't) is close to 1. However, with a time delay of five minutes between the two measurements, there is no similarity, so the autocovariance is close to 0. The autocovariance function then plots the values of the autocovariance as a function of the time intervals between the two measurements. The book goes on to show a sample autocovariance plot, elaborating that from this you can deduct the time scale of the wind. I think such an example would facilitate the laymans understanding of the term autocorrelation/autocovariance. I am not sure to what extend I am allowed to cite or even quote my source so I am reluctant to insert this into the main article. Jonemo (talk) 12:15, 15 April 2008 (UTC)

Please Please make this article user friendly. im a 3rd year student and Im having difficulty understanding this, partly because there is an unnecessarily excessive usage of unnecessary 'tough sounding' terminology that hides the real meaning and structure needs improvement. thanks a lot and great job guys! 13 feb 2008

Why don't you find a book or other source whose explanation you find to be more user friendly, and menion here as an example of how you think it could be done better. Dicklyon (talk) 06:42, 13 February 2008 (UTC)

[edit] infinite variance?

I'm a little confused by the edit summary just left. The very definition of gaussian white noise is that the marginal distribution of variates at a given time have a gaussian distribution. If they have a gaussian distribution, they have finite variance. How is it you go from a finite spectrum to an infinite marginal variance? Lunch 19:44, 19 May 2007 (UTC)

See these books. Your definition may be correct and consistent with infinite variance, if by "variates" you mean numbers that you can get by integrals of the process times some kernel. But a sample is not an integral, for a continuous-time white noise process, the variance of "samples" if they could be defined would be infinite, like the variance of the process, which is the variance per unit bandwidth times an infinite bandwidth. If you define sampling as the limit as the kernel width goes to zero, of the integral a kernel (a distribution) times the process value, then that limit does not exist for a white-noise process. So you can't sample it. Dicklyon 19:59, 19 May 2007 (UTC)

[edit] Confidence bounds

It would be nice if someone explained the meaning of straight blue lines on the ACF plot (confidence bounds). —Preceding unsigned comment added by 85.178.233.98 (talk) 13:27, 10 December 2007 (UTC)