Spurious correlation

From Wikipedia, the free encyclopedia

[edit] Statistics

In statistics a correlation between data exists because of a statistical fluke (rather than true correlation (not to be confused with causality as correlation does not prove causality which can exist between the two correlated factors or be stemming from one or more unknown factors affecting the two being analyzed)), it is called a spurious correlation. One of the measures of goodness of fit is the R2 statistic. Spurious correlation occurs when the sample size is small, and the R2 metric is misleading i.e. there is a high likelihood that the fit occurred purely by chance.



[edit] Psychology

In psychology a spurious correlation in its simplest form refers to a situation in which the existence of a misleading correlation between 2 variables is produced through the operation of a third causal variable.

In other words we find a correlation between A and B. So we have three possible relationships:

A causes B,

B causes A

OR

C causes both A and B.


The latter is a spurious correlation.


This leads to a common statement by professors "Correlation isn't causality"


a spurious correlation is an apparent, although false, association between two (or more) variables caused by some other variable.

Determining spurious correlation entails effort, involving a process known as control (holding constant all relevant variables except one in order to clearly see its effect).