Talk:Data dredging

From Wikipedia, the free encyclopedia

Socrates This article is within the scope of the WikiProject Philosophy, which collaborates on articles related to philosophy. To participate, you can edit this article or visit the project page for more details.
??? This article has not yet received a rating on the quality scale.
Mid This article has been rated as mid-importance on the importance scale.

Support merge - Data dredging is probably the best title (comment by John Quiggin, forgot to sign).

Do not merge - Bias through incorrect data-snooping is essentially different from the problem created by testing a hypothesis with the same data-set. For example, data-snooping bias may occur when dealing with an highly fluctuating set of data where every removal of a datapoint results in a new extreme, and so on. (Pc100935 11:59, 18 December 2006 (UTC))

Splitting a data-set parts A and B and then using part B to test a hypothesis formulated using part A is not recommended since these datasets can be highly correlated. Best practice is to formulate a hypothesis before looking at the data and use the data to test the hypothesis. If a hypothesis is based on existing data it should only be tested by collecting new independent data. (Pc100935 11:59, 18 December 2006 (UTC))