Talk:Regression toward the mean

From Wikipedia, the free encyclopedia

I'm not sure this page explains "regression to the mean" very well.

I agree; it's lousy. Michael Hardy 23:26, 2 Feb 2004 (UTC)
The first time I read it, I thought it was lousy. The second time I read it, it was closer to mediocre.

F. Galton's use of the terms "reversion" and "regression" described a certain, specific biological phenomenon, and it is connected with the stability of an autoregressive process: if there is not regression to the mean, the variance of the process increases over time. There is no reason to think that the same or a similar phenomenon occurs in, say, scores of students, and appealing to a general "principle of regression to the mean" is unwarranted.

I completely disagree with this one; there is indeed such a general principle. Michael Hardy 23:26, 2 Feb 2004 (UTC)

I guess I could be convinced of the existence of such a principle, but something more than anecdotes is needed to establish that.

Absolutely. A rationale needs to be given. Michael Hardy 23:26, 2 Feb 2004 (UTC)

Regression to the mean is just like normality of natural populations: maybe it's there, maybe it isn't; the only way to tell is to study a lot of examples.

No; it's not just empirical; there is a perfectly good rationale.

I'll revise this page in a week or two if I don't hear otherwise; the page should summarize Galton's findings,

I don't think regression toward the mean should be taken to mean only what Galton wrote about; it's far more general. I'm really surprised that someone who's edited a lot of statistics articles here does not know that there is a reason why regression toward the mean in widespread, and what the reason is. I'll return to this article within a few days. Michael Hardy 23:26, 2 Feb 2004 (UTC)

connect the biological phenomenon with autoregressive stability, and mention other (substantiated) examples. Wile E. Heresiarch 15:00, 2 Feb 2004 (UTC)


In response to Michael Hardy's comments above --

  1. Perhaps I overstated the case. Yes, there is a class of distributions which show regression to the mean. (I'm not sure how big it is, but it includes the normal distribution, which counts for a lot!) However, if I'm not mistaken there are examples that don't, and these are by no means exotic.
  2. There is a terminology problem here -- it's not right to speak of a "principle of r.t.t.m." as the article does, since r.t.t.m. is a demonstrated property (i.e., a theorem) of certain distributions. "Principle" suggests that it is extra-mathematical, as in "likelihood principle". Maybe we can just drop "principle".
  3. I had just come over from the Galton page, & so that's why I had Galton impressed on my mind; this article should mention him but need not focus on his concept of regression, as pointed out above.

regards & happy editing, Wile E. Heresiarch 22:57, 3 Feb 2004 (UTC)

It's nothing to do with Normality - it applies to all distributions.

Johnbibby 22:11, 12 December 2006 (UTC)

--

The opening sentence "of related measurements, the second is expected to be closer to the mean than the first" is obviously wrong.Jdannan 08:17, 15 December 2005 (UTC)


Small change to the historical background note.

Contents

[edit] Principle of Regression

I agree that the "principle" cannot hold for all distributions, but only a certain class of them, which includes the normal distributions. I think R. A. Fisher found an extension to the case where the conditional distribution is Gaussian but the joint distribution need not be. In any case, in the section on "Mathematical Derivation", it should be made clear that the specific *linear* regression form E[Y|X]=rX is valid only when Y and X are jointly Gaussian. Of course there are some other examples such as when Y and X are jointly stable but that is another can of worms. The overall question might be rephrased: given two random variables X and Y of 0 mean and the same variance, for what distributions is |E[Y|X]| < |X| almost surely?

I will make some small edits to the "mathematical derivation" section.

[edit] Intelligence

Linda Gottfredson points out that 40% of mothers having IQ of 75 or less also have children whose IQ is under 75 - as opposed to 7% of normal or bright mothers.

Fortunately, because of regression to the mean, their children will tend to be brighter than they are, but 4 in 10 still have IQs below 75. (Why g matters, page 40)

What do we know about IQ or g and regression toward the mean? Elabro 18:55, 5 December 2005 (UTC)

Your question seems to contain its own answer. Taking everything at face value, and brushing aside all the arguments (whether g exists, whether it means anything, whether Spearman's methodology was sound, whether imprecise measurements of g should be used to make decisions about people's lives, etc.) what the numbers you cite mean is simply that IQ measurements are mixtures of something that is inherited and something that is not inherited.
Intelligence, as measured by IQ score, is just about 50% heritable.
Regression doesn't have to do with the child, in this case, it has to do with the mother. The lower the mother's IQ measurement, the further away from the mean it is. The further away from the mean it is, the more likely that this was not the result of something inherited but of some other factor, one which won't be passed on to the child, who will therefore be expected to have higher intelligence than the mother.
This isn't obvious at first glance but it is just plain statistics. Our article on regression doesn't have any diagrams, and one is needed here. Dpbsmith (talk) 20:26, 5 December 2005 (UTC)
Thanks for explaining that. It's clear to me now, and I hope we can also make it clear to the reader.
By the way, I'm studying "inheritance" and "heritage" and looking for factors (such as genes) that one cannot control, as well as factors (such as parenting techniques, choice of neighborhood and school) that one can control - and how these factors affect the academic achievement of children. This is because I'm interested in Educational reform, a topic that Wikipedia has long neglected. Elabro 22:10, 5 December 2005 (UTC)

[edit] Massachusetts test scores

HenryGB has twice removed a reference supporting the paragraph that gives MCAS "improvement" scores as a good example of the regression fallacy. He cites http://groups.google.com/group/sci.stat.edu/tree/browse_frm/thread/c1086922ef405246/60bb528144835a38?rnum=21&hl=en&_done=%2Fgroup%2Fsci.sta which I haven't had a chance to review. At the very least, it is extremely inappropriate to remove the reference supporting a statement without also removing the statement.

We need to decide whether this is a clear case of something that is not regression, in which case it doesn't belong in the article; or whether it's the usual case of a somewhat murky situation involving real-world data that isn't statistically pure, in a politically charged area, where different factions put a different spin on the data. If it's the latter, then it should go back with qualifying statements showing that not everyone agrees this is an actual example of regression. As I say, I haven't read his reference yet, so I don't know yet which I think. I gotta say that when I saw the headlines in the Globe about how shocked parents in wealthy towns were that their schools had scored much lower than some troubled urban schools on these "improvement" scores, the first thing that went through my mind was "regression." Dpbsmith (talk) 12:04, 31 March 2006 (UTC)

[edit] Poorly written

The introduction is poorly written and fairly confusing.