Talk:Point-biserial correlation coefficient
From Wikipedia, the free encyclopedia
This is wrong. The point biserial correlation coefficient is still important today in the field of Psychometrics. —Preceding unsigned comment added by 128.97.86.17 (talk • contribs) 08:57, 21 June 2006
I would like to suggest to delete the "external information" link, since the formula for r_pb on that page is wrong. That is, it makes exactly the mistake I am warning about. Kmir78 01:50, 18 January 2007 (UTC)
Do you have a source with the correct formula? MrArt 05:43, 18 January 2007 (UTC)
Glass and Hopkins (Statistical Methods in Education and Psychology (3rd Edition)) have the correct formula, but I could also easily put a derivation from the normal formula in here. I don't know any online source. Kmir78 04:48, 19 January 2007 (UTC)
[edit] Wrong formula?
It is not easy to just say that a formula is wrong if one doesn't know its meaning. Oe should at least distinguish between population and sample. In the population the coefficient is a parameter and in the sample an estimator of this parameter.
Parameter:
with
- μx = E(Y | X = x)
and
- p = P(X = 1)
Estimator
- Failed to parse (Cannot write to or create math output directory): r= \frac{M_1-M_0}{S_Y}\sqrt{\frac{N_1(n-N_1)}{n(n-1)}},
where M1,M0 are the sample means of Y for X=1 and X=0 and N1 the number of Y's with X=1 and SY is the usual sample standard deviation "with n-1 in the denominator".
If one takes for SY the sample standard deviation "with n the denominator". ", the formula reads:
Even in the case where SY is the usual sample standard deviation "with n-1 in the denominator" the formula:
gives a good estimator of ρ.Nijdam 14:07, 24 January 2007 (UTC)
Asymptotically, both formulas will yield the same result. But as far as I know, rpb is supposed to equal rXY in the sample. rXY is independent of the denominator n or n − 1 and therefore, rpb with 'some n' stuck in there somewhere instead of n − 1 will not equal rXY. Kmir78 06:40, 28 January 2007 (UTC)
[edit] significance
Information on how to determine the significance of a pb correlation would be useful. 208.253.150.100 02:02, 23 April 2007 (UTC)
I have included a note on assessing the significance of a pb correlation by relating it to Student's t distribution.
On the question of using n-1 rather than n as a denominator, the former should be used to estimate the population standard deviation and the latter to calculate the sample standard deviation. The pb coefficient however depends on the ratio between the standard deviations of X and Y. Clearly, if n-1 is used in both cases (and I don't see how you could justify using it for one but not the other), they just cancel out and the formula remains unchanged. In what sense then can either version of the formula be regarded as "wrong"?
I don't have a copy of the Glass and Hopkins book. What is the "right" formula which they give? Ted7815 (talk) 10:04, 28 March 2008 (UTC)