Talk:Student's t-test
From Wikipedia, the free encyclopedia
Contents |
[edit] Calculations
I don't suppose anyone wants to add HOW TO DO a t-test??
- That seems to be a deficiency of a fairly large number of statistics pages. The trouble seems to be that they're getting written by people who've gotten good grades in statistics courses in which the topics are covered, but whose ability does not exceed what that would imply. Maybe I'll be back.... Michael Hardy 22:04, 7 June 2006 (UTC)
- If I have time to learn TeX, maybe I'll do it. I know the calculations, it's just a matter of getting Wikipedia to display it properly. Chris53516 16:17, 19 September 2006 (UTC)
- I uploaded some crappy images of the calculations. I don't have time to mess with TeX, so someone that's a little more TeX-savvy (*snicker*) can do it. Chris53516 16:42, 19 September 2006 (UTC)
- User:Michael Hardy converted two of my crappy graphics to TeX, and I used his conversion to do the last. So there you have it, calculations for the t-test. Chris53516 18:21, 19 September 2006 (UTC)
-
- Great. Now, could someone explicit the formulla? I assume than N is the sample size, s the standard deviation, but what is the df1/dft? ... Ok I found the meaning of df. I find the notation a bit comfusing. it looks a lot like the derivative of a function... is dft thez degrees of freedom of the global population?
-
- What do you mean: "could someone explicit the formulla (sic)" (emphasis added)? N is the sample size of group 1 or group 2, depending on which number is there; s is the standard deviation; and df is degress of freedom. There is a degree of freedom for each group and the total. The degrees of freedom for each group is calculated by taking the sample size and subtracting one. The total degrees of freedom is calculated by adding the two groups' degrees of freedom or by subtracting the total sample size by 2. I will change the formula to reflect this and remove the degrees of freedom. Chris53516 13:56, 11 October 2006 (UTC)
[edit] independent samples
Should 'assumptions' include the idea that we assume all samples are independent? This seems like a major omission.
[edit] history unclear
"but was forced to use a pen name by his employer who regarded the fact that they were using statistics as a trade secret. In fact, Gosset's identity was unknown not only to fellow statisticians but to his employer - the company insisted on the pseudonym so that it could turn a blind eye to the breach of its rules." What breach? Why didn't the company know? If it didn't know, how is it insisting on a pseudonym?
[edit] Welch (or Satterthwaite) approximation?
"As the variance of each group is different, the Welch (or Satterthwaite) approximation to the degrees of freedom is used in the test"...
Huh?
--Dan|(talk) 15:00, 19 September 2006 (UTC)
[edit] Table?
This article doesn't mention the t-table which appears to be necessary to make sense of the t value. Also, what's the formula used to compute such tables? —Ben FrantzDale 15:07, 12 October 2006 (UTC)
- I'm not sure which table you are referring to or what you mean by "make sense of the t value". Perhaps you mean the table for determining whether t is statistically significant or not. That would be a statistical significance matter, not a matter of just the t-test. Besides, that table is pretty big, and for the basic meaning and calculation of t, it isn't necessary. Chris53516 15:24, 12 October 2006 (UTC)
- I forgot. The calculation for such equations is calculus, and would be rather cumbersome here. It would belong at the statistical significance article, anyway. That, and I don't know the calculus behind p. Chris53516 15:26, 12 October 2006 (UTC)
-
- Duah, Student's t-distribution has the answer to my question. —Ben FrantzDale 14:55, 13 October 2006 (UTC)
-
-
- Glad to be of not-so-much help. :) Chris53516 15:11, 13 October 2006 (UTC)
-
[edit] Are the calculations right?
The article says:
But if you ignore the -1 and -2, say for the biased estimator or if there are lots of samples, then s simplifies to
This seems backwards. The external links all divide the standard deviation by its corresponding sample size, which is what I was expecting. So I'd guess there's a typo and the article should have:
Can anyone confirm this?
Bleachpuppy 22:14, 17 November 2006 (UTC)
- I think it's right as it stands, but I don't have time to check very carefully. When you multiply s12 by N1 − 1, you just get the sum of squares of deviations from the sample mean in the first sample. Similarly with "2" instead of "1". So the sum in the numerator is the sum of squares due to error for the two samples combined. Then you divide that sum of squares by its number of degrees of freedom, which is N1 + N2 − 2. All pretty standard stuff. Michael Hardy 23:23, 17 November 2006 (UTC)
-
- ... and I think that just about does it; i.e. I've checked carefully. Michael Hardy 23:29, 17 November 2006 (UTC)
-
-
- Please provide a citation or derivation. I think Bleachpuppy is right that the subscripts have been switched. Suppose N1 = 30 and N2 = 109, a very large number, and s1 and s2 are of moderate and comparable size (i.e. N2 is a very large number in comparison to any of the other numbers involved). In this case, in effect is known almost perfectly, so the formula should reduce to a close approximation of the t-distribution for the case where the sample 1 is being compared to a fixed null-hypothesis mean μ which in this case is closely estimated by . In other words, it should be approximately equal to:
- But apparently the formula as written does not reduce to this; instead it reduces to approximately:
- This is claiming that this statistical test depends critically on σ2. But since N2 is a very large number in this example, σ2 should be pretty much irrelevant; we know with great precision regardless of the value of σ2, as long as σ2 is not also a very large number. And the test should depend on the value of σ1 but does not. --Coppertwig 12:45, 19 November 2006 (UTC)
- Please provide a citation or derivation. I think Bleachpuppy is right that the subscripts have been switched. Suppose N1 = 30 and N2 = 109, a very large number, and s1 and s2 are of moderate and comparable size (i.e. N2 is a very large number in comparison to any of the other numbers involved). In this case, in effect is known almost perfectly, so the formula should reduce to a close approximation of the t-distribution for the case where the sample 1 is being compared to a fixed null-hypothesis mean μ which in this case is closely estimated by . In other words, it should be approximately equal to:
-
-
-
-
- All I have with me right now is an intro to stat textbook: Jaccard & Becker, 1997. Statistics for the behavioral sciences. On page 265, it verifies the original formula. I have many more advanced books in my office, but I won't be there until tomorrow. -Nicktalk 21:02, 19 November 2006 (UTC)
- P.S. none of the external links really have any useful information on them (they especially lack formulas). Everything that I've come across on the web uses the formula as currently listed in the article. -Nicktalk 21:29, 19 November 2006 (UTC)
- The original formula is also confirmed by Hays (1994) Statistics p. 326. -Nicktalk 19:36, 20 November 2006 (UTC)
- OK! I see what's wrong!! The formula is a correct formula. However, the article does not state to what problem that formula is a solution! I assumed that the variances of the two populations could differ from each other. Apparently that formula is correct if you're looking at a problem where you know the variance of the two distributions is the same, even though you don't know what the value of the variance is. I'll put that into the article. --Coppertwig 03:33, 21 November 2006 (UTC)
-
-
I know these calculations are correct; I simply didn't have my textbook to for a citation. Keep in mind that much of the time we strive to have an equal sample size between the groups, which makes the calculation of t much easier. I will clarify this in the text. – Chris53516 (Talk) 14:28, 21 November 2006 (UTC)
[edit] Extra 2?
Where the text reads, "Where s2 is the grand standard deviation..." I can't tell what that two is referring to. It doesn't appear in the formula above or as a reference. 198.60.114.249 23:29, 14 December 2006 (UTC)
- The equation you're looking for can be found at standard deviation. It was not included in this page because it would be redundant. However, I will add a link to it in the text you read. — Chris53516 (Talk) 02:38, 15 December 2006 (UTC)
- Thanks Chris! 198.60.114.249 07:23, 15 December 2006 (UTC)