Talk:Student's t-test

From Wikipedia, the free encyclopedia

1 Calculations
2 independent samples
3 history unclear
4 Welch (or Satterthwaite) approximation?
5 Table?
6 Are the calculations right?
7 Extra 2?

[edit] Calculations

I don't suppose anyone wants to add HOW TO DO a t-test??

That seems to be a deficiency of a fairly large number of statistics pages. The trouble seems to be that they're getting written by people who've gotten good grades in statistics courses in which the topics are covered, but whose ability does not exceed what that would imply. Maybe I'll be back.... Michael Hardy 22:04, 7 June 2006 (UTC)

If I have time to learn TeX, maybe I'll do it. I know the calculations, it's just a matter of getting Wikipedia to display it properly. Chris53516 16:17, 19 September 2006 (UTC)

I uploaded some crappy images of the calculations. I don't have time to mess with TeX, so someone that's a little more TeX-savvy (*snicker*) can do it. Chris53516 16:42, 19 September 2006 (UTC)

User:Michael Hardy converted two of my crappy graphics to TeX, and I used his conversion to do the last. So there you have it, calculations for the t-test. Chris53516 18:21, 19 September 2006 (UTC)

Great. Now, could someone explicit the formulla? I assume than N is the sample size, s the standard deviation, but what is the df1/dft? ... Ok I found the meaning of df. I find the notation a bit comfusing. it looks a lot like the derivative of a function... is dft thez degrees of freedom of the global population?

What do you mean: "could someone explicit the formulla (sic)" (emphasis added)? N is the sample size of group 1 or group 2, depending on which number is there; s is the standard deviation; and df is degress of freedom. There is a degree of freedom for each group and the total. The degrees of freedom for each group is calculated by taking the sample size and subtracting one. The total degrees of freedom is calculated by adding the two groups' degrees of freedom or by subtracting the total sample size by 2. I will change the formula to reflect this and remove the degrees of freedom. Chris53516 13:56, 11 October 2006 (UTC)

[edit] independent samples

Should 'assumptions' include the idea that we assume all samples are independent? This seems like a major omission.

[edit] history unclear

"but was forced to use a pen name by his employer who regarded the fact that they were using statistics as a trade secret. In fact, Gosset's identity was unknown not only to fellow statisticians but to his employer - the company insisted on the pseudonym so that it could turn a blind eye to the breach of its rules." What breach? Why didn't the company know? If it didn't know, how is it insisting on a pseudonym?

[edit] Welch (or Satterthwaite) approximation?

"As the variance of each group is different, the Welch (or Satterthwaite) approximation to the degrees of freedom is used in the test"...

Huh?

--Dan|^(talk) 15:00, 19 September 2006 (UTC)

[edit] Table?

This article doesn't mention the t-table which appears to be necessary to make sense of the t value. Also, what's the formula used to compute such tables? —Ben FrantzDale 15:07, 12 October 2006 (UTC)

I'm not sure which table you are referring to or what you mean by "make sense of the t value". Perhaps you mean the table for determining whether t is statistically significant or not. That would be a statistical significance matter, not a matter of just the t-test. Besides, that table is pretty big, and for the basic meaning and calculation of t, it isn't necessary. Chris53516 15:24, 12 October 2006 (UTC)

I forgot. The calculation for such equations is calculus, and would be rather cumbersome here. It would belong at the statistical significance article, anyway. That, and I don't know the calculus behind p. Chris53516 15:26, 12 October 2006 (UTC)

Duah, Student's t-distribution has the answer to my question. —Ben FrantzDale 14:55, 13 October 2006 (UTC)

Glad to be of not-so-much help. :) Chris53516 15:11, 13 October 2006 (UTC)

[edit] Are the calculations right?

The article says:

$t = {\overline{X}_1 - \overline{X}_2 \over s_{\overline{X}_1 - \overline{X}_2}} \ \mathrm{where}\ s_{\overline{X}_1 - \overline{X}_2} = \sqrt{{\mathrm({N}_1 - 1)\cdot s_1^2 + \mathrm({N}_2 - 1)\cdot s_2^2 \over \mathrm({N}_1 + {N}_2 - 2)}\left({1 \over N_1} + {1 \over N_2}\right)}$

But if you ignore the -1 and -2, say for the biased estimator or if there are lots of samples, then s simplifies to

$s = \sqrt{ s_1^2 / N_2 + s_2^2 / N_1 }$

This seems backwards. The external links all divide the standard deviation by its corresponding sample size, which is what I was expecting. So I'd guess there's a typo and the article should have:

$t = {\overline{X}_1 - \overline{X}_2 \over s_{\overline{X}_1 - \overline{X}_2}} \ \mathrm{where}\ s_{\overline{X}_1 - \overline{X}_2} = \sqrt{{\mathrm({N}_2 - 1)\cdot s_1^2 + \mathrm({N}_1 - 1)\cdot s_2^2 \over \mathrm({N}_1 + {N}_2 - 2)}\left({1 \over N_1} + {1 \over N_2}\right)}$

Can anyone confirm this?

Bleachpuppy 22:14, 17 November 2006 (UTC)

I think it's right as it stands, but I don't have time to check very carefully. When you multiply s₁² by N₁ − 1, you just get the sum of squares of deviations from the sample mean in the first sample. Similarly with "2" instead of "1". So the sum in the numerator is the sum of squares due to error for the two samples combined. Then you divide that sum of squares by its number of degrees of freedom, which is N₁ + N₂ − 2. All pretty standard stuff. Michael Hardy 23:23, 17 November 2006 (UTC)

... and I think that just about does it; i.e. I've checked carefully. Michael Hardy 23:29, 17 November 2006 (UTC)

Please provide a citation or derivation. I think Bleachpuppy is right that the subscripts have been switched. Suppose

N 1 = 30

and

N 2 = 10 9

, a very large number, and

s 1

and

s 2

are of moderate and comparable size (i.e.

N 2

is a very large number in comparison to any of the other numbers involved). In this case, in effect $\overline{X}_2$ is known almost perfectly, so the formula should reduce to a close approximation of the t-distribution for the case where the sample 1 is being compared to a fixed null-hypothesis mean

μ

which in this case is closely estimated by $\overline{X}_2$ . In other words, it should be approximately equal to:

$t = \frac{\overline{X}_1 - \mu}{(\sigma_1/\sqrt{30})}$

But apparently the formula as written does not reduce to this; instead it reduces to approximately:

$t = \frac{\overline{X}_1 - \mu}{(\sigma_2/\sqrt{30})}$

This is claiming that this statistical test depends critically on

σ 2

. But since

N 2

is a very large number in this example,

σ 2

should be pretty much irrelevant; we know $\overline{X}_2$ with great precision regardless of the value of

σ 2

, as long as

σ 2

is not also a very large number. And the test should depend on the value of

σ 1

but does not. --Coppertwig 12:45, 19 November 2006 (UTC)

All I have with me right now is an intro to stat textbook: Jaccard & Becker, 1997. Statistics for the behavioral sciences. On page 265, it verifies the original formula. I have many more advanced books in my office, but I won't be there until tomorrow. -Nick^talk 21:02, 19 November 2006 (UTC)

P.S. none of the external links really have any useful information on them (they especially lack formulas). Everything that I've come across on the web uses the formula as currently listed in the article. -Nick^talk 21:29, 19 November 2006 (UTC)

The original formula is also confirmed by Hays (1994) Statistics p. 326. -Nick^talk 19:36, 20 November 2006 (UTC)

OK! I see what's wrong!! The formula is a correct formula. However, the article does not state to what problem that formula is a solution! I assumed that the variances of the two populations could differ from each other. Apparently that formula is correct if you're looking at a problem where you know the variance of the two distributions is the same, even though you don't know what the value of the variance is. I'll put that into the article. --Coppertwig 03:33, 21 November 2006 (UTC)

I know these calculations are correct; I simply didn't have my textbook to for a citation. Keep in mind that much of the time we strive to have an equal sample size between the groups, which makes the calculation of t much easier. I will clarify this in the text. – Chris53516 ^(Talk) 14:28, 21 November 2006 (UTC)

[edit] Extra ²?

Where the text reads, "Where s² is the grand standard deviation..." I can't tell what that two is referring to. It doesn't appear in the formula above or as a reference. 198.60.114.249 23:29, 14 December 2006 (UTC)

The equation you're looking for can be found at standard deviation. It was not included in this page because it would be redundant. However, I will add a link to it in the text you read. — Chris53516 ^(Talk) 02:38, 15 December 2006 (UTC)

Thanks Chris! 198.60.114.249 07:23, 15 December 2006 (UTC)

Retrieved from "http://en.wikipedia.org../../../s/t/u/Talk%7EStudent%27s_t-test_7b40.html"

Talk:Student's t-test

From Wikipedia, the free encyclopedia

Contents

[edit] Calculations

[edit] independent samples

[edit] history unclear

[edit] Welch (or Satterthwaite) approximation?

[edit] Table?

[edit] Are the calculations right?

[edit] Extra ²?

Views

Navigation

Search

Talk:Student's t-test

From Wikipedia, the free encyclopedia

Contents

[edit] Calculations

[edit] independent samples

[edit] history unclear

[edit] Welch (or Satterthwaite) approximation?

[edit] Table?

[edit] Are the calculations right?

[edit] Extra 2?

Views

Navigation

Search

[edit] Extra ²?