Student's t-test

From Wikipedia, the free encyclopedia

A t test is any statistical hypothesis test for two groups in which the test statistic has a Student's t distribution if the null hypothesis is true.

Contents

[edit] History

The t statistic was introduced by William Sealy Gosset for cheaply monitoring the quality of beer brews. "Student" was his pen name. Gosset was a statistician for the Guinness brewery in Dublin, Ireland, and was hired due to Claude Guinness's innovative policy of recruiting the best graduates from Oxford and Cambridge to apply biochemistry and statistics to Guinness' industrial processes. Gosset published the t test in Biometrika in 1908, but was forced to use a pen name by his employer who regarded the fact that they were using statistics as a trade secret. In fact, Gosset's identity was unknown not only to fellow statisticians but to his employer—the company insisted on the pseudonym so that it could turn a blind eye to the breach of its rules.

Today, it is more generally applied to the confidence that can be placed in judgments made from small samples.

[edit] Use

Among the most frequently used t tests are:

  • A test of the null hypothesis that the means of two normally distributed populations are equal. Given two data sets, each characterized by its mean, standard deviation and number of data points, we can use some kind of t test to determine whether the means are distinct, provided that the underlying distributions can be assumed to be normal. All such tests are usually called Student's t tests, though strictly speaking that name should only be used if the variances of the two populations are also assumed to be equal; the form of the test used when this assumption is dropped is sometimes called Welch's t test. There are different versions of the t test depending on whether the two samples are
    • independent of each other (e.g., individuals randomly assigned into two groups), or
    • paired, so that each member of one sample has a unique relationship with a particular member of the other sample (e.g., the same people measured before and after an intervention, or IQ test scores of a husband and wife).
If the t value that is calculated is above the threshold chosen for statistical significance (usually the 0.05 level), then the null hypothesis that the two groups do not differ is rejected in favor of an alternative hypothesis, which typically states that the groups do differ.
  • A test of whether the mean of a normally distributed population has a value specified in a null hypothesis.
  • A test of whether the slope of a regression line differs significantly from 0.

Once a t value is determined, a P value can be found using a table of values from Student's t-distribution.

[edit] Assumptions

  • normal distribution of data, tested by using either the Shapiro-Wilk or Kolmogorov-Smirnov test.
  • equality of variances, tested by using either the F test, the more robust Levene's test, Bartlett's test, or the Brown & Forsythe test
  • Samples may be independent or dependent, depending on the hypothesis and the type of samples:
    • Independent samples are usually two, randomly selected groups
    • Dependent samples are either two groups matched on some variable (for example, age) or are the same people being tested twice (called repeated measures)

It may be statistically conservative not to make the assumption of equality of sample variances.[citation needed] Modern statistical packages make the test equally easy to do with or without it. Since all calculations are done subject to the null hypothesis, it may be very difficult to come up with a reasonable null hypothesis that accounts for equal means in the presence of unequal variances. In the usual case, the null hypothesis is that the different treatments have no effect—this makes unequal variances untenable. In this case, one should forgo the ease of using this variant afforded by the statistical packages. See also Behrens-Fisher problem.

[edit] Determining type

For novices, the most difficult issue is often whether the samples are independent or dependent. Independent samples typically consist of two groups with no relationship. Dependent samples typically consist of a matched sample (or a "paired" sample) or one group that has been tested twice (repeated measures).

Dependent t-tests are also used for matched samples, where two groups are matched on a particular variable. For example, if we examined the heights of men and women in a relationship, the two groups are matched on relationship status. This would call for a dependent t-test because it is a paired sample (one man paired with one woman). Alternatively, we might recruit 100 men and 100 women, with no relationship between any particular man and any particular woman; in this case we would use an independent samples test.

Another example of a matched sample would be to take two groups of students, match each student in one group with a student in the other group based on an achievement test result, then examine how much each student reads. An example pair might be two students that score 90 and 91 or two students that scored 45 and 40 on the same test. The hypothesis would be that students that did well on the test may or may not read more. Alternatively, we might recruit students with low scores and students with high scores in two groups and assess their reading amounts independently.

An example of a repeated measures t-test would be if one group were pre- and post-tested. (This example occurs in education quite frequently.) If a teacher wanted to examine the effect of a new set of textbooks on student achievement, (s)he could test the class at the beginning of the year (pretest) and at the end of the year (posttest). A dependent t-test would be used, treating the pretest and posttest as matched variables (matched by student).

[edit] Calculations

[edit] Independent t-test

[edit] Equal sample sizes

This equation is only used when the two sample sizes (that is, the n or number of participants of each group) are equal.

t = {\overline{X}_1 - \overline{X}_2 \over {s_{\overline{X}_1 - \overline{X}_2}}}\ \mathrm{where}\ s_{\overline{X}_1 - \overline{X}_2} = \sqrt{s_{\overline{X}_1}^2 + s_{\overline{X}_2}^2}

Where s is the grand standard deviation (or pooled sample standard deviation), 1 = group one, 2 = group two. The denominator is the standard error of the difference between two means. Alternatively, some researchers use the control group standard deviation for a more conservative estimate.[citation needed]

[edit] Unequal sample sizes

This equation is only used when the two sample sizes are unequal. It is assumed that the two distributions have the same variance. (When this assumption is violated, see below.) The t statistic to test whether the means are different can be calculated as follows:

t = {\overline{X}_1 - \overline{X}_2 \over s_{\overline{X}_1 - \overline{X}_2}} \ \mathrm{where}\ s_{\overline{X}_1 - \overline{X}_2} = \sqrt{{({n}_1 - 1) s_1^2 + ({n}_2 - 1) s_2^2  \over {n}_1 + {n}_2 - 2}\left({1 \over n_1} + {1 \over n_2}\right)}

Where s2 is the unbiased estimator of the variance, n = number of participants, 1 = group one, 2 = group two. n − 1 is the number of degrees of freedom for either group, and the total sample size minus 2 is the total number of degrees of freedom.

The statistical significance level associated with the t value calculated in this way is the probability that, under the null hypothesis of equal means, the absolute value of t could be that large or larger just by chance—in other words, it's a two-tailed test, testing whether the means are different where either one or the other might be the larger one if they are different (see Press et al, 1999, p. 616).

[edit] Dependent t-test

This equation is used when the samples are dependent; that is, when there is only one sample that has been tested twice (repeated measures) or when there are two samples that have been matched or "paired".

t = {\overline{X}_D \cdot \sqrt{N} \over s_D}

For this equation, the differences between all pairs must be calculated. The pairs are either one person's pretest and posttest scores or one person in a group matched to another person in another group (see table). The average (XD) and standard deviation (sD) of those differences are used in the equation.

Example of repeated measures
Number Name Test 1 Test 2
1 Mike 35% 67%
2 Melanie 50% 46%
3 Melissa 90% 86%
4 Mitchell 78% 90%
Example of matched pairs
Pair Name Age Test
1 Jon 35 250
1 Jane 36 340
2 Jimmy 22 460
2 Jessy 21 200

[edit] Confidence intervals using a small sample size

Consider a normally distributed population. To estimate the population's variance take a sample of size n and calculate the sample's variance, s. An unbiased estimator of the population's variance is

\widehat{\sigma}^2 = {n \over n-1}s^2

Clearly for small values of n this estimation is inaccurate. Hence for samples of small size instead of calculating the z value for the number of standard deviations from the mean

z = { \overline{x} - \mu \over {\sigma \over \sqrt{n} } }

and using probabilities based on the normal distribution, calculate the t value

t = { \overline{x} - \mu \over { s_{n-1} \over \sqrt{n} } }

The probability that the t value is within a particular interval may be found using the t distribution. The sample's degrees of freedom are the number of data that need to be known before the rest of the data can be calculated.

e.g.

A random sample of things have weights

30.02, 29.99, 30.11, 29.97, 30.01, 29.99

Calculate a 95% confidence interval for the population's mean weight.

Assume the population ~ N(μ,σ2)

The samples' mean weight is 30.015 with standard deviation of 0.045. With the mean and the first five weights it is possible to calculate the sixth weight. Consequently there are five degrees of freedom.

The t distribution tells us that, for five degrees of freedom, the probability that t > 2.571 is 0.025. Also, the probability that t < −2.571 is 0.025. Using the formula for t with t = ± 2.571 a 95% confidence interval for the populations mean may be found by making μ the subject of the equation.

i.e.

30.015 - 2.571{0.045 \over \sqrt{8}} < \mu < 30.015 + 2.571{0.045 \over \sqrt{8}}
(29.97 < μ < 30.06)

[edit] Significance

To determine or calculate significance, see Student's t-distribution. The t-test user has to choose between a one-tailed test and a two-tailed test of significance.

[edit] Alternatives to the t test

If a non-parametric alternative to the t test is wanted, the usual choices are:

[edit] Sources

[edit] See also

[edit] External links