Sample size

From Wikipedia, the free encyclopedia

Sample size, usually designated N, is the number of repeated measurements in a statistical sample. They are used to estimate a parameter, a descriptive quantity of some population. N determines the precision of that estimate. Larger N gives smaller error bounds of estimation. A typical statement is to say that one can be 95% sure the true parameter is within +or- B of the estimate, where B is an error bound that decreases with increasing N. Such a bounded estimate is referred to as the confidence interval for that parameter.

For example, the simplest rule of thumb for estimating any parameter is the one for a proportion in a population. It is that the maximum bound, B, of a 95% confidence interval for an unknown proportion is 1/sqrt(N). So, N=100 gives B = 10%, N=400 gives B = 5%, N=1000 gives B = ~3%, and N=10000 gives B = 1%. One sees these numbers quoted often in news reports of opinion polls and other sample surveys.

For sufficient N, usually at least 30, the general 95% confidence interval for a population mean or "expected value" is the sample mean +or- B, where B = 2sqrt(V/N) and V is the variance of the sampled variable. Conversely N=4V/B2. (Note, if the mean is a function of other variables to be estimated as a model containing P parameters, including any intercept, that first must be estimated themselves from the same sample used to estimate the mean, then the sample size should be N+P.)

The rule of thumb for maximum B for a proportion derives from the fact that for sufficient N, the estimator of a proportion, X/N, has a binomial distribution and is also the sample mean from a Bernoulli distribution with maximum variance of .25, closely approximating a normal distribution which the Central Limit Theorem says contains ~95% of its values within 2 standard deviations of its population mean. One simply envisions those bounds being shifted from around the population mean to around its estimator. This maximum 95% error bound, twice the standard error of X/N, where X and N are yet to be determined, is B = 2sqrt(.25/N) = 1/sqrt(N). Conversely N=1/B2.

[edit] See also