Random sample
From Wikipedia, the free encyclopedia
This article or section is in need of attention from an expert on the subject. WikiProject Statistics may be able to help recruit one. |
A sample is a subject chosen from a population for investigation. A random sample is one chosen by a method involving an unpredictable component. Random sampling can also refer to taking a number of independent observations from the same probability distribution, without involving any real population. A probability sample is one in which each item has a known probability of being in the sample.
The sample will usually not be completely representative of the population from which it was drawn— this random variation in the results is known as sampling error. In the case of random samples, mathematical theory is available to assess the sampling error. Thus, estimates obtained from random samples can be accompanied by measures of the uncertainty associated with the estimate. This can take the form of a standard error, or if the sample is large enough for the central limit theorem to take effect, confidence intervals may be calculated.
Contents |
[edit] Types of random sample
- A simple random sample is selected so that every possible sample has an equal chance of being selected from the population
- A self-weighting sample, also known as an EPSEM (Equal Probability of Selection Method) sample, is one in which every individual, or object, in the population of interest has an equal opportunity of being selected for the sample. Simple random samples are self-weighting.
- Stratified sampling involves selecting independent samples from a number of subpopulations, group or strata within the population. Great gains in efficiency are sometimes possible from judicious stratification.
- Cluster sampling involves selecting the sample units in groups. For example, a sample of telephone calls may be collected by first taking a collection of telephone lines and collecting all the calls on the sampled lines. The analysis of cluster samples must take into account the intra-cluster correlation which reflects the fact that units in the same cluster are likely to be more similar than two units picked at random.
[edit] Methods of producing random samples
- Random number table
- Mathematical algorithms for pseudorandom number generators
- Physical randomisation devices such as coins, playing cards or sophisticated devices such as ERNIE
[edit] An example application
The CEO of a company which provides call centers is considering the introduction of new software that she hopes that will reduce average call handling times. She designs an experiment to find out the reduction in mean call handling time associated with the new software. At one of her call centers a sample of 50 call agents will use the new software and the remaining 150 staff will use the existing software. She knows that if she simply asks the center manager to choose the staff to operate the new software he will likely choose the most intelligent and cooperative agents. The results of the trial will thus be subject to substantial bias in favor of the new software. To avoid this problem she allocates the agents randomly by putting the names of the agents in a column in a spreadsheet. She then creates a second column consisting of random numbers from the spreadsheet's random number generator. By sorting using the second column as the sort key she puts the staff names in random order and selects the first 50 names. These will be the staff using the new software.