Nonprobability sampling
Sampling is the use of a subset of the population to represent the whole population. Probability sampling, or random sampling, is a sampling technique in which the probability of getting any particular sample may be calculated. Nonprobability sampling does not meet this criterion and should be used with caution. Nonprobability sampling techniques cannot be used to infer from the sample to the general population.
The advantage of nonprobability sampling is its lower cost compared to probability sampling. However, one can say much less on the basis of a nonprobability sample than on the basis of a probability sample. Of course, research practice appears to belie this claim, because many analysts draw generalizations (e.g., propose new theory, propose policy) from analyses of nonprobability sampled data. One must ask, however, whether those published works are publishable because tradition makes them so, or because there really are justifiable grounds for drawing generalizations from studies based on nonprobability samples.
Some embrace the latter claim, and assert that while probability methods are suitable for large-scale studies concerned with representativeness, non-probability approaches are more suitable for in-depth qualitative research in which the focus is often to understand complex social phenomena (e.g., Marshall 1996; Small 2009). These assertions raise an interesting question—how can one understand a complex social phenomenon by drawing only the most convenient expressions of that phenomenon into consideration? What assumption about homogeneity in the world must one make to justify such assertions? Alas, research indicates only one situation in which a non-probability sample can be appropriate—if one is interested only in the specific cases studied (for example, if one is interested in the Battle of Gettysburg), one does not need to draw a probability sample from similar cases (Lucas 2014a).
Still, some use nonprobability sampling. Examples of nonprobability sampling include:
- Convenience, haphazard or accidental sampling - members of the population are chosen based on their relative ease of access. To sample friends, co-workers, or shoppers at a single mall, are all examples of convenience sampling. Such samples are biased because researchers may unconsciously approach some kinds of respondents and avoid others (Lucas 2014a), and respondents who volunteer for a study may differ in unknown but important ways from others (Wiederman 1999).
- Snowball sampling - The first respondent refers a friend. The friend also refers a friend, and so on. Such samples are biased because they give people with more social connections an unknown but higher chance of selection (Berg 2006).
- Judgmental sampling or purposive sampling - The researcher chooses the sample based on who they think would be appropriate for the study. This is used primarily when there is a limited number of people that have expertise in the area being researched. Such samples are biased because prominent experts may differ from other, equally expert, less prominent persons.
- Deviant case - The researcher obtains cases that substantially differ from the dominant pattern (a special type of purposive sample). The results can biasedly estimate the deviant case situation because the researcher may select visible deviants (e.g., students who act out in class) and thus miss other deviants (e.g., students who silently deviate from the dominant pattern).
- Case study - The research is limited to one group, often with a similar characteristic or of small size. Such work can be biased for generalizing to other cases, but may not be biased for understanding the one case studied.
- Ad hoc quotas - A quota is established (e.g. 65% women) and researchers are free to choose any respondent they wish as long as the quota is met. The researcher may approach people the researcher is comfortable addressing, leading to biases in the research.
Even studies intended to use probability sampling sometimes end up using nonprobability samples because of characteristics of the sampling method. For example, using a sample of people in the paid labor force to analyze the effect of education on earnings is to use a non-probability sample of persons who could be in the paid labor force. Because the education people obtain could determine their likelihood of being in the paid labor force, technically the sample in the paid labor force is a nonprobability sample for the question at issue. In such cases results are biased.
The statistical model one uses can also render the data a non-probability sample. For example, Lucas (2014b) notes that several published studies that use multilevel modeling have been based on samples that are probability samples in general, but nonprobability samples for one or more of the levels of analysis in the study. Evidence indicates that in such cases the bias is poorly behaved, such that inferences from such analyses are unjustified.
These problems occur in the academic literature, but they may be more common in non-academic research. For example, in public opinion polling by private companies (or other organizations unable to require response), the sample can be self-selected rather than random. This often introduces an important type of error: self-selection bias. This error sometimes makes it unlikely that the sample will accurately represent the broader population. More important, this error makes it impossible to establish that the sample represents the broader population. Volunteering for the sample may be determined by characteristics such as submissiveness or availability. The samples in such surveys should be treated as non-probability samples of the population, and the validity of the findings based on them is unknown and cannot be established.
See also
- Sampling (statistics)
- Cluster sampling
- Judgment sample
- Multistage sampling
- Quota sampling
- Simple random sample
- Systematic sampling
- Stratified sampling
References
- Berg, Sven. (2006). "Snowball Sampling–I," pp. 7817–7821 in Encyclopedia of Statistical Sciences, edited by Samuel Kotz, Campbell Read, N. Balakrishnan, and Brani Vidakovic. Hoboken, NJ: John Wiley and Sons, Inc.
- Lucas, Samuel R. (2014a). "Beyond the Existence Proof: Ontological Conditions, Epistemological Implications, and In-Depth Interview Research.", Quality & Quantity, 48: 387-408. doi:10.1007/s11135-012-9775-3.
- Lucas, Samuel R. (2014b). "An Inconvenient Dataset: Bias and Inappropriate Inference in the Multilevel Model.", Quality & Quantity, 48: 1619-1649. doi:10.1007/s11135-013-9865-x
- Marshall, Martin N. 1996. "Sampling for Qualitative Research." Family Practice 13: 522–526. doi:10.1093/fampra/13.6.522
- Small, Mario L. (2009). "‘How many cases do I need?’ On science and the logic of case selection in field-based research." Ethnography 10: 5–38. doi:10.1177/1466138108099586
- Wiederman, Michael W. (1999). "Volunteer bias in sexuality research using college student participants." Journal of Sex Research, 36: 59-66, doi:10.1080/00224499909551968.