Validity (statistics)
From Wikipedia, the free encyclopedia
- This article discusses validity in social sciences; for the term in logic see validity.
In statistics a valid measure is one which is measuring what it is supposed to measure. Validity implies reliability (consistency). A valid measure must be reliable, but a reliable measure need not be valid. Validity refers to getting results that accurately reflect the concept being measured.
In psychology, validity is the ability of a test to measure what it was designed to measure, and the degree to which the results of a experimental method lead to clear-cut conclusions (internal vailidty) and how far those can be generalized (external validity).
Validity can be accessed in a number of ways, though there are just two distinct "types" of validity, the validity of an experiment, and the validity of a assessment method (e.g. structured interview, personality enquete, etc). Validity is, first and foremost, a logical exercise, rather than a computational endeavor. Establishing validity is, essentially, supporting the claim made that the test measures or predicts the construct it purports to predict. At the heart of any validity discussion must be the idea of construct validity, which will be discussed below. Another area of validity that must be considered is the validity of the criterion. This means to look at a certain criterion (e.g. personality trait "extraversion") and to correlate your measure with a criterion measure known to be valid (e.g. extraversion of the "Big Five"). When the criterion measure is collected at the same time as the measure being validated the goal is to establish concurrent validity; when the criterion is collected later the goal is to establish predictive validity. Similar to criterion validity is construct validity, where an investigator examines whether a measure is related to other variables.
Content validity estimates how far your tests covers the domain you want to measure. face validity, is an estimates for how good a test appears to measure a certain criterion; it does not guarantee that the test actually measures phenomena in that domain.
According to classical test theory, predictive or concurrent validity (correlation between the predictor and the predicted) cannot exceed the square root of the correlation between two versions of the same measure -- that is, reliability limits validity.s
Contents |
[edit] Internal validity
Internal validity is an estimate of how much your measurement is based on clean experimental techniques, so that you can make clear-cut inferences about cause-consequence relations. If you choose experimental designs with at random assignment of subjects or (if that isn't possible) you counterbalance for interfering variables then you get an experiment with high internal validity.
[edit] External validity
The issue of external validity concerns the question to which extend one may safely generalize the conclusion derived from an statistical evaluation.
[edit] Ecological validity
This issue is closely related to internal validity and covers the question to which degree your experimental findings mirror what you can observe in the real world (ecology= science of interaction between organism and its environment). Typically in science, you have two domains of research: Passive-observational and active-experimental. The purpose of experimental designs is to test causality, so that you can infer A causes B or B causes A. But sometimes, ethical and/or methological restrictions prevent you from conducting an experiment (e.g. how does isolation influence a child´s cognitive functioning?) Then you can still do research, but it´s not causal, it´s correlational, A occurs together with B. Both techniques have their strengths and weaknesses. To get a experimental design you have to control for all interfering variables. That´s why you conduct your experiment in a laboratory setting. While gaining internal validity (excluding interfering variables by keeping them constant) you lose ecological validity because you establish an artificial lab setting. On the other hand with observational research you can't control for interfering variables (low internal validity) but you can measure in the natural (ecological) environment, thus at the place where behavior occurs.
[edit] Population validity
[edit] Construct validity
[edit] Intentional validity
[edit] Representation validity or translation validity
[edit] Face validity
Face validity is very closely related to content validity, though it should not be confused with it. While content validity depends on a theoretical basis for assuming if a test is assessing all domains of a certain criterion (e.g. does assessing addition skills yield in a good measure for mathematial skills? - To answer this you have to know, what different kinds of arithmetic skills mathematical skills include ) face validity relates to whether a test appears to be a good measure or not. This judgment is made on the "face" of the test, thus it can also be judged by the amateur.
[edit] Content validity
This is a non-statistical type of validity that involves “the systematic examination of the test content to determine whether it covers a representative sample of the behaviour domain to be measured” (Anatasi & Urbina, 1997 p114).
A test has content validity built into it by careful selection of which items to include (Anatasi & Urbina, 1997). Items are chosen so that they comply with the test specification which is drawn up through a thorough examination of the subject domain. Foxcraft et al (2004, p. 49) note that by using a panel of experts to review the test specifications and the selection of items the content validity of a test can be improved. The experts will be able to review the items and comment on whether the items cover a representative sample of the behaviour domain.
[edit] Observation validity
[edit] Predictive validity
[edit] Criterion validity
[edit] Concurrent validity
[edit] Convergent validity
[edit] Discriminant validity
[edit] Statistical conclusion
[edit] Factors jeopardizing internal and external validity
Campbell and Stanley (1963) define internal validity as the basic requirements for an experiment to be interpretable - did the experiment make a difference in this instance? External validity addresses the question of generalizability - to whom can we generalize this experiment's findings?
Eight extraneous variables can interfere with internal validity:
1. History, the specific events occurring between the first and second measurements in addition to the experimental variables
2. Maturation, processes within the participants as a function of the passage of time (not specific to particular events), e.g., growing older, hungrier, more tired, and so on.
3. Testing, the effects of taking a test upon the scores of a second testing.
4. Instrumentation, changes in calibration of a measurement tool or changes in the observers or scorers may produce changes in the obtained measurements.
5. Statistical regression, operating where groups have been selected on the basis of their extreme scores.
6. Selection, biases resulting from differential selection of respondents for the comparison groups.
7. Experimental mortality, or differential loss of respondents from the comparison groups.
8. Selection-maturation interaction, etc. e.g., in multiple-group quasi-experimental designs
Four factors jeopardizing external validity or representativeness are:
9. Reactive or interaction effect of testing, a pretest might increase
10. Interaction effects of selection biases and the experimental variable.
11. Reactive effects of experimental arrangements, which would preclude generalization about the effect of the experimental variable upon persons being exposed to it in non-experimental settings
12. Multiple-treatment interference, where effects of earlier treatments are not erasable.
[edit] See also
- Validity (logic)
- External Validity
- Internal validity
- Ecological validity
- Construct Validity
- Discriminant Validity
- Concurrent Validity