Cohen's kappa

From Wikipedia, the free encyclopedia

Cohen's kappa coefficient is a statistical measure of inter-rater agreement. It is generally thought to be a more robust measure than simple percent agreement calculation since κ takes into account the agreement occurring by chance. Cohen's kappa measures the agreement between two raters who each classify N items into C mutually exclusive categories.

The equation for κ is:

\kappa = \frac{\Pr(a) - \Pr(e)}{1 - \Pr(e)}, \!

where Pr(a) is the relative observed agreement among raters, and Pr(e) is the probability that agreement is due to chance. If the raters are in complete agreement then κ = 1. If there is no agreement among the raters (other than what would be expected by chance) then κ ≤ 0.

The seminal paper introducing kappa as a new technique was published by Jacob Cohen in the journal Educational and Psychological Measurement in 1960.

Note that Cohen's kappa measures agreement between two raters only. For a similar measure of agreement (Fleiss' kappa) used when there are more than two raters, see Fleiss (1981).

Contents

[edit] Significance

Landis and Koch[1] gave the following table for interpreting κ values. This table is however by no means universally accepted; Landis and Koch supplied no evidence to support it, basing it instead on personal opinion. It has been noted that these guidelines may be more harmful than helpful[2], as the number of categories and subjects will affect the magnitude of the value. The kappa will be higher when there are fewer categories.[3]

κ Interpretation
< 0 No agreement
0.0 — 0.20 Slight agreement
0.21 — 0.40 Fair agreement
0.41 — 0.60 Moderate agreement
0.61 — 0.80 Substantial agreement
0.81 — 1.00 Almost perfect agreement

[edit] See also

[edit] Online calculators

[edit] References

  • Jacob Cohen (1960), A coefficient of agreement for nominal scales, Educational and Psychological Measurement Vol.20, No.1, pp.37-46.
  • Joseph L. Fleiss. Statistical methods for rates and proportions, 2ed. John Wiley & Sons, Inc. New York. 1981. pp 212-236 (chapter 13: The measurement of interrater agreement).

[edit] Notes

  1. ^  Landis, J. R. and Koch, G. G. (1977) pp. 159--174
  2. ^  Gwet, K. (2001)
  3. ^  Sim, J. and Wright, C. C. (2005) pp. 257--268

[edit] References

  • Fleiss, J. L. (1971) "Measuring nominal scale agreement among many raters." Psychological Bulletin, Vol. 76, No. 5 pp. 378--382
  • Gwet, K. (2001) Statistical Tables for Inter-Rater Agreement. (Gaithersburg : StatAxis Publishing)
  • Landis, J. R. and Koch, G. G. (1977) "The measurement of observer agreement for categorical data" in Biometrics. Vol. 33, pp. 159--174
  • Scott, W. (1955). "Reliability of content analysis: The case of nominal scale coding." Public Opinion Quarterly, 17, 321-325.
  • Sim, J. and Wright, C. C. (2005) "The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements" in Physical Therapy. Vol. 85, pp. 257--268
  • Fleiss, J. L. and Cohen, J. (1973) "The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability" in Educational and Psychological Measurement, Vol. 33 pp. 613--619
  • Fleiss, J. L. (1981) Statistical methods for rates and proportions. 2nd ed. (New York: John Wiley) pp. 38--46

[edit] External links