Top-coded

From Wikipedia, the free encyclopedia


In econometrics and statistics, a top-coded dataset is one for which the upper bound is not known. This is often done to preserve the anonymity of people participating in the survey (for example, if a survey included a person with wealth of $51 billion, it would not be anonymous because people would know it is Bill Gates).

Contents

[edit] Example: Top-coding of wealth

id age income
1 26 24778 exact value
2 32 26750 exact value
3 45 26780 exact value
4 32 30000+ top coded
5 45 30000+ top coded

[edit] Implications for ordinary least squares

  • If the lower bound of the top-coded group is used as a regressor value (30000 in the example above), OLS is biased and inconsistent.
  • The top-coded group can be omitted from the regression entirely. Provided there are no systematic differences between the omitted group and the included groups, OLS is consistent and unbiased.
  • The Tobit procedure is robust to top coding, and gives unbiased estimates.

[edit] See also

[edit] References

  • Tobin, James (1958). "Estimation for relationships with limited dependent variables". Econometrica 26 (1), 24–36.


This Econometrics-related article is a stub. You can help Wikipedia by expanding it.