De-anonymization

De-anonymization (also spelt as deanonymization) is a strategy in data mining in which anonymous data is cross-referenced with other sources of data to re-identify the anonymous data source.

More and more data are becoming publicly available over the Internet. These data are released after applying some anonymization techniques like removing personally identifiable information (PII) such as names, addresses and social security numbers to ensure the sources' privacy. This assurance of privacy allows the government to legally share limited data sets with third parties without requiring written permission. Such data has proved to be very valuable for researchers, particularly in health care. However, as the Netflix contest dramatically revealed so much of data is available, even after anonymization, that a specific individual’s identity could be re-discovered.

The term became popular in 2006 when Arvind Narayanan and Vitaly Shmatikov entered a contest hosted by Netflix, and applied their de-anonymization techniques to successfully identify Netflix data for a number of specific members.[1][2][3]

Some de-anonymizations

See also

References

  1. Margaret Rouse. "de-anonymization (deanonymization)". WhatIs.com. Retrieved 19 January 2014.
  2. Arvind Narayanan and Vitaly Shmatikov. "Robust De-anonymization of Large Sparse Datasets". Retrieved 19 January 2014.
  3. Arvind Narayanan, Vitaly Shmatikov. "How To Break Anonymity of the Netflix Prize Dataset". arXive.org. Retrieved 19 January 2014.
  4. Larry Hardesty. "How hard is it to 'de-anonymize' cellphone data?". MIT news. Retrieved 14 January 2015.
  5. Melissa Gymrek, Amy L. McGuire, David Golan, Eran Halperin, and Yaniv Erlich1: Identifying Personal Genomes by Surname Inference, Science 18 January 2013: Vol. 339 no. 6117 pp. 321-324.

Further reading