De-anonymization

De-anonymization (also spelt as deanonymization) is a strategy in data mining in which anonymous data is cross-referenced with other sources of data to re-identify the anonymous data source. The term became popular in 2006 when Arvind Narayanan and Vitaly Shmatikov entered a contest hosted by Netflix, and applied their de-anonymization techniques to successfully identify Netflix data for a number of specific members.[1][2][3]

More and more data are becoming publicly available over the Internet. These data are released after applying some anonymization techniques like removing personally identifiable information (PII) such as names, addresses and social security numbers to ensure the sources' privacy. This assurance of privacy allows the government to legally share limited data sets with third parties without requiring written permission. Such data has proved to be very valuable for researchers, particularly in health care. However, as the Netflix contest dramatically revealed so much of data is available, even after anonymization, that a specific individual’s identity could be re-discovered.

Examples of de-anonymization

See also

References

  1. Margaret Rouse. "de-anonymization (deanonymization)". WhatIs.com. Retrieved 19 January 2014.
  2. Arvind Narayanan and Vitaly Shmatikov. "Robust De-anonymization of Large Sparse Datasets" (PDF). Retrieved 19 January 2014.
  3. Arvind Narayanan, Vitaly Shmatikov. "How To Break Anonymity of the Netflix Prize Dataset". arXiv.org. Retrieved 19 January 2014.
  4. Larry Hardesty. "How hard is it to 'de-anonymize' cellphone data?". MIT news. Retrieved 14 January 2015.
  5. Gymrek, Melissa; McGuire, A. L.; Golan, David; Halperin, Eran; Erlich, Yaniv (18 January 2013). "Identifying Personal Genomes by Surname Inference". Science. 339 (6117): 321-324. doi:10.1126/science.1229566.

Further reading

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.