Talk:Record linkage
From Wikipedia, the free encyclopedia
Record linkage and deduplication are NOT the same thing. The first is linking two or more datasets, the second is removing duplicate entries in a single dataset. It's helpful (often very important) to DEDUPLICATE before attempting a RECORD LINKAGE.
Using your definition, deduplication is a simple(r) instance of record linkage. In depuplication the "two" datasets are the same, and have the same structure in terms of fields, something that is not always the case with Record Linkage. The two terms are often used interchangeably (and many other terms are also used to refer to the same concept, which is kind of ironic if you think about it) Ipeirotis 04:46, 1 February 2007 (UTC)
Separately, "deduplication" has taken on a life of its own as an optimization for storage technology in backup and archiving, with a different meaning. There may need to be a link that discusses this and points to the definition that is currently called Capacity Optimization. Examples of this use: see
http://searchsecurity.techtarget.com/tip/0,289483,sid5_gci1187934,00.html
http://www.networkworld.com/news/2006/091806-storage-deduplication.html
http://enterprisestorageforum.webopedia.com/TERM/d/data_deduplication.html
-
- I agree that "deduplication" is also used in the storage world. Big storage vendors (such as Netapp) are using the term 'Deduplication' when referring to some of there storage optimization products. See [1], for example. In fact, I found this very WP page while searching for storage deduplication technologies. Gigglesworth 22:12, 26 July 2007 (UTC)
Why was the deduplication entry overwritten by something that could have been easily linked as a reference to the actual subject matter? What's been changed here is not for reference, but force feeding just one of many ways of approaching this issue. I'd prefer it changed back to what we had previously, and some of the slightly over-zealous editting to cease. From a focused summary on the subject as a whole, with links and related terms, to something that is needlessly over-complicated, what's here isn't an improvement. If people would consider the user base as a whole when updating, it would be appreciated.