Similarity Enhanced Transfer

From Wikipedia, the free encyclopedia

Similarity-Enhanced Transfer (SET) is a technique for improving the speed at which peer-to-peer file sharing and content distribution systems can share data. Similarity-Enhanced Transfer (SET) works by spotting chunks of identical data in files that are an exact or near match to the one needed and transferring this data to the client if the 'exact' data is not present.

Contents

[edit] Method

SET uses a technique called handprinting[1] - which is based on earlier techniques known as "Shingling" that have been used to filter junk e-mails - to seek out files that contain some of the data needed by the one a file-sharing program has requested. The SET system computes a handprint for each file, and can take chunks of data from files which are both identical and similar to the one being searched for. The lower similarity ranking that SET searches for, the more sources for that data are likely to be found. The extra overhead of locating these sources does not out-weigh the benefit of using them to help saturate the recipient's available bandwidth. [1] Indeed, exploiting similar sources can significantly improve download time.

In tests, SET improved the transfer time of an MP3 music file by 71% and a 55Mb movie trailer went 30% faster using the researchers' techniques to draw from movie trailers that were 47% similar. SET could help most with less popular files, but it is not believed to improve transfer rates much for popular data, where there is already a huge set of people downloading it. Experiments suggest that in the other cases, SET can help a lot. [1]

[edit] History

SET was developed by Professor David Andersen of Carnegie Mellon University, Ph.D student Himabindu Pucha, Purdue University and Dr. Michael Kaminsky, Intel Research Pittsburgh. Andersen believes that this technique could be immediately used by developers and applied to the BitTorrent file sharing system.[2]

[edit] Application areas

SET could be used to improve the speed of:

[edit] See also

[edit] References

  1. ^ a b c Himabindu Pucha, David G. Andersen, Michael Kaminsky (April 2007). Exploiting Similarity for Multi-Source Downloads Using File Handprints. Purdue Univ., Carnegie Mellon Univ., Intel Research Pittsburgh. Retrieved on 2007-04-15.
  2. ^ "Speed boost plan for file-sharing", BBC News Online-Technology, BBC News, 2007-04-12. Retrieved on 2007-04-13. 

[edit] External links