Plagiarism detection

From Wikipedia, the free encyclopedia

With the advent of the Internet, it has never been easier for students to plagiarize the work of others. Many teachers are looking for efficient ways to fight plagiarism. A few solutions exist.

Contents

[edit] The use of search engines

When trying to detect plagiarism, the first idea is to use a search engine, and search for keywords or key sentences from the suspected text, hoping to find similar texts on the Internet.

This method may be useful when the student copied a whole article, but it can become fast ineffective when the plagiarist used only parts of articles or mixed different articles. Moreover, this method is quite time-consuming.

[edit] Plagiarism detection software

With the rise of this phenomenon, many software were designed to facilitate plagiarism detection.

These software range from the basic comparison of two or more documents, to the more evolved versions allowing to find plagiarized sources on the Internet. They handle a number of document formats, the main ones being Word, PDF, and HTML.

Two categories may be distinguished for these software:

  • the ones running on a remote server
  • the ones installed on the user’s computer, working locally.

The first kind of plagiarism detection software is presumably better, because it can use a formidable reference database, containing possible sources of plagiarism. Moreover, when a new document is submitted for analysis, it can be added to the database, allowing it to grow.

However, this feature may be considered by some as a violation of student copyright.

Examples of common plagiarism control software are:

[edit] Plagiarism detection algorithms

The Rabin-Karp algorithm allows to seek a substring within a text by using hashing. One of its main applications is in detection of plagiarism, where single-string searching algorithms are impractical. MOSS is implemented based on this with some improvements such as using a window to select from a number of hashes.

Another class of techniques for plagiarism detection employs distance calculations algorithms originating from bioinformatics for matching human genome patterns.

There are now several tools available through the web to aid in the detection of plagiarism and multiple publications within biomedical literature. One tool developed in 2006 by researchers in Dr. Harold Garner's laboratory at University of Texas Southwestern Medical Center at Dallas is Déjà Vu, an open-access database containing several thousand instances of duplicate publication.

[edit] See also

[edit] External links

Languages