Wikipedia:Spotting possible copyright violations

From Wikipedia, the free encyclopedia

Shortcuts:
WP:SPCP
WP:SPCV

This is a guide to spotting violations of the Wikipedia copyright policy that are simple cut-and-pastes from other websites. Please remember to assume good faith and to avoid copyright paranoia when doing the important work of keeping Wikipedia compliant with the GFDL.

Contents

[edit] Signs that an article might be cut-and-pasted

There are a number of signs that an article might be cut-and-pasted. None of these are conclusive evidence, but more than one of these signs tends to be apparent in a cut-and-pasted article.

[edit] Indicative, but by no means conclusive signs:

  • The text is not wikified or is over-wikified, with every occurrence of a word or phrase made into a wiki link (as if search-and-replace had been used to insert the links)
  • The text was added all at once by one person in finished form with no spelling or other errors.
  • The writing style is "too good to be true"
  • The text has a strange tone of voice, such as an overly informal tone or a very slanted marketing voice with weasel words
  • They may contain non-standard characters such as Microsoft "smart quotes" (Note that these may have been created in Microsoft Word or another word processing software offline)

[edit] Strong signs of copy and pasting:

  • Out of context phrases like "this site/page/book/whitepaper"
  • They may also have isolated or out-of-context words or phrases such as "top", "go to top", "next page", "click here", that were originally part of the navigation structure of the original website
  • Use of trademark signs (™,®) and similar typical signs of commercial text
  • The writing style one that rarely occurs outside of a specific, invariably copyrighted, use. EG. An advertisement or press release
  • The questionable contribution is from a user who has a history of violating copyright

[edit] Irrefutable evidence

  • Pages which exhibit the above characteristics, and include the original site's copyright notice, copied intact!
  • A copy of the page source, including links to other pages on the same server which would not occur on Wikipedia or a wiki (e.g., a link to /home/news/latest.html)

[edit] Checking it out

Once alerted by one or more of these suspicious signs, you can then check the article by highlighting a sentence or non-trivial sentence fragment that is unlikely to be found by chance in many documents, copying and pasting it into a search engine. You should then check the matching pages, if any, for further correspondence to the submitted article. Be aware that many sites mirror content from Wikipedia, so a search engine may find several sites with the exact content. Those sites should list Wikipedia as the source of the article.

For extra thoroughness, you may also want to check out the "groups" option in Google, to check that the article is not copied from Usenet.

Many times an image from some other website is uploaded here under the same name. Hence if you suspect an image to be a copyright violation, you can try searching Google Images for the filename of the image to check if there are matches from other websites for the same image. Even if the image was uploaded with a different name, a google image search for relevant search terms might help finding the original image in case of a copyright violation.

[edit] If you suspect that a page is a copyright infringement

It is not the job of rank-and-file Wikipedians to police every article for possible copyright infringement, but if you suspect one, you should at the very least bring up the issue on that page's talk page. Others can then examine the situation and take action if needed. The most helpful piece of information you can provide is a URL or other reference to what you believe may be the source of the text.

  • Remember: please don't bite the newbies -- many cut-and-paste contributors may not understand that what they are doing is wrong, and some may turn into valuable contributors if educated rather than punished. You can use the user's talk page to discuss your concerns with them. The {{nothanks}} template may be useful for this.
  • Some cases will be false alarms. For example, if the contributor was in fact the author of the text that is published elsewhere under different terms, that does not affect their right to post it here under the GFDL. Material from public domain resources is sometimes republished with unclear or misleading copyright notices which may obscure the origin. An article from another language's Wikipedia might be translated and published here (bringing with it seemingly suspicious anomalies, particularly if the contributor's understanding of English and/or wikification is limited). Also, sometimes you will find text elsewhere on the Web that was copied from Wikipedia. In these cases, it is a good idea to make a note in the talk page to discourage such false alarms in the future.
  • Please see the Wikipedia copyright policy document for what to do in difficult cases, such as where a user continues to post copyrighted material in spite of warnings.

See also: Wikipedia:Boilerplate request for permission, Wikipedia:Confirmation of permission

Languages