Wikipedia:WikiProject Spam

From Wikipedia, the free encyclopedia

Shortcut:
WP:WPSPAM

As Wikipedia grows in popularity the temptation to misuse its editability to bring attention to other websites becomes nearly unbearable. At one end of the spectrum are professional spammers seeking to drive traffic to commercial sites. At the other end are webmasters of simple community sites who just want to get more attention for their site. This potential for self-promotion on Wikipedia must be managed. Wikipedia is not a link repository. Wikipedia exists for the purpose of creating a collaboratively edited encyclopedia, not for any individual to promote a site that they have an interest in.

This problem is only going to get worse. As search engine optimization becomes more prevalent, many web site operators will seek to use Wikipedia to increase the number of inbound links to their sites. In order to combat link spam on Wikipedia, the process needs to become more streamlined.

Currently link spammers enjoy a lot of advantages from the lack of cohesion to the spam fighting process. It is possible to successfully sneak links into relatively unwatched articles. Such links may lie unexamined for months, gaining the appearance of legitimacy from having remained in the articles so long. When spam links are reverted, there is not much communication. Spammers can return and add links when different editors are watching who do not know their history of editing with an agenda. And spammers love to take advantage of the fact that Wikipedians assume good faith, luring us into discussing their links with them "on the merits" as if they had nothing but the good of Wikipedia at heart.

We propose the creation of a voluntary link spam fighting brigade. Our purpose will be to develop standards and processes for recognizing, hunting down, and eliminating link spam, to streamline communication between those who want to watch over articles to prevent it, and to send a message by our actions and effectiveness that link spammers are fighting a war they cannot win.

If you would like to participate, we encourage you to add your name to the sign-up list. We encourage you to join in editing this page so we can grow toward consensus about the best way to fight link spam. You are welcome to relate any of your own current ongoing efforts to fight link spam on the talk page so that in the immediate future we can be aware of users that are acting with an agenda to promote an external site.

Contents

Removal how-to

There are a variety of facets for dealing with inappropriate links. This guide breaks the process into a number of steps. Most editors will want to complete the first step. Editors interested in doing a more thorough job should follow through with additional measures.

  1. Revert and warn the user: when new spam appears on your watchlist, the easiest way to remove or revert it is by selecting the diff link, then select the edit above the left-hand column, include an appropriate edit summary, preview changes and finally save the page. If you come upon an article with spam, check the recent diffs (the last links) in the page history to see if it was added recently in a way that damaged the article; revert the changes to restore the articles to its previous state. It is important to warn the user, which will likely stop the spamming or establish a history of problematic edits. To warn the user, go back to your watchlist or page history and select the Talk link associated with the editor and add {{subst:uw-spam1}} ~~~~ to that page. If the user already has a spam warning, add {{subst:uw-spam2}} ~~~~; if two warnings {{subst:uw-spam3}} ~~~~; if three warnings {{subst:uw-spam4}} ~~~~. At this point the task is done, but to see if the user added the same link to other articles, go on to step2.
  2. Check the user's contributions: a user will often add the same link to multiple articles. This is often confirmation that the user is not editing in good faith. To check for this type of activity, select the contribs (or for anonymous users the IP address) link from your watch list or an article's history. This shows all the other edits the user recently made and selecting the diff link shows if the same link has been added to other articles. If inappropriate links are found, revert as in step one, but the user only needs to be warned once unless he has spammed since the last warning.
  3. Check for similar links: a crafty spammer hides spamming by using multiple accounts. This step involves finding all of the articles that contain a link to a particular site. If a link to www.example.com were discovered and removed in steps one and two, the next step is to use the linksearch command to find all articles that contain such links. One may enter www.example.com in the search box, but consider entering *.example.com because this will find not only www.example.com but also ads.example.com and any other domain that might have been used. The linksearch command is found in the Special pages list, which has a link from the toolbox of every page.
  4. Identifying the spammer: the process of finding links in step three reveals which articles they are in but does not indicate which editor added them. To find out, go to the article history and expand it to 500 entries. Check to see if the link is present in the last revision in the list. If so, select the previous 500 changes and check again, repeating the process to find the subset of changes where the link appeared. To find the exact edit where the link was added, check the version in the middle of the 500 entries. If it is not present there, then it is in the edits above otherwise it is in the edits below. Check the middle of the appropriate half in the same manner. By using this divide and conquer method the exact point of insertion can be quickly found. Often the edit summary includes the words External links which can help pinpoint the edit. Once the edit is found, go back to step one and start cleaning up after this editor.
  5. Persistent spammers: if an active spammer continues adding links after a {{subst:uw-spam4}} warning, report this user to the administrators at the intervention against vandalism page.


edit  

To-do

The To-do list

Regular sources of spam-removal tasks

Some articles that need external links trimmed

  • I'm not feeling very bold, but would a classic "drop-the-bomb" be appropriate on Genealogy software? It looks like something from Consumer Reports (Without any real substance though)! 68.39.174.238 18:39, 28 January 2007 (UTC)
    • I just offered to do just that on the talk page. Hopefully, one of the contributers to the article will be motivated to clean it up. --Selket 08:22, 3 February 2007 (UTC)
I think alot of the problem is/was the useless "explanatory" text that was added on to the links. The "alphabetical list" however is still in need of work. 68.39.174.238 13:41, 10 February 2007 (UTC)
Book sources looks OK for now. If you think otherwise, lets discuss. DGG 05:05, 22 January 2007 (UTC)
"Suppliers" MAY be removable as SPAM (The nonsearchable/deeplinkable ones especially), but the rest seem legit. 68.39.174.238 01:50, 8 February 2007 (UTC)
Please note that these two articles are outside article space, and other rules may apply. These pages are linkfarms, but with a specific target and thought. When removing the ISBN search capability per WP:NOT#REPOSITORY (not a directory) article space will see more spam which has to be cleaned out. Now both are bad, but at least the list pages outside article space are manageable. --Dirk Beetstra T C 10:34, 10 February 2007 (UTC)
  • I don't want to be the one to do it, but Theology is in pretty bad shape. Selket Talk 07:20, 7 February 2007 (UTC)
I have done some, will you review them to see if I took out something illegitimate? 68.39.174.238 01:53, 8 February 2007 (UTC)
I just took a pretty sharp ax to it again. Have a look; I think some more could still go. --Selket Talk 23:21, 11 February 2007 (UTC)
Down to two, I took out the "conservative Calvinist" — Unless we can link to a broad and representative selection of all theologies (Which I suspect would be an even worse linklist) I don't think we should give preferential linking to any part of it. 68.39.174.238 03:51, 12 February 2007 (UTC)

Recently cleaned articles:

  • Video game music‎ - Was tagged with {{Cleanup-spam}} tag - have cleaned up links and untagged. Have also added a few references (i.e. where External links were more appropiate as references). -- Rehnn83 Talk 09:41, 3 April 2007 (UTC)
  • Widow spider - Was tagged with {{Cleanup-spam}} tag - have cleaned up links and untagged -- Rehnn83 Talk 08:38, 3 April 2007 (UTC)

Some sites that need investigating

  • I may be off-side so I want to have someone experienced with how people creatively spam on wikipedia evaluate this: Template:McGrawHillAnimation. It looks to me like a company, www.maxanim.com (search for links), is inserting a GoogleAds dummy page between the science related wiki pages, and the desired flash animations from McGraw-Hill. The trouble is, the animations contain helpful scientific content, but we no longer have the direct links available on the science related wiki pages, only this template which links through MaxAnimations (i.e. the 'spammer', if it is one, replaces something like http://highered.mcgraw-hill.com/olc/dl/120078/micro50.swf with {{McGrawHillAnimation|genetics|microarray}} Jethero 18:33, 1 April 2007 (UTC)
I've commented out the template code to disable all these thinks. The landing page has nothing but ads. This is totally unacceptable user experience. See Template:McGrawHillAnimation. The user who installed a lot of these User:Arcadian, seems to be a legitimate contributor. I think we've been had by Maxanim. The posted link bait and then switched it out for ads. Jehochman (Talk/Contrib) 13:53, 5 April 2007 (UTC)
  • www.lodgephoto.com (search for links) They have lots of nice photos, sorted by locations, which are probably useful and interesting to people wanting to see photos of those locations. And yet, they are posting here because they want to sell their photos. Or maybe they just like taking photos and selling them is just how they fund themselves. It's a borderline case, but I think it would be a slippery slope to leave them as is. As nice as the photos are, it's too commercial for my liking. PS. Checkout the new template! Regards, Ben Aveling 02:47, 4 February 2007 (UTC)
  • exoticindiaart.com - basically selling crafts
I checked out many of these links and they seem to be legitimate references, not spam or even hawking anything.DUBJAY04 19:17, 13 January 2007 (UTC)
Keep going down the list. The last 40 or 50 of the 138 entries are products/items for sale (sometimes sold). But many of the pages on this site have been linked to from 5 or 6 wikipedia pages. With 138 different links to this site, my AGF exhausts. Regards, Ben Aveling 21:19, 6 February 2007 (UTC)
  • www.eoearth.org Encyclopedia of Earth; this is another encyclopedia website launched several months ago that has articles similar to WP. A group of users (mostly anonymous) are adding external links en masse to the corresponding pages in their encyclopedia on WP (nearly 100 at last count). Many have been warned User talk:128.197.34.220, User talk:KonaScout. They appear to be using WP to promote their new website. A number of other editors and I have removed some of these links, but there are plenty others out there. Calltech 23:08, 21 November 2006 (UTC)
  • Another anonymous user User_talk:69.182.174.152 joined in today to add a number of links to eoearth.org. Removed these links and placed a message on talk page but I'm sure they'll simply use another IP or identifier. Calltech 00:55, 27 November 2006 (UTC)
Removed a few more today. -- Satori Son 19:36, 13 December 2006 (UTC)
  • This is a decent site. I would keep these links if they are references or if they link to an article which is significantly better than Wikipedia's. Otherwise, we can delete them. These links shouldn't be deleted simply because they link to a competitor to Wikipedia. Unfortunately, this site uses Creative Commons, not the GFDL, so we can't simply put their articles on Wikipedia. Andrew_pmk | Talk 02:04, 3 January 2007 (UTC)
  • Unfortunately, this is about several users who were systematically adding links to EoE, using WP as a promotional platform to this site. Bulk additions of links violate WP guidelines and adding links to a site where there is an affiliation is a conflict of interest. All links to EoE were not removed, just those added in the manner described above. The site has not been blacklisted; it is simply being watched to ensure the link spamming does not continue. Calltech 11:44, 5 January 2007 (UTC)
  • Dozens of extlinks to David Pietrusza's site which has linkdirs on various subjects. I've removed some of them but am not sure of the best way to handle this. Also I removed several dozen inappropriate links to dorothyparker.com mostly promoting "walking tours" of Dorothy Parker's old literary hangouts, that were in many articles related to Parker's literary circle. I left in a few which were outside article space or arguably met WP:EL guidelines, but the owner of that site (K72ndst (talk contribs)) restored a bunch of them and there was a reversion contest (he's backed off for now), so someone might want to keep an eye on it (linksearch). Note that dorothyparker.com is not Parker's personal site (she is dead). The owner claims it's an "official" site but this strikes me as dubious--her entire estate went to the NAACP. I removed the link from Parker's biographical page and (after K72ndst reverted the removal) I removed it again and left K72ndst a talk message asking him to supply documentation before restoring the link. 67.117.130.181 04:49, 6 December 2006 (UTC)
  • www.emedicine.com seems to be popping up everywhere. The unobtrusive ads are not bad in and of themselves, but it fails WP:EL#Links_normally_to_be_avoided #1 in most articles it is linked from. -Selket Talk 08:05, 12 February 2007 (UTC)
  • www.chabad.org The Chabad-Lubavitch organisation has some provable authority and chabad.org does appear to have an editorial policy so is suitable as an attributable source for their views, but the Lubavitchers are a very small group within Judaism, certainly well short of the level of influence that would justify nearly 650 links in mainspace. These need a careful review and some pretty ruthless pruning. Guy (Help!) 23:13, 8 March 2007 (UTC)
  • Amazon.com. Yup, you read that right. There are literally thousands of links to amazon.com, almost all of which should not be there. either we should be using the ISBN syntax or they are links to book cover images, which are being used as references for trivial facts (which is original research). The major problem is that these links can be subverted with referral ids. Guy (Help!) 12:52, 18 March 2007 (UTC)
    • Yes, they shouldn't be here. But often not spam as much as people who have no idea how to write a citation and link to the amazon page for the book instead. Septentrionalis PMAnderson 20:24, 30 March 2007 (UTC)

Users to check out

I checked out the link and it did violate WP:EL. Removed and left warning with user. -- Satori Son 07:39, 13 December 2006 (UTC)
  • This anonymous user has almost 100% spam contributions for FHM and the IP itself originates from the FHM office in NYC. -- Tomlouie | talk 17:29, 15 August 2006 (UTC)
  • Special:Contributions/Tangelise - all "contributions" are for promoting FBi Radio. Camillus (talk) 13:24, 25 October 2006 (UTC)
  • User:193.122.103.201, the chemistry lab SPAMmer! 68.39.174.238 15:05, 12 January 2007 (UTC)
  • User:Cada2 linking to http://www.magistermusicae.com/magister-musicae/frontpage.html in pages he is creating, many of which seem probably inclusion worthy, but the link probably isn't. GRBerry 19:14, 9 February 2007 (UTC)
Not only that, he's doing the same thing in eS Cada ( talkcontribs • [../../../../es/l/o/g/Special%7ELog_move_02c8.html page moves] • [../../../../es/l/o/g/Special%7ELog_block_4a41.html block log] ) Local: User:Cada, where it has been shown that his additions are copyright violations! 68.39.174.238 22:02, 9 February 2007 (UTC)
  • Special:Contributions/Davemckay - This user has been editing pages and creating pages in order to link to his website. For example, he created The Untimely Meditations in order to link to his own website's hosted version of a translation taken from the living translator's website, in possible violation of that translator's copyright. Apparently user needs basically all of his edits reverted (and presumably new stubs deleted?), and a stern spam warning on his talk page. I need some help from you WikiSpam folks if this is going to happen soon. Wareh 17:09, 20 February 2007 (UTC)
I just now finally finished cleaning out a huge quantity of linkspam. I got rid of linkspam from Davemckay that came from four IP addresses and two user accounts. See my contributions between then and now for the whole long list; see Davemckay's talk page for the IP addresses and other user account. Wareh 03:03, 23 February 2007 (UTC)

Watchlists

Lists of popular articles:

These are also frequently vandalized.

Technology articles are often prone to spam, as are lists.

both stand-alone and embedded lists.

Here are lists to watch for recent changes:

Informal watchlists:

Standards

The number one rule for Project members is this code of honor: "I will never insert links to my own sites into Wikipedia's article space." Not only is Conflict of interest a guideline that is generally accepted among editors, but many of us who run websites are too committed to their success (however we define it) to impartially judge whether or not they belong in an article. Moreover, we are actively reverting self-promotion linking by other editors, some of whom view the addition of their links as sincere attempts to service various communities. It is easier to gain the respect of these people if we hold ourselves to the highest possible standard and avoid any appearance of double-standards or hypocrisy.

Tag 'em to stop 'em

Suspicious edits automatically deserve a {{subst:uw-spam1}} tag on the user's talk page, with spam or {{uw-spam1}} in the edit summary. This is important! First, to drive the message that spam is not welcome here, and second, to warn us of repeat offenders. If they come back months later there will be a record of their behavior. Placing the warning tag does not take much more effort than removing the spam itself, and can really help the effort to prevent the spam from returning. Successive violations of the spam policy can be met with {{subst:uw-spam2}}, {{subst:uw-spam3}} and then {{subst:uw-spam4}} on the user's page. If a violation occurs after the fourth warning, you should report the offending user at the Administrator intervention against vandalism page.

How to identify spam and spammers

  1. User is anonymous (an IP address)
  2. User:page and/or User_talk:page are red links
  3. No edit summary (other than, perhaps /* External links */)
  4. User has made only one edit, which consisted of inserting a link
  5. User has made multiple edits to related articles
  6. The majority of user's edits are to external links sections
  7. The link is a site that has Google/Yahoo ads (AdSense/SM).
  8. Edits are marked "minor"
  9. Link is trying to sell a product or service. You can use Microsoft's Detecting Online Commercial Intention Tool to help you with the determination.
  10. User adds links to the top of a section, above far more relevant sites
  11. User replaces an existing link or part of an existing link.
  12. The syntax of the added link does not match the syntax used in the rest of the list
  13. User adds links to inappropriate sections of articles ("References", "See also", "For more information")
  14. User adds links that have been previously removed, without discussing on the talk page.
  15. Following a link takes you to a site that does not mention the specific topic of the page containing the link.
  16. Link is unrelated, or only marginally related to the article. For example, link on a biography to a specific page on a genealogy site describing the person's genealogy, but not the person.
  17. User adds links to other Wikipedia articles where he/she has already placed spam links.
  18. User includes within the link description, "hosted on example.com" with a separate link to example.com.
  19. Link is mangled, or it took many edits to get the syntax right. The spammer may be new to Wikipedia and not be familiar with Wikipedia syntax for external links.
  20. Text of the link goes beyond describing the contents to actively encouraging you to read it. For example, including text such as, "Read more about [subject] in [this fascinating article]"

Common spammer strawmen

Spammers will offer arguments like the following. These are strawman arguments, for the reasons listed.

  • "But you have links to commercial sites in the list."
    • Spamming is about promoting your own site or a site you love, not about commercial sites at all. Links to commercial sites are often appropriate. Links to sites for the purpose of using Wikipedia to promote your site are not.
  • "But you have links to other sites that people have added for self-promotion."
    • Those need to go, too. The fact that we haven't gotten around to it, yet, does not mean that we have some obligation to have your site.
  • "But you have a link to site Y, and my site is just like that."
    • We don't need to link to every site in existence that meets a certain criterion. Sometimes we just need one site representative of a category. (See also the comments about linking to web directories instead, so that Wikipedia does not become a web directory.)
  • "But these links have been here for a long time."
    • There are no binding decisions on Wikipedia, especially when the decision was never discussed on the talk page. Just because nobody noticed your spam a long time ago does not mean you now have a "right" to keep it in.
  • "My link is very unique."
    • It is more likely that the link they have added has no more information than the Wikipedia article itself.
  • "My site is non-commercial, so it's not spamming" (Similarly 'nonprofit', 'charitable', opposes cruelty to puppies, etc)".
    • It doesn't matter--being noncommercial (etc.) doesn't confer a license to spam even when it's true, and these sites are often trying to sell something even if the business is organized as a nonprofit.

Assuming good faith

Assuming good faith is an important policy of Wikipedia, but does not require that you assume good intentions when there is evidence to the contrary. Link spamming behavior fits a definite profile. When editors meet this profile, they are engaging in activity which is detrimental to Wikipedia, no matter how sincere they may have been in their edits. We should develop responses to those who engage in this behavior which encourage them to reform into productive Wikipedians, but we should waste no time in protecting Wikipedia from the damaging behavior through reverts and blocks where necessary.

Regular clean-out of undiscussed links

What several editors in some articles do is go in every few days and remove any undiscussed external links. Call it quick and easy "house cleaning." To encourage sincere links, they leave this edit summary:

Regular clean-out of undiscussed links. Please come to Talk page if you want a link not to be cleaned out regularly.

One could easily start this strategy in any article by adding {{subst:Discuss links here}} to its talk page. The plan is to discourage people whose sole intention is self-promotion.

Also, add commented-out warnings to the External links section of the articles, themselves:

<!-- ATTENTION! Please do not add links without discussion and consensus on the talk page. Undiscussed links will be removed. -->

For this purpose the Template:NoMoreLinks has been created.

The strategy is used in the following articles:

This strategy is also helpful to deal with POV and conspiracy links:

What to do with linkfarms

  • Wikipedia policy states that Wikipedia is not a web directory of anything. Sometimes, the easiest and best way is replace the link farm with a reference to a web directory, such as the Open Directory Project ({{dmoz}}) and the Yahoo! Directory ({{yahoo directory}}. For example, see my edits to Model United Nations and Online shopping directory. It works!: check the date - until now, no one has added links to these two pages. --Perfecto 03:39, 16 January 2006 (UTC)
  • One good thing spammers do is find us overlapping product or company lists in several articles (which they create themselves, sadly). For example, one of them helped me find overlapping link farms in Friendster, Social network service and Social network. I found several sites linked in all three! The solution is to put these farms together in one article, and then say "For links to so-and-so sites, see:" on the rest. After this, I find, they start leaving the other articles alone. --Perfecto 22:34, 8 February 2006 (UTC)

The Campaign

We would want a concerted viral marketing strategy involving

and a dash of mentions in help pages, FAQs and fixup templates.

Guidelines and policies

Templates

Spam warnings
Article tags
Policy & Project

{{subst:spam}} and related

These templates should be substituted ({{subst:Uw-spam}}, etc) as per WP:SUBST.

{{Cleanup-spam}}

{{Cleanup-spam}}, which I began, might be useful. See Wikipedia:Spam for more details. -- Perfecto 04:03, 6 December 2005 (UTC)

{{subst:WPSPAM-invite}}

Saw someone revert or remove linkspam? Invite the comrade here with {{subst:WPSPAM-invite}} placed on their User talk page. -- Perfecto 04:27, 30 December 2005 (UTC)

A souped-up alternative: {{subst:WPSPAM-invite-n}}. Λυδαcιτγ 23:00, 18 February 2007 (UTC)

Standardised edit summary

HorsePunchKid suggests a standardised edit summary to raise awareness both of the problem and this particular effort:

Removed link spam. Wikipedia is [[WP:NOT|NOT]] a link directory. Join [[Wikipedia:WikiProject Spam]] to help!

Perfecto uses the following:

Removed [[WP:EL|external link]] [[WP:SPAM|spam]]. ([[WP:WPSPAM|You can help!]])

--Aude suggests:

Removed [[WP:EL|external link]] added by [[User talk:69.159.82.252|69.159.82.252]]. Wikipedia is [[WP:NOT|NOT]] a link directory. ([[WP:WPSPAM||WikiProject Spam]])
Substitute the ip address/user name as appropriate.

TheJabberwʘck suggests (for users of popups):

Reverted [[WP:EL|external link]] addition by [[Special:Contributions/<user>|<user>]] to version %s, using [[:en:Wikipedia:Tools/Navigation_popups|popups]]. Wikipedia is [[WP:NOT|NOT]] a link directory. ([[WP:WPSPAM|you can help!]])

These edit summaries help drive a concerted viral marketing strategy.

Recognition

The coveted Spamstar of Glory is awarded to those who show strong contributions to tracking down and stopping spammers as well as cleaning up their links. Introduced on November 8, 2006 by A. B., it originally consisted of a nicely Photoshopped can of Hormel spam superimposed on a barnstar. Later, due to concerns about infringing on Hormel's trademark, the award was changed to the current design, adapted from the The RickK Anti-Vandalism Barnstar

The Spamstar of Glory
Presented to {{{1}}} for diligence in fighting spam on Wikipedia

List of the proud few awarded this distinctive honor to date

Participants

See the list of participants. You can sign up and help us fight spam on Wikipedia!
As of March 2007 we have over 200 participants.

Userbox

Participants may add this to their userpage instead of signing up.

Code: Results in:
{{User WikiProject Spam}}
This user is a member of WikiProject Spam.

If you prefer not to use userboxes, you may add yourself directly to Category:WikiProject Spam Members by placing the following code on your Userpage: [[Category:WikiProject Spam Members|{{PAGENAME}}]]

Tools

  • Special:Linksearch - find all external links to a particular site, useful when a spam link is added by many different IP addresses or accounts.
  • To combat repeat offenders, you may request to have links added to the Wikimedia-wide spam blacklist.
  • Watch the link addition feed in #wikipedia-en-spam. There is a bot on there that reports all newly added links and keeps track of serial spammers.
  • Daily digests of the logs from the linkwatcher, to see how many times each link was added and by whom: User:Veinor/Link count (today's page: here, or here if the previous is a redlink).
  • If no one is around to add something to the spam blacklist, contact User:Shadow1, User:Eagle 101 or User:Nick and request that the URL be added to Shadowbot's blacklist for automated reversion. You can normally reach all of us at #wikipedia-spam-t on freenode.