User talk:Lupin/badwords

From Wikipedia, the free encyclopedia

Contents

[edit] A curious question

I may, perhaps, be harder to offend than the average american, but how is "all the pies" considered a "bad word"? :) - JustinWick 08:34, 31 January 2006 (UTC)

See Who Ate All the Pies? - it's a "classic" playground football insult. Lupin|talk|popups 16:19, 31 January 2006 (UTC)
Wow, an informative response! Thanks, I learned something! - JustinWick 05:41, 12 February 2006 (UTC)

[edit] Suggestion

You should add more variations of the bad words. I can think of some you may have missed. Evan Robidoux 09:11, 24 February 2006 (UTC)

What are you thinking of? I can add them in. -Mysekurity 09:25, 24 February 2006 (UTC)
  • Image:Human_feces.jpg
  • suks
  • kill, especially with exclamation points.
  • Variations of the word "die," especially with exclamation marks (e.g.: "Die!")

That's all I can think of right now. Evan Robidoux 09:42, 24 February 2006 (UTC)

[edit] Another suggestion

terms youve missed are permutations of a,s,d, and f. on a qwerty keyboard if you mash the keys most people end up writing "asdasdasdf" or similar. vandal edits usually give an edit summary of mashed keys.-- Alfakim --  talk  18:02, 13 April 2006 (UTC)

Actually, vandals usually give no edit summary, or only a section edit summary. This is probably because most of them are new users who haven't noticed the summary box.--Reverting 02:49, 6 June 2006 (UTC)

[edit] Regexp

Is it possible for this to support Regxps? It seems to me that a good number of these edits and such could be used for good (see this dif, where the word vegan was added...)? -Mysekurity[m!] 21:17, 27 April 2006 (UTC)

Yes, I've had a go at this. Note that ( is replaced with (?: - the idea is that all paren groups are transformed into non-capturing parens so that it doesn't mess up script internals. This means that backreferences aren't possible and also that you should avoid opening parens apart from using them for grouping at the moment. Also, each regexp is treated as if it's surrounded by word boundary markers, it is made case-insensitive, and flags aren't supported. To add a regexp, surround it by forward slashes and add it to badwords. I haven't tested this much, so let me know how you fare... Lupin|talk|popups 02:13, 28 April 2006 (UTC)

[edit] More Bad Words

This is just a suggestion, I didn't add any of these.
REDIRECT--Maybe this will work against WoW, or redirect vandals.
chicks--as in "I like hot chicks.
stupid--"article is stupid--I'm surprised you don't already have this.
Also, many vandals like to type in ALL CAPS, so maybe you can do something about this.

I disagree with REDIRECT, as it will give a huge number of false-positives for every time someone moves a page, or creates a redirect. It's broad words like this that make the tool much less useful. I'm going to remove it. -Mysekurity[m!] 01:20, 10 May 2006 (UTC)

[edit] Wang

How is Wang a bad word? It is a common Chinese family name. Andrew_pmk | Talk 02:37, 2 May 2006 (UTC)

It is also slang for penis, along with about a million other words to refer to genitalia (there's a certain stigma attached to private parts, as I understand). This is the type of situation where I think REGXPs (see above) would work well. Unfortunately, I'm not too good with them, so if you have any suggestions based on Lupin's post above, feel free to tell me and I'll change the page. -Mysekurity[m!] 02:45, 8 May 2006 (UTC)

[edit] I couldn't think of a title for this...

What about ____ on Wheels? And they aren't all bad words. Just words vandals like to use.-Gangsta-Easter-Bunny 20:09, 5 May 2006 (UTC)

It's already there (see "On Wheels"). -Mysekurity[m!] 01:19, 10 May 2006 (UTC)

[edit] Case sensitive?

Are the "badwords" listed here case sensitive? By that I mean will a word, say "bitch" still be detected if it is written "BITCH", for example, without a seperate entry for an all-caps version of the word having to exist?--Conrad Devonshire Talk 01:39, 9 May 2006 (UTC)

They're all case-insensitive, so the answer to your second question is "yes". Lupin|talk|popups 02:32, 17 May 2006 (UTC)
Here's the thing though... I've seen more than a few vandalous edits where the entire edit was done all in caps. is there any way that we can filter for an "all Caps" edit? Fbarton 00:19, 8 December 2006 (UTC)

[edit] Removed "fist"

I decided to remove "fist" from the list, but if anyone disagrees with this decision, feel free to undo it.--Conrad Devonshire Talk 21:37, 9 May 2006 (UTC)

[edit] Removed "woody"

I have decided to remove "woody" from the list of vandal terms.--Conrad Devonshire Talk 01:37, 17 May 2006 (UTC)

[edit] Moravia?

Why is "Moravia" on the list of vandal terms?--Conrad Devonshire Talk 21:54, 28 May 2006 (UTC)

No idea :) Here's the diff. Lupin|talk|popups 01:39, 30 May 2006 (UTC)

[edit] Linkspam

I've added three links to the list. I don't think they should be banned from Wikipedia outright, but they have been added a lot recently and I'd like to keep an eye on them. If this is not the kind of thing we want on this list, feel free to remove them. Tom Harrison Talk 14:50, 3 June 2006 (UTC)

[edit] Badwords fork

Rather than ask for consensus every time I wanted to remove a false positive, I've split off my own badwords list which is slightly more optimized. Anyone who is interested is welcome to use it: http://en.wikipedia.org/wiki/User:Can%27t_sleep%2C_clown_will_eat_me/badwords -- Can't sleep, clown will eat me 02:32, 5 June 2006 (UTC)

Forking is fine of course, but I'd rather people were bold and changed the page as they saw false positives or missing bad words crop up instead of trying to come to some sort of consensus in advance. If there's controversy there can be discussion, but I don't want anyone to think that there's a requirement to discuss before making changes. Lupin|talk|popups 06:51, 6 June 2006 (UTC)

[edit] gabenwell.com and churnedfortaste.com

I have added these two sites to the list. If you see a link to either one of them posted, DO NOT CLICK IT. It will cause a window with an offensive image to appear and will attempt to open tons of Outlook Express and and Instant Messenger windows and try to send e-mail to the GNAA. They were posted by now-blocked user Churnedfortaste. Another mirror of this site, hentai.net has also been spammed according to the Spam Blacklist but has since been blacklisted.--Conrad Devonshire Talk 03:06, 11 July 2006 (UTC)

  • These to have now been added to the spam blacklist.--Conrad Devonshire Talk 21:30, 11 July 2006 (UTC)

[edit] Ho

Could someone please remove "ho" from the list? I looked for it myself, but couldn't find it.--The Count of Monte Cristo Parley 10:13, 1 August 2006 (UTC)

Done. I couldn't find it either, so I wrote a script which I've included below for reference. Lupin|talk|popups 01:17, 2 August 2006 (UTC)
#!/usr/bin/env perl
# usage: findbad.pl testword < badwords
my $test=@ARGV[0];
while (<STDIN>) {
  next unless m!^/(.*)/$!;
  my $re=$1;
  if ($test =~ /$re/i) {
    print "$.: $_";
  }
}

[edit] Triple

I have removed "triple", as it was giving lots of false-positives, and I can't imagine any bad use of it. -Goldom ‽‽‽ 11:50, 5 August 2006 (UTC)

Apparently, I haven't, cause it's still showing up. Not sure what I actually did there, in that case, so I reverted in case it was something bad. If someone else could remove it properly, unless there's a reason to keep it, that'd be great. -Goldom ‽‽‽ 11:54, 5 August 2006 (UTC)
The motivation was that Colbert vandals are saying that various populations have tripled, apparently. I have removed the line, though, and have added instructions on getting the change to take hold at the top of the page. Lupin|talk|popups 13:57, 5 August 2006 (UTC)

[edit] TTT

Why is TTT flagged as a bad word? -- Selmo 04:33, 18 August 2006 (UTC)

[edit] nigger

What do you think of the idea of adding nigger(s) to the black list? I saw it twice tonight Lucasbfr 02:18, 20 August 2006 (UTC)

I'm sorry, racial slurs are terrible things, etc, but that's a fairly amusing (hopefully unintentional) pun. Yes, I am that insensitive.- JustinWick 09:32, 7 December 2006 (UTC)
It's K, I was thinking the same thing. -Patstuarttalk|edits 10:05, 7 December 2006 (UTC)

[edit] queer

I've been using your tool (which I LOVE) and a few times "queer" came up because the TV show "Queer eye for the straight guy" was mentioned. Is it possible to make that an exception to the scan for that word? Lauren 18:56, 20 August 2006 (UTC)

[edit] Regular expression idioms

Wherever a space appears in a regular expression, it could be replaced with \s* to allow one or more spaces to match. Also useful: (e?s|[e']?d|in[g']?|ers?)? to catch verb paradigms such as pick, picks, picked, picker, pick'd, picking, pickin', and so on. Peter O. (Talk) 02:53, 23 August 2006 (UTC)

[edit] Noxious SPAMmer

Since "datasheet4u.com" has done NOTHING but SPAM datasheet, could someone add this to the list to prevent sneaky insertions (It's already on the SPAM blacklist, but they just don't link it instead)? Thanx. 68.39.174.238 23:26, 5 September 2006 (UTC)

[edit] Regex

How come these two rules I made to match vandalism which often involve the use of more than 2 ?'s and !'s don't seem to work? What is wrong with them and what's athe right way of matching multiple question marks and multiple exclamation mark?

/!{2,}/

/\?{2,}/

Sir Vicious 01:34, 1 November 2006 (UTC)

Regular expressions are awful. They never do what you expect them to do (or what documentation says they should do); they work differently on each system, and what's more, the huge amount of the afore mentioned documentation never seems to solve the problem. -Patstuart(talk)(contribs) 03:04, 1 November 2006 (UTC)
Thanks for the comment. So, are there better ways of matching them? I've tried /!!+/ too but it did not seem to have the desired effect, it matched a single "!" too, weird. Sir Vicious 03:50, 1 November 2006 (UTC)
Come to think of it, maybe I don't need to use regexp at all, I can just match ?? and !!, any case where more than 2 marks is used will also automatically be matched. Sir Vicious 03:59, 1 November 2006 (UTC)
I've tried some stuff in the sandbox; it's picking up Niger (I added that as a reg ex actually to pick up nigar), but it's not picking up n00b, which is on the list either, and I could have sworn it would pick up. *Sigh*. Patstuart(talk)(contribs) 04:08, 1 November 2006 (UTC)
Ha! As I typed this, look at this edit: [1]. and I thought picking up niger was bad! Patstuart(talk)(contribs) 04:09, 1 November 2006 (UTC)
Hehe, yes, there is always an idiot out there who can't even vandalize right =) Sir Vicious 04:13, 1 November 2006 (UTC)


[edit] Possible or impossible

I don't know if this would be possible, but I've seen a lot of vandalism today where the user put their own username into an article. I found them through the badwords filter, but I wonder how much "Graffiti" we're missing because of this. Is there a way to check if the added text is equivalent to the editor's username? Fbarton 19:01, 8 December 2006 (UTC)

[edit] Innovative vandalism

Just came across this. Not sure how to add <nowiki> and </nowiki> to this list. —Dylan Lake 02:00, 13 December 2006 (UTC)