User talk:HBC NameWatcherBot/Blacklist

From Wikipedia, the free encyclopedia

< User talk:HBC NameWatcherBot

Archive 1

[edit] Same letter repetition regex

Assuming, as the bot is written in perl, that it uses perl regexen, couldn't:

a{12}|b{12}|c{12}|d{12}|e{12}|f{12}|g{12}|h{12}|i{12}|j{12}|k{12}|l{12}|m{12}: REGEX,NOTE(same-letter string > 11 chars)
n{12}|o{12}|p{12}|q{12}|r{12}|s{12}|t{12}|u{12}|v{12}|w{12}|x{12}|y{12}|z{12}: REGEX,NOTE(same-letter string > 11 chars)

be replaced with

([a-z])\1{11}: REGEX,NOTE(same-letter string > 11 chars)

and be a bit more efficient and certainly more maintainable? SamBC(talk) 12:15, 19 February 2008 (UTC)

Will that actually work? If so, that seems like a really good idea. I wasn't aware you could use matched patterns recursively in the regex itself: if so, it's a really powerful feature, although it may well make them no longer classical regular expressions, since I suspect it may increase the set of grammars it can match by allowing recursion. -- The Anome (talk) 17:16, 19 February 2008 (UTC)

Indeed, these are Perl regular expressions, similar to POSIX extended regular expressions, neither of which is really much like a classical regular expression. They match much more than basic regular grammars. SamBC(talk) 17:50, 19 February 2008 (UTC)

Update: according to this [1], it should work. I'll try it. -- The Anome (talk) 17:22, 19 February 2008 (UTC)

Well, the source code is here: User:H/HBC_MCP/NameWatcher. If that does not make sense, then just try creating a username that matches and see if it works. _(1 == 2)^Until 18:52, 21 February 2008 (UTC)

[edit] Muhammad

I'm considering re-adding this rule:

m[ou]hamm?[ae]d: REGEX,LOW_CONFIDENCE,WAIT_TILL_EDIT,NOTE(Muhammad is a common Muslim name and mention of religious figures is not automatically a violation of WP:U)

Given the offensive nature of some of the usernames which have been registered containing variants of Mohammed, it seems worth tolerating a few false positives to be removed by hand, rather than allowing highly offensive religious slurs to be missed by the filter. As analogous case would be the rule matching "Jesus", which is certainly a common name in many Catholic countries. -- The Anome (talk) 00:48, 20 February 2008 (UTC)

Sorry I reverted this edit without seeing this thread on the talk page. My primary reason for removing this from the black list is the that various ways Muhammad are transliterated make up the most common first names on Earth. It is the more common than John, Smith, Jones etc put together(I think, ^{[citation needed]}).

That being said, I have always thought we need a reporting page outside of WP:UAA where keywords that have a large number of false positives, but still find a lot that would otherwise be missed can be reported to. The Muhammad pattern would fit well on such a page. The bot currently supports this functionality. Thoughts? _(1 == 2)^Until 18:44, 21 February 2008 (UTC)

[edit] Inclusion of "kaffir" (and variants)

I appreciate the inclusion of ethnic slurs, but in this case there is a substantial non-infringing use; kaffir is a variety of lime. If this rule stays in, it at least needs a note of this. SamBC(talk) 12:49, 20 February 2008 (UTC)

Actually, looking at Kaffir, it appears that the racial-slur meaning is only one of a great many, including not-slur terms for other ethnic or tribal groups. SamBC(talk) 13:03, 20 February 2008 (UTC)

[edit] "honky"

Why is "honky" now blacklisted? "Honky-tonk" is certainly used nowadays to refer to a type of piano without reference to the older meaning "honky-tonk" (a sort of brother, IIRC), and even that doesn't seem worth blacklisting. Does it mean something else I've missed? SamBC(talk) 12:51, 20 February 2008 (UTC)

It's a racial slur. I'll whitelist "honky-tonk". -- The Anome (talk) 12:58, 20 February 2008 (UTC)

[edit] "hater"

Didn't we have some bad experience quite recently with amount of false positives generated by "hate" variants? I also remember there being a regex that matched hate variants but only as whole words, but I can't see it in the list now. SamBC(talk) 13:04, 20 February 2008 (UTC)

that's a good idea. I'll add \b's to make it a word match. -- The Anome (talk) 13:21, 20 February 2008 (UTC)

now changed to ;\b(hate)[rs]?\b:REGEX -- The Anome (talk) 13:24, 20 February 2008 (UTC)

[edit] Apologies

My recent attempt at "improving" a regex caused the bot to start marking every username: this is entirely my fault, for which I apologize. (I was distracted by User:EPIC MASTER's sockpuppet outbreak.) Thanks to Bongwarrior for catching it. -- The Anome (talk) 21:15, 21 February 2008 (UTC)

Update: I think I know what was wrong with this pattern. The pattern looked like this: [0-9 .;@'#~!"£$%^&*()-_=+{}]{12}. I think the error was in the ")-_" bit, which matched all ASCII codes between 0x29 ")" and 0x5F "_", thus matching all ASCII capital letters and digits, as well as a lot of punctuation characters. I think [-0-9 .;@'#~!"£$%^&*()_=+{}]{12} would have worked better, but I'm not going to risk it just now. -- The Anome (talk) 14:42, 27 February 2008 (UTC)

Update 2: I think the following regex should match all ASCII punctuation characters and digits correctly, without using any of the characters forbidden by the formatting of this blacklist: [ !-@[-`{-~]. Why, you might ask, don't I simply say [^0-9A-Za-z]? The reason is that I believe processing is occurring bytewise in UTF-8, and this would have the undesirable property of catching all names in non-Latin scripts. -- The Anome (talk) 14:53, 27 February 2008 (UTC)

Given as it's Perl, matching should be happening character-wise in whatever encoding the script has been set to use; as usernames here are in UTF-8 and the author seems to know what s/he's doing, I would assume that the script has been set to use UTF-8. So matches should be character-wise in UTF-8. It's possibly worth checking this and actually trying to match using Unicode character-type descriptors, which should allow you to just say, effectively, "match punctuation". Another trick you may have missed is that the hyphen won't be interpreted as part of a range if it's at the end of the character class. SamBC(talk) 09:27, 28 February 2008 (UTC)

[edit] Bollocks

Apparently this is more offensive in the U.K. (It doesn't seem offensive at all to me.) Should it remain on the blacklist though? Grand master ka 23:26, 21 February 2008 (UTC)

Yes, I believe it's still regarded as quite offensive in the UK, and at one time was apparently regarded as so offensive that the album Never Mind the Bollocks, Here's the Sex Pistols was the subject of a (failed) obscenity prosecution. -- The Anome (talk) 00:26, 22 February 2008 (UTC)

A lot of things had (mostly failed) obscenity prosecutions at the time, and IIRC the Sex Pistols successfully claimed that it was an old word meaning "rubbish". Of course, it also means something else. However, I have concerns about it being in the blacklist. No usernames referencing an album title that a court explicitly decided wasn't obscene? SamBC(talk) 11:37, 22 February 2008 (UTC)

I agree, it's not obscene. However, it is aggressive, and still likely to cause offence. -- The Anome (talk) 11:04, 23 February 2008 (UTC)

No, I'm sorry, this is crazy... would "The Dog's Bollocks" not be an acceptable username then? That just means something really really good. The word can be used in substantial ways that are not offensive, and can we please not base things on what we imagine might be offensive to someone?

N.b.: I'm British. SamBC(talk) 13:00, 23 February 2008 (UTC)

That's why names caught by this bot are submitted for human consideration, as many words can be used in both offensive and inoffensive contexts. The list is compiled on the basis that it's worth an occasional false positive to more catch usernames with deliberate offensive intent with greater probability: however, pattern-based detectors will always have to balance between type I errors and type II errors (see receiver operating curve for a detailed discussion). If we find that this pattern generates excessive number of false positives and wastes admins' time unnecessarily without sufficient reason, we can always remove it, as has happened to many other patterns before. -- The Anome (talk) 13:36, 23 February 2008 (UTC)

We should try our best to limit false positives. I bot can report less clear cut patterns to an alternate page. _(1 == 2)^Until 06:52, 24 February 2008 (UTC)

[edit] User:EPIC MASTER

Anti-scientology sockpuppeteer User:EPIC MASTER has pre-announced a sockpuppery campaign on (a previous revision of) their userpage. See [2] for their own log of previously-blocked usernames. Similar usernames created today: ALPHA-MYTH, ALPHA-GHOST, GAMMA-YELLOW, GAMMA-GREEN, GAMMA-BLUE and GAMMA-RED. They seem to have returned, and created more sockpuppet accounts with names similar to those mooted on their userpage. I've created a regex to try to catch any more usernames along these lines. -- The Anome (talk) 17:42, 24 February 2008 (UTC)

EPIC MASTER seems to have gone away. These should probably be removed soon. -- The Anome (talk) 13:41, 3 March 2008 (UTC)

[edit] LOL etc.

The use of "LOL", "LOLZ", "LULZ", and similar terms in usernames (particularly when repeated eg. "lololol") is often associated with vandalism accounts. However, they are not offensive per se. Should they be added to the blacklist? -- The Anome (talk) 13:39, 3 March 2008 (UTC)

[edit] I've an idea!

how about something like "X is a", where X is a usnerame? 81.149.250.228 (talk) 07:55, 20 March 2008 (UTC)

[edit] Typing patterns

I've re-added the typing pattern regexps to the blacklist. Whilst there is nothing offensive about typing-pattern usernames, they fail to meet the username policy requirements in two ways:

confusion: typing-pattern names like sdsdfjs, sdlfjssdj, sdfjsdlj (just for example) are difficult to remember or distinguish from one another; similarly with (for example) lololololololol and lololololol or eeeeeeeeeeeeeeeeeeee vs. eeeeeeeeeeeeeeeeeeeeee.
disruption: long experience has shown that users with typing-pattern usernames generally do not intend to use their accounts for constructive purposes

-- The Anome (talk) 00:17, 3 April 2008 (UTC)

These patterns are catching usernames such as "poiupoiupoiu" which are not confusing at all, and users are considering them to be examples and reporting even less confusing names such as "z9z9z9". Also, there has been little support on WT:U for the idea that strings such as "eeeeeeeeeeeeee" merit blocking.
The second reason you give -- that the people who get blocked this way are likely going to be disruptive -- has all kinds of things wrong with it. It assumes bad faith. It blocks people who haven't disrupted anything on the assumption that they're going to be disruptive. And in the end, if you give them a username block based on that bot report (remember that UAA is only for username blocks), then you are lying about the reason they're being blocked.

Wikipedia already has ways to block people for being disruptive. The username policy isn't it. Bots need to support the existing policy, not define new policy. rspeer / ɹəədsɹ 01:01, 3 April 2008 (UTC)

The bot does not define any kind of policy at all. It merely flags usernames for human attention. Experience shows that a username of "sdfglsdfgsdjgsljd" is just as clear a sign of disruptive intent (which is a username policy criterion) as "fuckyermom999", and we have actually been blocking such usernames for a long time. As you say, "Wikipedia already has ways to block people for being disruptive". This is one of them. -- The Anome (talk) 01:40, 3 April 2008 (UTC)

No. UAA is for placing username blocks, not disruption blocks. You're suggesting (partially with your words here, and partially through the bot) that we should username block people when their names suggests they might be disruptive.

A username block says "Your name is unacceptable. If you have the patience, please try again with another name." If there's nothing actually wrong with their username, but we username block them for disruption, then the username block is a lie. If we username block them before disruption, then it's a lie and an assumption of bad faith.

It's okay to username block "fuckyermom999" because the name is disruptive. That's different from the person behind it being potentially disruptive, something which has nothing to do with username blocking at all. rspeer / ɹəədsɹ 02:24, 3 April 2008 (UTC)

I like the solution of adding comments to the regexps, to remind admins that they should use their own judgment when blocking usernames which match these patterns, rather than just blocking blindly. (Technical note: I've removed the commas within the comments, which break the bot syntax rules.) -- The Anome (talk) 10:48, 3 April 2008 (UTC)

Okay, thanks. Glad that's resolved. (Funny thing: I was looking at those messages as they started to show up on the bot list, and I was wondering if I had really typed those long sentences with no commas. I wondered how tired I was when I wrote them. But now I understand.) rspeer / ɹəədsɹ 05:05, 4 April 2008 (UTC)

[edit] "screw you" regex edit

The addition of [^p] is good, apart from the fact that, as written, it requires there to be some character after the "screw you" part. You probably want a negative lookahead assertion instead... I can't remember what they look like off the top of my head, but if you like, I'll look it up later. SamBC(talk) 10:34, 4 April 2008 (UTC)

Good catch, I have used the whitelist to accomplish the same thing. Thanks. _(1 == 2)^Until 13:45, 4 April 2008 (UTC)

[edit] "traitor" and "treason"

I really don't think these are strong enough words to warrant blacklisting. --Conti|✉ 16:56, 5 April 2008 (UTC)

Not generally, no. But their usage is sometimes an indication of a particular sockpuppet account, and it has proven useful on a few occasions. --Bongwarrior (talk) 17:53, 5 April 2008 (UTC)

Was User:Traitor buster a sockpuppet, too, then? I'm still a bit confused about this whole thing, actually, since it seems that pretty much any username reported by this bot gets blocked more or less automatically. Which means that usernames that contain "traitor" or "treason" get blocked, too, regardless of any sockpuppetry or not. Anyways, we could add "WAIT_TILL_EDIT" to those two names, at least. --Conti|✉ 14:36, 6 April 2008 (UTC)

First off, names on this list do not get blocked automatically. Secondly, unless there is a specific sock puppet that uses those terms they should be left out. And if specific sock puppets do use that term it should be clarified in a label. If it is to catch sock puppets, then WAIT_TILL_EDIT would be counter productive, a LABEL() explaining it would do better.

Conti, if you know of any admins that automatically blocks names that the bot reports without considering policy, just let me know about it by providing some diffs. _(1 == 2)^Until 14:41, 6 April 2008 (UTC)

Well, automatically in the sense of "Gets blocked by an admin as soon as it's reported by the bot", but I might be wrong there. Why was the user I mentioned above blocked, for example? No contribs, no deleted contribs, either, no mention of sockpuppetry, and the name itself doesn't really look that bad to me. The only reason I can see right now why that user was blocked is because it was reported by this bot. The bot's blacklist also has a "Sock puppets or impersonation" section, so if those names are used by sockpuppets, they should probably be moved. And not blocked, unless they actually are sockpuppets. :) --Conti|✉ 15:25, 6 April 2008 (UTC)

Looks to me like a convincing argument could be made that the name implied an intent to disrupt, by "busting" any other users that the user in question believed to be "traitors". However, that ought to have been handled by discussion, I think, as the name could also have been chosen in good faith. There are some admins, or some admins at some times, who take the bots reports as automatically invalid, which is saddening, but I haven't made a study to figure out who they are, I just see it sometimes and sigh, as every attempt to bring it up before as ended up with big arguments. SamBC(talk) 15:29, 6 April 2008 (UTC)

So, does anyone disagree with me removing the two entries? Or at least add "WAIT_TILL_EDIT" to it? --Conti|✉ 18:30, 8 April 2008 (UTC)

I would disagree with removing it. The percentage of false positives seems to be pretty low, and I would hope most admins would have enough sense to realize that "traitor" and "treason" aren't username violations by themselves. I would be hesitant to add "WAIT_TILL_EDIT" to it, because this sockpuppet has done this numerous times before, and he knows he's going to be blocked, so he tries to tag as many articles as he can before he gets caught. Delaying the report will increase the damage. Having said that, it would be advisable to wait for edits before actually blocking, unless the username obviously matches the pattern. --Bongwarrior (talk) 20:48, 8 April 2008 (UTC)

Well, if this is a sockpuppet issue (is s/he using both names?), it should moved into the right section at least. "SOCK_PUPPET" should be added, too. A note that the two names in itself aren't username violations by themselves sounds useful, too. Most of the names listed here are, after all, so it's easy to think that all names/terms listed here are clear username violations. --Conti|✉ 20:55, 8 April 2008 (UTC)

Agreed, both of those are good ideas. --Bongwarrior (talk) 21:03, 8 April 2008 (UTC)

Alright then. Now I just need the name of the sockpuppeteer who uses these two names and I'll move the terms and add the flags. --Conti|✉ 22:43, 8 April 2008 (UTC)

Sounds like a good plan. _(1 == 2)^Until 23:39, 8 April 2008 (UTC)

[edit] adjacent keys

qwertyu|yuiop|dfgh|fghj|ghjk|hjkl|asdasd|asdf|zxcvb|cvbnm: REGEX,NOTE(probable random-typing username. Although this kind of name is flagged by the bot it may not merit a username block. Please be sure that any blocks you place are supported by the username policy.),LABEL(adjacent keys)

Do we really need this? _(1 == 2)^Until 04:20, 8 April 2008 (UTC)

Not at all. Valid blocks based on these rules were few and far between before the policy change, and now they will never be valid except by a strange coincidence. rspeer / ɹəədsɹ 04:41, 8 April 2008 (UTC)

Not needed. The vast majority of names flagged by this regex are harmless. --Bongwarrior (talk) 20:48, 8 April 2008 (UTC)

I have removed the whole section. _(1 == 2)^Until 21:55, 8 April 2008 (UTC)

[edit] Wait till edit

It seems to me we should be using this flag a lot more. Most of these patterns are not obviously disruptive (e.g. "Smelly") and really don't need to even be reported unless the user actually edits. Mango juice^talk 14:01, 9 April 2008 (UTC)

I'd agree, although I'd also like to note that there are definitely some patterns that don't want to wait until an edit. SamBC(talk) 14:26, 9 April 2008 (UTC)

I agree on both counts. Something you could say in a schoolyard and not get in trouble from your teacher like "smelly" is a good candidate. Unless it is used as personal attack then I don't see it likely to be a violation. ;is\s?smelly:REGEX would be more likely to get real hits. _(1 == 2)^Until 15:12, 9 April 2008 (UTC)

Ok, I added WAIT_TILL_EDIT to a whole bunch of rules. I don't feel it's necessary for us to get into specifics; anyone should feel free to remove the flag from anything they think might be important to look at before edits; I probably won't object over individual lines. For all the ones I added the flag to, I believe that the worst cases are ones where I could still take or leave blocking the user if they haven't edited, and the typical case would be one where I wouldn't block if there aren't any edits to judge from. Mango juice^talk 16:46, 9 April 2008 (UTC)

I agree with the most part with you choices, however I have removed the WAIT_TILL_EDIT from "molest" for two reasons: 1) It is unlikely to give many false positives(while it does have innocent uses, the term is not often used outside of references to abuse due to its taint, imho), and 2) it is the sort of term that is likely to involve extreme unpleasantness and we don't want that user editing before we block the name. Otherwise good choices. _(1 == 2)^Until 17:29, 9 April 2008 (UTC)

Sound fine to me. Mango juice^talk 17:31, 9 April 2008 (UTC)

I added WAIT_TILL_EDIT to one more rule: the one for bots. I did this not because this type isn't serious, but because there are probably some cases where a bot is actually legitimate and it seems to me that if we go blocking them before they edit, it might cause difficulties. (By the time they start editing, everything should be made clear.) Mango juice^talk 19:37, 15 April 2008 (UTC)

[edit] Swallows

I'm sorry, but how is "swallows?" a valid pattern? On its own, it could be perfectly innocent; surely a better pattern can be come up with that won't get as many false positives? SamBC(talk) 23:43, 18 April 2008 (UTC)

[edit] Stalin's Enema

Not contesting the fact that that name is problematic, but I don't see that there's any reason to assume that any name including either "stalin" or "enema" is likely to be a problem. I thus disagree with their inclusion on the blacklist. SamBC(talk) 12:41, 28 April 2008 (UTC)

While I can understand why we would not blacklist the word Stalin, I am pretty sure that we want to be notified when someone uses the word "enema" in their name. _(1 == 2)^Until 14:36, 28 April 2008 (UTC)

Well, my immediate thought as to a reasonable use of the term would be to reference an album title, "Enema Of The State". That's just off the top of my head. SamBC(talk) 15:28, 28 April 2008 (UTC)

[edit] LBHS, lemonbay, mantaray

Firstly, I've never seen any username containing LBHS, lemonbay, or mantaray which edited harmoniously, but I have seen various usernames containing these threads that have been sockpuppets of a particular account that vandalized articles. The vandal is listed at WP:Long term abuse, so why not have the username bots search for that person's sockpuppets? GO-PCHS-NJROTC (talk) 21:13, 28 April 2008 (UTC)

[edit] Trademark symbol, Registered trademark symbol and Copyright symbol

Should we change:

™
   NOTE(Trademark symbol)
®
   NOTE(Registered trademark symbol)
©
   NOTE(Copyright symbol)

™
   NOTE(Trademark symbol - please ensure that this is being used in a legal manner, rather than for stylistic purposes)
®
   NOTE(Registered trademark symbol - please ensure that this is being used in a legal manner, rather than for stylistic purposes)
©
   NOTE(Copyright symbol - please ensure that this is being used in a legal manner, rather than for stylistic purposes)

?...... Dendodge .. Talk^Help 20:09, 15 May 2008 (UTC)

My short answer would be "no". Usernames should identify users; trademarks, on the other hand, identify products. A second, more technical point, that I was not aware of myself until I was bitten by it: commas in comments, notes, labels, etc. confuse the parser for the blocking patterns, so they should be avoided. -- The Anome (talk) 11:59, 17 May 2008 (UTC)

[edit] "Colective" and probable variants

We have a vandal which seems highly prone to using the word "colective" in their usernames when planning to circumvent semiprots on user talkpages which he targets. Putting forward that "colective" and any other variant misspellings of collective or even the word itself should be blacklisted to prevent this. treelo _talk 19:25, 17 May 2008 (UTC)

[edit] Drugs

Mentioning drugs, such as cocain, crack, pot, etc could be disruptive, so shouldn't NameWatcherBot look for those too? GO-PCHS-NJROTC (talk) 16:04, 21 May 2008 (UTC)

In the past at WP:RFCN and WT:U it has been determined that mention of drugs is not automatically disruptive. Go from country to country and you will find that attitudes on drugs vary considerably. While consensus can change, I for one still think that the demonizing of drugs is not representative of the world view. _{1 != 2} 16:09, 21 May 2008 (UTC)

Besides, usernames like User:Bongwarrior and User:HighInBC have not shown themselves to be disruptive. I am sure there are many more such drug related names that have not been disruptive. _{1 != 2} 16:11, 21 May 2008 (UTC)

[edit] WAIT_TILL_EDIT

(In regards to this edit) Why on Earth would we want names with the words "twat", "shit", and "whore" to edit before being reported? I think any name with these words need to be blocked BEFORE they start editing. Why wait? _{1 != 2} 14:11, 8 June 2008 (UTC)

Hello? _{1 != 2} 15:47, 10 June 2008 (UTC)

No argument from me! —Krellis (Talk) 16:09, 10 June 2008 (UTC)

Requesting edit to protected page: This edit needs to be undone. Attempts to discuss have led only to agreement. _{1 != 2} 15:54, 11 June 2008 (UTC) {{editprotected}}

Done Happy‑melon 16:01, 11 June 2008 (UTC)