Wikipedia:Bots/Requests for approval/ClueBot V
From Wikipedia, the free encyclopedia
- The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Request Expired.
Contents |
[edit] ClueBot V
tasks • contribs • count • sul • logs • page moves • block user • block log • flag log • flag bot
Automatic or Manually Assisted: Automatic, unsupervised.
Programming Language(s): PHP, my classes.
Function Summary: Analyze new pages and add appropriate tags.
Edit period(s) (e.g. Continuous, daily, one time run): Continuous
Edit rate requested: No more than 3 edits per new page. (maxlag=2) (1: tag, 2: possibly notify, and 3: possibly log the action, though I will probably drop the log step shortly after it is established)
Already has a bot flag (Y/N): N
Function Details: Analyze new pages and add appropriate tags to the new page. See User:ClueBot V/NPP for more information on the kinds of tags it will add and when it will add them. That page is the log of what ClueBot V would do if it were running in read-write mode. For the db tags, it uses heuristics similar to that of ClueBot, but more refined for new pages.
[edit] Discussion
- Normally I'd be cautious of bots that tag new pages for deletion. However, looking over the log the bot should have few (if any) false positives. Also, Cobi knows what he is doing, and if their are any issues they'll be resolved. Mønobi 05:34, 18 February 2008 (UTC)
- I have all faith in Cobi and ClueBot, but I just feel CSD is the one of the things that Bots shouldn't handle. I'd say go ahead for the rest of the tags, but skip the speedy deletion notices. my $.02 Q T C 05:48, 18 February 2008 (UTC)
- On a personal note, I have no problems whatosever with it - I often considered tagging new pages under AVB but never found the time. A bot does have pretty good accuracy at this sort of thing - on an random sampling of edits, its no more controversial than what I would have done as a human. People won't be comfortable with the idea of bots speedy tagging stuff but then again, they weren't happy when the fist anti vandalism bots kicked in either. It's not a vote, and I don't want to singlehandedly make a decision on this one, but I'd like to see a trial and see how people accept it. If someone else in BAG wants to chime in, I'd appreciate it. -- Tawker (talk) 07:34, 18 February 2008 (UTC)
- I'd support it - I think it's a really good idea - but announcing it two days after mine? Eeek. Ale_Jrbtalk 10:02, 18 February 2008 (UTC)
-
- Well, I had been planning it for quite a while, and I just finally got around to finishing up the code that I had lying around. I don't see any reason why both of ours can't run at the same time, though. :) -- Cobi(t|c|b) 17:28, 18 February 2008 (UTC)
- Heh. Either way, as I said, I think the idea is good. Ale_Jrbtalk 18:09, 18 February 2008 (UTC)
- Cobi has really reliable bots, so I see no problem with Approved for trial (3 days). Soxred93 | talk bot 18:45, 18 February 2008 (UTC)
- Indeed he does, but this needs more community input. We had a proposal like this a while ago which I believe fell down at some point, and we would do well to review that and community opinion before moving into trials. Cobi: could you post to VP and AN (etc) mentioning this BRFA please? Martinp23 19:59, 18 February 2008 (UTC)
- Cobi has really reliable bots, so I see no problem with Approved for trial (3 days). Soxred93 | talk bot 18:45, 18 February 2008 (UTC)
- Heh. Either way, as I said, I think the idea is good. Ale_Jrbtalk 18:09, 18 February 2008 (UTC)
- Well, I had been planning it for quite a while, and I just finally got around to finishing up the code that I had lying around. I don't see any reason why both of ours can't run at the same time, though. :) -- Cobi(t|c|b) 17:28, 18 February 2008 (UTC)
(undent) If there was a restriction on the types of CSDs it could do, I think it might be less controversial. For instnace, if a page has no string of characters that matches any term in the English language, its a speedy under either G1 or A2. On the other hand,a valid article about a recently convicted felon, who is notable, might get scooped up in a G10 attack. I'd say that all the CSDs, except G4, G6, G10, G12, and A7 (whatever happened to A6?), would be good for this bot to work on. MBisanz talk 21:43, 18 February 2008 (UTC)
- I can agree with that, and it seems Cobi has implemented it, but don['t we already have a bot that tags copyright infringments (G12)? –Crazytales talk 21:55, 18 February 2008 (UTC)
- You should look at User:ClueBot V/NPP to see what it tags already. No, it doesn't tag for a lot of CSDs, just ones that a bot could pick up on easily. -- Cobi(t|c|b) 21:48, 18 February 2008 (UTC)
- Yea, the only one I saw there I could object to is "Club bot (+16) by User:BoomanthaGreat — {{db-attack}}, {{db-nocontext}}, {{uncategorized}}, {{wikify}}, {{unreferenced}}", for including a db-attack tag. Would it hurt the expected efficiency a great deal to leave that tag out? Also, over what time period was that sample collected? MBisanz talk 22:12, 18 February 2008 (UTC)
- Well, the bot uses {{db-attack}} on pages that say something along the lines of (sorry for the profanity): "FUCK YOU" or "FUCK WIKIPEDIA" or a number of other common phrases like that. I can't really see an instance where something like that would seriously not be deleted. -- Cobi(t|c|b) 22:17, 18 February 2008 (UTC)
- Ok, then I don't mind attack being included. I was afraid its heuristics would pick up a news article that placed undue wieght for instance on a murder conviction. —Preceding unsigned comment added by MBisanz (talk • contribs)
- Well, the bot uses {{db-attack}} on pages that say something along the lines of (sorry for the profanity): "FUCK YOU" or "FUCK WIKIPEDIA" or a number of other common phrases like that. I can't really see an instance where something like that would seriously not be deleted. -- Cobi(t|c|b) 22:17, 18 February 2008 (UTC)
- Yea, the only one I saw there I could object to is "Club bot (+16) by User:BoomanthaGreat — {{db-attack}}, {{db-nocontext}}, {{uncategorized}}, {{wikify}}, {{unreferenced}}", for including a db-attack tag. Would it hurt the expected efficiency a great deal to leave that tag out? Also, over what time period was that sample collected? MBisanz talk 22:12, 18 February 2008 (UTC)
- I'm concerned by the {{db-nocontext}} tags, which don't seem like something a bot can reliably assess. Primera B 2007, for example, is not a stellar article but doesn't look like a speedy to me. rspeer / ɹəədsɹ 23:06, 18 February 2008 (UTC)
-
- Well, that article has been improved since ClueBot V looked at it ... When ClueBot V looked at it it was:
“ | The Torneo Primera B of 2006 was full of expectations, since Centauros Villavicencio were relegated from the Colombian Professional Football league | ” |
-
- One sentence, no period, and it seems to be the opinion of the author, not a fact. Although it has improved now, the log entries should be investigated with regard to the fact that they may have been modified since ClueBot V saw it. -- Cobi(t|c|b) 23:37, 18 February 2008 (UTC)
- Okay, based on that explanation, I don't actually have a problem with it tagging {{db-nocontext}}. My only objection would be that it seems to jump on in-progress articles, but that's just an issue of timing that could be easily tweaked. I support this bot task. rspeer / ɹəədsɹ 19:53, 19 February 2008 (UTC)
- Support I think this bot a great idea. And the examples of tagging show no problems to me.--Sunny910910 (talk|Contributions|Guest) 03:16, 19 February 2008 (UTC)
- Support, subject to removing {{db-nocontext}} from the list, as the bot seems to have trouble with that one. עוד מישהו Od Mishehu 16:23, 19 February 2008 (UTC)
- Support also. I can't see the deleted pages in the sample list but the of ones which still exist, their initial revisions seem to have been very well assessed. • Anakin (talk) 17:04, 19 February 2008 (UTC)
- Support. I'd rather see this go ahead, and then make tweaks, and - worse case - decide to severely cut back, then to have us lack the boldness to actually try it and see what happens. We can posit all sorts of problems, but if we don't actually test this out, thoroughly - and by "test", I mean that the bot runs in real production mode for a while - then we're never going to know all that this bot can do to reduce the amount of (relatively routine) work that human editors have to do. -- John Broughton (♫♫) 23:14, 19 February 2008 (UTC)
- Oppose. At least the way it currently runs. Of the two pages I've seen tagged {{db-nocontext}} by this bot so far, one should not have been tagged for speedy at all, and the other one should instead have been tagged {{db-attack}}. When I tried to insert that tag myself, I was slowed down by an edit conflict with the bot.
- Additionally, my list of contributions allows me to follow up on tags I insert, to make sure no one removes tags from articles he created himself and goes unnoticed. I do not see the bot doing that kind of follow-up.
- I can see how the bot can be useful: if it keeps inserting the cleanup tags as it currently does, but the bot should wait at least 5 minutes after article creation before inserting any tag (to avoid edit conflicts with human patrollers). As for insertion of speedy deletion tags by a bot, here's an idea: why not automatically tag a recreation of a recently speedied (or otherwise deleted) page? --Blanchardb-Me•MyEars•MyMouth-timed 23:40, 19 February 2008 (UTC)
- One sentence, no period, and it seems to be the opinion of the author, not a fact. Although it has improved now, the log entries should be investigated with regard to the fact that they may have been modified since ClueBot V saw it. -- Cobi(t|c|b) 23:37, 18 February 2008 (UTC)
- Definitely, the bot should not bother with speedy tagging (unless it makes sure its speedy tags can only be removed by admins or else reinserted) and it should wait its turn (I had three edit conflicts with the bot since my entry above). --Blanchardb-Me•MyEars•MyMouth-timed 00:14, 20 February 2008 (UTC)
-
- Stop this bot from speedy altogether. Speedy requires judgement. Placing db-empty tags on articles within seconds after first editing makes it impossible for newbies to write articles tin their natural way. Warning tags, sure, but not speedies. It is completely destructive to orderly process. We have enough difficulties with speedy when humans do it. DGG (talk) 01:24, 20 February 2008 (UTC)
-
- Speedy properly requires two human beings, not just one . In some case a human admin will delete immediately when it is really obvious, but in general this should not be bypassed. the effect of this bot is to make speedy by a single admin routine. I don't think therei s or would be or should be any consensus on this. DGG (talk) 01:27, 20 February 2008 (UTC)
- Hey, I'm a dyed-in-the-wool deletionist, but having a bot tag articles for speedy deletion is a bad, bad idea. There is no way to apply any speedy without some judgement, which a bot does not have. UnitedStatesian (talk) 01:55, 20 February 2008 (UTC)
- Stop per DGG. Nobody (bot or human) should tag short articles within minutes of creation. A speedy deletion template is not just a notice to an admin to maybe delete a page; it's a notice to a new user that their page is being rejected. Can be very discouraging to new users. This bot could place a much less intimidating tag, just a little one saying "very short article" or something; it should wait an hour after page creation before tagging articles before being too short at all; and later a human can replace the "very short article" tag with a speedy tag. Insufficient discussion has taken place for letting a bot put speedy tags on articles. Articles are not to be tagged as too short within minutes of creation -- see WP:CSD "Contributors sometimes create articles over several edits, so try to avoid deleting a page too soon after its creation if it appears incomplete." and "Before nominating an article for speedy deletion, consider whether it could be improved, reduced to a stub, merged or redirected elsewhere or be handled with some other action short of deletion." --Coppertwig (talk) 01:57, 20 February 2008 (UTC)
- I may have over-reacted a bit. Cluebot does a lot of good work in other areas. I can see how this can be useful for attack pages. However, in the log, Newton Tracey was incorrectly tagged. Also, it must not speedy-tag articles for being too short if they've just been created a few minutes before. Some very short articles may be valid articles, too.
- I was about to propose a method of doing "slow speedies". This could be useful here. The code in Template:db-t3/date could be used to change the look of the template after, say, an hour. At first it could be set to not look like a big, intimidating speedy template. (Actually, I'm not sure that would work. The user should not be warned when they've just created a short page, but on the other hand they should be warned before their page is deleted; and I don't know how the mechanism to adjust the category is activated with db-t3.) --Coppertwig (talk) 03:52, 20 February 2008 (UTC)
- Given the last three entries, I move to have the bot stopped without waiting for the end of the three-day trial. --Blanchardb-Me•MyEars•MyMouth-timed 02:01, 20 February 2008 (UTC)
- Limited support for a limited trial. Recommend (a) starting on a narrow range, leaving nocontext and other tags requiring judgment to a later time, (b) Waiting a period of time before tagging except for obvious vandalism articles, there is no point tagging an article the minute it's created and while it may be in progress; and (c) Add a special notice alerting the admin that this is a bot and asking the admin to double-check its accurate and report any false positives; (d) collect the reports before we give permanent permission or expand the range. Really suggest starting slow on this one and building up rather than coming out with everything blazing and really annoying a lot of people if it diesn't work. Best, --Shirahadasha (talk) 02:33, 20 February 2008 (UTC)
- Stop the bot. It's applying tags that make no sense, e.g., tagging this as db-test, this as db-nocontext. Obviously, a bot doesn't have the judgment to tell whether something is db-nocontext or not. Spacepotato (talk) 02:40, 20 February 2008 (UTC)
- Edits like this are exactly what I had in mind when I mentioned follow-up. If a bot does the tagging, nobody is following the page, and an edit like that goes unnoticed. You'd think trolls wouldn't notice? --Blanchardb-Me•MyEars•MyMouth-timed 02:49, 20 February 2008 (UTC)
- Hey, how about applying some non-bitey tag other than a speedy delete? It could be a "this article will soon be evaluated by a human to see if it is a candidate for speedy deletion" or a silent template that does nothing but add a clean-up category, so people can go through later with AWB or some fast method and add deletion tags or let the articles pass at the rate of 10-20 new articles per minute. That sounds like an extra step but it gets the job done, still saves an enormous amount of time, and if the article is truly terrible for zero content or context an administrator can just go ahead and delete without even adding a tag. My concern is that there are too many errors in this small sample, probably because "nonsense" and "context" are semantic concepts that require an understanding of meaning, not something you can do heuristically based on word counts and the like. Unless your bot has considerable human-like inteligence it's either going to miss most of the articles that should be tagged, or tag too many that are not speediable, thereby biting newbies and forcing non-operators to chase behind the bot with a broom cleaning up the mistakes. Just to spot check, in the first half of the list or so the deletions tags are in error in at least several, including Newton Tracey, Fowler Hollow Run (even the very first version here, here, here, here, and the wikify tags seem to be inappropriate at Nancy Rosalie Milio, Ph.D. and here (some users deleted the tag). That's seven bad tags in a small sample size. Wikidemo (talk) 03:07, 20 February 2008 (UTC)
- Please, tell me what's so "bitey" about general maintenance. "It is a very short article lacking sufficient context to identify the subject of the article" and "It is an article with no content whatsoever" is hardly "bitey". If you want to "bite" a new user you'd tell them to go fuck themselves. Telling them their article isn't up to par with wikipedia polices/standards isn't bitey. Mønobi 03:35, 20 February 2008 (UTC)
-
-
- New users often create articles incrementally. Tagging a page which is in the middle of being constructed is unhelpful. Also, to a new user, a big pink box filled with official-sounding bureaucratic verbiage is intimidating and frightening. Spacepotato (talk) 03:36, 20 February 2008 (UTC)
- I'm sure it'd be easier for Cobi to add a timer :) . That should help with articles being created incrementally. Mønobi 03:56, 20 February 2008 (UTC)
- New users often create articles incrementally. Tagging a page which is in the middle of being constructed is unhelpful. Also, to a new user, a big pink box filled with official-sounding bureaucratic verbiage is intimidating and frightening. Spacepotato (talk) 03:36, 20 February 2008 (UTC)
-
- I disagree with Monobi about what is bitey. Tagging an article for speedy deletion for being too short is bitey. See Ggggggggggggggg12 for a case where a valuable contributor was permanently lost to the project as a result of their page being speedied for being too short. --Coppertwig (talk) 03:42, 20 February 2008 (UTC)
[edit] Review
- Okay, so there have been some complaints. I guess it's expected. However, it would be helpful if those users could state what the problems are so they could help Cobi redefine what he needs to look for and what to ignore. Simply saying "Stop the bot!" doesn't help at all. Mønobi 03:08, 20 February 2008 (UTC)
-
- Without knowing what algorithm the bot uses, it's hard to know what's wrong. However, if the bot thinks that all short articles are db-a1, this is obviously incorrect. Short but valid stubs like "Earth is the third planet from the Sun", "Helium is the second chemical element", or "Fowlers Hollow Run is a small stream in Pennsylvania" (this last one actually occurred and was tagged by the bot) are not eligible for db-nocontext. Spacepotato (talk) 03:34, 20 February 2008 (UTC)
[edit] Possible remedies
I have been thinking, would it appease many here if I took the db-nocontext and db-test tags out and replaced them with a category that humans can go through and decide whether or not to tag it?
What I've gathered here is:
- Db-nocontext has too many false positives.
- Db-test, while quite strict, has caused some false positives.
- The speedy deletion notices are bitey when not applied properly.
- There needs to be a delay for adding cleanup tags.
Proposed remedies:
- Replace Db-nocontext with Category:Bot-identified short articles.
- Replace Db-test with Category:Bot-identified possible test pages.
- See above two remedies.
- Delay 1 hour before adding cleanup tags (if still necessary), however, still tag really obvious vandalism on sight.
Please comment on these and, if you see a way these can be improved, let me know.
Thanks. -- Cobi(t|c|b) 04:02, 20 February 2008 (UTC)
-
- What kind of follow-up do you propose could be done on remedies 1 and 2?
- If an obvious vandalism page is tagged with no admin available to perform the actual deletion within 5 minutes, the creator removes the tag and nobody notices because no one has that page on his list of contributions for follow-up, then what? Would the bot reinsert the maliciously removed tag by itself? Or would the page remain lingering unnoticed for days or maybe months? --Blanchardb-Me•MyEars•MyMouth-timed 04:24, 20 February 2008 (UTC)
-
-
-
- Well, I don't like the idea of a bot edit-warring over a tag whether or not the removing user should be removing the tag. I generally take the stance with bots is that bots should consider reverts by humans as always right - if a human undoes what a bot does, the bot shouldn't redo it. I could have the bot add another category along with the speedy tag or have it post to a noticeboard type page in its userspace when the author removes a speedy tag. If there really is consensus for the bot to aggressively enforce the speedy tag, I can make it do that, too. -- Cobi(t|c|b) 07:31, 20 February 2008 (UTC)
-
-
- Obviously, there would be two more categories to be patrolled. I think that would not be a bad idea. one of the problems of NP patrolling is how to catch things later if they arent caught immediately -- we have no built in way of keeping track, though some of us improvise various attempts at this. This is where a bot is just the thing that would do. I would suggest,for example, that the bot look for empty articles 24 hours after they were first written, and then place the bot-identified tags. this will catch the ones missed by the humans. Bots are good at that, they dont careless skip over articles when they're tired. DGG (talk) 05:14, 20 February 2008 (UTC)
-
- I am not sure what the point of waiting a day before adding it to a category is. While relatively easy, what is the harm in putting it in a category on creation? A category is much less intrusive than a tag, and even if it is speedied, when it is deleted, it isn't like there is any cleanup for the category. However, if that is the consensus, I can easily do that. -- Cobi(t|c|b) 07:31, 20 February 2008 (UTC)
- Categories will still be visible to new users who have their first page "tagged" in minutes, before they have gotten very far at all. I think it should wait a decent length of time, as new pages will often seem to be substandard, and we ask human taggers to wait in that situation. That doesn't apply to attack-spotting, of course. SamBC(talk) 10:36, 20 February 2008 (UTC)
- I am not sure what the point of waiting a day before adding it to a category is. While relatively easy, what is the harm in putting it in a category on creation? A category is much less intrusive than a tag, and even if it is speedied, when it is deleted, it isn't like there is any cleanup for the category. However, if that is the consensus, I can easily do that. -- Cobi(t|c|b) 07:31, 20 February 2008 (UTC)
If anyone thinks it would help, I can write a template which morphs from a polite {{ambox}}-style cleanup tag to a CSD template after an arbitrary time period fairly easily. Happy‑melon 09:58, 20 February 2008 (UTC)
- I think it would be helpful, and had been thinking of doing it myself. For very short articles, for example, I think the relatively innocuous template should say something like "remove this when you make the article longer" (as opposed to the usual "do not delete the speedy template on articles you create yourself.") I was thinking of making a template called "db-slow-a1" and/or "db-slow-a3", using code from db-t3. It would be useful for humans as well as bots to use; we were discussing something like this a few weeks ago at at Wt:CSD "Speed of speedies" but didn't realize at that time that such an easy solution existed. If you would go ahead and do something like what you describe I would appreciate it. It could morph after, say, an hour. I suspect it may only actually morph the next time someone views the page, but I argue that that doesn't matter much. --Coppertwig (talk) 13:11, 20 February 2008 (UTC)
- The bot could tag short pages with {{stub}}. Category:Stubs is patrolled regularly by experienced editors who should know the speedy deletion criteria, and if it doesn't meet A1 or A3 then it can be put in the appropriate stub category. We do already have a bot which tags very short uncategorised pages as stubs. Hut 8.5 13:27, 20 February 2008 (UTC)
Here's how I see what would happen. In the case of a true positive {{db-nocontext}} caught by the bot, a human patroller would catch it too and tag it independently of what the bot does about it, effectively making the bot useless. Remember, these are people actively looking for {{db-bio}}'s who cannot tag without seeing the actual article and who will spot true {{db-nocontext}}'s
My concern with the bot tagging vandalism pages, aside from possible edit conflicts, is that the vandal could easily undo everything the bot does if he knows about the bot's existence (and it's only a matter of time before a significant number of them do). The difference between new pages patrolled by ClueBot V and the pages that are patrolled by the classic ClueBot is that the latter are, for the most part, on several legitimate editors' watchlist while new pages are on no one's watchlist. If the bot "discreetly" puts vandalism pages in a special category, what is it that would keep vandals from taking them out of there?
As I stated above, the only speedy deletion criteria that should be handled by bots in any way are G12 (already handled by CorenSearchBot) and G4 (as those are already under someone's radar before they are even created). --Blanchardb-Me•MyEars•MyMouth-timed 11:32, 20 February 2008 (UTC)
- Cobi: I apologize for the tone of my first post on this page. I think you're doing great work. Re delaying one hour: yes, I was just going to suggest that myself.
- If you implement your above suggestions, Cobi, I think that would address my concerns.
- I still think it's better to wait an hour before adding the "bot-identified short article" category, but I don't feel strongly about this as the wording is relatively innocuous. (Why not wait an hour for everything except attack pages and copyvios?)
- Note that very short articles about geographic features are not speediable and not deletable at all, apparently; see discussion of Blofeld of SPECTRE's articles at New pages patrol. (also a related AN/I discussion.) Therefore I suggest that you exempt from your short-article tagging any articles containing phrases such as "is a river", "is a town", "is a lake", "is a village", etc. I also suggest exempting (from speedy categories other than attack) articles containing the word "school", if you haven't already. ("If controversial, as with schools, list the article at Articles for deletion instead." WP:CSD#A7.
- Does Cluebot V check the article history? What if someone posts a valid, long article, and someone else then replaces it with a short article or attack page -- would Cluebot V speedy-tag it?
- Just curious: is ClueBot V going to mark some of the pages as patrolled?
- Thanks for listening to our concerns, and thanks for writing an excellent bot. :-) --Coppertwig (talk) 13:42, 20 February 2008 (UTC)
As for proposed remedy #4, I would recommend a 24 hour wait before tagging a page "no content" or "test" or whatnot. There's no rush to delete these, and we want to avoid biting newbies who start with next to nothing, and expand over the next several hours. 24 hours seems reasonable to me for new articles (though not for changes to existing articles). I'm also curious as to your answer to the question below. – Quadell (talk) (random) 15:31, 20 February 2008 (UTC)
- I think a good idea would be to have to bot use a special tag, designed for bots. My idea on what it would mean is kind of halfway between a speedy and a prod. Soxred93 | talk bot 17:30, 20 February 2008 (UTC)
- I was thinking tentatively along similar lines--a bot placing prod tags, with a special designation. But here again, it would take people actively checking these. And prod tags are removably by anyone even the author. And I see another problem with any of this--people whose articles are marked for deletion by a bot are going to get very angry indeed. I have someone of my page complaining today that the article was marked by a bot when it was just someone going systematically and using a standard form. As it is, people think the process here is hopelessly impersonal. I think that tagging for further examination is fine, if the notice if friendly enough. But before we have a bot placing deletion tags, I think it would need a general discussion and general consensus--not just here, not just a deletion policy. It's a basic change in the way we do things, and I think the ordinary WPedian would think it a change for the worse. The newcomers certainly would. DGG (talk) 19:40, 21 February 2008 (UTC)
[edit] history concerns
If this has already been discussed, I apologize but I couldn't find it above. My concern is that this bot does not seem to evaluate the history of the page. Too often an apparent speedy candidate is recent vandalism overlaying a page that may have been thin but was not speedy-eligible. Does this bot properly confirm that not just the current content but every prior version equally applied for speedy-deletion?
I also have concerns about some of the filters being used to define the trigger. A page with no recognizable English text may be vandalism - or it may be an import of a page from one of our sister wikis that's still pending translation. Those pages can be eligible for deletion after they've been properly translated or transwiki'd but not just at the point of creation.
All in all, I am deeply skeptical of the wisdom of using a bot to evaluate speedies. This is incredibly hard for trained humans. I don't know if it will ever be feasible for a bot. Rossami (talk) 14:29, 20 February 2008 (UTC)
-
- ClueBot looks for recent changes. ClueBot V looks for new pages. ClueBot V will never analyze an article if it has more than one revision. On the IRC recent change feed (which is near impossible for humans to keep up with, but quite nice for bots), each entry has some flags. One of the flags is N which means that this is a new page creation. ClueBot V only looks at entries marked with that flag. ClueBot only looks at entries not marked with that flag. So to conclude, if it is a page blanking or major reduction, it will be reverted by ClueBot. If it is a new page creation, it will be analyzed by ClueBot V. -- Cobi(t|c|b) 20:46, 20 February 2008 (UTC)
-
- what I would like a bot to do is to catch the ones that have ben missed by the humans. If it looked at articles 24 hours old which had no apparent content, and sent a reminder to the author that something more than that was needed to call it an article, and put it in a category to be checked by humans, this would be useful. In fact, it would be a good opportunity to start making appropriate time allowances for other sorts of improvable articles. But the most a bot should place on an article is an alert, worded in a positive way--as suggested above. Once we have established the principle that a bot does not place speedy tags, and, except for copyvio does not give warnings, but rather friendly notices, then it would be useful. There's no harm in the first notice about even vandalism being friendly either--if anything more is needed the humans around here are quite capable of doing it. What they are not capable of doing is what a computer program can do--or making sure than nothing slips through unlooked at. DGG (talk) 17:26, 20 February 2008 (UTC)
[edit] Suggested Changes
- Please feel free to add entries, and I will let you know if it is feasible or not
- Replace Db-nocontext with Category:Bot-identified short articles, and wait 24 hours before adding the category (if still necessary).
- Replace Db-test with Category:Bot-identified possible test pages, and wait 24 hours before adding the category (if still necessary).
- Delay 1 hour before adding cleanup tags (if still necessary), however, still tag really obvious vandalism on sight.
- Create some custom notice templates that are friendlier for the bot to use. (Anyone: Feel free to create them in ClueBot V's userspace)
- Create a noticeboard in the bot's userspace for the bot to post when a user removes speedy tags placed by the bot.
- My suggestions are:
- For other speedy deletion criteria (attack, vandalism), wait 6 hours (not 24) before tagging. Why wait? Because of the bot creator's understandable reluctance to engage in an edit war. Creators of such pages are notorious among newpage patrollers for their predisposition to remove speedy tags. Why 6 and not 24 hours? Chances are that after 12 hours the creator is offline and therefore unable to keep track of speedy deletion tags. Goes with DGG's suggestion that the bot should only tag pages that were missed by humans.
- Add G4 Repost to the list of criteria the bot looks for. --Blanchardb-Me•MyEars•MyMouth-timed 21:35, 20 February 2008 (UTC)
[edit] Discussion
- I would still modify 1 & 2. to not tag by placing an actual speedy on the article. Those Categories would presumably get active patrol instead, which would do it better--or is the proposal to put some other tag after the the 24 hours.,
- I'm not happy about repost, because judging that a repost is identical is not trivial, and, in fact, often disputed: again, tagging possible reposts but not going further would be a good idea.
- I like the 6 hour delay. most of the vandalism will be removed manually before that. and we will therefore have a opportunity for newcomers to practice their skills by learning when to tag. there are many people who positively enjoy it.DGG (talk) 21:50, 20 February 2008 (UTC)
-
- In response to your first point, I meant that it wouldn't tag with speedies at all, but it would put the category on after 24 hours ... I guess I need to word it a little better. In response to your second point, I completely agree. Unless the bot is going to be detecting exact copy & paste, it shouldn't be doing this. And to store all of the new pages' current content for several hours will be a lot of resources on my end anyway. And there is no way to get the content of the deleted revisions other than storing them when they are created on my end, or taking this bot to RfA (no, I am not going to do that). -- Cobi(t|c|b) 21:55, 20 February 2008 (UTC)
- I understand your comment about G4. I mentioned this because it is perhaps the hardest speedy criterion to spot for a human patroller. I guess if G4 tagging is to be done by a bot, it would have to be a bot that does only that. --Blanchardb-Me•MyEars•MyMouth-timed 01:14, 21 February 2008 (UTC)
- Idea: Use custom speedy tags so people know when Cluebot has tagged it. I'll make some in my userspace and come back when I'm done. CWii(Talk|Contribs) 12:22, 21 February 2008 (UTC)
- User:Compwhizii/Speedys/Db-meta- There, I like that. What does everyone else think? CWii(Talk|Contribs) 12:48, 21 February 2008 (UTC)
- For pages that are tagged for being too short or lacking context, it doesn't make sense to say "do not remove this tag from pages you created yourself." I'm planning to suggest a change to the standard speedy templates along those lines. It should add "unless you also make the article longer" or something. For attack pages etc., though, it can still say 'do not remove this tag'".
- I think the wording could be made softer. How about "has been tentatively tagged", for example? It may be a good idea to use a different colour from the standard speedy templates. Remember that while admins are supposed to check things before deleting speedy-tagged articles, some of them at times have gone through speedy backlogs rapidly with little or no checking. I have no idea how common this is, but speedy-tagging articles should not be done lightly. Making it a different colour would help alert admins that there's something different about this one.
- By the way, here's another clue for ClueBot if you haven't thought of it already, Cobi: a very short article containing the word "genus" or the phrase "of the genus" is probably a species stub, and should (in my opinion at least) not be speedied but could be tagged as a species stub or bot-identified possible species stub or something. There may be other keywords for species stubs, e.g. "species": I'm not sure.
- Just to clarify: Cobi, if ClueBot V finds a very short article and comes back to tag it an hour later, does it check whether the article has been edited meanwhile? What if, for example, the article has been lengthened, then reverted to the very short version? --Coppertwig (talk) 13:30, 21 February 2008 (UTC)
- I understand your comment about G4. I mentioned this because it is perhaps the hardest speedy criterion to spot for a human patroller. I guess if G4 tagging is to be done by a bot, it would have to be a bot that does only that. --Blanchardb-Me•MyEars•MyMouth-timed 01:14, 21 February 2008 (UTC)
- In response to your first point, I meant that it wouldn't tag with speedies at all, but it would put the category on after 24 hours ... I guess I need to word it a little better. In response to your second point, I completely agree. Unless the bot is going to be detecting exact copy & paste, it shouldn't be doing this. And to store all of the new pages' current content for several hours will be a lot of resources on my end anyway. And there is no way to get the content of the deleted revisions other than storing them when they are created on my end, or taking this bot to RfA (no, I am not going to do that). -- Cobi(t|c|b) 21:55, 20 February 2008 (UTC)
(several e/c) I assume you meant to link to WP:BOT not WP:BPT, Compwhizii!! I like this idea, but rather than create an entirely new set of CSD templates, I suggest we just add parameters to the existing ones. I have added a |bot=
parameter to {{db-meta}}, so it can incorporate the bot notices - of course the precise nature of the change is negotiable. I've also made a change to {{db-g10}} to show how it works: if people like it, it can be easily applied to the other templates:
{{db-g10|bot=ClueBot V}}
Produces:
This page was tagged by ClueBot V, because it may meet Wikipedia’s criteria for speedy deletion, as it serves no purpose but to disparage its subject or some other entity. This includes a biography of a living person that is entirely negative in tone and unsourced, where there is no neutral version in the history to revert to. See CSD G10.Attack page(CSD G10)G10
If this page does not meet the criteria for speedy deletion, or you intend to fix it, please remove this notice, but do not remove this notice from pages that you have created yourself. If you created this page and you disagree with its proposed speedy deletion, please add:
directly below this tag, and then explain why you believe this wikipedia page should not be deleted on its talk page. This will alert administrators to permit you the time to write your explanation. See help writing your first article. Administrators: check links, history (last), and logs before deletion, as this page was tagged by a bot, and do not quote any disparaging content in the deletion log entry. |
Comments? Happy‑melon 13:39, 21 February 2008 (UTC)
- (By the way, as you're aware, happy-melon, the wording of db-meta is soon planned to be changed, to delete the "The reason given is:" part; see Template:Speedybase, discussions at Wikipedia talk:Criteria for speedy deletion/Archive 28#Suggested new wordings for CSD templates general, articles, images and other. Probably quite compatible with these other changes.) --Coppertwig (talk) 13:57, 21 February 2008 (UTC)
-
- Can the above example be serious--an automatically placed bot for the most sensitive category of all. G10, incorporating BLP. ? DGG (talk) 19:44, 21 February 2008 (UTC)
- Read the discussion above. As Cobi says, "Well, the bot uses {{db-attack}} on pages that say something along the lines of (sorry for the profanity): "FUCK YOU" or "FUCK WIKIPEDIA" or a number of other common phrases like that. I can't really see an instance where something like that would seriously not be deleted. -- Cobi" As for my opinion of the bot, I think given the improvements listed above and the use of its own easily identifiable templates that it should be allowed. It's providing the sort of safety net we currently lack to ensure that nothing non-controversially speedily deletable doesn't slip through the cracks.--Dycedarg ж 22:06, 21 February 2008 (UTC)
- I've never seen an article of that type last more than a few minutes on WP. We people handle this very well. I suppose a checkup with a very limited heuristic is possible, but I see no evidence that there is actually a problem. DGG (talk) 01:26, 22 February 2008 (UTC)
- Just because vandalism of this nature is currently kept in check by the army of dedicated RCPatrollers doesn't mean they wouldn't welcome a helping hand from an untiring assistant. I just picked {{db-g10}} because it has been discussed above: if people like it I'll make the necessary changes to all the CSD templates. What adding
|bot=ClueBot V
actually does can be changed centrally at {{db-meta}}, so if anyone has any suggestions for a different effect feel free to air them. Happy‑melon 12:54, 22 February 2008 (UTC)- Before making suggestions on the look I'd like to understand better how it's going to be used. For articles being tagged for being too short, is the proposal to put such a template on immediately, or after an hour? After an hour would be OK (if it checks whether the page has been edited meanwhile). Again, for attack pages and copyvios immediately is OK, too. --Coppertwig (talk) 20:49, 23 February 2008 (UTC)
- Just because vandalism of this nature is currently kept in check by the army of dedicated RCPatrollers doesn't mean they wouldn't welcome a helping hand from an untiring assistant. I just picked {{db-g10}} because it has been discussed above: if people like it I'll make the necessary changes to all the CSD templates. What adding
- Can the above example be serious--an automatically placed bot for the most sensitive category of all. G10, incorporating BLP. ? DGG (talk) 19:44, 21 February 2008 (UTC)
One thing the bot should take into account is users going into a (sometimes bot-assisted) mass-creation of stub articles. In the past few weeks, the people at WikiProject French communes have been extremely busy mass-creating stubs, to the point that at times they literally dominate the New Pages list. And all of their articles are valid stubs. I do not think it would be constructive to tag each and every one of these articles even with a cleanup tag (that would just overwhelm the cleanup department). Personally, when I am on newpage patrol and I see such a mass-creator at work, I ignore his entries. There could be a way for the bot to identify non-admin mass-creators so that they are left alone once identified. --Blanchardb-Me•MyEars•MyMouth-timed 21:02, 23 February 2008 (UTC)
- Easy. The bot can keep track of how many articles it's seen that are created by each user. Whenever it sees a large number of articles by one user, it can stop tagging them and issue some sort of alert message to some log or something (or hope that other people will just notice the large number of pages being created and be smart enough to do something about them.) Such sets of articles need to be considered and decided on as a group, not tagged individually. --Coppertwig (talk) 21:41, 23 February 2008 (UTC)
- Comment I kinda agree with Blanchardb's suggested changes above, mainly dealing with the delays in adding the tags and adding tags which don't necessarily add them to the normal speedy categories. Q T C 09:31, 1 March 2008 (UTC)
- Template:UserResponseNeeded Status of this request? --uǝʌǝsʎʇɹnoɟʇs(st47) 10:25, 18 March 2008 (UTC)
- Indeed, I'd like to get this going one direction or the other. Cobi, if the bot's functions have changed greatly from last trial, I'm willing to do another trial (it *sounds* like they have). One suggestion -- Cluebot should probably skip articles with stub tags... Let's get this one moving again. SQLQuery me! 04:21, 20 March 2008 (UTC)
A user has requested the attention of the operator. Once the user has seen this message and replied, please remove this tag. Status? — Werdna talk 13:55, 4 April 2008 (UTC)
Request Expired. - I'll resubmit later. -- Cobi(t|c|b) 06:17, 22 April 2008 (UTC)
- The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.