Wikipedia:Bots/Requests for approval/DieBucheBot
From Wikipedia, the free encyclopedia
[edit] DieBucheBot
tasks • contribs • count • logs • page moves • block user • block log • flag log • flag bot
Operator: User:DieBuche
Automatic or Manually Assisted: Automatic, but supervised
Programming Language(s): pywikipedia framework
Function Summary: Do some replacements especially superseded pictures from commons
Edit period(s) (e.g. Continuous, daily, one time run): 3-4 times a week
Edit rate requested: 4-5 edits per minute
Already has a bot flag (Y/N): No
Function Details: This Bot replaces pictures on commons that have been marked as "superseded" [1] with the new picture.
[edit] Discussion
- Will this bot work on commons, on en-wiki, or on both? Is the idea here that you type in the name of a superseded image, and the name of the better image, and all pages using the old image on en-wiki will be updated to link to the other image? Gimmetrow 00:41, 20 March 2007 (UTC)
- It works only on en-wiki, i tell him which images have to be replaced, he doesn't find them automatically --DieBuche 13:23, 20 March 2007 (UTC)
- How do you get the pages on which to replace it? All pages or only those in article namespace? If you're using python regexp to replace the image links, how do you plan to deal with varations in use of underscores and spaces? (There are a couple ways to handle this.) Gimmetrow 19:17, 21 March 2007 (UTC)
- I didn't think about this. Can you tell me how to do it ? --DieBuche 19:58, 21 March 2007 (UTC)
- Well, the simple solution is to convert all underscores to spaces, then search for the image name with spaces. This is convenient (doesn't even need a regexp) but will change every underscore on the page. I'm pretty sure some tools do this, and it's usually OK. Or you could make a bit more complex regexp to only convert underscores between sets of double brackets. Gimmetrow 22:45, 21 March 2007 (UTC)
- This is overall a good idea, but there is no reason to change the entire page at all. Perhaps this is what you meant, but let me explain. It is a matter of retrieving the page, converting a local copy from underscores -> spaces and upper case -> lower case, find the strings that match in the page and remember their indicies. Next, perform the replacement on a different copy that has not had the conversions applied. Then submit the latter copy when finished doing the find and replace. The alternative is to do this with a complex regexp, just depends on personal preference. -- RM 12:19, 22 March 2007 (UTC)
- "Remember their indices" is a little tricky if it's using regexp or replace. There are some tool(s) which leave the underscore->space replacement in the article. Gimmetrow 13:02, 22 March 2007 (UTC)
- Oh, I understand that. That's why I said regexp is a separate solution (and probably the best one) from the one I described. Doing article processing would require custom code to perform the scanning, finding, and replacing. My overall point is that the tools may not be good enough, since doing a global underscore->space replacement is a bad idea, as it could remove legitimate underscores. That just makes this a little more difficult to pull off correctly. But I won't approve a bot that has a known risk of adding errors, so the work has to be done to make it work correctly. -- RM 13:25, 22 March 2007 (UTC)
- I thought it could been done without regexp; But now I know that I've to. --DieBuche 15:55, 22 March 2007 (UTC)
- Oh, I understand that. That's why I said regexp is a separate solution (and probably the best one) from the one I described. Doing article processing would require custom code to perform the scanning, finding, and replacing. My overall point is that the tools may not be good enough, since doing a global underscore->space replacement is a bad idea, as it could remove legitimate underscores. That just makes this a little more difficult to pull off correctly. But I won't approve a bot that has a known risk of adding errors, so the work has to be done to make it work correctly. -- RM 13:25, 22 March 2007 (UTC)
- "Remember their indices" is a little tricky if it's using regexp or replace. There are some tool(s) which leave the underscore->space replacement in the article. Gimmetrow 13:02, 22 March 2007 (UTC)
- This is overall a good idea, but there is no reason to change the entire page at all. Perhaps this is what you meant, but let me explain. It is a matter of retrieving the page, converting a local copy from underscores -> spaces and upper case -> lower case, find the strings that match in the page and remember their indicies. Next, perform the replacement on a different copy that has not had the conversions applied. Then submit the latter copy when finished doing the find and replace. The alternative is to do this with a complex regexp, just depends on personal preference. -- RM 12:19, 22 March 2007 (UTC)
- Well, the simple solution is to convert all underscores to spaces, then search for the image name with spaces. This is convenient (doesn't even need a regexp) but will change every underscore on the page. I'm pretty sure some tools do this, and it's usually OK. Or you could make a bit more complex regexp to only convert underscores between sets of double brackets. Gimmetrow 22:45, 21 March 2007 (UTC)
- How about this - enter the image name with spaces, and generate a form for the image name with underscores for spaces. Then go through all pages which link to the image, and replace any instance of either the all-space or all-underscore version to the updated image name. When all pages are done, have the bot tell you if any pages still link to the image, so you can do those by hand. There will be no errors in the articles, and only rarely will you need to do anything by hand. Would that work? Gimmetrow 19:38, 22 March 2007 (UTC)
- I didn't think about this. Can you tell me how to do it ? --DieBuche 19:58, 21 March 2007 (UTC)
- How do you get the pages on which to replace it? All pages or only those in article namespace? If you're using python regexp to replace the image links, how do you plan to deal with varations in use of underscores and spaces? (There are a couple ways to handle this.) Gimmetrow 19:17, 21 March 2007 (UTC)
-
-
-
- Sound's good --DieBuche 14:44, 23 March 2007 (UTC)
- What if some are spaces and some are underscores? The best way to do it is this. Get the page text. Perform a regex replacement on the page text, where the regex replacement string is gotten from command line input with all spaces replaced with
[ _]
, the regex code for "space or underscore". So, for example, here's some c# code for what I'm saying (sorry, I'm not good with python)
- What if some are spaces and some are underscores? The best way to do it is this. Get the page text. Perform a regex replacement on the page text, where the regex replacement string is gotten from command line input with all spaces replaced with
- Sound's good --DieBuche 14:44, 23 March 2007 (UTC)
-
-
Console.WriteLine("Enter image name:"); string regex = Console.ReadLine(); regex = "[Ii]mage:" + "[" + char.ToUpper(regex[0]) + char.ToLower(regex[0]) + "]" + regex.Remove(0, 1).Replace(" ", "[ _]");
Question: Can you give an example of images that you will be replacing? Not all images currently using the superseded tag should actually be superseded (if they were, we could have just asked User:Orgullomoore to change User:CommonsDelinker's replace.py function). For example, not all png's that have been superseded by svg's should be replaced (even if they have, they probably shouldn't be deleted) and if you do run this it could cause *a lot* of problems for commons. I suggest you request for bot approval on commons as well since this is technically representing the commons community. Yonatan talk 14:01, 7 April 2007 (UTC)