Wikipedia:Bots/Requests for approval/HBC archive builderbot
From Wikipedia, the free encyclopedia
- The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Approved.
[edit] HBC archive builderbot
tasks • contribs • count • logs • page moves • block user • block log • flag log • flag bot
Operator: HighInBC
Automatic or Manually Assisted: Automatic, unsupervised
Programming Language(s): perl
Function Summary: This bot can create archives retroactively using the history of a page.
Example: Find old and new removals of username discussions at WP:RFCN, builds and maintains an archive of links. User:HBC archive builderbot/sandbox
Edit period(s) (e.g. Continuous, daily, one time run): twice daily, less than 1 minute to run(except the first run where is downloads the history, about 10 minutes)
Edit rate requested: X edits per TIME 1 post per job per run(no more than 6 posts per minute), 2 runs per day
Already has a bot flag (Y/N):
Function Details: This bot runs in perl. It uses the Algorithm::Diff module to compare each revision with the next. If it detects that a header was removed then it considers it an archiving of a discussion. It uses the revision number, the edit summary, the user doing the edit, and the contents of the heading to make an archive entry.
The actual revision history is gathered using the Special:Export command and a caching system I wrote that ensures only new revisions are downloaded. The first run of this will take 10-15 minutes to populate a cache of about 2600 revisions(about 35 megs WP:RFCN in this case), subsequent runs take only moments as it will load only the new revisions. In the future, I plan for my caching routine to gather what it can from a local version of the most recent database dump. This should result in a vast reduction in server load.
In testing I found the diff module could analyze over 2600 diffs in less than 3 seconds, this is very fast.
The program will mostly likely run twice daily. A demonstration of this bot's output can be found here: User:HBC archive builderbot/sandbox the current source code can be found here: User:HBC archive builderbot/source
[edit] Discussion
This bot is mostly written, I just have to alter it to write to the wiki instead of dumping the output. HighInBC (Need help? Ask me) 22:20, 2 February 2007 (UTC)
I have already discussed this with the community, and have their support for at least one application of this bot: Wikipedia_talk:Requests_for_comment#Archiving_username_discussion. HighInBC (Need help? Ask me) 00:54, 3 February 2007 (UTC)
I am just realizing that this routine is very powerful at generating archives of many pages, including user talk pages(User:HighInBC/sandbox). I may wish to offer this as a retroactive archiver for any page(in other words not limit my scope to RFCN). I would most likely use an OptIn system similar to User:HighInBCBot. HighInBC (Need help? Ask me) 03:56, 3 February 2007 (UTC)
Is there any more information this request needs? Would it get a faster response if it helped WP:AIV somehow(chuckle)? HighInBC (Need help? Ask me) 17:10, 4 February 2007 (UTC)
- Can you be explicit about what the read-rate is? I have at least a folk memory of a 1/s limit someplace: this sounds as if you plan on it being significantly faster, at least in the initial run (or always?). Alai 10:23, 5 February 2007 (UTC)
The program will use the special:export command to load 100 revisions in one pass, then will wait about 5 seconds before doing the next batch. It will of course only load each revision once. HighInBC (Need help? Ask me) 13:07, 5 February 2007 (UTC)
- Ah, I missed that, should have read more carefully. Thanks for clarifying (and supplying the exact rate). Alai 13:41, 5 February 2007 (UTC)
In deference to server load, I could put a cap on how many revisions it loads each run. If it is running twice a day, then it could limit itself to X amount of revisions, then write a partial report if the number of revision exceeds X. The next time it runs it will get the next X revisions. If the bot will download 1000 revisions(around 10-15 megs) each run, it will catch up with most pages in just a few days, then stay caught up by downloading the few revisions that appeared in the half day. HighInBC (Need help? Ask me) 20:50, 7 February 2007 (UTC)
This feature has been implemented: [1], [2]. HighInBC (Need help? Ask me) 21:53, 7 February 2007 (UTC)
I am wondering what the status of this bot's approval is, is there any information I can provide to expedite it? HighInBC (Need help? Ask me) 01:19, 10 February 2007 (UTC)
- I can't really quite figure out what this bot does exactly, but let's call it Approved for trial. Make 50-100 edits, and report back here with diffs. Hopefully the diffs will clarify things more than words :-) —METS501 (talk) 19:03, 10 February 2007 (UTC)
Approved. Sorry about this request getting "lost" for a while, but you've been running the bot anyway. This bot shall run with a flag. —METS501 (talk) 15:58, 17 March 2007 (UTC)
- The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.