User talk:HBC Archive Indexerbot

From Wikipedia, the free encyclopedia

To-do list for User:HBC Archive Indexerbot:
  • Try to figure out how to split up huge indices to avoid timeouts
    • Multiple pages, transcluded onto one?
    • Probably need to have multiple actual pages - otherwise it's just unmanagable. Will probably need new target syntax. split=alpha, segments=A-C,D-G,H-L,etc?
  • Address scalability of cache (filename hashing or something)
  • Make cache more efficient by caching compiled objects (%index hash bits)
  • Allow indexing of talk page if opt-in is on a different page (like in a transcluded header)
  • Handle month/year archives (maybe an option to follow links on the main page to find the archives?)
    • Partially possible by specifying individual single-page masks
  • Option to include sub-headings (i.e. ===Example===) in the index
    • Vote for this from Jam
  • Capability to handle characters such as é in the page titles
  • Option required for something like %%subst%% that expands to subst: (for allowing for a template to contain ParserFunctions that subst on save

Contents

[edit] Formatting Comment

I know the formatting sucks, and the sorting isn't working, I will fix that when I have time. If you have a suggestion for the formatting go ahead and make it. As for the sorting, I thought I had it working but I will have to fiddle some more later. HighInBC (Need help? Ask me) 16:52, 19 January 2007 (UTC)

[edit] Formatting Idea

How about a table for formatting? Without taking the time to edit every single entry, I did a couple of rows from the WWII index at User:Krellis/Sandbox. You'll note I added a feature request there, too - I'm not sure if it's really possible, but it would be neat to try to include an estimate of the number of replies in a given section. One way to try to approximate it would be the number of times "UTC" appears in the section - not perfect, by any means, but it might work. Or you could go by changes in indentation level. As long as you indicate somewhere that it's approximate, that probably wouldn't matter too much. Of course, I have no idea how difficult it would be to actually implement :) The table with alternating row backgrounds still might be at least somewhat of an improvement from the current style, since it will keep all of the links together in a row and help prevent items from running together. —Krellis 23:33, 2 February 2007 (UTC)

I was hoping that someone would suggest a better format, I will implement that. As for how many replies, this is difficult. I suppose I can count the timestamps... I will work on it. HighInBC (Need help? Ask me) 23:36, 2 February 2007 (UTC)

[edit] Target

Are there any specifications on what the target can be. I am trying to put it on Talk:Aang/Archive Topics. Would Aang/Archive Topics work too. If so, which one would be better? —Preceding unsigned comment added by Parent5446 (talkcontribs) 21:35, November 18, 2007 UTC

The bot doesn't care - either of those would be fine, as long as the opt-in exists. Talk:Aang/Archive Topics would be the more "standard" way of doing it, though - because the archive refers to talk topics, it makes sense for it to be in the talk space. In fact, I'm not sure subpages are allowed in the article namespace on enwiki, so the latter option may not actually be possible, and should almost certainly be avoided. —Krellis (Talk) 12:34, 19 November 2007 (UTC)

[edit] missed index

Hello. The HBCAIB didn't index some of my talk page articles. Can you please fix that for me? Thanks. Sincerely, Sir Intellegence - smartr tahn eaver!!!! 00:19, 30 November 2007 (UTC)

Ummmm... What I meant is that it didn't index all of the sections in my archive(s). Can you please fix that. Thanks for your help! Sincerely, Sir Intellegence - smartr tahn eaver!!!! 00:23, 11 December 2007 (UTC)

Sorry for the delay - it looks like your opt-in had the wrong number of leading zeroes - you're not using any leading zeroes in your actual archive naming, but you had specified 3 for that parameter. I've changed it to 0, so your archives should get indexed properly on the next run of the bot. —Krellis (Talk) 19:55, 31 December 2007 (UTC)

[edit] Sortable Numbers

When the bot creates sortable boxes (like one located here), make the "replies" section's number two-digit, i.e. 01, 05, 11, ect. It would make the sorting so much better. Thanks. Pbroks13 (talk) 05:15, 7 December 2007 (UTC)

Really? It looks to me like the sort works properly with single-digit numbers anyway. If I remember correctly, the sort order is intelligent about numerical sorts, so it should sort properly even without zero padding. Let me know if there's an example where it doesn't actually work - I just checked the example you gave and it looks like a correct numerical sort to me. —Krellis (Talk) 19:58, 31 December 2007 (UTC)

[edit] HBCAI not marking edits as bot edits

Hi there! First, let me just commend you on writing an incredibly useful bot! The ability to quickly survey current and past discussion is tremendously useful. • But, of course, I have a question: In my watchlist, most work done by robots is flagged as a "bot edit". There's a little black "b" next to the entry, and bot edits can be shown or hidden. HBCAI's edits don't appear to get marked that way. No "b", and they always show. Is this a known issue? • To try and help out, I went looking for information on how those edits get so marked. Unfortunately, I can't find anything. Not a word.  :( —DragonHawk (talk|hist) 18:58, 31 December 2007 (UTC)

P.S.: Ah-ha! I think I figured it out! listusers shows that User:HBC Archive Indexerbot is not flagged as a bot. All the helpers are, but not that one. • I dunno who to tell about this, though.  :) —DragonHawk (talk|hist) 19:03, 31 December 2007 (UTC)
Yeah, actually, I believe this was intentional - at the very bottom of the bot's request for approval you'll see that Mets501 notes in his approval message that the bot does not need a flag. I think there was some more detailed discussion of this somewhere, though I'm not sure where. As I recall, the thinking was that the bot doesn't do a huge number of edits, and so didn't need a flag, as it wouldn't be flooding recent changes. The bot does have over 100 archive indexes that it's maintaining, so one could possibly argue that it does more edits now than it used to do - I'd certainly have no problem with it getting a flag, it's just something we would need to poke the bot approvals group and/or a bureaucrat about. I'm not 100% of the best place to put it, though - either WT:BAG or the bot owner's noticeboard seem most logical to me. —Krellis (Talk) 19:51, 31 December 2007 (UTC)
Ahhh, I see. I decided to mention it over at WP:BOWN (here). I guess it doesn't really matter to me that much -- I can always just unwatch the index pages -- but I still think it's worth reexamining. • Thanks again! —DragonHawk (talk|hist) 21:32, 31 December 2007 (UTC)

[edit] ANI

Since an index of the hundreds of archive pages of the administrators noticeboard and ANI is virtually unusable (my browser has trouble even attempting to sort the table) do you think a combined index could be made of the last half-dozen or so archives of AN and ANI? —Random832 17:37, 26 February 2008 (UTC)

[edit] Can you help me with the set-up?

I'm trying to set up the bot to archive the (as yet non-existent) archives for WP:MUSEUMS. I created the target but I'm a little lost on masks and stuff. Do I have to do numerical archives or can it be monthly? I ask because I'm not sure how to set up numericals, but I can c/p the source for the monthly archive from my own talk, which I set up for MiszaBot. I'm aiming for a 30 day archive time. Thanks! TRAVELLINGCARIMy storyTell me yours 18:00, 21 March 2008 (UTC)

At the moment monthly archives are only supported manually - that is, each time a new archive page is created, you have to add it to the archive index opt-in code. If you take a look at the source of my talk, you'll see both archiving to numbered archives and the archive indexerbot opt-in:
{{User:MiszaBot/config
|maxarchivesize = 75K
|counter = 2
|algo = old(30d)
|archive = User talk:Krellis/Archive/Archive %(counter)d
}}

{{User:HBC Archive Indexerbot/OptIn
|target=./Archive
|mask=/Archive/Archive <#>
|leading_zeros=0
|indexhere=yes
|template=User:Krellis/archive template
}}
 
The MiszaBot config is very similar to what you're using, I expect, except that it's using a numeric counter instead of monthly - just set it to start at 1 and replace the path to the talk page, of course. Roughly the same would apply for the archive indexerbot opt-in (and you'd probably just remove the template= part entirely and just use the default template to start).
If you'd like, I can probably just set it up for you, as long as you don't mind numbered archives rather than named by month. Let me know! —Krellis (Talk) 20:11, 21 March 2008 (UTC)

[edit] Help regarding index numbering and interwiki links

For some reason the bot is linking to a non-existant article here, even though I have my archives on the page User_talk:Daedalus969/Archives/1. Currently, I have manually fixed the links, and undid the edit by the bot as it messes them up. What is wrong here? I can't get it to work.— dαlusquick link / Improve 04:47, 2 April 2008 (UTC)

[edit] Duplicate index entries from page moves?

I recently had the AIDS talk sub-pages moved into sequential archives (Talk:AIDS/NPOV disputeTalk:AIDS/Archive 6, etc). It looked to have been properly done, but now the indexer bot is creating duplicate entries for all pages that were part of the move (6 and up,current archive). Is there something that needs to be adjusted that is causing the misfire. -Optigan13 (talk) 04:47, 5 May 2008 (UTC)

Replied on my talk. —Krellis (Talk) 15:31, 9 June 2008 (UTC)

[edit] Sortable durations

In the current setup, the "sortable" table cannot possibly sort "durations" in any usable way. There is no way to easily distinguish "9 minutes" is shorter than "5 hours and 29 minutes" which is a shorter period of time than "2 days and 9 hours". My impression is that users don't care about the topics that with minutes-length duration, I guess it would be best to use a constant zero-padded days+hours format, respectively: "00 days 00 hours", "00 days 05 hours", "02 days 09 hours". Could you please consider this? --Kubanczyk (talk) 10:42, 8 June 2008 (UTC) Oh, an example of this is Wikipedia talk:Attribution/Archive index. --Kubanczyk (talk) 10:44, 8 June 2008 (UTC)

There is a %%durationsecs%% replacement string available when creating custom templates - this string will expand to the duration in seconds, which is safely sortable. You can use some of the advice at Help:Sorting for making an invisible sort key - specifically, making your duration column look something like this should do the trick: <span style="display: none;">%%durationsecs%%</span> %%duration%%. I don't have an example of it right now (though I just changed my own template to make sure it's working properly - I know I tested it when I added the feature some time back, though), but I'm pretty sure that should work. —Krellis (Talk) 15:34, 9 June 2008 (UTC)
Wow, thanks, nice trick. --Kubanczyk (talk) 18:02, 9 June 2008 (UTC)
No problem, and thanks for the updates on the docs! I should probably get the default template updated at some point to match mine so it's more useful. —Krellis (Talk) 19:17, 9 June 2008 (UTC)