User talk:SQL/Reflist

From Wikipedia, the free encyclopedia

Mind some help? Pairadox 08:12, 6 September 2007 (UTC)

Go for it :) (Sooner or later, someone will get a bot together, I'm hoping, to automagically fix these, I'm hoping... :) ) SQL(Query Me!) 08:34, 6 September 2007 (UTC)
Whew! I just did a couple hundred, someone else's turn :) Time for work, anyhow :) Thanks guys, for all the help so far! :) SQL(Query Me!) 20:07, 7 September 2007 (UTC)

Contents

[edit] Strike or delete?

Would it be ok if we deleted instead of striking when something is finished? The list will get shorter as make progress and if you are going to rerun the query periodically it is ok, if something accidentally falls off as it will be picked up next time around. Jeepday (talk) 03:16, 7 September 2007 (UTC)

Sure, deleting would be fine! :) I just liked to strike, that way it seemed like I was making more progress :) SQL(Query Me!) 04:09, 7 September 2007 (UTC)
The list are quite unmanageable, because of the length, so removing the entries would be preferable. Rettetast 19:51, 7 September 2007 (UTC)
Indeed, Great work, on the formatting of the lists! SQL(Query Me!) 20:07, 7 September 2007 (UTC)

[edit] Duplicates

There seem to be a lot of duplicates of list number 3 in number 4. Rettetast 20:18, 7 September 2007 (UTC)

Unfortunately, yes. Everywhere that there's a dup, was where I had to restart the bot.... I got something wrong in the code, where, it would start over from the last entry checked, instead of the entry after that, upon restart. Near the middle, I had to stop the process a few (dozen :) ) times, to tweak the code.... I got most of the dupes, but, I must have missed a few... Good catch! :) Thank you! SQL(Query Me!) 05:11, 8 September 2007 (UTC)

[edit] Error

Your bot added this article to the list even though it shouldn't be on the list. I think the reason that it ended up on the list is that it has "<!-- Citation for population statistic. Include <ref>-tags -->" written in the source code. To fix this you just have to get your bot to ignore the text in the comments, basically do this:

startSearchIndex = 0
start
if startSearchIndex is less than (<) sourceLength
    indexOfFirstComment = findIndexOfFirstMatchingStringAfter("<!--", startSearchIndex)
    indexOfFirstRef = findIndexOfFirstMatchingStringAfter("<Ref>", startSearchIndex)
    if  indexOfFirstComment comes before (<) indexOfFirstRef
         startIndex = findIndexOfFirstMatchingStringAfter("-->", startSearchIndex)
         goto start
    else add to list
else don't add to list

Also, feel free to add the task of going through the list on User:Jeffrey.Kleykamp/New User Study Guide.

Jeffrey.Kleykamp 01:23, 8 September 2007 (UTC)

Yeah, it's had a couple like that :) For the next run, I'll probably preg_match() out things in comments... Thanks, however, for the suggestion! :) SQL(Query Me!) 05:07, 8 September 2007 (UTC)
I just wanted to say that it sounds like you are going over each article individually (which is how my code above is organized), and now that I thought about it it seems like you could create a list of the "what links here" of both ref and reflist (and the other versions of reflist if they're different) and then compare the results and add every article that isn't on the second list. This solution would also catch any doubles since both lists would be sorted alphabetically. Jeffrey.Kleykamp 20:00, 8 September 2007 (UTC)
Ordinarily, yes, however, <ref> isn't a template, it's markup, so, no whatlinkshere on it :( SQL(Query Me!) 20:28, 8 September 2007 (UTC)
You'd at least be able to skip articles that are on the "what links here" of reflist, that should speed things up. Jeffrey.Kleykamp 20:31, 8 September 2007 (UTC)
Horray! I finally got this bug sorted. Gotta love regexes :) SQL(Query Me!) 06:36, 15 September 2007 (UTC)

[edit] Edit summary

I've started using the edit summary By way of User:SQL/Reflist. Maybe others will get curious and start helping out. Pairadox 01:40, 8 September 2007 (UTC)

Cool! :) I've been using something like Article has <ref> tags, but, no reflist. Fixing per User:SQL/Reflist.. I really like the one that was added to the main page here... I should start using that one :) BTW, thank you, for your help!!!! SQL(Query Me!) 05:09, 8 September 2007 (UTC)

[edit] Bot available

Hi SQL

I've got a bot (User:Chem-awb) which runs on AWB. You can take a look at the edits I've done. I wanted to give it a try but am not sure how. If you can give me some pointers on how to automate the process, I'd be more than willing to help you clear your list. :) --Rifleman 82 09:45, 8 September 2007 (UTC)

Sorry, I didn't notice this one! :( Unfortunately, I have zero experience with AWB... I've just requested access, so I can see what might be possible.... SQL(Query Me!) 17:41, 11 September 2007 (UTC)

[edit] New DB Dumps

Wikimedia just put out a new DB dump... So, I will probably be re-running the bot soon, against that one... Maybe, after we get more of these lists worked out, or, whenever everyone thinks it should happen. As of this one, I'm filtering out things not in namespace 0, and, I'm going to work on that comment bug :) SQL(Query Me!) 17:41, 11 September 2007 (UTC)

Better... Faster... Longer! :) Apparently, a lot can happen in a month. I'm on phase 2 of the new list, and, we're teetering at about 19,000 right now, with a ceiling of ~22,000. That's with improved filtering, in order to yeild less false positives. Nifty. SQL(Query Me!) 12:44, 12 September 2007 (UTC)
Heh. Slight bug :) I wrote a list sorter / parser / splitter.... It wound up duplicating every entry :) Fixed! 16,000 or so new articles listed! SQL(Query Me!) 19:54, 12 September 2007 (UTC)

[edit] User:SmackBot

Are you going to run a new query after smackbot gets done to see if anything is left? Jeepday (talk) 05:54, 15 September 2007 (UTC)

That sounds like a great idea :) However, the WikiMedia foundation only puts out Wikipedia database dumps once a month :( So, I'll probably be limited to running it monthly. SQL(Query Me!) 06:12, 15 September 2007 (UTC)
That will probably work. If smackbot is doing ok with it, you should only find a few and the bot will probably have cleaned them up before you find them. In case you do find work you are welcome to bring them to Wikipedia:Unreferenced articles either mention it on the talk page and/or make a child Wikipedia:Unreferenced articles/Reflist. The editors who work there have the skills and desire to address these kind of issues. Jeepday (talk) 06:19, 15 September 2007 (UTC)
Oh, I see what you're saying! :) I'd misunderstood... I'll run the 2nd phase again, overnight... I think Smackbot's done with this batch :) SQL(Query Me!) 06:20, 15 September 2007 (UTC)

[edit] New list up / Question

Man, Great work, User:Smackbot, and, everyone who's helped out with this project!!! :) The new list is only 564 entries.

I'm somewhat curious, however. In doing the code for the bot, I've come across a LOT of pages, as well, that have a {{reflist}} or similar tag, but, no <ref> tags. It would be relatively trivial, to do a list of those too.... What's eveyone think... Should I go for it? Would it be useful, as well? SQL(Query Me!) 06:11, 16 September 2007 (UTC)

I just took a look and it seems Rich is cleaning up those 564 with smackbot. So probally nothing to worry about. Jeepday (talk) 14:53, 16 September 2007 (UTC)
The pages with {{reflist}} or similar and no <ref> tags should all have at least one of the following. There are probably at least a hundred thousand of them.
  1. {{refimprove}} or {{unreferenced}} or similar
  2. an external link to a reference (not formatted to <ref>)
  3. listed book citations
A good many probably having the ==References== section with the {{reflist}} under it so when a reference gets added it will post, have the {{reflist}} without <ref> is fine, it does not do anything until there is <ref>. Most should get addressed as people work the {{reflist}} or similar tags. Jeepday (talk) 14:53, 16 September 2007 (UTC)