Wikipedia talk:WikiProject Deletion sorting/Accuracy reports

From Wikipedia, the free encyclopedia

[edit] Accuracy report

I had a little extra time today, so I went through the full Aug. 20 output and checked for errors. I found that the program had sorted correctly on about 110 occasions, and incorrectly in about 20 (give or take a few). This is very promising, but it also means there are significant hurdles to cross before the process can be fully automated (I would say a 10% error rate is the most that we could possibly accept, and we're currently at about 19%).

Most of the errors are due either to:

  • shakily-defined sortpages (for instance, I counted it as an error when it sorted a book under "Writing" as well as "Publications"), or to
  • keywords with multiple meanings (such as a professor at Washington University being sorted into "Washington").

The first we can deal with fairly easily (see above); the second requires either a) an incalculable amount of fine-tuning, or b) a much more sophisticated approach. I'm pinning my hopes on b), and am working on a corpus of sorted stubs for automated keyword extraction; however, I'm also working on a) as time permits.

Suggestions for improvements to the approach are welcome. -- Visviva 13:07, 21 August 2006 (UTC)

Update: Progress! Following some tweaks to the searching routine and sortpage structure, I counted 116 reasonable placements and 15 unreasonable ones today (from the Aug 21 AfDs.). That's 12.9%, which is almost up to the threshold. 4 of the errors came from the eternally problematic "Lists" and "Words" sortpages -- if we left those out, we'd already have an error rate of under 10%.
A total of 8 errors were due to egregious flaws in the Wikipedia category structure, but such problems are probably inevitable. -- Visviva 12:17, 22 August 2006 (UTC)
The last two days have had error rates below 10%, though at the cost of reduced output (nearly half of the AfDs have gone unsorted). -- Visviva 17:42, 24 August 2006 (UTC)
Update Aug 26: Various improvements to the code which I won't burden this page with... The Aug 25 sort had (by my count) about 178 correct sortings, and 15 clearly incorrect sortings; 38 pages were left unsorted. That's better than 90% accuracy and around 75% overall coverage -- not too shabby. Now I just need to actually sort them all. Where's a bot when you need one... :-)
This program is only as smart as its keyword set. At present the keyword set is entirely hand-built, and accordingly clunky and incomplete. I'm hoping to bring in data from my corpus of stubs soon -- that should allow substantial improvements in accuracy and coverage.
(I should add that I'm counting a sorting as accurate if it falls under the rubric of "X-related" deletions -- in other words, a web applications company is website-related, although not actually a website. This is in line with Template:Deletionlist.) -- Visviva 03:22, 26 August 2006 (UTC)
Update Aug 27: Without any ad-hoc changes, the sort of yesterday's AfD's showed only 11 clear errors out of 158 sorting decisions, for an accuracy of about 93%. Coverage was 97 out of 129, or about 75%. -- Visviva 05:54, 27 August 2006 (UTC)
August 28: With some ongoing tweaks to the keyword set and the code, the final tally was 5 errors (maybe I'm being too generous?) out of 170 decisions (~97%). Coverage was 99 out of 118 (~84%). However, the original run with buggy code and untested keywords had about 5 more errors (~93% acc.) and 5 fewer inclusions (~80%) ... time will tell which is closer to the mark. -- Visviva 06:12, 28 August 2006 (UTC)
August 29: (Aug. 28th AfDs) The initial run gives about 11 errors out of 161 decisions, and 102 inclusions out of 138 articles scanned -- about 93% accuracy and 74% inclusion. Despite continuing tweaks to the code, we seem to be facing something of a plateau here. Errors continue to be concentrated in the larger and messier topical lists, especially Business, Sexuality, and Military (these three alone account for 6 of the 11 errors). -- Visviva 08:05, 29 August 2006 (UTC)