User:Rich Farmbrough/Disambig scripts
From Wikipedia, the free encyclopedia
[edit] Scripting
Note these scripts work with the SQL databse dumps, not with the XML dumps. Create this perl script dab.pl
#!/usr/bin/perl while (<>) { @lines=split /\[INSERT INTO \`cur\` VALUES \(|\d\'\),\(|\d\'\);\n/; foreach $line (@lines){ $line =~ m/\d+,(\d+),'(.+?[^\\])','(.+?[^\\])','/; $space=$1; $name=$2; $text=$3; if ($space==0) { if ($text =~ m/\{\{disambig\}\}/){ print $name, "\n"; } elsif ($text =~ m/\{\{msg:disambig\}\}/){ print $name, "\n"; } } } }
run
perl dab.pl ddddddddd_cur_table.sql > dab.txt
Where dddddd is the appropriate date (Takes a few minutes, I didn't time it.)
then create countdab.pl
#!/usr/bin/perl %dab=(); open (DAB,"dab.txt"); while (<DAB>){ chomp(); $dab{$_}=0; } $i=0; while (<>) { @lines=split /\[INSERT INTO \`cur\` VALUES \(|\d\'\),\(|\d\'\);\n/; foreach $line (@lines){ $line =~ m/\d+,(\d+),'(.+?[^\\])','(.+?[^\\])','/; if ($1==0){ $_=$3; @links= /\[\[(.*?)(?:\||\]\])/g; foreach $link (@links){ if ( exists $dab{$link} ) { $dab{$link}++; } } } } print STDERR ".",++$i; } $i=0; foreach $key (sort { $dab{$b} <=> $dab{$a} } keys %dab) { print "# [[", $key,"]] ([[Special:Whatlinkshere/",$key,"|links]] to ",$dab{$key}," articles)\n"; if ($i++>200) {last;} }
and run with something like
perl dabcount.pl dddddddddd_cur_table.sql > count.txt
where dddddddddd is the appropriate date. (takes about twenty minutes)
and you have your result. It's not perfect because it ignores nowiki, comments etc. but for a disambiguation league table it's good enough.