User talk:Dispenser/Link checker

From Wikipedia, the free encyclopedia

1 Readability page
2 suggestion for the click on link checker
3 monobook script
4 Ignore list
5 Readability tool
6 Featured Article Candidates
7 Suggestions
8 KeyError
9 Error checking Degrassi: The Next Generation
10 False deadlinks?

[edit] Readability page

What do the highlighed red words signify? (e.g. here) Harryboyles 06:17, 6 January 2008 (UTC)

They're for debugging the sentence counter which just looks for periods. —Dispenser (talk) 08:17, 6 January 2008 (UTC)

[edit] suggestion for the click on link checker

add a separate column, and/or a button that says details, which upon pressing, expands the link as is being done now by clicking anywhere. —Preceding unsigned comment added by Nergaal (talk • contribs) 04:10, 16 January 2008 (UTC)

Thanks for the direction, while I've been trying to keep the interface as clutter free as possible. I think the icon seems to be the necessary hint. But we'll have to see in the logs (its been 13 hit of 100 utility hit w/ the message). —Dispenser (talk) 03:45, 18 January 2008 (UTC)

[edit] monobook script

Hi. The monobook script doesn't seem to be working for me. It doesn't appear in either Firefox or IE7. I am looking in the right place, right? At the toolbox on the left? Matthew | talk | Contribs 01:02, 12 February 2008 (UTC)

Done I had been loading the function dynamically before. It should show up bellow the search box and labeled as "Check external links". — Dispenser 03:09, 12 February 2008 (UTC)

It does. Good job! -- Matthew | talk | Contribs 03:54, 12 February 2008 (UTC)

[edit] Ignore list

Sorry if this is mentioned somewhere, but I don't see it. What URLs would be on the ignore list or more appropriately, why are CNN links on the URL Ignore list? Phydend (talk) 19:46, 18 February 2008 (UTC)

ignorelist = [
    re.compile(r'.*[\./@]example.(com|net|org)(/.*)?'), # reserved for documentation
    re.compile(r'.*[\./@]tools.wikimedia.(org|de)/.*'), # So we don't end up calling ourself
    re.compile(r'.*[\./@]wikimedia.org/.*'),            # Wikipedia media repository
    re.compile(r'.*[\./@]archive.org(/.*)?'),           # Prevent downloading of media
    re.compile(r'.*[\./@]cnn.com(/.*)'),                # CNN has firewalled us
]

Basically CNN had put a rule in their firewall config to drop all packets from the Toolserver. This caused requests which queried CNN to timeout, which take about 5 minutes. — Dispenser 23:19, 18 February 2008 (UTC)

Alright, that makes sense. Thanks for the quick response, I was just wondering. Phydend (talk) 01:21, 19 February 2008 (UTC)

[edit] Readability tool

Sometimes the tool comes back with statistics fairly quickly (For example; Introduction to evolution, Bees and toxic chemicals and Dog), but othertimes seems to be slow, so slow that it might be broken (Evolution for example). What is going on? Are those articles just too complicated? Is something else wrong?--Filll (talk) 01:23, 20 February 2008 (UTC)

Fixed I made an optimization that I shouldn't have in template removal. I do not put much credence to the tool and have ceased any serious development. Problems begin with the syllable counter doesn't use a dictionary or known algorithms. The readability algorithms were based on their respective Wikipedia articles which have errors, are simplified, and/or were incorrect. Additionally, the readability algorithms have a standard deviation of roughly 1½ for 1 interval, i.e. accurate to within ±1.5 for 68% of people. — Dispenser 05:58, 20 February 2008 (UTC)

[edit] Featured Article Candidates

Dispenser,

First, congratulations on a great tool; very useful.

Now the bad bit ;) Where does the tool get the FAC list from? I ask because it doesn't seem to be up-to-date for the list of all current candidates. Is it a manual job to update it? Cheers. Carré (talk) 11:47, 6 March 2008 (UTC)

It runs automaticlly starting at 5:00 UTC, using the category list created from /config template. It uses the HTML output from the page and the runs a regex on it to get the pages from the linked headers. The part has been working for some time now. However, it seems as though there is a caching issue somewhere as it continues to get 1½ month old version of the page. I've changed the address to the purge page in hopes that will resolve the issue. It'll solve it in the short term, we will see if it fixes the problem in 6 months from now. — Dispenser 04:01, 7 March 2008 (UTC)

[edit] Suggestions

I don't really think a sortable table would help that much. A big pink notice that a site is one of the host sites like members.aol.com/geocities/etc. might be nice, but its so easy now to get a domain name that it's easy to hide if you know what you're doing. Being generally clueless on the sort of programing that you can do with Wikipedia, would it be possible to highlight if the word "blog" was on the page? Or other words that should throw up a red flag? If this isn't possible or would be too hard, I totally understand. In the totally dreaming realm I'd also love something that would see the list of refs and see if they are using cite web and check that they have publisher and last access date used so that I can easily pull up a list of citations missing those two parameters. That's easily the one thing that gets dropped the most. Ealdgyth - Talk 15:45, 14 April 2008 (UTC)

I've changed how templates are handled so they're more flexible. It will display the {{cite web}} information in single {} and italics. Is this alright? The blog thing seems hard to do since it isn't easy equating a single link with a word that appear outside that link (i.e. intro talk about somebody's blog, and its in reference to link number 10). — Dispenser 03:48, 20 April 2008 (UTC)

By the way, it's working wonderfully for me know. I love the fact that I can see the root domain name, that helps SOOOO much! Thanks for all the work you do on this. It's very much appreciated. Ealdgyth - Talk 14:50, 18 May 2008 (UTC)

[edit] KeyError

When trying to do a check on a page with an unusual character (such as é), the python script gets confused and throws up a KeyError. — Wackymacs (talk) 16:52, 14 April 2008 (UTC)

Fixed Thanks! — Dispenser 03:48, 20 April 2008 (UTC)

[edit] Error checking Degrassi: The Next Generation

I've just noticed this error that was not occurring in the last few days. [1]

It repeatedly brings back both GLAAD links as not working, however, when I clicked on them to check myself, they do work. Cheers! -- ṃ•α•Ł•ṭ•ʰ•Ə•Щ• @ 03:18, 29 April 2008 (UTC)

I am unable to duplicate your results, I have found other bugs but both GLAAD links continue to popup with rank 0. Perhaps it was a server or during the weekend development of the tool. — Dispenser 02:59, 30 April 2008 (UTC)

Yup. Just ran it 3 times, and it's all fine. Thanks for checking though. -- ṃ•α•Ł•ṭ•ʰ•Ə•Щ• @ 03:08, 30 April 2008 (UTC)

[edit] False deadlinks?

Someone recently ran the tool against an article and it reported a dead link [2]. I manually checked the link in question, and it is good. I also tried to run the tool, and got the same false reading of a deadlink. The link is on the New York Times website, I'm wondering if they might be filtering traffic of this nature? Yngvarr (c) 12:53, 18 May 2008 (UTC)

I check the page earlier today and the link in question did not show up with a red row. I suspect that you misinterpreted the «dead link» template notice and have changed to the more traditional {{dead link}} format. I suspect that the user that made the edit in question had merely played around with the options options. An alternate possibility is NYT site was down. I have to add a history mecanisim in the future. — Dispenser 05:29, 19 May 2008 (UTC)