Wikipedia:Dead external links
From Wikipedia, the free encyclopedia
Active Wiki Fixup Projects |
---|
Leading the charge in the War on Error! Must be active, systematic, have lists, & need help. |
Writing |
Articles that need to be wikified Massive backlog. |
Dead-end pages |
Most wanted stubs (Updated from 2006 Jan 25 dump, still active as of 2007 May 4) |
Most wanted articles |
Missing articles Wikipedia is not as complete as you might think! |
Other |
Disambiguation pages with links Directing ambiguous links to the intended articles. |
Templates with red links Help solve red links in templates through writing or repair. |
Interlanguage links Add and improve interlanguage links in articles. |
Red Link Recovery |
Unreferenced articles Ensuring articles include at least one reference or source. |
Articles needing geo-coordinates Help locate places. See WP:GEO |
Uncategorised articles Help categorise articles. |
Orphaned articles Help link to these orphaned articles. |
Linkrot Fix broken links to external websites. |
Transwiki log cleanup Articles that have been transwikied and need to be checked for possible merging or deletion. |
Main - Inactive - Mini |
Like almost all large websites, Wikipedia also suffers from the phenomenon known as link rot, where external links go stale after a period of time. As of the November 6, 2006 database dump, Wikipedia contained 2,578,134 external links, and roughly 10% of these links are broken in some manner.
Contents |
[edit] Repairing
Dead links are unprofessional, and should be fixed on a regular basis. You can try to find the current location of the resource using a Google search. Dead links of online newspaper articles can be converted to references to off-line sources. Do not simply remove dead links; they often contain valuable information.
However, if unsuccessful tag the link with {{dead link}} which will notify other editors that the link is dead and optionally provide a link to the Internet Archive. See Wikipedia:Citing sources#What to do when a reference link "goes dead" and Wikipedia:Using the Wayback Machine.
This page is intended to be a clearing house for all such external links. If you make corrections to the source article to fix a broken link, please indicate so below to prevent a duplication of effort. Also use the following edit summary can help increase the awareness of the problem:
Fixed broken links to external websites; [[Wikipedia:Dead external links|you can help too!]]
[edit] Status codes
Although the sections below contain a short description of the status code in question, please see the list of HTTP status codes for a more complete description.
[edit] 200
The 200 status code indicates that the link is correctly formed, and retrievable. Although such links do not need correction, they are included here for completeness. Wikipedia currently contains 2,171,863 of these links. Due to the sheer number of links that correctly resolve, these are not available for download.
[edit] 300
Indicates that the website requested more information from the bot so that it could make an appropriate presentation of the content. Although such links are most likely correct, they should probably be double checked. Wikipedia currently contains 143 of these links.
[edit] 301
Indicates that the content has been moved permanently, and that the link inside Wikipedia should probably be updated to reflect the new location. Although this should not be changed for all sites as some sites use 301 redirects to redirect pages that change their destination often. Wikipedia currently contains 84,303 of these links.
[edit] 302, 303, 307
Indicates that the content has been temporarily moved, and that the client should continue to use the original link. Although these links should be correct in theory, they are often used by link farms, and should probably be checked. Wikipedia currently contains 146,643 status 302 links, 1567 status 303 links, and 88 status 307 links.
[edit] 400
Indicates that the site in question could not understand the bot's request. Although these should hopefully diminish with future revisions of the bot, it may be useful to test them, anyway (low priority). Wikipedia currently contains 1,604 of these links. Note: links with anchors and HTML entities should be ignored (see talk page).
[edit] 401
The page required authorization, which the bot does not support. The page in question may have included login information, the bot has no way of knowing this. Such links should be fixed if the page does not contain login information. Wikipedia currently contains 672 status 401 links.
[edit] 402
Although not an active status code, the servers used it anyway. It indicates that the server requested payment (in theory) from the client. Such links should be fixed. Wikipedia currently contains 4 of these links.
[edit] 403
"Forbidden" - this generally indicates the server software itself cannot access the location where the file would be found, or that access to that location is not permitted from the internet under any circumstance - login or authorization information will not change things. Some for-pay reference sites, such as http://www.jstor.org/, might give partial access in the response (e.g. display the first page), which might still be useful. Often a symptom of link rot. Such links should be fixed. Wikipedia currently contains 7,984 status 403 links.
[edit] 404, 410
The 404 error is the most common symptom of link rot, and it indicates that the page has not been found. The 410 status code is similar, but indicates that the file has permanently gone. Such links are required by policy to be repaired, perhaps with a link to the Internet Archive, or by finding the current location of the page if it has been moved without a forwarding redirect. Wikipedia currently contains 92,808 status 404 links and 229 status 410 links.
[edit] 406
Occurs for a number of reasons, indicates that the client request was unacceptable in some manner. Should probably be fixed. Wikipedia currently contains 1,521 of these links.
[edit] 409
Indicates some sort of error that the client needs to resolve. Should probably be fixed. Wikipedia currently contains 1 of these links.
[edit] 423
Although not an active status code, servers use it to indicate some sort of "Locked" error. Wikipedia currently contains 6 of these links.
[edit] 425
Another non-active status code from a single server, http://www.worldofspectrum.org/ . The message it returned at that time was "Mirroring Denied", but those links work OK now. See also Apache docs which indicate a message of "No code", indicating a server misconfiguration.
[edit] 5xx
Indicates there was some sort of internal server error. This could be the result of a malformed bot HTTP request, or numerous other reasons. Should be examined to determine whether the site is suffering from some sort of permanent problem with the link in question. Wikipedia currently contains 17,625 status 500 links, 22 status 501 links, 481 status 502 links, and 714 status 503 links.
[edit] NA - Unsupported protocol
Indicates that the link was used a protocol such as IRC, Gopher, etc. that the bot is not capable of resolving. Should be checked as to whether the resource type is correct (e.g. htttp://www.wikipedia.org instead of http). Wikipedia currently contains 331 of these links.
[edit] NA - Unknown error
Indicates that the bot had some sort of difficulty resolving the link in question. Could be caused by a number of errors: DNS lookup failures, socket timeouts, etc. The default socket timeout was set to 30 seconds, which may be too low for some very slow sites. Should probably be tested. Wikipedia currently contains 48,600 of these links.
[edit] Downloads
Below are links to download tab separated text files (gzip compressed) containing the links. They are in the form:
Article title, [tab], URL, [tab], further description (as in [http://www.wikipedia.org/ Wikipedia] links), [tab], error code, [tab], server response. These should probably be located to somewhere more permanent in the future.
200 (not available)
400 - 401 - 402 - 403 - 404 - 406 - 409 - 410 - 423 - 425
NA (Unsupported protocol) - NA (Unknown error)
The 404 errors have pages to themselves. These have now been updated to reflect the November 6, 2006 database update:
- misc, 2964 entries
- a, 5987 entries
- b, 4723 entries
- c, 6298 entries
- d, 4179 entries
- e, 3013 entries
- f, 2939 entries
- g, 3322 entries
- h, 3770 entries
- i, 2179 entries
- j, 4467 entries
- k, 2312 entries
- l, 6347 entries
- m, 6672 entries
- n, 3375 entries
- o, 1806 entries
- p, 4295 entries
- q, 224 entries
- r, 3808 entries
- s, 7540 entries
- t, 5535 entries
- u, 1592 entries
- v, 1195 entries
- w, 2686 entries
- x, 48 entries
- y, 481 entries
- z, 328 entries
Please indicate your correction status in the form "123: ABC - XYZ", eg, "404: African Academy of Sciences - anonymous remailer"
[edit] See also
- User:Dispenser/Link checker, a tool running on the Toolserver which checks the status of external links and has a link editing interface with option on repair.
- weblinkchecker.py, a script from the Python Wikipedia Bot collection that finds and reports external links that are no longer available.
- Wikipedia:External links
- Wikipedia:WikiProject Spam