Wikipedia:Dead external links

From Wikipedia, the free encyclopedia

Shortcuts:
WP:LINKROT
WP:DEADLINK
Active Wiki Fixup Projects
Leading the charge in the War on Error!
Must be active, systematic, have lists, & need help.
Writing
Articles that need to be wikified

Massive backlog.
(Category, live update)

Dead-end pages

These pages are not wikified.
(Updated 2007 Nov 3)

Most wanted stubs

(Updated from 2006 Jan 25 dump, still active as of 2007 May 4)

Most wanted articles

(Updated 2007 Sep 8)

Missing articles

Wikipedia is not as complete as you might think!
(ongoing)

Other
Disambiguation pages with links

Directing ambiguous links to the intended articles.
(Updated 2008 March 12)

Templates with red links

Help solve red links in templates through writing or repair.
(Updated 2007 December 5)

Interlanguage links

Add and improve interlanguage links in articles.
(Updated 2007 August 25)

Red Link Recovery

Repair red links in articles.
(Updated 2007 July 2)

Unreferenced articles

Ensuring articles include at least one reference or source.
(Category, live update)

Articles needing geo-coordinates

Help locate places. See WP:GEO
(Category, live update)

Uncategorised articles

Help categorise articles.
(Category, live update)

Orphaned articles

Help link to these orphaned articles.
(Category, live update)

Linkrot

Fix broken links to external websites.
(Updated 2007 Jan 13)

Transwiki log cleanup

Articles that have been transwikied and need to be checked for possible merging or deletion.

Main - Inactive - Mini

Like almost all large websites, Wikipedia also suffers from the phenomenon known as link rot, where external links go stale after a period of time. As of the November 6, 2006 database dump, Wikipedia contained 2,578,134 external links, and roughly 10% of these links are broken in some manner.

Contents

[edit] Repairing

Dead links are unprofessional, and should be fixed on a regular basis. You can try to find the current location of the resource using a Google search. Dead links of online newspaper articles can be converted to references to off-line sources. Do not simply remove dead links; they often contain valuable information.

However, if unsuccessful tag the link with {{dead link}} which will notify other editors that the link is dead and optionally provide a link to the Internet Archive. See Wikipedia:Citing sources#What to do when a reference link "goes dead" and Wikipedia:Using the Wayback Machine.

This page is intended to be a clearing house for all such external links. If you make corrections to the source article to fix a broken link, please indicate so below to prevent a duplication of effort. Also use the following edit summary can help increase the awareness of the problem:

Fixed broken links to external websites; [[Wikipedia:Dead external links|you can help too!]]

[edit] Status codes

Although the sections below contain a short description of the status code in question, please see the list of HTTP status codes for a more complete description.

[edit] 200

The 200 status code indicates that the link is correctly formed, and retrievable. Although such links do not need correction, they are included here for completeness. Wikipedia currently contains 2,171,863 of these links. Due to the sheer number of links that correctly resolve, these are not available for download.

[edit] 300

Indicates that the website requested more information from the bot so that it could make an appropriate presentation of the content. Although such links are most likely correct, they should probably be double checked. Wikipedia currently contains 143 of these links.

[edit] 301

Indicates that the content has been moved permanently, and that the link inside Wikipedia should probably be updated to reflect the new location. Although this should not be changed for all sites as some sites use 301 redirects to redirect pages that change their destination often. Wikipedia currently contains 84,303 of these links.

[edit] 302, 303, 307

Indicates that the content has been temporarily moved, and that the client should continue to use the original link. Although these links should be correct in theory, they are often used by link farms, and should probably be checked. Wikipedia currently contains 146,643 status 302 links, 1567 status 303 links, and 88 status 307 links.

[edit] 400

Indicates that the site in question could not understand the bot's request. Although these should hopefully diminish with future revisions of the bot, it may be useful to test them, anyway (low priority). Wikipedia currently contains 1,604 of these links. Note: links with anchors and HTML entities should be ignored (see talk page).

[edit] 401

The page required authorization, which the bot does not support. The page in question may have included login information, the bot has no way of knowing this. Such links should be fixed if the page does not contain login information. Wikipedia currently contains 672 status 401 links.

[edit] 402

Although not an active status code, the servers used it anyway. It indicates that the server requested payment (in theory) from the client. Such links should be fixed. Wikipedia currently contains 4 of these links.

[edit] 403

"Forbidden" - this generally indicates the server software itself cannot access the location where the file would be found, or that access to that location is not permitted from the internet under any circumstance - login or authorization information will not change things. Some for-pay reference sites, such as http://www.jstor.org/, might give partial access in the response (e.g. display the first page), which might still be useful. Often a symptom of link rot. Such links should be fixed. Wikipedia currently contains 7,984 status 403 links.

[edit] 404, 410

The 404 error is the most common symptom of link rot, and it indicates that the page has not been found. The 410 status code is similar, but indicates that the file has permanently gone. Such links are required by policy to be repaired, perhaps with a link to the Internet Archive, or by finding the current location of the page if it has been moved without a forwarding redirect. Wikipedia currently contains 92,808 status 404 links and 229 status 410 links.

[edit] 406

Occurs for a number of reasons, indicates that the client request was unacceptable in some manner. Should probably be fixed. Wikipedia currently contains 1,521 of these links.

[edit] 409

Indicates some sort of error that the client needs to resolve. Should probably be fixed. Wikipedia currently contains 1 of these links.

[edit] 423

Although not an active status code, servers use it to indicate some sort of "Locked" error. Wikipedia currently contains 6 of these links.

[edit] 425

Another non-active status code from a single server, http://www.worldofspectrum.org/ . The message it returned at that time was "Mirroring Denied", but those links work OK now. See also Apache docs which indicate a message of "No code", indicating a server misconfiguration.

[edit] 5xx

Indicates there was some sort of internal server error. This could be the result of a malformed bot HTTP request, or numerous other reasons. Should be examined to determine whether the site is suffering from some sort of permanent problem with the link in question. Wikipedia currently contains 17,625 status 500 links, 22 status 501 links, 481 status 502 links, and 714 status 503 links.

[edit] NA - Unsupported protocol

Indicates that the link was used a protocol such as IRC, Gopher, etc. that the bot is not capable of resolving. Should be checked as to whether the resource type is correct (e.g. htttp://www.wikipedia.org instead of http). Wikipedia currently contains 331 of these links.

[edit] NA - Unknown error

Indicates that the bot had some sort of difficulty resolving the link in question. Could be caused by a number of errors: DNS lookup failures, socket timeouts, etc. The default socket timeout was set to 30 seconds, which may be too low for some very slow sites. Should probably be tested. Wikipedia currently contains 48,600 of these links.

[edit] Downloads

Below are links to download tab separated text files (gzip compressed) containing the links. They are in the form:

Article title, [tab], URL, [tab], further description (as in [http://www.wikipedia.org/ Wikipedia] links), [tab], error code, [tab], server response. These should probably be located to somewhere more permanent in the future.

200 (not available)

300 - 301 - 302 - 303 - 307

400 - 401 - 402 - 403 - 404 - 406 - 409 - 410 - 423 - 425

500 - 501 - 502 - 503

NA (Unsupported protocol) - NA (Unknown error)

The 404 errors have pages to themselves. These have now been updated to reflect the November 6, 2006 database update:

  • misc, 2964 entries
  • a, 5987 entries
  • b, 4723 entries
  • c, 6298 entries
  • d, 4179 entries
  • e, 3013 entries
  • f, 2939 entries
  • g, 3322 entries
  • h, 3770 entries
  • i, 2179 entries
  • j, 4467 entries
  • k, 2312 entries
  • l, 6347 entries
  • m, 6672 entries
  • n, 3375 entries
  • o, 1806 entries
  • p, 4295 entries
  • q, 224 entries
  • r, 3808 entries
  • s, 7540 entries
  • t, 5535 entries
  • u, 1592 entries
  • v, 1195 entries
  • w, 2686 entries
  • x, 48 entries
  • y, 481 entries
  • z, 328 entries

Please indicate your correction status in the form "123: ABC - XYZ", eg, "404: African Academy of Sciences - anonymous remailer"

[edit] See also

[edit] Status