Link rot

From Wikipedia, the free encyclopedia

Link rot is the process by which links on a website gradually become irrelevant or broken as time goes on, because websites that they link to disappear, change their content or redirect to new locations.

The phrase also describes the effects of failing to update webpages so that they become out-of-date, containing information that is old and useless, and that clutters up search engine results. This process most frequently occurs in personal web pages and is prevalent in free webhosts such as GeoCities, where there is no financial incentive to fix link rot.

1 Prevalence
2 Discovering
3 Modern management
4 Combating
5 References
6 See also
7 External links

[edit] Prevalence

The 404 “not found” response is familiar to even the occasional Web user. A number of studies have examined the prevalence of linkrot on the Web, in academic literature, and in digital libraries. In a 2003 experiment, Fetterly et al. (2003) discovered that about 0.5% of web pages disappeared each week. McCown et al. (2005) discovered that half of the URLs cited in D-Lib Magazine articles were no longer accessible 10 years after publication, and other studies have shown link rot in academic literature to be even worse (Spinellis, 2003, Lawrence et al., 2001). Nelson and Allen (2002) examined link rot in digital libraries about found that 3% of the objects were no longer accessible after one year.

[edit] Discovering

Detecting link rot for a given URL is difficult using automated methods. If a URL is accessed and returns back an HTTP 200 (OK) response, it may be considered accessible, but the contents of the page may have changed and may no longer be relevant. Some web servers also return a soft 404, a page returned with a 200 (OK) response (instead of a 404) that indicates the URL is no longer accessible. Bar-Yossef et al. (2004) developed a heuristic for automatically discovering soft 404s.

[edit] Modern management

On Wikipedia, and other Wiki-based websites, broken external links still present a maintenance problem. Wikipedia uses a clear color system with internal links, so the user can see if the link is live before clicking on it. If referencing an old website or dated information, users can externally link to pages using a web archiving service, allowing for a reliable permanent link.

[edit] Combating

[edit] Webmasters

Webmasters have developed a number of best-practices for combating link rot:

Avoiding unmanaged hyperlink collections
Avoiding links to pages deep in a website ("deep linking")
Using hyperlink checking software or a Content Management System (CMS) that automatically checks links
Using permalinks
Using HTTP mechanisms (e.g. "301: Moved Permanently") to automatically refer browsers and crawlers to the new location of a URL

[edit] Authors citing URLs

A number of studies have shown how wide-spread link rot is in academic literature (see below). Authors of scholarly publications have also developed best-practices for combating link rot into their work:

Avoiding URL citations that point to resources on a researcher's personal home page (McCown et al., 2005)
Using Persistent Uniform Resource Locators (PURLs) and digital object identifiers (DOIs) whenever possible
Using web archiving services (e.g. WebCite) to permanently archive and retrieve cited Internet references (Eysenbach and Trudel, 2005).

[edit] Tools

There are a number of tools that have been developed for the general public to archive web resources that may go missing in the future:

WebCite, a tool specifically for scholarly authors, journal editors and publishers to permanently archive "on-demand" and retrieve cited Internet references (Eysenbach and Trudel, 2005).
Archive-It, a subscription service, allows institutions to build, manage and search their own web archive
hanzo:web is a personal web archiving service created by Hanzo Archives that can archive a single web resource, a cluster of web resources, or an entire website, as a one-off collection, scheduled/repeated collection, an RSS/Atom feed collection or collect on-demand via Hanzo's open API.

[edit] References

[edit] Link rot on the Web

Ziv Bar-Yossef, Andrei Z. Broder, Ravi Kumar, and Andrew Tomkins (2004). "Sic transit gloria telae: towards an understanding of the Web’s decay". Proceedings of the 13th international conference on World Wide Web, 328–337.

Tim Berners-Lee (1998). "Cool URIs Don’t Change".

Gunther Eysenbach and Mathieu Trudel (2005). "Going, going, still there: using the WebCite service to permanently archive cited web pages". Journal of Medical Internet Research 7 (5).

Dennis Fetterly, Mark Manasse, Marc Najork, and Janet Wiener (2003). "A large-scale study of the evolution of web pages". Proceedings of the 12th international conference on World Wide Web.

Wallace Koehler (2004). "A longitudinal study of web pages continued: A consideration of document persistence". Information Research 9 (2).

John Markwell and David W. Brooks (2002). "Broken Links: The Ephemeral Nature of Educational WWW Hyperlinks". Journal of Science Education and Technology 11 (2): 105-108.

[edit] In academic literature

Robert P. Dellavalle, Eric J. Hester, Lauren F. Heilig, Amanda L. Drake, Jeff W. Kuntzman, Marla Graber, Lisa M. Schilling (2003). "Going, Going, Gone: Lost Internet References". Science 302 (5646): 787-788.

Steve Lawrence, David M. Pennock, Gary William Flake, Robert Krovetz, Frans M. Coetzee, Eric Glover, Finn Arup Nielsen, Andries Kruger, C. Lee Giles (2001). "Persistence of Web References in Scientific Research". Computer 34 (2): 26-31.

Frank McCown, Sheffan Chan, Michael L. Nelson, and Johan Bollen (2005). "The Availability and Persistence of Web References in D-Lib Magazine". Proceedings of the 5th International Web Archiving Workshop and Digital Preservation (IWAW'05).

Diomidis Spinellis (2003). "The Decay and Failures of Web References". Communications of the ACM 46 (1): 71-77.

[edit] In digital libraries

Michael L. Nelson and B. Danette Allen (2002). "Object Persistence and Availability in Digital Libraries". D-Lib Magazine 8 (1).

[edit] See also

[edit] External links

Future-Proofing Your URIs
Jakob Nielsen on link rot
Warrick - a tool for recovering lost websites from the Internet Archive and search engine caches

Retrieved from "http://en.wikipedia.org../../../l/i/n/Link_rot.html"

Category: World Wide Web