LOCKSS

From Wikipedia, the free encyclopedia

Lots Of Copies Keep Stuff Safe ™

The LOCKSS project, under the auspices of Stanford University, develops and supports an open source system allowing libraries to collect, preserve and provide their readers with access to material published on the Web. The system attempts to replicate the way libraries do this for material published on paper. It was originally designed for scholarly journals, but is now also used for a range of other materials. Examples include the Solinet project to preserve theses and dissertations at eight universities, and the MetaArchive project preserving at-risk digital content about the culture and history of the American South.

Traditionally, academic libraries will retain issues of scholarly journals, either individually or collaboratively, providing their readers accesss to the content received, even after the publisher has ceased or the subscription has been canceled. In the digital age, libraries often subscribe to journals that are only available digitally over the Internet. Although convenient at the time, this presents a problem for the preservation of data. If either the publisher ceases to publish, or the library cancels the subscription, the content that was previously paid for is no longer available.

The LOCKSS system allows a library, with permission from the publisher, to collect, preserve and disseminate to its own readers its own copy of material to which it has subscribed, and open access material (perhaps published under a Creative Commons license). Each library's system collects its copy using a specialized web crawler that verifies that the publisher has granted suitable permission. The system is format-agnostic, collecting whatever formats the publisher delivers via HTTP. Libraries which have collected the same material cooperate in a peer-to-peer network to ensure its preservation. Peers in the network vote on cryptographic hash functions of preserved content and a nonce; a peer that is outvoted regards its copy as damaged and repairs it from the publisher or other peers. For details, see papers published at SOSP 2003 and USENIX 2005.

The LOCKSS license used by most publishers allows a library's readers access to its own copy, but does not allow similar access to other libraries or unaffiliated readers; the system does not support file sharing. On request, a library may supply another library with content to effect a repair, but only if the requesting library proved in the past that it had a good copy by voting with the majority. If the reader's browser no longer supports the format in which the copy was collected, a format migration process can convert it to a current format. These limits on the use that may be made of preserved copies of copyright material have been effective in persuading copyright owners to grant the necessary permission.

The LOCKSS approach of selective collection with permission from the publisher, distributed storage, and restricted dissemination contrasts with, for example, the Internet Archive's approach of omnivorous collection without permission from the publisher, centralized storage, and unrestricted dissemination. The LOCKSS system is far smaller, but it can preserve subscription materials to which the Internet Archive has no access.

The fact that each library administers its own LOCKSS peer, and maintains its own copy of preserved material, and the fact that there are libraries doing so worldwide (see the list of participating libraries below), provides a much higher degree of replication (computer science) than is usual in a fault-tolerant system. The voting process makes use of this high degree of replication to eliminate the need for backups to off-line media, and to provide robust defenses against attacks aimed at corrupting preserved content.

In addition to their role in preserving access, libraries have traditionally made it difficult to rewrite or suppress printed material. The existence of an indeterminate but large number of identical copies on a somewhat tamper-resistant medium under many independent administrations meant that attempts to alter or remove all copies would likely both fail and be detected. Web publishing, based on a single copy under a single administration, provides none of these safeguards against subversion. Web publishing is, therefore, a suitable tool for Winston Smith's job of rewriting history. By preserving many copies under diverse administration, by automatically auditing the copies at intervals against each other (and, in the future, against the publisher's copy), and by alerting libraries when changes are detected, the LOCKSS system attempts to restore many of these safeguards.

The source code for the entire LOCKSS system carries BSD-style open-source licenses and is available from SourceForge. LOCKSS is a trademark of Stanford University.

[edit] See also

[edit] External links