Persistent Uniform Resource Locator
From Wikipedia, the free encyclopedia
On the Internet, a persistent uniform resource locator (PURL) is a Uniform Resource Locator (URL) (i.e. location-based Uniform Resource Identifier or URI) that does not directly describe the location of the resource to be retrieved, but instead describes an intermediate (more persistent) location which, when retrieved, results in redirection (e.g. via a 302 HTTP code) to the current location of the final resource.
PURLs are an interim measure - while Uniform Resource Names (URNs) are being mainstreamed - to solve the problem of the lack of persistence (over time) of URIs in location-based URI schemes like HTTP. Currently, persistence problems are caused by the practical impossibility of every user having their own domain name, and the hassle and money involved in re-registering domain names, that result in WWW authors putting their documents in rather arbitrary locations of questionable persistence (i.e. wherever they can get the WWW space). Existing official PURLs (on Purl.Org) will probably be mapped to a URN namespace at a later date.
Contents |
[edit] Principles of operation
The oldest PURL HTTP server can be reached as purl.oclc.org as well as purl.org, purl.net, and purl.com.
PURLs are organized into "domains" like directory paths, e.g. /net/scape is the "subdomain" scape of "domain" net, and has itself "subdomains" like about. These "domains" are unrelated to Internet domains, their purpose is to define one or more "maintainers". The maintainers can grant write access to ALL or other registered users, e.g. "domain" net is open for ALL registered users.
The PURL server or "resolver" in PURL terminology first tries to match a request directly to a defined PURL. If the PURL exists the reply is a redirect to the last URL associated with it as specified by its maintainer. This can be another PURL, any http-URL, or in fact any URL. It's the job of the maintainer to guarantee that target URLs do in fact still exist.
Because PURLs are designed to be persistent it's not supported to delete them, but they can be disabled, e.g. http://purl.net/net was disabled. Otherwise PURLs work like redirects.
If the resolver gets no direct match for a given PURL it tries to match it right to left by truncating components separated by "/" against "partial redirects". This is a special kind of /x/y/z/any/thing PURL, where /x/y/z/ is defined and created as partial redirect. If its target URL is /a/b/c/d/ the /x/y/z/any/thing request is redirected to /a/b/c/d/any/thing.
For partial redirects the longest match wins, which is just the same as "right to left", so if say /x/y/z/any/ is defined with target /foo/, then the redirect would go to /foo/thing instead of /a/b/c/d/any/thing. The URL of a partial redirect does not necessarily end with a slash "/". Other examples for the query /x/y/z/any/thing and partial redirect /x/y/z/any/:
- Target /a/b/c/some results in a redirection to /a/b/c/something.
- Target /a/b/c?bar= results in a redirection to /a/b/c?bar=thing.
As shown partial redirects simply replace the longest known left hand side match by the target. Because direct matches are evaluated first it's possible to have ordinary PURLs "within" partial redirects, e.g. /x/y/z/any/but/this could be redirected to /a/b/c/elsewhere without affecting other /x/y/z/any/ queries matched by a partial redirect.
It's also possible to have different PURLs for /x/y/z and /x/y/z/ (note trailing slash), where the latter would be typically a partial redirect. For an example compare /net/scape and /net/scape/.
Popular http-servers silently add a "missing" trailing slash to URLs, or strip an extraneous trailing slash as needed, but the specification RFC 3986 allows them to refer to different resources.
Many Wikis support PURLs in the net domain by shorthands like [[purlnet:scape]] for purlnet:scape as shown above, because purlnet is defined in a Meta Interwiki map.
[edit] Notable redirects
This is an incomplete list of partial redirects in the net domain, readers are encouraged to add what they use. The various possible left hand sides like http://purl.net/net are represented by the Interwiki prefix purlnet:, and working right hand side exampes are shown.
- purlnet:abuse/SWEN, purlnet:abuse
- Google groups search limited to net abuse
- purlnet:cp/1252, purlnet:cp
- ICU info about codepage 1252
- purlnet:eisa/40, purlnet:eisa
- Encyclopedia of Integer Sequences by A-number
- purlnet:en2de/en.wikipedia.org/wiki/Persistent_Uniform_Resource_Locator
- Crude Google en to de translation of a given URL
- purlnet:en2fr/en.wikipedia.org/wiki/Persistent_Uniform_Resource_Locator
- Like en2de, all pairs for de/en/fr might "work"
- purlnet:msgid/4zCix009Cv2acya@bionic35.bionic.zer.de, purlnet:msgid
- Google groups archive access by Message-ID
- purlnet:rfc/4321, purlnet:rfc
- Abstract and keywords for RFC 4321
- purlnet:ucode/feff, purlnet:ucode
- Letter database Unicode point u+FEFF