Persistent Uniform Resource Locator
From Wikipedia, the free encyclopedia
A persistent uniform resource locator (PURL) is a Uniform Resource Locator (URL) (i.e. location-based Uniform Resource Identifier or URI) that does not directly describe the location of the resource to be retrieved but instead describes an intermediate (more persistent) location which, when retrieved, results in redirection (e.g. via a 302 HTTP status code) to the current location of the final resource.
PURLs are an interim measure — while Uniform Resource Names (URNs) are being mainstreamed — to solve the problem of transitory URIs in location-based URI schemes like HTTP. Persistence problems are caused by the practical impossibility of every user having their own domain name, and the inconvenience and money involved in re-registering domain names, that results in WWW authors putting their documents in rather arbitrary locations of questionable persistence (i.e. wherever they can get the WWW space). Existing official PURLs (on Purl.Org) will probably be mapped to a URN namespace at a later date.
Contents |
[edit] Principles of operation
The oldest PURL HTTP server can be reached as purl.oclc.org as well as purl.org, purl.net, and purl.com.
PURLs are organized into "domains" like directory paths, e.g. /net/scape is the "subdomain" scape of "domain" net, and has itself "subdomains" like about. These "domains" are unrelated to Internet domains; their purpose is to define one or more "maintainers". The maintainers can grant write access to ALL or other registered users, e.g. "domain" net is open for ALL registered users.
The PURL server, or "resolver" in PURL terminology, first tries to match a request directly to a defined PURL. If the PURL exists the reply is a redirect to the last URL associated with it as specified by its maintainer. This can be another PURL, any http-URL, or, in fact, any URL. It's the job of the maintainer to guarantee that target URLs do in fact still exist.
Because PURLs are designed to be persistent deleting them is not supported. They can be disabled, e.g. http://purl.net/net was disabled; otherwise PURLs work like redirects.
If the resolver gets no direct match for a given PURL it tries to match it right to left by truncating components separated by "/" against "partial redirects". This is a special kind of /x/y/z/any/thing PURL, where /x/y/z/ is defined and created as partial redirect. If its target URL is /a/b/c/d/ the /x/y/z/any/thing request is redirected to /a/b/c/d/any/thing.
For partial redirects the longest match wins, which is just the same as "right to left", so if, for example, y /x/y/z/any/ is defined with target /foo/, then the redirect would go to /foo/thing instead of /a/b/c/d/any/thing. The URL of a partial redirect does not necessarily end with a slash "/". Other examples for the query /x/y/z/any/thing and partial redirect /x/y/z/any/:
- Target /a/b/c/some results in a redirection to /a/b/c/something.
- Target /a/b/c?bar= results in a redirection to /a/b/c?bar=thing.
As shown partial redirects simply replace the longest known left hand side match by the target. Because direct matches are evaluated first it's possible to have ordinary PURLs "within" partial redirects, e.g. /x/y/z/any/but/this could be redirected to /a/b/c/elsewhere without affecting other /x/y/z/any/ queries matched by a partial redirect.
It's also possible to have different PURLs for /x/y/z and /x/y/z/ (note trailing slash), where the latter would be typically a partial redirect. For an example compare /net/scape and /net/scape/.
Popular http-servers silently add a "missing" trailing slash to URLs or strip an extraneous trailing slash as needed, although specification RFC 3986 allows them to refer to different resources.
Many Mediawiki Wikis support PURLs in the net domain by shorthands like [[purlnet:scape]] for purlnet:scape as shown above, because purlnet is defined in a Meta Interwiki map.
[edit] Notable redirects
This is an incomplete list of partial redirects in the net domain. The various possible left hand sides like http://purl.net/net are represented by the Interwiki prefix purlnet:, and working right hand side examples are shown.
- purlnet:abuse/SWEN, purlnet:abuse
- Google groups search limited to net abuse
- purlnet:eisa/40, purlnet:eisa
- Encyclopedia of Integer Sequences by A-number
- purlnet:en2de/en.wikipedia.org/wiki/Persistent_Uniform_Resource_Locator
- Crude Google en to de translation of a given URL
- purlnet:en2fr/en.wikipedia.org/wiki/Persistent_Uniform_Resource_Locator
- Like en2de, all pairs for de/en/fr might "work"
- purlnet:msgid/4zCix009Cv2acya@bionic35.bionic.zer.de, purlnet:msgid
- Google groups archive access by Message-ID
- purlnet:rfc/4321, purlnet:rfc
- Abstract and keywords for RFC 4321
- purlnet:ucode/feff, purlnet:ucode
- Letter database Unicode point u+FEFF