Talk:Uniform Resource Identifier
From Wikipedia, the free encyclopedia
Contents |
[edit] URI vs URL
what's difference between URI and URL? -- Taku 09:14, Mar 11, 2004 (UTC)
- URLs are a subset of URIs, which are more general than just Internet resources. But you're right, it should probably be mentioned on the page. --Bth 09:28, 11 Mar 2004 (UTC)
- I removed the edit. An adequate discussion of URLs, with appropriate links, is already at the bottom of the article. For further info, see section 1.1.3 of the new version of RFC 2396 [1] (in development) - mjb 11:12, 12 Mar 2004 (UTC)
Whoever coined the distinctions between URL, URI, and URN needs to get a life. lysdexia 08:27, 24 Oct 2004 (UTC)
- Well, no one came up with the three acronyms all at once. My understanding is that early on, URLs proved to be too closely tied to addressing and retrieval according to specific network protocols (http, ftp, etc.); it was difficult ot use them to just name things (give them identifiers). So someone came up with URNs. They're really just a special URL scheme that has no location/retrieval semantics.
- Well, that still kind of left things in a confusing state, because it meant that sometimes a URL was a name and sometimes it was a locator, but really even when it was a locator you could use it as if it were a name. The more you use URLs, the more these kinds of things become important. So along came URI, as a sort of grand unification theory. URIs are more than just a way of dealing with URLs and URNs generically, though; they formally draw a line between the idea of merely identifying something and actually retrieving (or even suggesting that it's possible to retrieve) a representation of it.
- Making these distinctions allows the definition of "resource" (the thing being identified or located) to be much more flexible—a big help in the world of RDF and knowledge management applications. In other words, if someone were designing it from scratch today, it would've been URI all along, and you'd never know about URL or URN. For the most part, URL/URN are obsolete terms, but we're kinda stuck with them, in large part due to the resistance of people who apparently have a life. ;) - mjb 23:04, 25 Oct 2004 (UTC)
-
- Isn't "obsolete" really a little strong? To me that would imply a term more or less abandoned in general use, whereas a Google shows that occurences of "URL" vastly outnumber (by more than 13 to 1) those of "URI". Loganberry (Talk) 04:58, 30 July 2005 (UTC)
-
-
- I agree; obsolete is unnecessary.
-
-
-
-
- URL is not only obsolete, but it never existed in the first place. Please report to the Ministry of Love for correctional therapy.
-
-
-
-
- A reasonable observation; I've noted it below, in the 'URI/URL/URN popular semantics' thread, and I made a change to the article today, to tone down the 'obsolete' bit. It now says:
-
-
-
-
- The contemporary point of view among the working group that oversees URIs is that the terms URL and URN are context-dependent aspects of URIs, and rarely need to be distinguished.[1] In technical publications, especially standards produced by the IETF and the W3C, the term URL has long been deprecated, as it is rarely necessary to distinguish between URLs and URIs. However, in nontechnical contexts and in software for the World Wide Web, the term URL remains ubiquitous. Additionally, the term web address, which has no formal definition, is often used in nontechnical publications as a synonym for URL or URI, although it generally refers only to 'http' and 'https' URIs. —mjb 22:19, 2 August 2006 (UTC)
-
-
I was pretty confused by the section on URI and its relation to URL and URN. Perhaps this section could be rewritten? Today I ran into the definitions given in ANSI/NISO Z39.29-2005 http://www.niso.org/standards/resources/Z39-29-2005.pdf , which were useful to me. They say:
- A URN is "name of an internet resource that has institutional persistence, that is, its exact location may change from time to time, but some agency will be able to find it. A URN is a form of UIR. It looks like 'URN:[agency or directory]://[term]'. The user need only know the name of the resource "[term]", not its location on the internet.
- A URI is "The generic set of all names and addresses which are short strings that refer to intellectual objects (typically on the Internet). A URI typical describes 1) the mechanism used to access the resourcec, 2) the specific computer that the resource is housed in, and 3) the specific name of the reource (a file name) on the computer. The most common form of URI is the Web page address or URL. Character strings that identify File Transfer Protocol (FTP) addresses and e-mail addresses are also URIs." Jodi.A.Schneider 17:55, 10 September 2006 (UTC)Jodi A. Schneider
[edit] URI vs URL part deux
My understanding is that the term "URI" was the one that was deprecated, given the widespread acceptance of the term "URL". Was I wrong? Do we have a reference for the contention that "URL" is considered obsolete? --P3d0 03:02, 13 December 2005 (UTC)
- References and clarifications added today. —mjb 22:19, 2 August 2006 (UTC)
[edit] Fragments and RDF
RFC 3968 has changed the generic URI syntax to allow fragment identifiers not just in URI references, but on all URIs except those conforming to "absolute-URI". I think(?) this was done in part to deal with resource identification in RDF. I'd like to research this further and mention it in the article. —mjb 00:42, 3 Feb 2005 (UTC)
[edit] Introduction revision proposal
Defining correctly URI and Resource is tricky, but the current article introduction seems confusing on at least two points:
- It defines URI as a "Internet protocol element". This seems quite restrictive, since some kinds of URI (URN) are not linked to Internet protocol. The first use of a URI is (as its name says) to identify a resource. It's that basis that declarative semantics of RDF, RDFS, OWL ... rely upon. Some particular URIs (URLs) are also used as locators, which means they have functional semantics in an internet protocol (http, mail, ftp, ...). The distinction between declarative and functional semantics of URIs is important. So my suggestion is to stick to the last definition as per RFC 3986 (p.4), which makes clearly this distinction.
"A URI is an identifier consisting of a sequence of characters [...] It enables uniform identification of resources via a separately defined extensible set of naming schemes. How that identification is accomplished, assigned, or enabled is delegated to each scheme specification."
- It links resource to Resource (computer science). It seems to me that the meaning of Resource should be here as per its RFC 3986 definition, ibid., p.4, and applicable to the "R" in URI, URL, URN and RDF as well. So the link should be rather to Resource (Web). Agreed, this article is currently a stub. I've on my agenda to expand it - tricky subject - and will be back to this discussion when it's done. -- universimmedia 07:51, 20 June 2006 (UTC)
- It sounds like you're under the impression that protocol means network communication standard. That's the most common kind of protocol, but the term actually has a broader definition, and it is not inaccurate or restrictive to say that a URI is an Internet protocol element. However, I would agree that this isn't obvious to most readers. I also agree with the need to fork the 'resource' articles. I'm glad someone is working on it. —mjb 01:53, 1 August 2006 (UTC)
- It sounds like you've a correct interpretation of what I understand by protocol. But do you agree with the link in the introduction to protocol ? Or maybe this article should undergo revision also?. As for resource I would be happy not to work alone on it. Your suggestions are welcome! universimmedia 09:16, 1 August 2006 (UTC)
- Yes, I agree with the link to protocol (computing) and the rest of your edits from June 22. I don't know if I have time to work on the resource (web) article just yet. You seem to have a grasp of the main issues. Good luck!
:)
—mjb 22:19, 2 August 2006 (UTC)
- Yes, I agree with the link to protocol (computing) and the rest of your edits from June 22. I don't know if I have time to work on the resource (web) article just yet. You seem to have a grasp of the main issues. Good luck!
- It sounds like you've a correct interpretation of what I understand by protocol. But do you agree with the link in the introduction to protocol ? Or maybe this article should undergo revision also?. As for resource I would be happy not to work alone on it. Your suggestions are welcome! universimmedia 09:16, 1 August 2006 (UTC)
[edit] Examples of URI, URL, URN
[edit] Semi-private discussion
I reverted recent edits by Krauss in which he added examples intended to illustrate the differences between URIs, URLs, and URNs. I am not opposed to offering such examples, but what was written was incorrect or misleading. 'www.wikipedia.org', for example, is a URI reference, but not a URI. It also cannot be used as a URN (it would have to begin with 'urn:foo:' where foo is a URN scheme name). A web browser might allow it to be input as if it were a URL, and then do cleanup on it or just make assumptions about what was intended, but the character string itself is not a URL. I also reverted speculation that such interfaces are the reason why people confuse URL, URI, etc.; it's plausible, perhaps even probable, but unverifiable. —mjb 07:09, 28 July 2006 (UTC)
Ok, you destroy all may text — my language is portuguese from Brazil (sorry my english errors), to write english was dificult and time-consuming for me! But you very fast (and my text can by reloaded), I accept your sugestion...
- The text on "Relationship to URL and URN" section is incomplete and not didactic: ask people and your friends if they understand the URI/URL/URN difference!
- "speculation about reason for confusion": ask google, not only my speculation... but ok... the point is "what is true??" you have the true? ... I think on Wiki the true emerge from a "dynamic convergence process"... Consensus and convergence are the true. (and destroying texts you may cancel the process).
- incorrect examples: ok sorry... in other articles collaborators correct the erros, if the idea was good, it is preserved ("understanding mistakes by examples" was my sugestion, you agree the idea?)
- "discuss examples on talk": thanks, let discuss.
Krauss 29 July 2006 (UTC).
You've got two things going here:
- a push to add more examples to illustrate the relationship between URI, URL, URN; and
- a push to rewrite a couple of paragraphs that explain that relationship.
I don't see a strong need for more examples, but I am open to it, if it will really help. But what you haven't really done here is explain why you're seeking to rewrite the prose. What is wrong with the text that is currently in the article? What's missing from it? What crucial aspect did it fail to address?
See, I think it very clearly and succinctly explains the relationship and concepts, and nothing that you're trying to add or change, thus far, does anything to improve upon it. For example, you said something like "URNs are technical; URLs are not" (I'm paraphrasing). To me, all URIs are equally 'technical', but I think I do understand what you were trying to say: URNs are only found in technical contexts, whereas URLs are found in both technical and nontechnical publications, from common HTML code for web sites to billboards and magazine ads. It's a fair observation, but it's not a point that's crucial to establish an understanding of what a URL is and what a URN is. It's trivia.
The difference between URL and URN is trivial. Both function as resource IDs, and both can be dereferenced ('resolved') to obtain a representation of the resource they denote. The only difference is that the URL's scheme implicitly suggests a possible dereference mechanism that is (or should be) dictated by the spec that governs the scheme. The protocol suggested by a URL for dereferencing is just a suggestion. Nothing is preventing an application from reading a URI, be it a URL or URN, and associating it with a representation of the identified resource from its own cache. No network activity need take place. So an http URL and a urn:uuid URN are fundamentally the same; they just identify a resource. The http URL just contains some information that suggests a possible dereference procedure. Once you understand this, you should see why the IETF views the distinction between URLs and URNs as irrelevant. We should not be making the distinction into more than it is.
- ok, let us use the KISS principle to write the article, not the W3C prolixity —krauss 1 August 2006.
The only reason people avoid URNs in nontechnical contexts is just because most of the time, the resources that people most often want to make reference to are things that must be obtained 'live' from a network, via specific protocols like HTTP. When that's the goal, then a URL is a natural choice, because it provides the protocol-specific details for representation retrieval (and, often, server interaction) within itself, and also because people who mention URLs feel safe making assumptions about the capabilities of URL resolvers that are built into web applications and operating systems, and about people's connectedness to Internet-based distributed domain name services. Using a URN would require a similar type of global resolver service to help with the dereferencing process, and no such service exists. This is not an intrinsic difference between URLs and URNs; it's just a circumstance fueled, in part, by momentum, misunderstanding, and bureaucracy. —mjb 01:37, 31 July 2006 (UTC)
- Ok, we can review our consensus (see added sec.). —krauss 1 August 2006.
Again, I've reverted your changes (except for a link) because the information you are adding to the article is redundant or is giving people advice, and because you are proceeding with changes that have not been agreed to here. Why are you so impatient? Please read what I wrote above, and answer the questions I asked: what crucial info is missing? —mjb 00:17, 1 August 2006 (UTC)
- Sorry. Ok, we need time, and we add step by step, I agree... it was also a sandBox, I needed to see (and show to you) what I doing here. About you wrote above, I am reading good a rfc3986 HTML text... but you are the technical expert, I am doing only a overview. I am a "vulgar" wiki reader and a collaborator worried about didatic (understandability and simplicity) for "vulgar readers". —krauss 1 August 2006.
[edit] Proposed examples
To readers "understand by examples", we need good examples on the article.
Note 1: there are test kits: W3C-2004 kit, ... (more kits?)
Note 2: I sugest to remove the URI reference citatoins. The URI-ref need the relative-ref concept. We are diff only URI/URN/URL, its not didatic for the article mix then with URI-ref (neither productive for us). "URI-reference = URI / relative-ref
" rfc3986 sec 4.1. On set terms: "the set of all strings that are valid URI-refs are the union of absolute URI set and the relative URI set".
Examples | could be a (see note) | is not a |
http://www.wikipedia.org
|
URL, URI (or URI reference) | URN |
www.wikipedia.org
|
URL-like string acceptable by some web browser user interfaces as if it had been prefaced by 'http://', as in the preceding example | URL, URI (or URI reference), URN |
urn:www.wikipedia.org
|
URN, URI (or URI reference) | URL |
http://www.example.org/book0395363411.htm#Sec1
|
URI reference | URI, URL, URN |
We can put a lot examples here and resume/select for article.
- I completely redid your examples for accuracy, and changed "is"/"is not" to "could be"/"could not be", because a string's URI/URL/URN-ness is not just a matter of syntax, but also of designation/role. However, see below; I don't think this chart is really necessary. —mjb 07:31, 29 July 2006 (UTC)
- The "not only sintax" condiction is very important, only now I am reading the RFC 2141 ... ops, it says "intended to serve as persistent, location-independent", persistence need to remember on the article. Perhaps the example table need a column to the context, or we need a second table to put "not-browser" using contexts. Also RFC define
<URN> ::= "urn:" <NID> ":" <NSS>
(obligation for "urn:" on URN sintax, see also RFC's Appendix A) —Krauss 29 July 2006 (UTC) - Other point: the examples selected to article need show that the "URL refers to the subset of URI" (RFC-2396, 1.2).
- I hope you understand what I mean about syntax and designation: a random string that matches the syntax of a URI is not necessarily a URI; to be a URI it must also have the role of being an identifier (of a resource).
- You mentioned RFC 2396, which is obsolete. STD 66, aka RFC 3986, is current. Please don't use RFC 2396 as a reference. I only mentioned RFC 2396 in the article where it was necessary to explain a difference between the two versions of the spec.
- Regarding persistence of URNs, which is another issue altogether, STD 66 says "This specification does not require that a URI persists in identifying the same resource over time, though that is a common goal of all URI schemes." So, the 'urn' scheme is not special in this regard, and persistence is a goal, not a requirement. This is basically an acknowledgment that the association between a resource and its ID is under the control of whoever is in charge of the resource, references to it, or access to it, and thus may thus change at any time. Persistence is essentially application-level, not syntax-level, so it's beyond the scope of the spec's authority.
- I really suggest you read everything in STD 66 very carefully. For example, did you notice this?
- An individual scheme does not have to be classified as being just one of "name" or "locator". Instances of URIs from any given scheme may have the characteristics of names or locators or both, often depending on the persistence and care in the assignment of identifiers by the naming authority, rather than on any quality of the scheme. Future specifications and related documentation should use the general term "URI" rather than the more restrictive terms "URL" and "URN". (reference: RFC 3305)
- RFC 3305 may be of interest to you, as well; it elaborates on that last sentence and provides richer explanations of the relationship between the terms.
- Lastly, in the example table, I don't want to have a separate column for 'in browsers' vs 'strict'. I think you're allowing your observations of web browser functionality to pollute your concept of what URIs are about. You must take care not to insinuate that web browsing is their purpose. Their purpose is, simply, resource identification – although obviously, the principal component of the WWW, hypertext/HTML, demands that a robust system of resource identification and dereferencing be established (SGML was rather nonspecific about it), so the development of the URI syntax and concepts received much momentum during HTML's formative years. It's just that you mustn't lead the reader to believe that what their browser does (e.g., accepting a malformed URL in its 'address bar'/'URL bar' widget) has something to do with what a URI is. URIs comprise an Internet protocol for resource identification. The World Wide Web is one of many uses of the Internet, and browsers are one of the tools that humans use to interface with the WWW. I mean, the WWW and its browsers are not the same thing as the Internet and its general protocols (URIs included), and we must try to maintain that separation. —mjb 01:05, 1 August 2006 (UTC)
- The "not only sintax" condiction is very important, only now I am reading the RFC 2141 ... ops, it says "intended to serve as persistent, location-independent", persistence need to remember on the article. Perhaps the example table need a column to the context, or we need a second table to put "not-browser" using contexts. Also RFC define
-
-
- General: I cited Identifying, locating, and naming things on the Web (by D.Connolly) on the External links because Connolly have another point of view... and I added 2 secs. here (Talk) because I think we need some basic consensus to continue the discussion. About the objective of this article, you think is to be didatic (understandability and simplicity) or to be techinical (computer science readers)? —krauss 1 August 2006.
-
-
-
-
- I like having those links. Ultimately, what we say about URIs must agree with the specs, but the writings of people who were/are involved in the development of the specs is definitely useful and relevant, especially when they're explaining esoteric topics for a more general audience. Thanks.
- Regarding the objective of the article: it's both. To some extent, simplicity must be sacrificed for correctness and accuracy – if a topic must be included, but can't be explained without getting "technical", then we have to bring the reader up to the technical level through examples and definitions. However, we have to avoid going overboard with holding their hand; this should not be an exhaustive tutorial, nor an in-depth study of every nuance of the specs and the ways in which URIs are used in the world.
- If you visit the mathematics articles, you will find many highly technical explanations that make no sense to the average high school graduate, often written (inappropriately for an encyclopedia, in my opinion) in a hand-waving, reader-addressing lecture style. But how do you explain post-calculus to someone who decided they were done with math after they got a C in algebra? You have to draw a line somewhere and say "if this is too technical, too bad".
- That said, if you would say what sentences in the article you feel are too technical, we could figure out ways to make them less jarring, either by changing them, or changing the text leading up to them. —mjb 21:17, 2 August 2006 (UTC)
-
-
[edit] Speculation?
These paragraphs can be used on the article, or are Speculation? |
draft vers. 1
A Uniform Resource Locator (URL) is a subset of the URI popular and usual protocols (with scheme names like Uniform Resource Name (URN) is for more technical use, and often times people use the terms — URN and URL, or, URN and URI — interchangably, which is not entirely correct. A possible source of mistakes is because web browsers allow for default documents and do not require a scheme to retrieve a document. |
draft vers. 1 Discussion
I think it is ok. Krauss 29 July 2006 (UTC). We can only comment "on web browsers it is more difficult to see the differences, it allow for default documents and do not require a scheme to retrieve a document".
|
draft vers. 2
A Uniform Resource Locator (URL) is a subset of the URI popular protocols (with scheme names like In the use of the term Uniform Resource Name (URN), some caution may be required to interchange with URL or URI, because web browsers allow for default documents and do not require a scheme to retrieve a document. The term URN refers to the subset of URI that are required to remain globally unique and persistent (even when the resource ceases to exist or becomes unavailable). |
Vote and/or change the text.
I'm for vers.2 but with a slight modification :
- A Uniform Resource Locator (URL) is a subset of the URI popular protocols ..."
Seems to me a misleading shortcut for :
- A Uniform Resource Locator (URL) is a subset of URI associated with the popular protocols ..."
universimmedia 13:47, 31 July 2006 (UTC)
-
- ok... Universimmedia was a relevant colaborator, Mjb need more oks to add this two paragraphs (without revert then)? -- Krauss
Start here (this is what is in the article):
- A URI can be classified as a locator or a name or both. A Uniform Resource Locator (URL) is a URI that, in addition to identifying a resource, provides means of acting upon or obtaining a representation of the resource by describing its primary access mechanism or network "location". For example, the URL http://www.wikipedia.org/ is a URI that identifies a resource (Wikipedia's home page) and implies that a representation of that resource (such as the home page's current HTML code, as encoded characters) is obtainable via HTTP from a network host named www.wikipedia.org. A Uniform Resource Name (URN) is a URI that identifies a resource by name in a particular namespace. A URN can be used to talk about a resource without implying its location or how to dereference it. For example, the URN urn:isbn:0-395-36341-1 is a URI that, like an International Standard Book Number (ISBN), allows one to talk about a book, but doesn't suggest where and how to obtain an actual copy of it.
- The contemporary point of view among the working group that oversees URIs is that the terms URL and URN are context-dependent aspects of URIs, and rarely need to be distinguished.[1] In technical publications, especially standards produced by the IETF and the W3C, the term URL has long been deprecated, as it is rarely necessary to distinguish between URLs and URIs. However, in nontechnical contexts and in software for the World Wide Web, the term URL remains ubiquitous. Additionally, the term web address, which has no formal definition, is often used in nontechnical publications as a synonym for URL or URI, although it generally refers only to 'http' and 'https' URIs.
You have not said what's wrong with this, and I think it's pretty good, but let's go ahead and analyze your replacement anyway:
- A Uniform Resource Locator (URL) is a subset of the URI popular protocols (with scheme names like
http
,ftp
ormailto
).- Wrong term: protocols (see URI scheme, which needs a lot of work).
- Unfamiliar term: scheme (not introduced until the next section, on syntax, and then there's a whole article devoted to it).
- A URI can be classified as a locator or a name or both was an important, overarching concept that explains why we're talking about URLs (locators) and URNs (names) in the sentences that follow. Why was it removed?
- Therefore all URLs are URIs.
- Was this not implicit in the original phrase A Uniform Resource Locator (URL) is a URI that…?
- The term URL is technically deprecated, but is more widespread and historically important. For popular usage, to design
http
sites and web pages, the term web address can be used to replace the term URL; in other all usages, prefer the term URI.- In response to concerns about the article stating that URL is "obsolete", I think I've addressed this in the 2nd paragraph now, with the text in nontechnical contexts and in software for the World Wide Web, the term URL remains ubiquitous. Additionally, the term web address, which has no formal definition, is often used in nontechnical publications as a synonym for URL or URI, although it generally refers only to 'http' and 'https' URIs. (note this phrasing is careful to avoid introducing the word 'scheme' prematurely).
- In the use of the term Uniform Resource Name (URN), some caution may be required to interchange with URL or URI, because web browsers allow for default documents and do not
require a scheme to retrieve a document.
-
- Here you have begun using the term URN before you have defined it.
- You are also giving advice to the reader, which is not what we are supposed to do in an encyclopedia (at least, not directly).
- The sentence makes no sense. Why would it occur to anyone to "interchange URN with URL or URI", and what does that have to do with web browser interfaces and default documents?
- The term URN refers to the subset of URI that are required to remain globally unique and persistent (even when the resource ceases to exist or becomes unavailable).
- I've addressed the issue of persistence elsewhere in this talk page. It is a red herring and should not be mentioned, or should be very heavily qualified.
Also, in my version, the 1st paragraph defines the terms URL and URN and relates them to URI, and provides examples to make it very clear for the average reader who we can assume has a vague familiarity with the WWW. The 2nd paragraph explains how the terms are used, where they're deprecated, etc. In your version, the 1st paragraph is devoted to defining and describing the usage of URL, and the 2nd paragraph is devoted to defining URNs and advising the reader to (I think) try not to mix them up(?). There is no information in your version that isn't in mine, and mine I think is much more precise and better organized. So, I don't like your version at all. —mjb 22:50, 2 August 2006 (UTC)
[edit] URI/URL/URN popular semantics
The term URL have about 1,190,000,000 occurences (Google "web URL"), the term URI about 83,200,000 (Google "web URI"). The term URL not have other significant conotations, the term URI have (a India region, a Italy's city, etc. indicating the number is about less).
We can do another experiments on Google, Altavista and controlled text corpura. In all then the term URI ocurrs in only a ~5% of the total URL ocurrences. If we read a modern dictionary, of any language, english, portuguese, spanish, etc. a lot of them have the URL term, but not URI (the dictionary edictors do similar experiments to decide what is relevant).
URL term was born on a techinical context, but now is a word wide "cultural embedded" term. Now it is independent term, it is not from a Technical terminology (RFCs), but a universal, from languages term.
On languages the terms are speak, read and write by all people, and they "decide" (statistically and using) what terms they want to use. Experts or "W3C-technical people" not decide for the "vulgar people".
"Vulgar peole" CORRECTLY understand the terms URL and web address — we here on wiki not will change their understanding, we need also understand them. They read (and will read) this article for undertand the terms URI and URN, not for "URI-doctrination" or for technical details about the terms and your semantics.
"Vulgar peole" are the public (or a very significant part) of this article.
-- Krauss 1 August 2006.
It's ironic: "vulgar" in English has two meanings. When most (common) people hear/see the word "vulgar", they think it means "obscene". Only linguists (people involved in the technical study of language) use "vulgar" to mean "of the common people". :)
I think we agree that the article should mention that URL and web address are very common terms, and should also explain how the terms are related to URI. I felt that was adequately addressed by this paragraph:
- The contemporary point of view among the working group that oversees URIs is that the terms URL and URN are context-dependent aspects of URI and rarely need to be distinguished. Furthermore, the term URL is increasingly becoming obsolete, as it is rarely necessary to differentiate between URLs and URIs, in general. For popular URL schemes, the term web address is sometimes used instead of URL.
The first sentence is easily confirmed by reading RFC 3305, so I think it should stay.
The second sentence is, in hindsight, a bit of an overstatement. I concede on this point. We should work on it. It's true that it's rarely necessary to differentiate between URLs and URIs, but now I'd say that's the reason URL is not becoming obsolete! (at least in nontechnical publications) – people are not going to start using the broader term if they have no incentive. See my outline below, where I mentioned this point in more detail. I am not sure how to best phrase it for the article; I doubt people will really understand what 'dereference mechanism' means. :/
The third sentence is correct, but could be better qualified. When/where is the term used, and for which schemes, exactly? I mentioned this in the outline below, as well. —mjb 03:55, 2 August 2006 (UTC)
I've updated the 2nd and 3rd sentences in the article today. —mjb 22:19, 2 August 2006 (UTC)
[edit] URL isn't "obsolete"
Mjb please see Loganberry and others talk (and URL talk): on the "vulgo semantic" URL isn't obsolete (!). Wikipedia need show this "other side" (not strictly technical) of the URL semantic.
- I did not say "URL" is obsolete, did I? —mjb 20:30, 1 August 2006 (UTC)
- It is my interpretation reading this talk, and analyzing your positions and contribuitions. -- Krauss
- Hmm. Elsewhere in this Talk page, I used the word 'obsolete' when I scolded you for using RFC 2396 as a reference, because that's an outdated version of the URI syntax spec. It has nothing to do with 'URL' though. But it's true, I did say "for the most part, URL/URN are obsolete terms" in Oct 2004, and I did put in the article that the term URL was obsolete, though this was intended to indicate the position of the URI working group; it was not an observation of trends in publishing. I've updated the contentious sentence in the article to address this issue; hopefully it is satisfactory now. —mjb 22:19, 2 August 2006 (UTC)
- It is my interpretation reading this talk, and analyzing your positions and contribuitions. -- Krauss
[edit] About URL/URI/URN central (technical) concepts, we have consensus?
- URI concepts:
- The set of "all URI valid strings" is a union of URN set and URL set.
- It have a technical definition more general, appropriate and accurate than URL.
- URN
- Persistence, not-availability
- URL
- Availability
-- Krauss 1 August 2006.
- URI concepts:
- The set of "all valid URI strings" is, essentially, a union of URN set and URL set.
- URI is a more general and appropriate concept than URL for the purpose of resource identification.
- URN
PersistenceNonavailability
- URL
Availability
Please see the comments I made earlier about persistence. URNs are not special in this regard. As compared to URLs, URNs have the intent of being more persistent (by virtue of not being associated with a specific dereference mechanism), but there's nothing inherently persistent about them. And as stated in STD 66, persistence is a goal, but not requirement, of all URIs, regardless of scheme.
'Availability' is also misleading. You can't assume anything about the availability of a resource just based on whether it's a URN or URL. Availability is not a feature of URLs. You're confusing features of widely implemented dereference mechanisms (resolvers) with features of URI schemes.
Try the version below. This outline is, I believe, is everything you need to know in order to understand the core notion of what a URI is, and the relationship between a URI, URN, and URL. —mjb 03:31, 2 August 2006 (UTC)
- URI concepts:
- A URI is a resource ID. The definition of 'resource' is a separate topic.
- Agreed, but clarifying what a resource is can help to clarify how it can be identified. Such a definition, and history of the concept belongs to the Resource (Web) article. I wish people participating in this debate here could have a look at what I've written there so far, so that we come to a consensus of what belongs to here, and what belongs to there. So far there is quite a bit of overlap, but seems to me not too much contradiction with what mjb proposes here. universimmedia 07:34, 2 August 2006 (UTC)
- A URI is a character string that must conform to a certain general syntax (defined in STD 66), which may be further restricted by the syntax of a particular URI scheme (e.g., the 'mailto' scheme requires that the URI look like 'mailto:user@host').
- A URI can be dereferenced via any means available to its processor, regardless of scheme. Therefore, all URIs can be treated as resource names, regardless of scheme.
- Many URI schemes require that the URI contain information that enables the potential use of a particular dereference mechanism. URIs conforming to these schemes are called URLs, where the L stands for locator, meaning the URI can be treated not only as a name but also, potentially, as an address. For example, an 'http' URL contains the info needed in order to obtain a representation of the denoted resource via the HTTP protocol. A URL does not imply resource availability, nor does it require the use of a particular derference mechanism.
- The class of URIs conforming to the 'urn' scheme, and (historically) any other schemes that don't imply and enable a particular dereference mechanism, are called URNs. The N stands for name, meaning that there is no information in the URI that allows it to be treated as an address; it can only be treated as a name. URNs are especially useful for denoting resources that aren't network-bound.
- URI is a more general and appropriate concept than URL for the purpose of resource identification.
- The set of "all valid URI strings" is, essentially, a union of URN set and URL set. However, these sets are no longer formally defined; the consolidated URI syntax has replaced separate URN and URL syntax definitions for many years, now.
- URNs, by virtue of not being tied to a particular dereference mechanism, are often thought of as being more "persistent" than URLs. This is akin to saying that a person's name is more reliable than their email address as a means of denoting that person over time. There is, however, no absolute persistence inherent to any URI, although some degree of persistence is required for any URI to be meaningful, so this is a goal of all URI schemes.
- The term URL has had a great longevity and ubiquity in mass-media publications and software for the World Wide Web, where there is a need to refer to resources that must be accessed, on demand, via common network protocols, and where URN registries and dereferencing mechanisms are not widely or consistently defined or implemented. Since the term URN is relatively rare in nontechnical contexts, there is little incentive to favor the term URI over URL in general usage. *(see note below)
- There is a subset of URLs commonly called web addresses. These are http and https URLs, mainly. Web address is not a formally defined term.
- A URI is a resource ID. The definition of 'resource' is a separate topic.
[edit] URI reference diagram ideas
Krauss, I welcome the addition of useful diagrams, but your URI reference diagram and its caption are wrong, (and the diagram has a horrible typo, "absute").
Look, a URI reference is not a URI, so your sets analogy doesn't work. A URI reference might look like a URI, but its role is to denote/refer to a resource indirectly, by representing/denoting/referring to a URI.
There is also no such thing as a "relative URI". Only a "relative URI reference". And all URIs are absolute, so "absolute URI" is redundant.
Understand that a URI reference is a reference to a URI, similar to the way a URI is a reference to a resource. You might think of the URI reference as being shorthand or code for a URI.
Types of URI references:
- absolute (identical to a URI)
- relative (a portion of a URI)
I think the relationship would be best illustrated with a diagram that looks sort of like this (feel free to make it pretty):
resource #1 (e.g., a document) --(may contain)--> URI reference --(denotes/refers to)--> URI --. ^ | | | | (identifies/denotes/refers to) (identifies/denotes/refers to) (is relative to) | | | \|/ ("base") URI <-------------------------------' resource #2
Notice how a document has a URI (its URI, also called its "base URI" when dealing with relative URI references) that identifies it, but it does not contain a URI that refers to another resource. Rather, it only contains a URI reference. There are two levels of indirection between the two resources: Resource #1 refers to resource #2 by way of a URI reference which refers to a URI, which in turn refers to resource #2.
A couple of additional notes: When a URI reference is 'absolute', it is still technically 'relative to' a base URI, even though the base URI does not factor into the resolution. Also, when resource #1 and resource #2 have the same URI, then the URI reference in resource #1 is a "same-document" reference, which means that if it is being dereferenced, then no action should be taken. So, for example, if an HTML document links to itself, following the link shouldn't result in fetching a new copy of the document.
—mjb 21:17, 2 August 2006 (UTC)
- Frankly, I don't think that diagrams are all that useful, since they:
- visualize extremly simple relations: a much more informative illustration would be a version of the diagram by mjb above;
- are inaccurate, as explained by mjb, and for explicitly depicting URL and URN sets as disjoint;
- make it unclear whether the superset in both diagrams has elements that are not in subsets (e.g. URIs that are not URLs nor URNs);
- emphesize the concepts of URLs and URNs, which are correctly described in the text as marginal.
- Also, I believe there is some confusion in the article about the position of the fragment part. The current standard (STD 66) states that the fragment is an integral part of the URI (see URI scheme#Generic syntax). This was not so in the previous RFCs: the fragment was a part of the URI reference, but not the URI itself. I think the article should consistently reflect the current norm, while mentioning the previous definitions when introducing the terms, and in more detail in the history section.
- --Hrvoje Šimić 00:16, 4 August 2006 (UTC)