Resource Description Framework

From Wikipedia, the free encyclopedia

Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata model using XML but which has come to be used as a general method of modeling knowledge, through a variety of syntax formats (XML and non-XML).

The RDF metadata model is based upon the idea of making statements about resources in the form of subject-predicate-object expressions, called triples in RDF terminology. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, one way to represent the notion "The sky has the color blue" in RDF is as a triple of specially formatted strings: a subject denoting "the sky", a predicate denoting "has the color", and an object denoting "blue".

This mechanism for describing resources is a major component in what is proposed by the W3C's Semantic Web activity: an evolutionary stage of the World Wide Web in which automated software can store, exchange, and use machine-readable information distributed throughout the web, in turn enabling users to deal with the information with greater efficiency and certainty. RDF's simple data model and ability to model disparate, abstract concepts has also led to its increasing use in knowledge management applications unrelated to Semantic Web activity.

Contents

[edit] History

Work on RDF was initiated by Ramanathan V. Guha while at Apple Computer (as MCF) and continued, with contributions from Tim Bray, during his tenure at Netscape Communications Corporation.

The W3C published a specification for RDF's data model and XML syntax as a Recommendation in 1999. Work then began on a new version that was published as a set of related specifications in 2004. Unlike most other W3C Recommendations, the new specifications completely replaced the old, rather than being assigned a version number like "RDF 2.0". Consequently, many implementations based on the 1999 Recommendation have yet to be updated, and many newcomers to RDF are unaware that the older specifications even exist.

MIME media type application/rdf+xml was registered by RFC 3870. It recommends RDF documents to follow the new specifications.

[edit] Ontologies

A collection of RDF statements intrinsically represents a labeled, directed pseudo-graph. As such, an RDF-based data model is more naturally suited to certain kinds of knowledge representation than the relational model and other ontological models traditionally used in computing today. However, in practice, RDF data is often stored in relational database representations sometimes also called triple stores, and as RDFS and OWL demonstrates, additional ontology languages can be built upon RDF.

[edit] Shared ontologies

[edit] Resource identification

The subject of an RDF statement is a resource, possibly as named by a Uniform Resource Identifier (URI). Some resources are unnamed and are called blank nodes (bnodes) or anonymous nodes. They are not directly identifiable. The predicate is a resource as well, representing a relationship. The object is a resource or a Unicode string literal.

In Semantic Web applications, and in relatively popular applications of RDF like RSS and FOAF (Friend of a Friend), resources tend to be represented by URIs that intentionally denote actual, accessible data on the World Wide Web. But RDF, in general, is not limited to the description of Internet-based resources. In fact, the URI that names a resource does not have to be dereferenceable at all. For example, a URI that begins with "http:" and is used as the subject of an RDF statement does not necessarily have to represent a resource that is accessible via HTTP, nor does it need to represent a tangible, network-accessible resource — such a URI could denote the abstract notion of world peace, if desired.

Therefore, it is necessary for producers and consumers of RDF statements to be in agreement on the semantics of resource identifiers. Such agreement is not inherent to RDF itself, although there are some controlled vocabularies in common use, such as Dublin Core Metadata, which is partially mapped to a URI space for use in RDF.

[edit] Examples

[edit] Example 1: The postal abbreviation for New York

Certain concepts in RDF are taken from logic and linguistics, where subject-predicate and subject-predicate-object structures have meanings similar to, yet distinct from, the uses of those terms in RDF. This example demonstrates:

In the English language statement 'New York has the postal abbreviation of NY' , 'New York' would be the subject, 'has the postal abbreviation' the predicate and 'NY' the object.

Encoded as an RDF triple, the subject and predicate would have to be resources named by URIs. The object could be a resource or literal element. For example, in the N-Triples form of RDF, the statement might look like:

<urn:states:New%20York> <http://purl.org/dc/terms/alternative> "NY" .

In this example, "urn:states:New%20York" is the URI for a resource that denotes the U.S. state New York, "http://purl.org/dc/terms/alternative" is the URI for a predicate (whose human-readable definition can be found at [1]), and "NY" is a literal string. Note that the URIs chosen here are not standard, and don't need to be, as long as their meaning is known to whatever is reading them.

N-Triples is just one of several standard serialization formats for RDF. The triple above can also be equivalently represented in the standard RDF/XML format as:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        xmlns:terms="http://purl.org/dc/terms/">
        <rdf:Description rdf:about="urn:states:New%20York">
                <terms:alternative>NY</terms:alternative>
        </rdf:Description>
</rdf:RDF> 

However, because of the restrictions on the syntax of QNames (such as terms:alternative above), there are some RDF graphs that are not representable with RDF/XML.

[edit] Example 2: A Wikipedia article about Tony Benn

In a like manner, given that "http://en.wikipedia.org/wiki/Tony_Benn" identifies a particular resource (regardless of whether that URI could be traversed as a hyperlink, or whether the resource is actually the Wikipedia article about Tony Benn), to say that the title of this resource is "Tony Benn" and its publisher is "Wikipedia" would be two assertions that could be expressed as valid RDF statements. In the N-Triples form of RDF, these statements might look like the following:

<http://en.wikipedia.org/wiki/Tony_Benn> <http://purl.org/dc/elements/1.1/title> "Tony Benn" .
<http://en.wikipedia.org/wiki/Tony_Benn> <http://purl.org/dc/elements/1.1/publisher> "Wikipedia" .

And these statements might be expressed in RDF/XML as:

<rdf:RDF
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        xmlns:dc="http://purl.org/dc/elements/1.1/">
        <rdf:Description rdf:about="http://en.wikipedia.org/wiki/Tony_Benn">
                <dc:title>Tony Benn</dc:title>
                <dc:publisher>Wikipedia</dc:publisher>
        </rdf:Description>
</rdf:RDF>

Of, course, to an English-speaking person, the same information could be represented simply as:

"The title of this resource is 'Tony Benn'"

However, RDF puts the information in the formal way that a machine can understand.. The purpose of RDF is to provide an encoding and interpretation mechanism so that resources can be described in a way that particular software can understand it; in other words, so that software can access and use data that it otherwise couldn't.

Both versions of the statements above are wordy because one requirement for an RDF resource (as a subject or a predicate) is that it be unique. The subject resource must be unique in an attempt to pinpoint the exact resource being described. The predicate needs to be unique in order to reduce the chance that the idea of Title or Publisher will be ambiguous to software working with the description. If the software recognizes http://purl.org/dc/elements/1.1/title (a specific definition for the concept of a title established by the Dublin Core Metadata Initiative), it will also know that this title is different from a land title or an honorary title or just the letters t-i-t-l-e put together.

[edit] Statement reification and context

The body of knowledge modeled by a collection of statements may be subjected to reification, in which each statement (that is each triple subject-predicate-object altogether) is assigned a URI and treated as a resource about which additional statements can be made, as in "Jane says that John is the author of document X". Reification is sometimes important in order to deduce a level of confidence or degree of usefulness for each statement.

In a reified RDF database, each original statement, being a resource, itself, most likely has at least three additional statements made about it: one to assert that its subject is some resource, one to assert that its predicate is some resource, and one to assert that its object is some resource or literal. More statements about the original statement may also exist, depending on the application's needs.

Borrowing from concepts available in logic (and as illustrated in graphical notations such as conceptual graphs and topic maps), some RDF model implementations acknowledge that it is sometimes useful to group statements according to different criteria, called situations, contexts, or scopes, as discussed in articles by RDF specification co-editor Graham Klyne [2] [3]. For example, a statement can be associated with a context, named by a URI, in order to assert an "is true in" relationship. As another example, it is sometimes convenient to group statements by their source, which can be identified by a URI, such as the URI of a particular RDF/XML document. Then, when updates are made to the source, corresponding statements can be changed in the model, as well.

Implementation of scopes does not necessarily require fully reified statements. Some implementations allow a single scope identifier to be associated with a statement that has not been assigned a URI, itself. [4] [5]

In first-order logic, as facilitated by RDF without scopes, the only metalevel relation is negation, but the ability to generally state propositions about nested contexts allows RDF to comprise a metalanguage that can be used to define modal and higher-order logic.

[edit] Query and inference languages

Main article: RDF query language

Several query languages for RDF graphs have emerged. RDF query languages allow expressions to be written that can be evaluated against a collection of statements in order to produce, for example, a narrowed set of statements, resources, or object values, or to perform comparisons and operations on such items. RDF queries can be used by knowledge management applications as a basis for inference actions.

Modeled loosely after SQL, the query language SPARQL is emerging as the de-facto RDF query language. On the track towards status of W3C Recommendation, it was released as a Candidate Recommendation in April 2006, but is back to Working Draft status since October 2006, due to open issues.

Other notable RDF query and inference languages include:

  • RDQL, precursor to SPARQL, SQL-like
  • Versa, compact syntax (non–SQL-like), solely implemented in 4Suite (Python)
  • XUL has a template element in which to declare rules for matching data in RDF. XUL uses RDF extensively for databinding.

[edit] Applications

[edit] See also

[edit] External links

[edit] News and resources

[edit] Tutorials and documents

[edit] Applications

[edit] RDF files