RDFLib

From Wikipedia, the free encyclopedia

RDFLib
Developer: Daniel Krech
Latest release: 2.3.1 / February 27, 2006
OS: Cross-platform
Use: Library
License: as-is
Website: http://rdflib.net/

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information. The library contains an RDF/XML parser/serializer that conforms to the RDF/XML Syntax Specification (Revised) . The library also contains both in-memory and persistent Graph backends. It is being developed by a number of contributors and was created by Daniel Krech who continues to maintain it.

Contents

[edit] History and status

[edit] Overview

[edit] RDFLib and Python Idioms

RDFLib's use of various Python idioms makes them an appropriate way to introduce it to a Python programmer who hasn't used it before.

RDFLib Graphs redefine certain built-in Python methods in order to behave in a predictable way. RDFLib graphs emulate container types and are best thought of as a set of 3-item triples:

   [(subject,predicate,object),(subject1,predicate1,object1),... (subjectN,predicateN,objectN)]

RDFLib graphs behave inherently like lists, but they are Sets so they don't override __getitem__,__setitem__, and __delitem__.

[edit] RDF Graph Terms

The following RDFLib classes (listed below) model RDF terms in a graph and inherit off a common Identifier class, which extends Python unicode. Instances of these are nodes in an RDF graph.

[edit] Namespace Utilities

RDFLib provides mechanisms for managing Namespaces. In particular, there is a Namespace class which takes (as its only argument) the Base URI of the namespace. Fully qualified URIs in the namespace can be constructed by attribute / dictionary access on Namespace instances:

   >>> from rdflib import Namespace
   >>> fuxi = Namespace('http://metacognition.info/ontologies/FuXi.n3#')
   >>> fuxi.ruleBase
   u'http://metacognition.info/ontologies/FuXi.n3#ruleBase'
   >>> fuxi['ruleBase']
   u'http://metacognition.info/ontologies/FuXi.n3#ruleBase'

[edit] Graphs as Iterators

RDFLib graphs also overide __iter__ in order to support iteration over the contained triples:

   for subject,predicate,obj_ in someGraph:
      assert (subject,predicate,obj_) in someGraph, "Iterator / Container Protocols are Broken!!"

[edit] Set Operations on RDFLib Graphs

__iadd__ and __isub__ are overidden to support adding and subtracting Graphs to/from each other (in place):

  • G1 += G1
  • G2 -= G2

[edit] Basic Triple Matching

RDFLib graphs support basic triple pattern matching with a triples((subject,predicate,object)) function. This function is a generator of triples that match the pattern given by the arguments. The arguments of these are RDF terms that restrict the triples that are returned. Terms that are None are treated as a wildcard.

[edit] RDF Convenience APIs (RDF Collections / Containers)

[edit] Managing Triples

[edit] Adding Triples

Triples can be added by either parsing them with the parse(source,publicID=None, format="xml") function. The first argument can be a source of many kinds, but the most common is the serialization (in various formats: RDF/XML, Notation 3, NTriples of an RDF graph as a string. The format parameter is one of n3, xml, or ntriples. publicID is the name of the graph into which the RDF serialization will be parsed.

Triples can also be added with the add function: add((subject, predicate, object)).

[edit] Removing Triples

Similarly, triples can be removed by a call to remove: remove((subject, predicate, object))

[edit] RDF Literal Support

RDFLib 'Literal's essentially behave like unicode characters with an XML Schema datatype or language attribute. The class provides a mechanism to both convert Python literals (and their built-ins such as time/date/datetime) into equivalent RDF Literals and (conversely) convert Literals to their Python equivalent. There is some support of considering datatypes in comparing Literal instances, implemented as an overide to __eq__. This mapping to and from Python literals is achieved with the following dictionaries:

   PythonToXSD = {
       basestring : (None,None),
       float      : (None,XSD_NS+u'float'),
       int        : (None,XSD_NS+u'int'),
       long       : (None,XSD_NS+u'long'),    
       bool       : (None,XSD_NS+u'boolean'),
       date       : (lambda i:i.isoformat(),XSD_NS+u'date'),
       time       : (lambda i:i.isoformat(),XSD_NS+u'time'),
       datetime   : (lambda i:i.isoformat(),XSD_NS+u'dateTime'),
   }

Maps Python instances to WXS datatyped Literals

   XSDToPython = {  
       XSD_NS+u'time'               : (None,_strToTime),
       XSD_NS+u'date'               : (None,_strToDate),
       XSD_NS+u'dateTime'           : (None,_strToDateTime),    
       XSD_NS+u'string'             : (None,None),
       XSD_NS+u'normalizedString'   : (None,None),
       XSD_NS+u'token'              : (None,None),
       XSD_NS+u'language'           : (None,None),
       XSD_NS+u'boolean'            : (None, lambda i:i.lower() in ['1','true']),
       XSD_NS+u'decimal'            : (float,None), 
       XSD_NS+u'integer'            : (long ,None),
       XSD_NS+u'nonPositiveInteger' : (int,None),
       XSD_NS+u'long'               : (long,None),
       XSD_NS+u'nonNegativeInteger' : (int, None),
       XSD_NS+u'negativeInteger'    : (int, None),
       XSD_NS+u'int'                : (int, None),
       XSD_NS+u'unsignedLong'       : (long, None),
       XSD_NS+u'positiveInteger'    : (int, None),
       XSD_NS+u'short'              : (int, None),
       XSD_NS+u'unsignedInt'        : (long, None),
       XSD_NS+u'byte'               : (int, None),
       XSD_NS+u'unsignedShort'      : (int, None),
       XSD_NS+u'unsignedByte'       : (int, None),
       XSD_NS+u'float'              : (float, None),
       XSD_NS+u'double'             : (float, None),
       XSD_NS+u'base64Binary'       : (base64.decodestring, None),
       XSD_NS+u'anyURI'             : (None,None),
   }

Maps WXS datatyped Literals to Python. This mapping is used by the toPython() method defined on all Literal instances.

[edit] SPARQL Querying

RDFLIb supports a majority of the current SPARQL specification and includes a harness for the publicly available RDF DAWG test suite. Support for SPARQL is provided by two methods:

  • rdflib.sparql.bison.Parse(_query_,_debug_ = 'False')
  • rdflib.sparql.bison.Evaluate(_store_,_queryObj_,_passedBindings_ = {},_DEBUG_ = False)

The first method parses a stream object with the SPARQL syntax. It uses a Python/C parser generated by BisonGen, which builds a hierarchy of parsed objects. This parsed object can be passed to the second function which evaluates the query against an RDFLib Store instance using the (optional) initial bindings.

Using Parse:

   from rdflib.sparql.bison import Parse
   from cStringIO import StringIO
   p = Parse(StringIO('.. SPARQL string ..'))
   print p

p is an instance of rdflib.sparql.bison.Query.Query

   from rdflib.sparql.bison import Evaluate
   rt = Evaluate(store,p,{ .. initial bindings ..})

[edit] Serialization (NTriples, N3, and RDF/XML)

[edit] Beyond the RDF Model

[edit] Named Graphs / Conjunctive Graphs

RDFLib defines the following kinds of Graphs:

  • 'Graph'(_store_,_identifier_)
  • 'QuotedGraph'(_store_,_identifier_)
  • 'ConjunctiveGraph'(_store_,_default_identifier_= None)

A Conjunctive Graph is the most relevant collection of graphs that are considered to be the boundary for closed world assumptions. This boundary is equivalent to that of the store instance (which is itself uniquely identified and distinct from other instances of Store that signify other Conjunctive Graphs). It is equivalent to all the named graphs within it and associated with a _default_ graph which is automatically assigned a BNode for an identifier - if one isn't given.


[edit] Formulae

RDFLib graphs support an additional extension of RDF semantics for formulae. For the academically inclined, Graham Kyles 'formal' extension (see external links) is probably a good read.

Formulae are represented formally by the 'QuotedGraph' class and disjoint from regular RDF graphs in that their statements are quoted.

[edit] Persistence

RDFLib provides an abstracted Store API for persistence of RDF and Notation 3. The Graph class works with instances of this API (as the first argument to its constructor) for triple-based management of an RDF store including: garbage collection, transaction management, update, pattern matching, removal, length, and database management (_open_ / _close_ / _destroy_) . Additional persistence mechanisms can be supported by implementing this API for a different store. Currently supported databases:

Store instances can be created with the plugin function:

   from rdflib import plugin
   from rdflib.store import Store
   plugin.get('.. one of the supported Stores ..',Store)(identifier=.. id of conjunctive graph ..)

[edit] 'Higher-order' Idioms

There is at least one high-level API that extends RDFLib graphs into other Pythonic idioms. For more a more explicit Python binding, there is Sparta.

[edit] Support

There is a #redfoot irc channel on freenode for anyone who wants to chat about rdflib or redfoot. Also available is a mailinglist and a Trac-based issue-tracker.

[edit] See also

[edit] External links

In other languages