DBpedia

DBpedia
Developer(s) University of Leipzig, Freie Universität Berlin, OpenLink Software
Initial release 23 January 2007
Stable release DBpedia 3.7 / 11 September 2011[1]
Written in Scala, Java, VSP
Operating system Virtuoso Universal Server
Type Semantic Web, Linked Data
License GNU General Public License
Website dbpedia.org

DBpedia is a project aiming to extract structured content from the information created as part of the Wikipedia project. This structured information is then made available on the World Wide Web.[2] DBpedia allows users to query relationships and properties associated with Wikipedia resources, including links to other related datasets.[3] DBpedia has been described by Tim Berners-Lee as one of the more famous parts of the Linked Data project.[4]

Contents

Background

The project was started by people at the Free University of Berlin and the University of Leipzig, in collaboration with OpenLink Software,[5] and the first publicly available dataset was published in 2007. It is made available under free licences, allowing others to reuse the dataset.

Wikipedia articles consist mostly of free text, but also include structured information embedded in the articles, such as "infobox" tables, categorisation information, images, geo-coordinates and links to external Web pages. This structured information is extracted and put in a uniform dataset which can be queried.

Dataset

As of September 2011, the DBpedia dataset describes more than 3.64 million things, out of which 1.83 million are classified in a consistent ontology, including 416,000 persons, 526,000 places, 106,000 music albums, 60,000 films, 17,500 video games, 169,000 organizations, 183,000 species and 5,400 diseases. The DBpedia data set features labels and abstracts for these 3.64 million things in up to 97 different languages; 2,724,000 links to images and 6,300,000 links to external web pages; 6,200,000 external links into other RDF datasets, 740,000 Wikipedia categories, and 18,100,000 YAGO2 categories. From this dataset, information spread across multiple pages can be extracted, for example book authorship can be put together from pages about the work, or the author.

The DBpedia project uses the Resource Description Framework (RDF) to represent the extracted information. As of September 2011, the DBpedia dataset consists of over 1 billion pieces of information (RDF triples) out of which 385 million were extracted from the English edition of Wikipedia and 665 million were extracted from other language editions.[6]

One of the challenges in extracting information from Wikipedia is that the same concepts can be expressed using different properties in templates, such as birthplace and placeofbirth. Because of this, queries about where people were born would have to search for both of these properties in order to get more complete results. As a result, the DBpedia Mapping Language has been developed to help in mapping these properties to an ontology while reducing the number of synonyms. Due to the large diversity of infoboxes and properties in use on Wikipedia, the process of developing and improving these mappings has been opened to public contributions.[7]

Example

DBpedia extracts factual information from Wikipedia pages, allowing users to find answers to questions where the information is spread across many different Wikipedia articles. Data is accessed using an SQL-like query language for RDF called SPARQL. For example, imagine you were interested in the Japanese shōjo manga series Tokyo Mew Mew, and wanted to find the genres of other works written by its illustrator. DBpedia combines information from Wikipedia's entries on Tokyo Mew Mew, Mia Ikumi and on works such as Super Doll Licca-chan and Koi Cupid. Since DBpedia normalises information into a single database, the following query can be asked without needing to know exactly which entry carries each fragment of information, and will list related genres:

 PREFIX dbprop: <http://dbpedia.org/property/>
 PREFIX db: <http://dbpedia.org/resource/>
 SELECT ?who ?work ?genre WHERE { 
  db:Tokyo_Mew_Mew dbprop:illustrator ?who .
  ?work  dbprop:author ?who .
  OPTIONAL { ?work dbprop:genre ?genre } .
 }

Uses

DBpedia has a broad scope of entities covering different areas of human knowledge. This makes it a natural hub for connecting datasets, where external datasets could link to its concepts.[8] The DBpedia dataset is interlinked on the RDF level with various other Open Data datasets on the Web. This enables applications to enrich DBpedia data with data from these datasets. As of January 2011, there are more than 6.5 million interlinks between DBpedia and external datasets including: Freebase, OpenCyc, UMBEL, GeoNames, Musicbrainz, CIA World Fact Book, DBLP, Project Gutenberg, DBtune Jamendo, Eurostat, Uniprot, Bio2RDF, and US Census data.[9][10] The Thomson Reuters initiative OpenCalais, the Linked Open Data project of the New York Times, the Zemanta API and DBpedia Spotlight also include links to DBpedia.[11][12][13] The BBC uses DBpedia to help organize its content.[14][15] Faviki uses DBpedia for semantic tagging.[16]

Amazon provides DBpedia Public Data Set that can be integrated into Amazon Web Services applications.[17]

See also

References

  1. ^ "DBpedia 3.7 released, including 15 localized Editions". DBpedia Blog. September 11, 2011. http://blog.dbpedia.org/2011/09/11/dbpedia-37-released-including-15-localized-editions/. 
  2. ^ Bizer, Christian; Lehmann, Jens; Kobilarov, Georgi; Auer, Soren; Becker, Christian; Cyganiak, Richard; Hellmann, Sebastian (September 2009). "DBpedia - A crystallization point for the Web of Data". Web Semantics: Science, Services and Agents on the World Wide Web 7 (3): 154–165. ISSN 1570-8268. http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/Bizer-etal-DBpedia-CrystallizationPoint-JWS-Preprint.pdf. 
  3. ^ "Komplett verlinkt - Linked Data" (in German). 3sat. 2009-06-19. http://www.3sat.de/dynamic/sitegen/bin/sitegen.php?tab=2&source=/neues/sendungen/magazin/135119/index.html. Retrieved 2009-11-10. 
  4. ^ "Sir Tim Berners-Lee Talks with Talis about the Semantic Web". Talis. 7 February 2008. http://talis-podcasts.s3.amazonaws.com/twt20080207_TimBL.html. 
  5. ^ , http://wiki.dbpedia.org/Team, retrieved 2009-11-23 
  6. ^ "DBpedia dataset". DBpedia. http://wiki.dbpedia.org/Datasets#h18-3. Retrieved 2008-09-26. 
  7. ^ "DBpedia Mappings". mappings.dbpedia.org. http://mappings.dbpedia.org/index.php/Main_Page. Retrieved 2010-04-03. 
  8. ^ E. Curry, A. Freitas, and S. O’Riáin, “The Role of Community-Driven Data Curation for Enterprises,” in Linking Enterprise Data, D. Wood, Ed. Boston, MA: Springer US, 2010, pp. 25-47.
  9. ^ "Statistics on links between Data sets", SWEO Community Project: Linking Open Data on the Semantic Web (W3C), http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/LinkStatistics, retrieved 2009-11-24 
  10. ^ "Statistics on Data sets", SWEO Community Project: Linking Open Data on the Semantic Web (W3C), http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics, retrieved 2009-11-24 
  11. ^ "First 5,000 Tags Released to the Linked Data Cloud". open.blogs.nytimes.com. 2009-10-29. http://open.blogs.nytimes.com/2009/10/29/first-5000-tags-released-to-the-linked-data-cloud/. Retrieved 2009-11-10. 
  12. ^ "Life in the Linked Data Cloud". www.opencalais.com. http://www.opencalais.com/node/9501. Retrieved 2009-11-10. "Wikipedia has a Linked Data twin called DBpedia. DBpedia has the same structured information as Wikipedia – but translated into a machine-readable format." 
  13. ^ "Zemanta talks Linked Data with SDK and commercial API". blogs.zdnet.com. http://blogs.zdnet.com/semantic-web/?p=243. Retrieved 2009-11-10. "Zemanta fully supports the Linking Open Data initiative. It is the first API that returns disambiguated entities linked to dbPedia, Freebase, MusicBrainz, and Semantic Crunchbase." 
  14. ^ "European Semantic Web Conference 2009 - Georgi Kobilarov, Tom Scott, Yves Raimond, Silver Oliver, Chris Sizemore, Michael Smethurst, Christian Bizer and Robert Lee. Media meets Semantic Web - How the BBC uses DBpedia and Linked Data to make Connections". www.eswc2009.org. http://www.eswc2009.org/program-menu/accepted-in-use-track-papers/134-georgi-kobilarov-tom-scott-yves-raimond-silver-oliver-chris-sizemore-michael-smethurst-christian-bizer-and-robert-lee-media-meets-semantic-web-how-the-bbc-uses-dbpedia-and-linked-data-to-make-connections. Retrieved 2009-11-10. 
  15. ^ "BBC Learning - Open Lab - Reference". bbc.co.uk. http://backstage.bbc.co.uk/openlab/reference.php. Retrieved 2009-11-10. "Dbpedia is a database version of Wikipedia. It's used in a lot of projects for a wide range of different reasons. At the BBC we are using it for tagging content." 
  16. ^ "Semantic Tagging with Faviki". www.readwriteweb.com. http://www.readwriteweb.com/archives/semantic_tagging_with_faviki.php. 
  17. ^ "Amazon Web Services Developer Community : DBpedia". developer.amazonwebservices.com. http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2319&categoryID=249. Retrieved 2009-11-10. 

External links