BioMOBY is a registry of web services used in bioinformatics. It allows interoperability between biological data hosts and analytical services by annotating services with terms taken from standard ontologies.
Contents |
The BioMoby project began at the Model Organism Bring Your own Database Interface Conference (MOBY-DIC), held in Emma Lake, Saskatchewan on September 21, 2001. It stemmed from a conversation between Mark D Wilkinson and Suzanna Lewis during a [1] Gene Ontology developers meeting at the Carnegie Institute, Stanford, where the functionalities of the Genquire and Apollo genome annotation tools were being discussed and compared. The lack of a simple standard that would allow these tools to interact with the myriad of data-sources required to accurately annotate a genome was a critical need of both systems.
Funding for the BioMOBY project was subsequently adopted by Genome Prairie [2] (2002-2005), Genome Alberta [3](2005-date), in part through Genome Canada [4], a not-for-profit institution leading the Canadian X-omic initiatives.
There are two main branches of the BioMOBY project. One is a web-service-based approach, while the other utilizes Semantic Web technologies. This article will refer only to the Web Service specifications. The other branch of the project, Semantic Moby, is described in a separate entry.
The Moby project defines three Ontologies that describe biological data-types, biological data-formats, and bioinformatics analysis types. Most of the interoperable behaviours seen in Moby are achieved through the Object (data-format) and Namespace (data-type) ontologies.
The MOBY Namespace Ontology is derived from the Cross-Reference Abbreviations List of the Gene Ontology project. It is simply a list of abbreviations for the different types of identifiers that are used in bioinformatics. For example, Genbank has "gi" identifiers that are used to enumerate all of their sequence records - this is defined as "NCBI_gi" in the Namespace Ontology.
The MOBY Object Ontology is an ontology consisting of IS-A, HAS-A, and HAS relationships between data formats. For example a DNASequence IS-A GenericSequence and HAS-A String representing the text of the sequence. All data in Moby must be represented as some type of MOBY Object. An XML serialization of this ontology is defined in the Moby API such that any given ontology node has a predictable XML structure.
Thus, between these two ontologies, a service provider and/or a client program can receive a piece of Moby XML, and immediately know both its structure, and its "intent" (semantics).
The final core component of Moby is the MOBY Central web service registry. MOBY Central is aware of the Object, Namespace and Service ontologies, and thus can match consumers who have in-hand Moby data, with service providers who claim to consume that data-type (or some compatible ontological data-type) or to perform a particular operation on it. This "semantic matching" helps ensure that only relevant service providers are identified in a registry query, and moreover, ensures that the in-hand data can be passed to that service provider verbatim. As such, the interaction between a consumer and a service provider can be partially or fully automated, as shown in the Gbrowse Moby and Ahab clients respectively.
BioMOBY does not, for its core operations, utilize the RDF or OWL standards from the W3C. This is in part because neither of these standards were stable in 2001, when the project began, and in part because the library support for these standards were not "commodity" in any of the most common languages (i.e. Perl and Java) at that time.
Nevertheless, the BioMOBY system exhibits what can only be described as Semantic Web-like behaviours. The BioMOBY Object Ontology controls the valid data structures in exactly the same way as an OWL ontology defines an RDF data instance. BioMOBY Web Services consume and generate BioMOBY XML, the structure of which is defined by the BioMOBY Object Ontology. As such, BioMOBY Web Services have been acting as prototypical Semantic Web Services since 2001, despite not using the eventual RDF/OWL standards.
However, BioMOBY does utilize the RDF/OWL standards, as of 2006, for the description of its Objects, Namespaces,Service, and Registry. Increasingly these ontologies are being used to govern the behaviour of all BioMOBY functions using DL reasoners.
There are several client applications that can search and browse the BioMOBY registry of services. One of the most popular is the Taverna workbench built as part of the MyGrid project. The first BioMOBY client was Gbrowse Moby, written in 2001 to allow access to the prototype version of BioMoby Services. Gbrowse Moby [5], in addition to being a BioMoby browser, now works in tandem with the Taverna workbench to create SCUFL workflows reflecting the Gbrowse Moby browsing session that can then be run in a high-throughput environment. The Seahawk [6] applet also provides the ability to export a session history as a Taverna workflow, in what constitutes a programming by example functionality.
The Ahab client is a fully automated data mining tool. Given a starting point, it will discover, and execute, every possible BioMOBY service and provide the results in a clickable interface.