EB-eye EBI's Search Engine

From Wikipedia, the free encyclopedia

Contents

[edit] The 'EB-eye' - EBI's Search Engine for biological databases

The European Bioinformatics Institute (EBI) is a non-profit academic organisation that forms part of the European Molecular Biology Laboratory (EMBL).

The EBI is a centre for research and services in bioinformatics. The Institute manages databases of biological data including nucleic acid, protein sequences and macromolecular structures.

[edit] The Mission of the EBI

  • To provide freely available data and bioinformatics services to all facets of the scientific community in ways that promote scientific progress
  • To contribute to the advancement of biology through basic investigator-driven research in bioinformatics
  • To provide advanced bioinformatics training to scientists at all levels, from PhD students to independent investigators
  • To help disseminate cutting-edge technologies to industry

[edit] What is the EB-eye Search?

The system is developed on top of the Apache Lucene project framework, which is an Open-source, high-performance, full-featured text search engine library written entirely in Java. It uses this technology to index EBI databases in various formats (e.g. flatfiles, XML dumps, OBO format, etc.) and provides very fast access to the EBI's data resources. The system allows the user to search globally across all EBI databases or individually in selected resources by using an Advance search.

[edit] Global Search

The global search is available at the top of all EBI web pages. Simply type query terms into the text dialogue there and press GO (or press Enter). The system then displays a summary page with the name of the various knowledge domains and the number of matches the search found in these. The user can expand or contract each or all domains by clicking the relevant '+' or '-' signs in the page. When expanded, each data resource is shown along with the number of matches found.

[edit] Examples
Insulin receptor
P53
External Services group
Bos taurus (cow) data at the EBI
escherichia NOT coli
C2H2 zinc finger family 
DNA binding 


[edit] Advanced Search

These searches are available from the 'Advanced Search' page. It first displays four text dialogues for each of the query types that are possible in the system and defaults to searching all the EBI data resources. The 'domain-specific search' in this page takes to user to a very easy to use wizard where it is possible to select data resources individually and choose which fields to query. When multiple data resources are available in a domain the user can select all or just one. After the selection of databases and fields is completed, the 'Advanced search' four text dialogues appear that allow the user to type the query terms of interest. Also see section on this page 'Example Advanced Search'

[edit] What can the user Search for?

Many of the text fields of EBI data resources are indexed in the search engine. But some or are not. This implies that searches from other search engines vs. this one will yield different results. As a rule, the search engine has identifiers, names, descriptions, keywords and cross-references indexed. More specific fields will be indexed at a later date as the quality of the data feeds improves.

Using the Advanced Search and selecting a data resource will bring the user to the select fields dialog. There it is possible to see what has been indexed. For example: In UniProt, the following fields are available: id, accession numbers, creation, last modification and sequence last modification dates, description and keywords.

[edit] Examples

It is also possible to search using cross-references. In the same Advanced search select fields dialog it is possible to see which cross-references are indexed.

[edit] Help & FAQ on EB-eye

Further pages describing the syntax for quering using this search engine are available on the EBI's web site.

[edit] Programmatic access to the EB-eye

A WSDL WSDL (Web Services Description Language) document is available now from here

[edit] Other Lucene-based search engine in biology/bioinformatics

Lucene has been around for a while now. Many bioinformatic centres have been experimenting with its use with biological data and databases. A pioneering development in this field is headed by Dr. Don Gilbert at Indiana University, called LuceGene, a part of the GMOD (Generic Software Components for Model Organisms Databases) initiative. Another example is the search engine in the new UniProt web site which is also based on Lucene and adds features such as sorting large data sets, subqueries across data sets and group-by queries.


[edit] External links