PRASANTH VIRTUAL BIOINFO LAB: BIOINFORMATICS DATA INTEGRATION SYSTEMS/ SEQUENCE RETRIEVAL SYSTEMS

Sequence Retrieval System (SRS)

SRS is a generic bioinformatics data integration software system. Developed initially in the early 1990s as an academic project at the European Molecular Biology Laboratory (EMBL), the system has evolved into a commercial product and is currently sold under license as a stand-alone software product.

SRS uses proprietary parsing techniques largely based on context-free grammars to parse and index flat-file data. A similar system combined with DOM-based processing rules is used to parse and index XML-formatted data. A relational database connector can be used to integrate data stored in relational database systems. SRS provides a unique common interface for accessing heterogeneous data sources and bypass complexities related to the actual format and storage mechanism for the data. SRS can exploit textual references between different databases and pull together data from disparate sources into a unified view.

SRS is designed from the ground up with extensibility and flexibility in mind, in order to cope with the ever-changing list of databases and formats in the bioinformatics world. SRS relies on a mix of database configuration via meta-definitions and hand-crafted parsers to integrate a wide range of database distributions. These meta-definitions are regularly updated and are also available for extension and modification to all users.

A number of similar commercial systems have been developed that replicate the basic functionality of SRS.

Entrez

The Entrez Global Query Cross-Database Search System is a powerful federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website. "Entrez" happens to be the second person plural (or formal) form of the French verb "entrer (to enter)", meaning the invitation "Come in!".

Entrez is the text-based search and retrieval system used at NCBI for all of the major databases, including PubMed, Nucleotide and Protein Sequences, Protein Structures, Complete Genomes, Taxonomy, OMIM, and many others. Entrez is at once an indexing and retrieval system, a collection of data from many sources, and an organizing principle for biomedical information.

Entrez Global Query is an integrated search and retrieval system that provides access to all databases simultaneously with a single query string and user interface. Entrez can efficiently retrieve related sequences, structures, and references. The Entrez system can provide views of gene and protein sequences and chromosome maps. Some textbooks are also available online through the Entrez system.

Entrez Nodes Represent Data

An Entrez “node” is a collection of data that is grouped together and indexed together. It is usually referred to as an Entrez database. In the first version of Entrez, there were three nodes: published articles, nucleotide sequences, and protein sequences. Each node represents specific data objects of the same type, e.g., protein sequences, which are each given a unique ID (UID) within that logical Entrez Proteins node. Records in a node may come from a single source (e.g., all published articles are from PubMed) or many sources (e.g., proteins are from translated Gen-Bank sequences, SWISS-PROT, or PIR)

Ensembl

Ensembl is a bioinformatics project to organize biological information around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of individual genomes, and of the synteny and orthology relationships between them. It is also a framework for integration of any biological data that can be mapped onto features derived from the genomic sequence. Ensembl is available as an interactive Web site, a set of flat files, and as a complete, portable open source software system for handling genomes. All data are provided without restriction, and code is freely available. Ensembl’s aims are to continue to “widen” this biological integration to include other model organisms relevant to understanding human biology as they become available; to “deepen” this integration to provide an ever more seamless linkage between equivalent components in different species; and to provide further classification of functional elements in the genome that have been previously elusive.

PRASANTH VIRTUAL BIOINFO LAB

Search This Blog

Friday, 15 April 2011

BIOINFORMATICS DATA INTEGRATION SYSTEMS/ SEQUENCE RETRIEVAL SYSTEMS

No comments:

Post a Comment