Search This Blog

Friday 15 April 2011

GENE/NUCLEOTIDE DATABASES

The primary Gene/Nucleotide databases are:
1)      NCBI GenBank
2)      EMBL
3)      DDBJ

NCBI GenBank

Introduction

The GenBank sequence database is an annotated collection of all publicly available nucleotide sequences and their protein translations. This database is produced at National Center for Biotechnology Information (NCBI) as part of an international collaboration with the European Molecular Biology Laboratory (EMBL) Data Library from the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). Initially, GenBank was built and maintained at Los Alamos National Laboratory (LANL). In the early 1990s, this responsibility was awarded to NCBI. GenBank and its collaborators receive sequences produced in laboratories throughout the world from more than 100,000 distinct organisms. GenBank continues to grow at an exponential rate, doubling every 10 months.

About NCBI

The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation. The NCBI houses genome sequencing data in GenBank and an index of biomedical research articles in PubMed Central and PubMed, as well as other information relevant to biotechnology. All these databases are available online through the Entrez search engine.
NCBI is directed by David Lipman, one of the original authors of the BLAST sequence alignment program.

International Collaboration

In the mid-1990s, the GenBank database became part of the International Nucleotide Sequence Database Collaboration (INSDC) with the EMBL database (European Bioinformatics Institute) and the Genome Sequence Database (GSDB). Subsequently, the GSDB was removed from the Collaboration (by the National Center for Genome Resources, Santa Fe, NM), and DDBJ joined the group. Each database has its own set of submission and retrieval tools, but the three databases exchange data daily so that all three databases should contain the same set of sequences.

Types of Sequences Accepted

NCBI’s GenBank database is a collection of publicly available annotated nucleotide sequences, including mRNA sequences with coding regions, segments of genomic DNA with a single gene or multiple genes, and ribosomal RNA gene clusters.

It is not limited to:
  1. Expressed sequence tag (EST) data
  2. Sequence tagged site (STS) data
  3. Genome survey sequence (GSS) data
  4. High throughput genomic (HTG) data
  5. Whole genomic sequences (WGS) data, etc

Data Exchange

GenBank exchanges data daily with its two partners in the International Nucleotide Sequence Database Collaboration (INSDC): the European Bioinformatics Institute (EBI) of the European Molecular Biology Laboratory (EMBL), and the DNA Data Bank of Japan (DDBJ). Nearly all sequence data are deposited into INSDC databases by the labs that generate the sequences, in part because journal publishers generally require deposition prior to publication so that an accession number can be included in the paper.

Non-Redundancy of Data

GenBank is specifically intended to be an archive of primary sequence data. Thus, to be included, the sequencing must have been conducted by the submitter. Because GenBank is an archival database and includes all sequence data submitted, there are multiple entries for some loci. Just as the primary literature includes similar experiments conducted under slightly different conditions, GenBank may include many sequencing results for the same loci. These different sequencing submissions can reflect genetic variations between individuals or organisms, and analyzing these differences is one way of identifying single nucleotide polymorphisms.

Submission Tools

Submission of sequences can be done using BankIt and Sequin (This topic has been given as a separate topic and be discussed later in this material).


EMBL

Introduction

European Molecular Biology Laboratory (EMBL) is a nucleotide sequence database maintained by European Bioinformatics Institute (EBI). The European Molecular Biology Laboratory (EMBL) is a molecular biology research institution supported by 20 European countries and Australia as associate member state. EMBL was created in 1974 and is an intergovernmental organization funded by public research money from its member states. Research at EMBL is conducted by approximately 85 independent groups covering the spectrum of molecular biology. The EBI is a hub for bioinformatics research and services, developing and maintaining a large number of databases which are free of charge for the scientific community.

The European Bioinformatics Institute (EBI) is a centre for research and services in bioinformatics, and is part of European Molecular Biology Laboratory (EMBL). It is located on the Wellcome Trust Genome Campus in Hinxton, Great Britain. The roots of the EMBL-EBI lie in the EMBL Nucleotide Sequence Data Library (now known as EMBL-Bank), which was established in 1980 at the EMBL laboratories in Heidelberg, Germany and was the world's first nucleotide sequence database. The original goal was to establish a central computer database of DNA sequences.

Data resources and tools at the EBI

EMBL-Bank, Genomes, Gene Expression, Literature, Sequence Similarity & Analysis, UniProt, Nucleotide Sequences, Molecular Interactions, Taxonomy, Pattern and Motif Searches, ArrayExpress, Protein Sequences, Reactions and Pathways, Ontologies, Structure Analysis, Ensembl, Macromolecular Structures, Protein Families, Text Mining,
InterPro, Small Molecules, Enzymes, PDBe, SOAP & REST Web Services, Carbohydrate structures.


DDBJ

Introduction

The DNA Data Bank of Japan (DDBJ) is a DNA data bank.[1] It is located at the National Institute of Genetics (NIG) in Japan. It is also a member of the International Nucleotide Sequence Database Collaboration or INSDC. It exchanges its data with European Molecular Biology Laboratory at the European Bioinformatics Institute and with GenBank at the National Center for Biotechnology Information on a daily basis. Thus these three databanks contents the same data at any given time.

Data Exchange

DDBJ began data bank activities since 1986 at NIG and it boasts to be the only nucleotide sequence data bank in Asia. Although DDBJ mainly receives its data from Japanese researchers, however it can accept data from a contributor belonging to any other country. DDBJ is primarily funded by the Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT).

Specialized Databases of DDBJ

Genome Information Broker (GIB) collects complete genome sequence data. GIB includes more than 50 bacterial genome, yeast genome and Arabidopsis genome. Human Genomics Studio (HGS) collects whole human genome sequences, assemble all the sequences, map all the available genes to the chromosomes, and compile a complete human genome catalog.






No comments:

Post a Comment