Search This Blog

Monday, 25 April 2011

ORganic VIrtual Library (ORVIL) – A combinatorial library construction based on organic constituents and without scaffold hopping


ABSTRACT                                                                                                                                
Rapid construction of virtual combinatorial library is a prerequisite for in silico library enumeration and design. ORganic VIrtual Library (ORVIL) is a perl program to generate a combinatorial library using most frequently observed 200 organic substituents. It is designed to explore the organic chemical space in the given query structure without affecting the entire backbone of the molecule enabling minimum molecular complexities. Its particular features are its simplicity to use, portable SMILES format and high speed of library construction. Benchmarking of Tamoxifen (drug for breast cancer) was performed which revealed a compound having similar architecture of known drug analogue, Toremifene.
Keywords: virtual combinatorial library, organic substituents, molecular complexities, SMILES format

Sunday, 24 April 2011

THE BLESSINGS OF MANAKULA VINAYAGAR


Prasanth Virtual Bioinfo Lab

The blessings of Manakula Vinayagar is always showered on me and I pray my god to bless me forever for success and prosperity in my life and the people around me. Shown image is of Shri Manakula Vinayagar, Pondicherry.

Saturday, 23 April 2011

COMPUTATIONAL STUDIES ON THE INTERACTION OF CORE HISTONE TAIL DOMAINS WITH CpG ISLAND


S. PRASANTH KUMAR 1, RAVI G. KAPOPARA 1,YOGESH T. JASRAI 1 AND RAKESH M. RAWAL 2.

1. Bioinformatics Laboratory, Department of Botany, University School of Sciences,
Ahmedabad- 380 009. 2. Division of Medicinal Chemistry and Pharmacogenomics, Department of Cancer Biology, Cancer & Research Institute (GCRI), Ahmedabad- 380 016.



Download the accepted galley proof here:
http://www.ijpbs.net/vol-3/issue-1/bio/B%20-%2068.pdf
Copyrights Reserved- International Journal of Pharma and Bio Sciences.



ABSTRACT:
It has been elucidated through in vitro studies that core histone tail domains preferentially interact with linker DNA. In the present study, we studied these interaction computationally using molecular docking and isocontour-based electrostatic map approach in order to identify the domains and regions of H3 and H4 tails and DNA contributing for the physical associativeness. We also explored the interaction made by the linker DNA containing methylated CpG dinucleotides (CpG island) with the normal and post-translational modified histone tails. We report that these interactions are electrostatically unfavored if one of the biomolecular partners is methylated thereby negatively charged zones of DNA and histone tails are required to be absent nearby.

KEYWORDS: Core histone tail domain, linker DNA, CpG island, Molecular docking, Isocontour-based
electrostatic potential map.

Here is the snapshot of a study undergone:

COPYRIGHTED IMAGE Prasanth Virtual Bioinfo Lab
Copyrights 2011 Prasanth Virtual bioinfo Lab

Friday, 15 April 2011

UPLOADING SEQUENCES TO THE DATABASES/SEQUENCE SUBMISSIONS

Sequence can be submitted in NCBI GenBank using:
  1. Sequin
  2. BankIt

Sequin

Sequin is a stand-alone software tool developed by the National Center for Biotechnology Information (NCBI) for submitting and updating sequences to the GenBank, EMBL, and DDBJ databases. Sequin has the capacity to handle long sequences and sets of sequences (segmented entries, as well as population, phylogenetic, and mutation studies). It also allows sequence editing and updating, and provides complex annotation capabilities. In addition, Sequin contains a number of built-in validation functions for enhanced quality assurance.

File Formats Accepted

Sequin normally expects to read sequence files in FASTA format. Note that most sequence analysis software packages include FASTA or "raw" as one of the available output formats. Population studies, phylogenetic studies, mutation studies, and environmental samples may be entered in either FASTA format, or in PHYLIP, NEXUS, MACAW, or FASTA+GAP formats if you are submitting an alignment.

Creating a Submission

Sequin is organized into a series of forms for entering submitting authors, entering organism and sequences, entering information such as strain, gene, and protein names, viewing the complete submission, and editing and annotating the submission. The goal is to go quickly from raw sequence data to an assembled record that can be viewed, edited, and submitted to your database of choice.

Submitting Authors Form: The pages in the Submitting Authors form ask you to provide the release date, a working title, names and contact information of submitting authors, and affiliation information.

Submission page: This page asks for a tentative title for a manuscript describing the sequence and will initially mark the manuscript as being unpublished. When the article is published, the database staff will update the sequence record with the new citation. This page also lets you indicate that a record should be held confidential by the database until a specified date, although the preferred policy is to release the record immediately into the public databases. It also contains pages of contact, author and author’s affiliation.

Sequence Format Form: Submission Type: Single Sequence if you have a single contiguous mRNA or genomic DNA sequence.  Segmented Sequence if you have a single collection of non-overlapping, non-contiguous sequences that cover a specified genetic region from a single source. A standard example is a set of genomic DNA sequences that encode exons from a gene along with fragments of their flanking introns. Gapped Sequence if you have a single non-contiguous mRNA or genomic DNA sequence. A gapped sequence contains specified gaps of known or unknown length where the exact nucleotide sequence has not been determined. Sequence Format: FASTA, FASTA+GAP, NEXUS, PHYLIP,etc. Then we have to fill Organism page and Annotation page (this is optional) before final submission. Now, the program will supply an automatic identifier which will be used for deposition in database and for future correspondence.


BankIt

BankIt is a web based tool developed by the National Center for Biotechnology Information (NCBI) for submitting and updating sequences to the GenBank,


Creating a Submission

Contact Information: Name, address, phone number, fax number and email address of the submitter must be entered when registering and submitting for the first time
Release date information: Immediately after it is processed at NCBI or on a date the submitter specifies

Reference information: Sequence authors: names of the researchers who are credited with the sequence Publication information: Unpublished, In-Press, or Published; and applicable citation information (paper's title, authors, journal title, volume, issue, year, pages)

Submission Category and Type: Original sequencing or Third Party Annotation
Single sequence, sequence set (phylogenetic, population, environmental, etc), or batch

Nucleotide sequence(s): Input (cut-and-paste) single or multiple sequences or Upload them as a FASTA file; FASTA files should include organisms in their definition lines
Sequences must be at least 200 nucleotides long (unless they are complete exons, non-coding RNAs (ncRNAs), microsatellites or ancient DNA)

Molecule type: what was sequenced? (genomic DNA, mRNA, genomic RNA, cRNA, etc)
Topology: linear or circular (circular must be complete, such as a complete plasmid)

Organism name, applicable source modifiers, location : Genus and species names (if not previously provided in FASTA file) If name is new or unrecognized, provide best known taxonomic lineage If genus and/or species names are not known, provide most specific name known (for example:Bacillus sp., Uncultured bacterium, Uncultured archaeon) Most complete name for any synthetic vector (for example: Cloning vector pAB234, Transfer vector p789Abc) Source modifiers include: strain, clone, isolate, specimen-voucher, isolation-source, country Location: organelle (mitochondrion, chloroplast, etc); map and/or chromosome

Features of the sequence: Upload files or use input forms to add all applicable features (for example: CDS, gene, rRNA, tRNA, microsatellite, exon, intron)





PATTERN SEARCHING DATABASES

Patterns are regular expressions matching short sequence motifs usually of biological meaning. This pattern serves as discriminators that help to identify a protein’s family e.g. zinc finger binding motif.  Databases which derive patterns from protein superfamily / family are known as Protein Pattern Databases.


Within a single conserved region (motif), the sequence information may be reduced to a consensus expression (a regular expression), often simply referred to as a pattern.

PROSITE

PROSITE is hosted by ExPaSy. PROSITE is an annotated collection of motif descriptors dedicated to the identification of protein families and domains. The motif descriptors used in PROSITE are either patterns or profiles, which are derived from multiple alignments of homologous sequences. This gives to these motif descriptors the notable advantage of identifying distant relationships between sequences that would have passed unnoticed based solely on pairwise sequence alignment.

The core of the PROSITE database is composed of two text files:

• PROSITE.DAT is a computer readable file that contains all the information necessary to programs that make use of PROSITE to scan sequence(s) for the occurrence of patterns or profiles. This file includes, for each of the entry described, statistics on the number of hits obtained while scanning the SWISS-PROT protein database for a pattern or profile. Cross-references to the corresponding SWISS-PROT entries as well as to matched sequences from the PDB 3D-structure database2 are also provided.

• PROSITE.DOC contains textual information that fully documents each pattern or profile.

PROSITE patterns

In some cases the sequence of an unknown protein is too distantly related to any protein of known structure to detect its resemblance by pairwise sequence alignment. However, relationships can be revealed by the occurrence in its sequence of a particular cluster of residue types, which is variously known as a pattern, motif, signature or fingerprint.

These motifs, typically around 10 to 20 amino acids in length, arise because specific residues and regions thought or proved to be important to the biological function of a group of proteins are conserved in both structure and sequence during evolution. These biologically significant regions or residues are generally:
• Enzyme catalytic sites.
• Prostethic group attachment sites (heme, pyridoxal-phosphate, biotin, etc.).
• Amino acids involved in binding a metal ion.
• Cysteines involved in disulphide bonds.
• Regions involved in binding a molecule (ADP/ATP, GDP/GTP, calcium, DNA, etc.) or

As the sequence of biologically meaningful motifs is evolutionarily conserved, a multiple alignment of them can be reduced to a consensus expression called a regular expression or pattern. Each position of such a pattern can be occupied by any residue from a specified set of acceptable residues, and in addition can be repeated a variable number of times within a specified range. At strictly conserved positions only one particular amino acid is accepted, whereas at other positions several amino acids with similar physicochemical properties can be accepted. It is also possible to define which amino acid(s) is(are) incompatible with a given position, and conserved residues can be separated by gaps of variable lengths.



 

BIOINFORMATICS DATA INTEGRATION SYSTEMS/ SEQUENCE RETRIEVAL SYSTEMS

Sequence Retrieval System (SRS)

SRS is a generic bioinformatics data integration software system. Developed initially in the early 1990s as an academic project at the European Molecular Biology Laboratory (EMBL), the system has evolved into a commercial product and is currently sold under license as a stand-alone software product.

SRS uses proprietary parsing techniques largely based on context-free grammars to parse and index flat-file data. A similar system combined with DOM-based processing rules is used to parse and index XML-formatted data. A relational database connector can be used to integrate data stored in relational database systems. SRS provides a unique common interface for accessing heterogeneous data sources and bypass complexities related to the actual format and storage mechanism for the data. SRS can exploit textual references between different databases and pull together data from disparate sources into a unified view.

SRS is designed from the ground up with extensibility and flexibility in mind, in order to cope with the ever-changing list of databases and formats in the bioinformatics world. SRS relies on a mix of database configuration via meta-definitions and hand-crafted parsers to integrate a wide range of database distributions. These meta-definitions are regularly updated and are also available for extension and modification to all users.
A number of similar commercial systems have been developed that replicate the basic functionality of SRS.


Entrez

The Entrez Global Query Cross-Database Search System is a powerful federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website. "Entrez" happens to be the second person plural (or formal) form of the French verb "entrer (to enter)", meaning the invitation "Come in!".

Entrez is the text-based search and retrieval system used at NCBI for all of the major databases, including PubMed, Nucleotide and Protein Sequences, Protein Structures, Complete Genomes, Taxonomy, OMIM, and many others. Entrez is at once an indexing and retrieval system, a collection of data from many sources, and an organizing principle for biomedical information.

Entrez Global Query is an integrated search and retrieval system that provides access to all databases simultaneously with a single query string and user interface. Entrez can efficiently retrieve related sequences, structures, and references. The Entrez system can provide views of gene and protein sequences and chromosome maps. Some textbooks are also available online through the Entrez system.

Entrez Nodes Represent Data


An Entrez “node” is a collection of data that is grouped together and indexed together. It is usually referred to as an Entrez database. In the first version of Entrez, there were three nodes: published articles, nucleotide sequences, and protein sequences. Each node represents specific data objects of the same type, e.g., protein sequences, which are each given a unique ID (UID) within that logical Entrez Proteins node. Records in a node may come from a single source (e.g., all published articles are from PubMed) or many sources (e.g., proteins are from translated Gen-Bank sequences, SWISS-PROT, or PIR)


Ensembl

Ensembl is a bioinformatics project to organize biological information around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of individual genomes, and of the synteny and orthology relationships between them. It is also a framework for integration of any biological data that can be mapped onto features derived from the genomic sequence. Ensembl is available as an interactive Web site, a set of flat files, and as a complete, portable open source software system for handling genomes. All data are provided without restriction, and code is freely available. Ensembl’s aims are to continue to “widen” this biological integration to include other model organisms relevant to understanding human biology as they become available; to “deepen” this integration to provide an ever more seamless linkage between equivalent components in different species; and to provide further classification of functional elements in the genome that have been previously elusive.


PROTEIN SEQUENCE DATABASES

The primary protein sequence databases are:
1. UniProt
2. SwissProt and TrEMBL (UniProt Consortium)
3. Protein information resources (PIR)

UniProt

Universal Protein Resource (UniProt) plays an increasingly important role by providing a stable, comprehensive, freely accessible central resource on protein sequences and functional annotation. UniProt is produced by the UniProt Consortium, formed in 2002 by the European Bioinformatics Institute (EBI), the Protein Information Resource (PIR) and the Swiss Institute of Bioinformatics (SIB).

The UniProt Consortium and the individual activities

EBI, located at the Wellcome Trust Genome Campus in Hinxton, UK, hosts a large resource of bioinformatics databases and services. SIB, located in Geneva, Switzerland, maintains the ExPASy (Expert Protein Analysis System) servers that are a central resource for proteomics tools and databases. PIR, hosted by the National Biomedical Research Foundation (NBRF) at the Georgetown University Medical Center in Washington, DC, USA, is heir to the oldest protein sequence database.

Major components of UniProt

The core activities include manual curation of protein sequences assisted by computational analysis, sequence archiving, and development of a user-friendly UniProt web site and the provision of additional value-added information through cross-references to other databases. UniProt is comprised of three major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase and the UniProt Reference Clusters. An additional component consisting of metagenomic and environmental sequences has recently been added to UniProt to ensure availability of such sequences in a timely fashion. UniProt is updated and distributed on a bi-weekly basis and can be accessed online for searches or download at http://www.uniprot.org.

UniProtKB

UniProtKB consists of two sections, UniProtKB/Swiss-Prot and UniProtKB/TrEMBL. The former contains manually annotated records with information extracted from literature and curator-evaluated computational analysis. To achieve accuracy, annotations are performed by biologists with specific expertise. Information including function,  catalytic activity, subcellular location, disease, structure and posttranslational modifications is annotated. An important part of the annotation process involves the merging of different reports for a single protein.
UniRef

The UniRef databases provide three clustered sets (UniRef100, 90 and 50) of sequences from UniProtKB and selected UniParc records in order to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences from view.

UniParc

UniParc is the main sequence storehouse and is a comprehensive repository that reflects the history of all protein sequences. UniParc houses all new and revised protein sequences from various sources to ensure that complete coverage is available at a single site. It includes not only UniProtKB but also translations from the EMBL-Bank/ DDBJ/GenBank Nucleotide Sequence Databases, the Ensembl database of animal genomes, the International Protein Index (IPI), the Protein Data Bank (PDB), NCBI’s Reference Sequence Collection (RefSeq), model organism databases FlyBase and WormBase and protein sequences from the European, American and Japanese Patent Offices. To avoid redundancy, sequences are handled as strings—all sequences 100% identical over the entire length are merged, regardless of source organism. New and updated sequences are loaded on a daily basis, cross-referenced to the source database accession number and provided with a sequence version that increments upon changes to the underlying sequence.


SWISS-PROT

SWISS-PROT is a protein sequence and knowledge database that is valued for its high quality annotation, the usage of standardized nomenclature, direct links to specialized databases and minimal redundancy. The format of SWISSPROT follows as closely as possible that of the EMBL Nucleotide Sequence Database for standardization purposes.

Core data and annotation

(The following paragraph can be written for explanation of SWISS-PROT format).

The core data, which is mandatory to each SWISS-PROT entry, consists principally of the amino acid sequence, the protein name (description), taxonomic data and citation information. If further information on the protein is available, the entries contain detailed annotation on items such as the function(s) of the protein, enzyme-specific information (catalytic activity, cofactors, metabolic pathway, regulation mechanisms), biologically relevant domains and sites, posttranslational modification(s), molecular weight determined by mass spectrometry, subcellular location(s) of the protein, tissue-specific expression, developmentally-specific expression of the protein, secondary structure, quaternary structure, splice isoform(s), polymorphism(s), similarities to other proteins, use of the protein in a biotechnological process, diseases associated with deficiencies in the protein, use of the protein as a pharmaceutical drug, sequence conflicts, etc. To acquire a maximum of up-to-date knowledge regarding a protein, information is not only obtained from publications reporting new sequence data, but also from review articles with an aim to revise periodically the annotations of families or groups of proteins.

Minimal redundancy

Many sequence databases contain, for a given protein sequence, separate entries which correspond to different literature reports. SWISS-PROT tries to merge all these data in order to minimize the redundancy of the database. Differences between sequencing reports due to splice variants, polymorphisms, disease-causing mutations, experimental sequence modifications or simply sequencing errors are indicated in the feature table of the corresponding SWISS-PROT entry. Splice isoforms may differ considerably from one another, with potentially less than 50% sequence similarity between isoforms.


TrEMBL: A computer-annotated supplement to SWISS-PROT

Why TrEMBL ?

Due to the increased data flow from genome projects to the sequence databases, the SWISS-PROT protein knowledgebase faced a number of challenges in its time- and labor-intensive way of manual database annotation. While it is necessary to maintain the high annotation quality as described above, it is also vital to make sequences available as quickly as possible. To address this, TrEMBL (translation of EMBL nucleotide sequence database) was introduced in 1996.

TrEMBL consists of computer-annotated entries derived from the translation of all coding sequences (CDS) in the nucleotide sequence databases, except for CDS already included in SWISS-PROT. It also contains protein sequences extracted from the literature and protein sequences submitted directly by the user community.

Sections of TrEMBL

It is subdivided into two sections: SP-TrEMBL contains sequences, which will eventually be incorporated into SWISS-PROT and REM-TrEMBL contains those, which will not. These include immunoglobulins and T-cell receptors, synthetic sequences, patent application sequences, fragments of less than 8 amino acids and coding sequences where there is strong experimental evidence that the sequence does not code for a real protein.

In addition, there is a weekly update to TrEMBL called TrEMBLnew. TrEMBLnew is produced weekly from new nucleotide sequences deposited in the EMBL nucleotide sequence database. At each TrEMBL release, the TrEMBLnew entries are processed; any entries redundant against SWISS-PROT/TrEMBL are merged and the remainder then progressed into TrEMBL.



Protein Information Resource (PIR)

The Protein Information Resource (PIR), located at Georgetown University Medical Center (GUMC), is an integrated public bioinformatics resource to support genomic and proteomic research, and scientific studies. PIR was established in 1984 by the National Biomedical Research Foundation (NBRF) as a resource to assist researchers in the identification and interpretation of protein sequence information. In 2002, PIR along with its international partners, EBI (European Bioinformatics Institute) and SIB (Swiss Institute of Bioinformatics), were awarded a grant from NIH to create UniProt, a single worldwide database of protein sequence and function, by unifying the PIR-PSD, Swiss-Prot, and TrEMBL databases.

PIR-PSD

The PIR, along with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), continues to enhance and distribute the PIR-International Protein Sequence Database (PSD), a non-redundant, expertly annotated, fully classified and extensively cross-referenced protein sequence database in the public domain. It contains about 250 000 protein sequences with comprehensive coverage across the entire taxonomic range, including sequences from all the publicly available complete genomes.

Architecture of PIR-PSD

a. Superfamily classification

A unique characteristic of the PIR-PSD is the superfamily/family classification that provides complete and non-overlapping clustering of proteins based on global (end-to-end) sequence similarity. Sequences in the same superfamily share common domain architecture (i.e. have the same number, order and types of domains) and do not differ excessively in overall length.

b. Bibliography submission and literature mapping

Linking protein data to literature data that describes or characterizes the proteins is crucial for us to increase the amount of experimentally verified data and to improve the quality of protein annotation. Attribution of protein annotations to validated experimental sources provides effective means to avoid propagation of errors that may have resulted from large-scale genome annotation.

iProClass (integrated Protein Classification)

The iProClass (integrated Protein Classification) database is designed to provide comprehensive descriptions of all proteins and to serve as a framework for data integration in a distributed networking environment. The database describes family relationships at both global (whole protein) and local (domain, motif, site) levels, as well as structural and functional classifications and features of proteins. The current version consists of more than 270 000 non-redundant PIR-PSD and SWISS-PROT proteins organized with more than 33 000 PIR superfamilies, 100 000 families, 3400 PIR homology and Pfam domains, 1300 ProClass/ProSite motifs, 280 PIR post-translational modification sites, and links to over 40 databases of protein families, structures, functions, genes, genomes, literature and taxonomy.

Directly linked to the iProClass sequence report are two additional PIR databases, ASDB and RESID. PIR-Annotation and Similarity Database (ASDB) lists pre-computed, biweekly updated FASTA neighbors of all PSD sequences with annotation information and graphical displays of sequence similarity matches. PIR-RESID documents over 280 post-translational modifications and links to PSD entries containing either experimentally determined or computationally predicted modifications with evidence tags.

PIR-NREF

As a major resource of protein information, one of our primary aims is to provide a timely and comprehensive collection of all protein sequence data that keeps pace with the genome sequencing projects and contains source attribution and minimal redundancy. The PIR-NREF protein database includes sequences from PIR, SWISS-PROT, TrEMBL, RefSeq, GenPept, PDB and other protein databases. The NCBI taxonomy is used as the ontology for matching source organism names at the species or strain (if known) levels. The NREF report provides source attribution (containing protein IDs, accession numbers and protein names from underlying databases), in addition to taxonomy, amino acid sequence and composite literature data. The composite protein names, including synonyms, alternate names and even misspellings, can be used to assist the ontology development on protein names and the identification of mis-annotated proteins. Related sequences, including identical sequences from different organisms and closely related sequences within the same organism, are also listed. The database presently consists of about 800 000 entries (Jan 2002) and is updated biweekly.


GENE/NUCLEOTIDE DATABASES

The primary Gene/Nucleotide databases are:
1)      NCBI GenBank
2)      EMBL
3)      DDBJ

NCBI GenBank

Introduction

The GenBank sequence database is an annotated collection of all publicly available nucleotide sequences and their protein translations. This database is produced at National Center for Biotechnology Information (NCBI) as part of an international collaboration with the European Molecular Biology Laboratory (EMBL) Data Library from the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). Initially, GenBank was built and maintained at Los Alamos National Laboratory (LANL). In the early 1990s, this responsibility was awarded to NCBI. GenBank and its collaborators receive sequences produced in laboratories throughout the world from more than 100,000 distinct organisms. GenBank continues to grow at an exponential rate, doubling every 10 months.

About NCBI

The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation. The NCBI houses genome sequencing data in GenBank and an index of biomedical research articles in PubMed Central and PubMed, as well as other information relevant to biotechnology. All these databases are available online through the Entrez search engine.
NCBI is directed by David Lipman, one of the original authors of the BLAST sequence alignment program.

International Collaboration

In the mid-1990s, the GenBank database became part of the International Nucleotide Sequence Database Collaboration (INSDC) with the EMBL database (European Bioinformatics Institute) and the Genome Sequence Database (GSDB). Subsequently, the GSDB was removed from the Collaboration (by the National Center for Genome Resources, Santa Fe, NM), and DDBJ joined the group. Each database has its own set of submission and retrieval tools, but the three databases exchange data daily so that all three databases should contain the same set of sequences.

Types of Sequences Accepted

NCBI’s GenBank database is a collection of publicly available annotated nucleotide sequences, including mRNA sequences with coding regions, segments of genomic DNA with a single gene or multiple genes, and ribosomal RNA gene clusters.

It is not limited to:
  1. Expressed sequence tag (EST) data
  2. Sequence tagged site (STS) data
  3. Genome survey sequence (GSS) data
  4. High throughput genomic (HTG) data
  5. Whole genomic sequences (WGS) data, etc

Data Exchange

GenBank exchanges data daily with its two partners in the International Nucleotide Sequence Database Collaboration (INSDC): the European Bioinformatics Institute (EBI) of the European Molecular Biology Laboratory (EMBL), and the DNA Data Bank of Japan (DDBJ). Nearly all sequence data are deposited into INSDC databases by the labs that generate the sequences, in part because journal publishers generally require deposition prior to publication so that an accession number can be included in the paper.

Non-Redundancy of Data

GenBank is specifically intended to be an archive of primary sequence data. Thus, to be included, the sequencing must have been conducted by the submitter. Because GenBank is an archival database and includes all sequence data submitted, there are multiple entries for some loci. Just as the primary literature includes similar experiments conducted under slightly different conditions, GenBank may include many sequencing results for the same loci. These different sequencing submissions can reflect genetic variations between individuals or organisms, and analyzing these differences is one way of identifying single nucleotide polymorphisms.

Submission Tools

Submission of sequences can be done using BankIt and Sequin (This topic has been given as a separate topic and be discussed later in this material).


EMBL

Introduction

European Molecular Biology Laboratory (EMBL) is a nucleotide sequence database maintained by European Bioinformatics Institute (EBI). The European Molecular Biology Laboratory (EMBL) is a molecular biology research institution supported by 20 European countries and Australia as associate member state. EMBL was created in 1974 and is an intergovernmental organization funded by public research money from its member states. Research at EMBL is conducted by approximately 85 independent groups covering the spectrum of molecular biology. The EBI is a hub for bioinformatics research and services, developing and maintaining a large number of databases which are free of charge for the scientific community.

The European Bioinformatics Institute (EBI) is a centre for research and services in bioinformatics, and is part of European Molecular Biology Laboratory (EMBL). It is located on the Wellcome Trust Genome Campus in Hinxton, Great Britain. The roots of the EMBL-EBI lie in the EMBL Nucleotide Sequence Data Library (now known as EMBL-Bank), which was established in 1980 at the EMBL laboratories in Heidelberg, Germany and was the world's first nucleotide sequence database. The original goal was to establish a central computer database of DNA sequences.

Data resources and tools at the EBI

EMBL-Bank, Genomes, Gene Expression, Literature, Sequence Similarity & Analysis, UniProt, Nucleotide Sequences, Molecular Interactions, Taxonomy, Pattern and Motif Searches, ArrayExpress, Protein Sequences, Reactions and Pathways, Ontologies, Structure Analysis, Ensembl, Macromolecular Structures, Protein Families, Text Mining,
InterPro, Small Molecules, Enzymes, PDBe, SOAP & REST Web Services, Carbohydrate structures.


DDBJ

Introduction

The DNA Data Bank of Japan (DDBJ) is a DNA data bank.[1] It is located at the National Institute of Genetics (NIG) in Japan. It is also a member of the International Nucleotide Sequence Database Collaboration or INSDC. It exchanges its data with European Molecular Biology Laboratory at the European Bioinformatics Institute and with GenBank at the National Center for Biotechnology Information on a daily basis. Thus these three databanks contents the same data at any given time.

Data Exchange

DDBJ began data bank activities since 1986 at NIG and it boasts to be the only nucleotide sequence data bank in Asia. Although DDBJ mainly receives its data from Japanese researchers, however it can accept data from a contributor belonging to any other country. DDBJ is primarily funded by the Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT).

Specialized Databases of DDBJ

Genome Information Broker (GIB) collects complete genome sequence data. GIB includes more than 50 bacterial genome, yeast genome and Arabidopsis genome. Human Genomics Studio (HGS) collects whole human genome sequences, assemble all the sequences, map all the available genes to the chromosomes, and compile a complete human genome catalog.






BIOINFORMATICS

What Is Bioinformatics?
·        Bioinformatics is the unified discipline formed from the combination of biology, computer science, and information technology.
·        "The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information.“ –Frank Tekaia

Docking and in silico Bioavailability Analysis of CDK6 Flavonol Inhibitors and its Analogues for Acute Lymphoblastic Leukemia

 
S.Prasanth Kumar

NOTE: THIS ARTICLE HAS BEEN ACCEPTED BY JCIB. FULL MANUCRIPT IS COPYRIGHTED BY JCIB.

ABSTRACT
Acute lymphoblastic leukemia can be controlled by inhibition of the cyclin-dependent kinase 6 –Retinoblastoma (CDK6-Rb) pathway. In this study, CDK6, a key regulator that allows progression through G1 and then G1/S transition of cell cycle, was used as target. Docking studies using ArgusLab with known flavonol CDK6 inhibitors and computationally designed analogues showed that the most feasible position for the drug to interact with the receptor was found to be with analog 5 with an energy value of -12.3503 kcal/mole. Further, physicochemical properties were performed to evaluate the molecule’s capability of being druglikeness. All compounds under study had passed this filtering process and analog 5 possessed better values than the co-crystallized ligand. Second-generation CDK inhibitors in confirmed clinical trials are targeted against both CDK4/CDK6 which can inhibit only cell cycle progression and has no apoptotic inducing activity towards myeloma cells. This study relates the importance of novel molecules showing selective interaction towards CDK6 and the greater possibility of selective CDK6 inhibitors in cancer therapy.


Keywords : Acute lymphoblastic leukemia, CDK6-Rb pathway, G1/S phase, cell cycle,in silico bioavailability analysis, physicochemical properties.

Wednesday, 13 April 2011

BIOINFORMATICS PRESENTATIONS IN .PPT FORMAT

Dear Reader/Followers,

Need of Bioinformatics Presentations in .ppt format. Download it from Slideshare under the account "prasanthperceptron" and acknowledge me. Any comments are invited and appreciated.

Regards.
S.Prasanth Kumar.

Screening of medicinal plant compounds against NS5B polymerase of hepatitis C virus (HCV) using molecular docking studies


P.  Sr inivasan*,  A.  Sudha,  A.  Shahul  Hame ed,  S.  Prasanth Kumar  and M.  Kar thikeyan,
Department of Bioinformatics, Alagappa University, Karaikudi – 630 003
Received on: 15-09-2010; Revised  on: 18-10-2010; Accepted on:13-12-2010

ABSTRACT
Hepatitis C virus (HCV) is a major cause of acute hepatitis and chronic liver disease, including cirrhosis and liver cancer. Several medicinal plants and its derivatives are experimentally tested in the treatment of this disease, but there is no unique effective drug for all HCV genotypes. Currently available drugs show the low level (~ 40%) of sustained viral response in genotype 1 patients. Non structural protein 5B (NS5B) polymerase is a RNA-dependent RNA polymerase, which is involved in the synthesis and replication of the Hepatitis C viral RNA and used as a potential target for the inhibition of HCV. The NS5B polymerase (PDB ID: 3D5M) was docked with bioactive compounds selected from the medicinal plants viz., Glycyrrhiza glabra, Bupleurum falcatum, Panax ginseng, Zingiber officinale and Phyllanthus niruri. In the present study, the compounds from these plants were analyzed for their inhibitory activity on NS5B polymerase of HCV by using Glide module. Further the results revealed that the bioactive compounds, glycyrrhizin and isoliquritin from the medicinal plants, Glycyrrhiza glabra were actively interacted the enzyme, NS5B polymerase with high docking score (glycyrrhizin: -9.664 and isoliquritin: -8.225), low binding energy (glycyrrhizin: -60.574 and isoliquritin: -50.978) and the ligand formed more number of H-bond interactions than the co-crystallized ligand.

Key words: NS5B polymerase, Hepatitis C virus, Molecular docking, Medicinal plants, Glide



KINDLY ACCESS THIS ARTICLE IN PDF FORMAT BY CLICKING HERE http://jpronline.info/article/view/5479/2884

Virtual Quantification of Protein Stability using Applied Kinetic and Thermodynamic Parameters

S. Prasanth Kumarand M. Meenachi
1Bioinformatics Laboratory, Department of Botany, University School of Sciences, Gujarat University, Ahmedabad - 380 009, Gujarat, India.
2.Department of Bioinformatics, Achariya Arts and Science College, Villianur, Pondicherry - 605 110, India.


Abstract
Protein stability, the most important aspect of molecular dynamics and simulations, requires sophisticated instrumentations of molecular biology to analyze its kinetic and thermodynamic background. Sequence- and structure-based programs on protein stability exist which relies only on single point mutations and sequence optimality. The energy distribution conferred by each amino acid essentially paves way for understanding protein stability. To the best of our knowledge, Protein Stability is a first program of its kind, developed to explore the energy requirement of each amino acid in the protein sequence derived from various applied kinetic and thermodynamic quantities. The algorithm is strongly dependent both on kinetic quantities such as atomic solvation energies and solvent accessible surface area and thermodynamic quantities viz. enthalpy, entropy, heat capacity, etc. The hydrophobicity pattern of protein was considered as the important component of protein stabilization. A program was developed to provide the energy distribution and its overall stability.

Keywords: Molecular modeling, kinetics, thermodynamics, protein stability.

PLEASE NOTE: FULL MANUSCRIPT IS COPYRIGHTED BY IIOAB JOURNAL

Epitope-based immunoinformatics and molecular docking studies of Nucleocapsid protein (NP) and Ovarian Tumor (OTU) domain of Crimean-Congo haemorrhagic fever virus (CCHFV)


Pappu Srinivasan*1, Sivakumar Prasanth Kumar1, Muthusamy Karthikeyan1, Jayaram Jeyakanthan1, Yogesh T. Jasrai2, Himanshu A. Pandya2, Rakesh M. Rawal3 and Saumya K. Patel2

1Department of Bioinformatics, Alagappa University, Tamil Nadu, India.
2Department of Botany, University School of Sciences, Gujarat University, Gujarat, India.
3Division of Medicinal Chemistry and Pharmacogenomics, Department of Cancer Biology, The Gujarat Cancer & Research Institute (GCRI), Gujarat, India.

Download Provisional Full Text Article here:

http://www.frontiersin.org/Journal/Abstract.aspx?s=1267&name=bioinformatics%20and%20computational%20biology&ART_DOI=10.3389/fgene.2011.00072

Citation: Srinivasan P, Prasanth kumar S, Karthikeyan M, Jeyakanthan J,
Jasrai YT, Pandya HA, Rawal RM and Patel SK(2011) Epitope-based
immunoinformatics and molecular docking studies of Nucleocapsid
protein (NP) and Ovarian Tumor (OTU) domain of Crimean-Congo
haemorrhagic fever virus (CCHFV). 2:72.doi:10.3389/fgene.2011.00072


Abstract

Crimean-Congo hemorrhagic fever virus (CCHFV), the fatal human pathogen is transmitted to humans by tick bite, or exposure to infected blood or tissues of infected livestock. The CCHFV genome consists of three RNA segments namely, S, M, and L.  The unusually large viral L protein has an ovarian tumor (OTU) protease domain located in the N terminus. It is likely that the protein may be autoproteolytically cleaved to generate the active virus L polymerase with additional functions. Identification of the epitope regions of the virus is important for the diagnosis, phylogeny studies and drug discovery. Early diagnosis and treatment of CCHF infection is critical to the survival of patients and the control of the disease. In this study, we undertook different in silico approaches using molecular docking and immunoinformatic tools to predict epitopes which can be helpful for vaccine designing. Small molecule ligands against OTU domain and protein-protein interaction between a viral and a host protein have been studied using docking tools. 
Keywords: CCHFV, Crimean-Congo Virus, OTU domain, Polymerase, in silico, ligand docking