PRASANTH VIRTUAL BIOINFO LAB: PATTERN SEARCHING DATABASES

Patterns are regular expressions matching short sequence motifs usually of biological meaning. This pattern serves as discriminators that help to identify a protein’s family e.g. zinc finger binding motif. Databases which derive patterns from protein superfamily / family are known as Protein Pattern Databases.

Within a single conserved region (motif), the sequence information may be reduced to a consensus expression (a regular expression), often simply referred to as a pattern.

PROSITE

PROSITE is hosted by ExPaSy. PROSITE is an annotated collection of motif descriptors dedicated to the identification of protein families and domains. The motif descriptors used in PROSITE are either patterns or profiles, which are derived from multiple alignments of homologous sequences. This gives to these motif descriptors the notable advantage of identifying distant relationships between sequences that would have passed unnoticed based solely on pairwise sequence alignment.

The core of the PROSITE database is composed of two text files:

• PROSITE.DAT is a computer readable file that contains all the information necessary to programs that make use of PROSITE to scan sequence(s) for the occurrence of patterns or profiles. This file includes, for each of the entry described, statistics on the number of hits obtained while scanning the SWISS-PROT protein database for a pattern or profile. Cross-references to the corresponding SWISS-PROT entries as well as to matched sequences from the PDB 3D-structure database2 are also provided.

• PROSITE.DOC contains textual information that fully documents each pattern or profile.

PROSITE patterns

In some cases the sequence of an unknown protein is too distantly related to any protein of known structure to detect its resemblance by pairwise sequence alignment. However, relationships can be revealed by the occurrence in its sequence of a particular cluster of residue types, which is variously known as a pattern, motif, signature or fingerprint.

These motifs, typically around 10 to 20 amino acids in length, arise because specific residues and regions thought or proved to be important to the biological function of a group of proteins are conserved in both structure and sequence during evolution. These biologically significant regions or residues are generally:

• Enzyme catalytic sites.

• Prostethic group attachment sites (heme, pyridoxal-phosphate, biotin, etc.).

• Amino acids involved in binding a metal ion.

• Cysteines involved in disulphide bonds.

• Regions involved in binding a molecule (ADP/ATP, GDP/GTP, calcium, DNA, etc.) or

As the sequence of biologically meaningful motifs is evolutionarily conserved, a multiple alignment of them can be reduced to a consensus expression called a regular expression or pattern. Each position of such a pattern can be occupied by any residue from a specified set of acceptable residues, and in addition can be repeated a variable number of times within a specified range. At strictly conserved positions only one particular amino acid is accepted, whereas at other positions several amino acids with similar physicochemical properties can be accepted. It is also possible to define which amino acid(s) is(are) incompatible with a given position, and conserved residues can be separated by gaps of variable lengths.

PRASANTH VIRTUAL BIOINFO LAB

Search This Blog

Friday, 15 April 2011

PATTERN SEARCHING DATABASES

No comments:

Post a Comment