|
668. BlockSearch: Elucidation of Unknown Protein
Functions
by Rainer Fuchs, Glaxo Research Institute,
Research Triangle Park, North Carolina 27709
With the growth of large-scale automated
sequencing projects, researchers increasingly
encounter protein-coding sequences which are a
priori not known. While there has been steady
growth of databases against which such sequences
can be checked, there has been a concommitant
growth of redundancy and noise which may lead to
complications in identifying more distantly
related proteins.
As a consequence, alternative methods for the
elucidation of unknown protein functions have been
developed. Possibly one of the best known is the
"motif" approach. These motifs are descriptions
of short, conserved sequence regions which are
characteristic of a protein family. If some part
of a new sequence matches one of these motifs, it
strongly suggests that the new protein is a member
of the family identified by the motif.
BlockSearch is another more quantitative method
based on what is called a "profile analysis." A
profile is essentially a frequency matrix derived
from an alignment of related sequences in which
each column of the matrix reflects the frequency
of occurence of each amino acid at the
corresponding position of this alignment. High
similarity of some region of a new sequence to a
profile indicates a possible biological
relationship of this sequence and the family
characterized by the profile.
Whereas profiles are mainly based on global
alignments and allow for gaps, it has been
demonstrated that even short ungapped alignments
(so-called BLOCKS) can yield very effective
scoring matrices. For this approach to be useful,
one must have access to a large database of such
blocks. This database exists and is identified in
the program documentation.
This system is written in very clean "C" and has
been successfully run on a number of Unix-based
platforms.
Lines of Code: 2500
C Compiler
|