CCL Home Page
Up Directory CCL 668.html
QCPE

THIS INFORMATION IS OBSOLETE AND IS PROVIDED ONLY FOR ITS HISTORICAL VALUE

QCPE
Main / Catalog / Section13


668. BlockSearch: Elucidation of Unknown Protein Functions

by Rainer Fuchs, Glaxo Research Institute, Research Triangle Park, North Carolina 27709

With the growth of large-scale automated sequencing projects, researchers increasingly encounter protein-coding sequences which are a priori not known. While there has been steady growth of databases against which such sequences can be checked, there has been a concommitant growth of redundancy and noise which may lead to complications in identifying more distantly related proteins.

As a consequence, alternative methods for the elucidation of unknown protein functions have been developed. Possibly one of the best known is the "motif" approach. These motifs are descriptions of short, conserved sequence regions which are characteristic of a protein family. If some part of a new sequence matches one of these motifs, it strongly suggests that the new protein is a member of the family identified by the motif.

BlockSearch is another more quantitative method based on what is called a "profile analysis." A profile is essentially a frequency matrix derived from an alignment of related sequences in which each column of the matrix reflects the frequency of occurence of each amino acid at the corresponding position of this alignment. High similarity of some region of a new sequence to a profile indicates a possible biological relationship of this sequence and the family characterized by the profile.

Whereas profiles are mainly based on global alignments and allow for gaps, it has been demonstrated that even short ungapped alignments (so-called BLOCKS) can yield very effective scoring matrices. For this approach to be useful, one must have access to a large database of such blocks. This database exists and is identified in the program documentation.

This system is written in very clean "C" and has been successfully run on a number of Unix-based platforms.

Lines of Code: 2500 C Compiler



Computational Chemistry List --- QCPE Main --- About This Site
Modified: Fri Nov 20 02:29:31 2009 GMT
Page accessed 16 times since Tue Dec 23 04:00:20 2025 GMT