http://www.ccl.net/pub/chemistry/html_pages/old/qcpe/QCPE_removed/ent/e13/668.html.shtml

CCL 668.html

About CCL

Rules, Instructions, Contributing Material, Supporting, About Us,

Resources

Software Archive, List Archive, Data Archives, Document Archives,

Search CCL

Text Search, RegExp Search,

Announcements

Conferences, Jobs, Resumes,

Links

Topics, Data, Software Sites, Hardware Sites, Institutions, Listsoftsites, Search Engines,

E-mail us

Send E-mail to CCL Administrators,

QCPE

THIS INFORMATION IS OBSOLETE AND IS PROVIDED ONLY FOR ITS HISTORICAL VALUE

Main / Catalog / Section13

668. BlockSearch: Elucidation of Unknown Protein Functions

by Rainer Fuchs, Glaxo Research Institute, Research Triangle Park, North Carolina 27709

With the growth of large-scale automated sequencing projects, researchers increasingly encounter protein-coding sequences which are a priori not known. While there has been steady growth of databases against which such sequences can be checked, there has been a concommitant growth of redundancy and noise which may lead to complications in identifying more distantly related proteins.

As a consequence, alternative methods for the elucidation of unknown protein functions have been developed. Possibly one of the best known is the "motif" approach. These motifs are descriptions of short, conserved sequence regions which are characteristic of a protein family. If some part of a new sequence matches one of these motifs, it strongly suggests that the new protein is a member of the family identified by the motif.

BlockSearch is another more quantitative method based on what is called a "profile analysis." A profile is essentially a frequency matrix derived from an alignment of related sequences in which each column of the matrix reflects the frequency of occurence of each amino acid at the corresponding position of this alignment. High similarity of some region of a new sequence to a profile indicates a possible biological relationship of this sequence and the family characterized by the profile.

Whereas profiles are mainly based on global alignments and allow for gaps, it has been demonstrated that even short ungapped alignments (so-called BLOCKS) can yield very effective scoring matrices. For this approach to be useful, one must have access to a large database of such blocks. This database exists and is identified in the program documentation.

This system is written in very clean "C" and has been successfully run on a number of Unix-based platforms.

Lines of Code: 2500 C Compiler

Computational Chemistry List --- QCPE Main --- About This Site

[ CCL Home Page ]
[ About CCL ] [ Resources ] [ Search CCL ] [ Announcements ] [ Links ] [ E-mail us ]
[ Raw Version of this page ]

Modified: Fri Nov 20 02:29:31 2009 GMT

Page accessed 30 times since Tue Dec 23 04:00:20 2025 GMT