QSAR - Canadian Bioinformatics Help Desk Newsletter -- March 4, 2004

From: ian-$-redpoll.pharmacy.ualberta.ca
Date: Thu, 4 Mar 2004 17:32:21 -0700




  Canadian Bioinformatics Help Desk Newsletter -- March 4, 2004
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


Help Desk News Banner Bioinformatics Platform: A GenomePrairie Project

CBHD Newsletter
Issue 9 - March 4, 2004



       CONTENTS:
Online version of this newsletter:
http://gchelpdesk.ualberta.ca/news/04mar04/cbhd_news_04mar04.php

Welcome to the ninth issue of the Canadian Bioinformatics Help Desk (CBHD) Newsletter. Back issues of our newsletter can be viewed at our newsletter archive site (http://gchelpdesk.ualberta.ca/news/news.php). Our circulation base has reached 1070 subscribers. In this issue's Bioinformatics Profile, we feature an article on Faster, Higher, Stronger Sequence Database Searches—homology detection with higher scores and stronger evidence. In our Commercial Software Spotlight we feature Phoretix 1D, a 1D gel image analysis tool, distributed in Canada by United Bioinformatica Inc. This biweekly newsletter is intended to keep Genome Canada researchers and other Help Desk users informed about new software, events, job postings, conferences, training opportunities, interviews, publications, awards, and other newsworthy items concerning bioinformatics, genomics, and proteomics. The CBHD newsletter is a mandated service of the Help Desk and we hope to provide enough useful content to keep you interested and informed. If you know of anyone who would be interested in receiving future issues of this newsletter or contributing content to the newsletter, please email us at ian---gchelpdesk.ualberta.ca. To subscribe to this newsletter, click here; to unsubscribe from this newsletter, send an email message to ian(a)gchelpdesk.ualberta.ca with the word "unsubscribe" in the subject line or body of the message.

Profile1) Bioinformatics Profile

Sequence Database Searches: Faster, Higher, Stronger!
Homology Detection: Higher Scores, Stronger Evidence

Feature article contributed by Paul Gordon

The key to a successful database analysis of DNA and protein sequences is to maximize two search result characteristics: sensitivity and selectivity. Improved sensitivity means that fewer true positive matches, i.e. identified functional domains, will be missed in the results. Improved selectivity means that fewer false positives, i.e. mistakenly identified functional domains, will be identified in the results. Software such as the BLAST suite of programs relies on assumptions about the nature of the sequence similarities to take computational shortcuts, and it does this fabulously well. The results from these searches can produce clear homology candidates. But what about those results that just give you weak hits and hypothetical proteins? To detect more distant homologies, or to find matches in error-prone sequence such as ESTs, these shortcuts cannot always be taken, and the search space grows exponentially. To identify the maximum number of functional domains correctly, one must use a whole range of sequence search tools. These tools include more sensitive pairwise methods such as Smith-Waterman searches and intron spanning GeneBLASTs, plus Hidden Markov Models such as those found in Pfam and other InterPro domain family databases. Figure 1 illustrates the relative sensitivity and selectivity of various search methods.




Figure 1. Comparison of sensitivity and selectivity of various sequence search methods. Blue denotes a software method, red denotes a hardware accelerated method.

These more sensitive and selective methods will generally yield higher e-values, and produce stronger evidence in non-obvious cases than the ubiquitous BLAST. In particular, the frameshift and Hidden Markov Model methods may find matches that elude the standard software methods because of these BLAST limitations:
Hardware Solutions: Faster Results

The University of Calgary, as part of the Genome Canada Bioinformatics Platform Project, hosts specialized systems for sequence database searches of all types described above. These systems are the
Paracel® GeneMatcher/BlastMachine and the TimeLogic™ Decypher
®. Full descriptions of these machines are available here. We use special systems to find these more difficult matches, requiring much more computation. By hardware accelerating these methods, it becomes practical to perform them for large scale datasets. In Figure 2, the plot of runtimes for comparable hardware and software methods illustrates the vast performance improvement achieved by the Paracel® (GeneMatcher & BlastMachine) and TimeLogic (Decypher®) systems for both the software and hardware methods.




Figure 2. Time-to-completion comparison of original methods and methods, in batch mode, available at the University of Calgary. For TBLASTX the improvement is 20-fold, for other methods it is at least 100-fold.


Each system has its particular strengths, as summarized in the table below. With this knowledge, we have the ability to maximize throughput, as well as sensitivity and selectivity, by selecting the proper methods on the proper machines for the given input data.


Sequence types Search Method Machine
DNA/protein BLAST/PSI-BLAST BlastMachine
TeraBLAST Decypher
ESTs vs. genomic GeneBLAST Decypher
Smith-Waterman, semi-global GeneMatcher
ESTs vs. protein S-W Frame GeneMatcher
DNA/protein vs. HMMs HMM Search Decypher
ESTs vs. HMMs HMM Frame Search Decypher
HMMs vs. genomic GeneWise Genematcher

Figure 3. Most appropriate sensitive search type for given data.


Availability

Unique Resources

In addition to being the only facility in Canada with this combination of resources, the University of Calgary facility hosts a unique in-house HMM database based on the popular NCBI Clusters of Orthologous Groups database. This database is the largest curated HMM set we know of at over 9000 prokaryotic and eukaryotic gene models. If you're still getting "hypothetical protein" from your PSI-BLAST responses, try searching COGSHMM on the Decypher system.

Casual Usage

Web interfaces for interactive submission by the general public exist for the Decypher and BlastMachine/GeneMatcher solutions.

Batch Jobs

More in depth coverage of how to use these resources is available at the University of Calgary web site (http://magpie.ucalgary.ca/search_resources.xhtml). Funded through Genome Canada and WestGrid grants, these machines are accessible for high throughput usage by Canadian academic researchers. For further details, please contact the Genome Canada Bioinformatics Platform Project Principal Investigator, Dr. Christoph Sensen, csensen . ucalgary.ca.


Software Spotlight Icon2) Software Spotlight

Featured Commercial Software: Phoretix 1D for 1D Gel Image Analysis

Feature article contributed by Russell Trischuk, UBI Application Scientist, r.trischuk[*]usask.ca

In the rapidly changing world of molecular biology high throughput analysis is a fact of life. As a result there is a need for a powerful, comprehensive, and user friendly gel analysis software. The answer to this requirement is Phoretix 1D Advanced, by Nonlinear Dynamics. Phoretix 1D is a complete 1D gel analysis package capable of analyzing any gel (DNA or protein) separated in a single dimension including PCR-based procedures (AFLP and RAPD), RFLP, SSCP, immunoblots, protein purification, and the analysis of post-translational modifications. Phoretix 1D is a highly automated, user friendly, accurate software package that provides the user with endless list of analysis applications. 

This software is capable of:

Multiple Lanes Image       Multi Tiered Image

Pixel Intensity Image

RF Lines Feature

Band ID-Matching Feature


Phoretix 1D is also available in a professional package (Phoretix 1D Pro) that couples the power of Phoretix 1D with a database that allows the user to create multiple libraries enabling the analysis of large cross-gel studies and projects.

If you are involved in any 1D gel-based projects and require an accurate, reliable and user friendly tool for the analysis and management of your data then Phoretix 1D Advanced or Professional should be your package of choice. For further information, trials, etc., please contact the Canadian distributor, UBI, at 866 202 2100, or by email at info^^ubi.ca



new icon3) What's New?



01 Mar 2004 Proteome Analyst Subcellular Localization Paper - Dr. David Wishart, Director of the CBHD, along with other researchers in the Department of Computing Science at the University of Alberta, recently co-authored a paper that appeared in the March 1 issue of Bioinformatics [ABSTRACT] [PDF]. If you missed our Bioinformatics Profile article on the Proteome Analyst Subcellular Localization Prediction Server, please visit http://gchelpdesk.ualberta.ca/news/22jan04/cbhd_news_22jan04.php#profile

01 Mar 2004 Chicken Genome Assembled - The first draft of the chicken genome sequence has been deposited into free public databases around the world. This assembly of genomic sequence data from the Red Jungle Fowl (Gallus gallus), ancestor of domestic chickens, represents the first avian genome to be sequenced. To read the NIH News Advisory, please visit http://www.nhgri.nih.gov/11510730

25 Feb 2004 2004 Benjamin Franklin Award in Bioinformatics - The Bioinformatics Organization, Inc. (Bioinformatics.Org) will present the 2004 Benjamin Franklin Award in Bioinformatics to Lincoln D. Stein of Cold Spring Harbor Laboratory for his "creation of a great number of open-source bioinformatics programs and for championing open-source principals in many venues..." For the Bioinformatics.Org press release, please visit http://bioinformatics.org/forums/forum.php?forum_id=2516

19 Feb 2004 Genome War - James Shreeve has written a new book, The Genome War: How Craig Venter tried to capture the code of life and save the world, chronicling the ins and outs of daily life at Celera Genomics during the race to sequence the human genome. For a good introduction to this book, check out Kevin Davies' recent Around the Bases feature article from BioIT World.

18 Feb 2004 Search for Complex Genes - This Bio-IT World article describes some of the "tricks of the trade", old and new, that are being used by researchers to track down "genes for complex diseases" (http://www.bio-itworld.com/archive/021804/genes.html).

18 Feb 2004 Continued Need for Bioinformatics - Analysts at Navigant Consulting believe that the need for bioinformatics analytical software in the Pharmaceutical and Biotechnology sectors will not go away over the next five years. Source: http://businesswire.com

16 Feb 2004 Bioinformatics Software in Academia - There is an abundance of bioinformatics software in academia. However, in some cases, some projects have never been completed or they contain outdated code. One dilemma that many academics face—resurrect the legacy code, start from scratch, or abandon it altogether. Here is an interesting article from The Scientist on this subject (http://www.the-scientist.com/yr2004/feb/prof2_040216.html).



Event Icon4) Upcoming Events

BIOINFORMATICS TRAINING

Applied Computational Genomics Course - The next ACGC will be held in Winnipeg, Manitoba, on June 12-20, 2004. Early bird registrations must be received before May 1, 2004. For more information, see last issue's Bioinformatics Profile article or visit the course web page.

Canadian Bioinformatics Workshops
:
In 2004, the CBW will be offering three remaining bioinformatics workshops on: 1. Developing the Tools (Deadline: March 6, 2004), 2. Proteomics (Deadline: May 22, 2004), and 3. Genomics (Deadline: June 19, 2004). These courses may count toward a Certificate in Bioinformatics. For further details, please visit http://www.bioinformatics.ca/workshops.php

BioneQ's Courses and Workshops - BioneQ offers a variety of courses and workshops in bioinformatics. Here are some of the courses and workshops that they offer: LIMS Workshop, EST Clustering Workshop, Workshop on Analysis of Expression Data, BASE Demo Installation, and Biojava Bootcamp. For further details, please visit their web site at http://www.bioneq.qc.ca

Training Program in Bioinformatics for Health Research: A bioinformatics training program, leading to a post-graduate diploma, M.Sc., or Ph.D., is "offered through a partnership between the BC Cancer Agency, Simon Fraser University and the University of British Columbia