Canadian Bioinformatics Help Desk Newsletter -- March 4, 2004
 |
 |
CBHD Newsletter
Issue 9 - March 4, 2004
|
CONTENTS:
|
| Welcome to the ninth issue of the
Canadian Bioinformatics Help Desk (CBHD) Newsletter. Back issues of our
newsletter can be viewed at our newsletter archive site (http://gchelpdesk.ualberta.ca/news/news.php).
Our circulation base has reached 1070 subscribers. In this issue's
Bioinformatics Profile, we feature an article on Faster, Higher,
Stronger Sequence Database Searches—homology detection with higher
scores and stronger evidence. In
our Commercial Software Spotlight we feature Phoretix 1D, a
1D gel image analysis tool, distributed in Canada by United Bioinformatica Inc. This biweekly newsletter
is intended to keep Genome Canada researchers and other Help Desk users
informed about new software, events, job postings, conferences,
training
opportunities, interviews, publications, awards, and other newsworthy
items concerning bioinformatics, genomics, and proteomics. The CBHD
newsletter is a mandated service of the Help Desk and we hope to
provide
enough useful content to keep you interested and informed. If you know
of anyone who would be interested in receiving future issues of this
newsletter or contributing content to the newsletter, please email us
at ian---gchelpdesk.ualberta.ca.
To subscribe to this newsletter, click here;
to unsubscribe from this newsletter, send an email message to ian(a)gchelpdesk.ualberta.ca
with the word "unsubscribe" in the subject line or body of the message. |
1) Bioinformatics Profile
|
Sequence Database
Searches: Faster, Higher, Stronger!
Homology Detection:
Higher Scores, Stronger Evidence
Feature
article contributed by Paul Gordon
The key
to a successful database analysis of DNA and protein sequences is to
maximize two search result characteristics: sensitivity and
selectivity.
Improved sensitivity means that fewer true positive matches, i.e.
identified functional domains, will be missed in the results. Improved
selectivity means that fewer false positives, i.e. mistakenly
identified functional domains, will be identified in the results.
Software such as the BLAST suite of programs relies on assumptions
about the nature of the sequence similarities to take computational
shortcuts, and it does this fabulously well. The results from these
searches can produce clear homology candidates. But what about those
results that just give you weak hits and hypothetical proteins? To
detect more distant homologies, or to find matches in error-prone
sequence such as ESTs, these shortcuts cannot always be taken, and the
search space grows exponentially. To identify the maximum number of
functional domains correctly, one must use a whole range of sequence
search tools. These tools include more sensitive pairwise methods such
as Smith-Waterman searches and intron spanning GeneBLASTs, plus Hidden Markov Models such as those found
in Pfam and other InterPro domain family
databases. Figure 1 illustrates the relative sensitivity and
selectivity of various search methods.
Figure 1. Comparison of
sensitivity and selectivity of various sequence search methods. Blue
denotes a software method, red denotes a hardware accelerated method.
These
more sensitive and selective methods will generally yield higher
e-values, and produce stronger evidence in non-obvious cases than the
ubiquitous BLAST. In particular, the frameshift and Hidden Markov
Model methods may find matches that elude the standard software
methods because of these BLAST limitations:
- It is optimized for finding protein similarity
and strong DNA homology, though you can try Iterative PSI-BLAST for
proteins
- Sequence with frameshifts (e.g. ESTs) will not be
aligned properly
- Alignments do not span introns nicely
- Distant or short DNA homology (e.g. primers) may
be missed
- Similarity between a sequence and a family of
sequences may be a lot stronger than to any particular member of
that family
- It will not do global alignments (e.g. make sure
an EST matches completely against a protein)
Hardware Solutions:
Faster Results
The University of Calgary, as part of the Genome Canada Bioinformatics
Platform Project, hosts specialized systems for sequence database
searches of all types described above. These systems are the Paracel®
GeneMatcher/BlastMachine and the TimeLogic™ Decypher®.
Full
descriptions of these machines are available here. We use special
systems to find these more difficult matches, requiring much more
computation. By hardware accelerating these methods, it becomes
practical to perform them for large scale datasets. In Figure 2, the
plot of
runtimes for comparable hardware and software methods illustrates the
vast performance improvement achieved by the Paracel®
(GeneMatcher &
BlastMachine) and TimeLogic™
(Decypher®)
systems for both the software
and
hardware methods.
Figure 2.
Time-to-completion comparison of original methods and methods, in batch
mode, available at the University of Calgary. For TBLASTX the
improvement is 20-fold, for other methods it is at least 100-fold.
Each
system has its particular strengths, as summarized in the table below.
With this knowledge, we have the ability to maximize throughput, as
well as sensitivity and selectivity, by selecting the proper methods on
the
proper machines for the given input data.
| Sequence types |
Search Method |
Machine |
| DNA/protein |
BLAST/PSI-BLAST |
BlastMachine |
| TeraBLAST |
Decypher |
| ESTs vs. genomic |
GeneBLAST |
Decypher |
| Smith-Waterman, semi-global |
GeneMatcher |
| ESTs vs. protein |
S-W Frame |
GeneMatcher |
| DNA/protein vs. HMMs |
HMM Search |
Decypher |
| ESTs vs. HMMs |
HMM Frame Search |
Decypher |
| HMMs vs. genomic |
GeneWise |
Genematcher |
Figure 3. Most
appropriate sensitive search type for given data.
Availability
Unique Resources
In addition to being the only facility in Canada with this combination
of resources, the University of Calgary facility hosts a unique
in-house HMM database based on the popular NCBI Clusters of Orthologous
Groups database. This database is the largest curated HMM set we
know of at
over 9000 prokaryotic and eukaryotic gene models. If you're still
getting "hypothetical protein" from your PSI-BLAST responses, try
searching COGSHMM on the Decypher system.
Casual Usage
Batch Jobs
2) Software Spotlight |
Featured Commercial Software:
Phoretix 1D for 1D Gel Image Analysis
Feature
article contributed by Russell Trischuk, UBI Application
Scientist, r.trischuk[*]usask.ca
In the rapidly
changing world of molecular biology high throughput analysis is a fact
of life. As a result there is a need for a powerful,
comprehensive, and user friendly gel analysis software. The
answer
to this requirement is Phoretix 1D Advanced, by Nonlinear Dynamics.
Phoretix 1D is a complete 1D gel analysis package capable of analyzing
any gel (DNA or protein) separated in a single dimension including
PCR-based procedures (AFLP and RAPD), RFLP, SSCP, immunoblots, protein
purification, and the analysis of post-translational
modifications. Phoretix 1D is a highly automated, user friendly,
accurate software
package that provides the user with endless list of analysis
applications.
This software is capable of:
- Analyzing large format and multi tiered gels that
have become common as a result of the high throughout requirements in
molecular biology labs today

- Accurate and reliable quantitation, utilizing
eight different user defined methods of background subtraction,
extensive normalization and quantity calibration strategies, and
unique to Phoretix 1D, Gaussian fitting as employed in 2D gel analysis

- A comprehensive, highly automated, flexible and
accurate means of MW and PI calculation regardless of gel distortion
and
variation as Phoretix 1D enables the user to calibrate gels based in
flexible Rf lines that effectively account for distortions within gels
and variation between gels

- Accurate
identification and matching of bands
within gels allowing for many different user defined analytical
opportunities including the ability to match all lanes in a gel to one
or more reference lanes that allows easy and accurate editing of gel
images; match data from all gels is automatically calculated and is
available for presentation in a variety of formats (graphically,
numerically or pictorially) based on the user's needs

Phoretix 1D is also
available in a professional package (Phoretix 1D Pro) that couples the
power of Phoretix 1D with a database that allows the user to create
multiple libraries enabling the analysis of large cross-gel studies and
projects.
If you are
involved in any 1D gel-based projects and require an accurate,
reliable
and user friendly tool for the analysis and management of your data
then
Phoretix 1D Advanced or Professional should be your package of
choice. For further information, trials, etc., please contact the
Canadian distributor, UBI, at 866 202
2100, or by email at info^^ubi.ca
3)
What's New?
|
| 01 Mar 2004 |
Proteome Analyst
Subcellular
Localization Paper - Dr.
David Wishart, Director of the CBHD, along with other researchers in
the Department of Computing Science at the University of Alberta,
recently co-authored a paper that appeared in the March 1 issue of Bioinformatics [ABSTRACT]
[PDF].
If you missed our Bioinformatics Profile article on the Proteome Analyst
Subcellular Localization Prediction Server, please visit http://gchelpdesk.ualberta.ca/news/22jan04/cbhd_news_22jan04.php#profile
|
| 01 Mar 2004 |
Chicken Genome
Assembled - The
first draft of the chicken genome sequence has been deposited into free
public databases around the world. This assembly of genomic sequence
data from the Red Jungle Fowl (Gallus
gallus), ancestor of domestic chickens, represents the first
avian genome to be sequenced. To read the NIH News Advisory, please
visit http://www.nhgri.nih.gov/11510730
|
| 25 Feb 2004 |
2004 Benjamin
Franklin Award in Bioinformatics - The
Bioinformatics Organization, Inc. (Bioinformatics.Org) will present the
2004 Benjamin Franklin Award in Bioinformatics to Lincoln D. Stein of
Cold Spring Harbor Laboratory for his "creation
of a great number of open-source bioinformatics programs and for
championing open-source principals in many venues..." For the
Bioinformatics.Org press release, please visit http://bioinformatics.org/forums/forum.php?forum_id=2516
|
| 19 Feb 2004 |
Genome War -
James Shreeve has written a new book, The
Genome War: How Craig Venter tried to capture the code
of life and
save the world, chronicling the ins and outs of daily life at
Celera
Genomics during the race to sequence the human genome. For a good
introduction to this book, check out Kevin Davies' recent Around the
Bases feature article from BioIT World.
|
| 18 Feb 2004 |
Search for Complex
Genes -
This Bio-IT World article describes some of the "tricks of the trade",
old and new, that are being used by researchers to track down "genes
for complex diseases" (http://www.bio-itworld.com/archive/021804/genes.html).
|
| 18 Feb 2004 |
Continued Need for
Bioinformatics -
Analysts at Navigant Consulting believe that the need for
bioinformatics analytical software in the Pharmaceutical and
Biotechnology sectors will not go away over the next five years. Source: http://businesswire.com
|
| 16 Feb 2004 |
Bioinformatics
Software in Academia - There
is an abundance of bioinformatics software in academia. However, in
some cases, some projects have never been completed or they contain
outdated code. One dilemma that many academics face—resurrect the
legacy code, start from scratch, or abandon it altogether. Here is an
interesting article from The Scientist on this subject (http://www.the-scientist.com/yr2004/feb/prof2_040216.html).
|
4) Upcoming Events |
BIOINFORMATICS TRAINING
Applied
Computational Genomics Course - The next ACGC will
be held in
Winnipeg, Manitoba, on June 12-20,
2004. Early bird registrations must be received before May 1, 2004. For more
information, see last issue's Bioinformatics
Profile
article or visit the course
web page.
Canadian
Bioinformatics Workshops: In 2004, the CBW will be
offering
three remaining bioinformatics
workshops
on: 1. Developing the Tools (Deadline: March
6, 2004),
2. Proteomics (Deadline: May 22, 2004),
and 3. Genomics (Deadline: June 19, 2004).
These courses may count toward a Certificate in
Bioinformatics.
For further details, please visit http://www.bioinformatics.ca/workshops.php
BioneQ's Courses
and Workshops - BioneQ offers a
variety of courses and workshops in bioinformatics. Here are some of
the courses and workshops that they offer: LIMS Workshop, EST
Clustering Workshop, Workshop on Analysis of Expression Data, BASE Demo
Installation, and Biojava Bootcamp. For further details, please visit
their web site at http://www.bioneq.qc.ca
Training
Program in Bioinformatics for Health Research: A bioinformatics
training program, leading to a post-graduate diploma, M.Sc., or Ph.D.,
is "offered through a partnership between the BC Cancer Agency, Simon
Fraser University and the University of British Columbia