QSAR - Canadian Bioinformatics Help Desk Newsletter -- March 4, 2004 from ian-$-redpoll.pharmacy.ualberta.ca on 2004-03-04 (QSAR and MS List)

From: ian-$-redpoll.pharmacy.ualberta.ca
Date: Thu, 4 Mar 2004 17:32:21 -0700

Canadian Bioinformatics Help Desk Newsletter -- March 4, 2004


CBHD Newsletter Issue 9 - March 4, 2004	CONTENTS: Bioinformatics Profile Faster Sequence Database Searches Software Spotlight Commercial - Phoretix 1D What's New? Upcoming Events CPI '04 - Register now Software Repository Call for submissions Deposit Software Bioinformatics Jobs CBHD Registration Free Subscription

Online version of this newsletter:
http://gchelpdesk.ualberta.ca/news/04mar04/cbhd_news_04mar04.php

Welcome to the ninth issue of the Canadian Bioinformatics Help Desk (CBHD) Newsletter. Back issues of our newsletter can be viewed at our newsletter archive site (http://gchelpdesk.ualberta.ca/news/news.php). Our circulation base has reached 1070 subscribers. In this issue's Bioinformatics Profile, we feature an article on Faster, Higher, Stronger Sequence Database Searches—homology detection with higher scores and stronger evidence. In our Commercial Software Spotlight we feature Phoretix 1D, a 1D gel image analysis tool, distributed in Canada by United Bioinformatica Inc. This biweekly newsletter is intended to keep Genome Canada researchers and other Help Desk users informed about new software, events, job postings, conferences, training opportunities, interviews, publications, awards, and other newsworthy items concerning bioinformatics, genomics, and proteomics. The CBHD newsletter is a mandated service of the Help Desk and we hope to provide enough useful content to keep you interested and informed. If you know of anyone who would be interested in receiving future issues of this newsletter or contributing content to the newsletter, please email us at ian---gchelpdesk.ualberta.ca. To subscribe to this newsletter, click here; to unsubscribe from this newsletter, send an email message to ian(a)gchelpdesk.ualberta.ca with the word "unsubscribe" in the subject line or body of the message.

1) Bioinformatics Profile

Sequence Database Searches: Faster, Higher, Stronger!
Homology Detection: Higher Scores, Stronger Evidence

Feature article contributed by Paul Gordon

The key to a successful database analysis of DNA and protein sequences is to maximize two search result characteristics: sensitivity and selectivity. Improved sensitivity means that fewer true positive matches, i.e. identified functional domains, will be missed in the results. Improved selectivity means that fewer false positives, i.e. mistakenly identified functional domains, will be identified in the results. Software such as the BLAST suite of programs relies on assumptions about the nature of the sequence similarities to take computational shortcuts, and it does this fabulously well. The results from these searches can produce clear homology candidates. But what about those results that just give you weak hits and hypothetical proteins? To detect more distant homologies, or to find matches in error-prone sequence such as ESTs, these shortcuts cannot always be taken, and the search space grows exponentially. To identify the maximum number of functional domains correctly, one must use a whole range of sequence search tools. These tools include more sensitive pairwise methods such as Smith-Waterman searches and intron spanning GeneBLASTs, plus Hidden Markov Models such as those found in Pfam and other InterPro domain family databases. Figure 1 illustrates the relative sensitivity and selectivity of various search methods.

Figure 1. Comparison of sensitivity and selectivity of various sequence search methods. Blue denotes a software method, red denotes a hardware accelerated method.

These more sensitive and selective methods will generally yield higher e-values, and produce stronger evidence in non-obvious cases than the ubiquitous BLAST. In particular, the frameshift and Hidden Markov Model methods may find matches that elude the standard software methods because of these BLAST limitations:

It is optimized for finding protein similarity and strong DNA homology, though you can try Iterative PSI-BLAST for proteins
Sequence with frameshifts (e.g. ESTs) will not be aligned properly
Alignments do not span introns nicely
Distant or short DNA homology (e.g. primers) may be missed
Similarity between a sequence and a family of sequences may be a lot stronger than to any particular member of that family
It will not do global alignments (e.g. make sure an EST matches completely against a protein)

Hardware Solutions: Faster Results

The University of Calgary, as part of the Genome Canada Bioinformatics Platform Project, hosts specialized systems for sequence database searches of all types described above. These systems are the Paracel® GeneMatcher/BlastMachine and the TimeLogic™ Decypher®. Full descriptions of these machines are available here. We use special systems to find these more difficult matches, requiring much more computation. By hardware accelerating these methods, it becomes practical to perform them for large scale datasets. In Figure 2, the plot of runtimes for comparable hardware and software methods illustrates the vast performance improvement achieved by the Paracel® (GeneMatcher & BlastMachine) and TimeLogic™ (Decypher®) systems for both the software and hardware methods.

Figure 2. Time-to-completion comparison of original methods and methods, in batch mode, available at the University of Calgary. For TBLASTX the improvement is 20-fold, for other methods it is at least 100-fold.

Each system has its particular strengths, as summarized in the table below. With this knowledge, we have the ability to maximize throughput, as well as sensitivity and selectivity, by selecting the proper methods on the proper machines for the given input data.

Sequence types	Search Method	Machine
DNA/protein	BLAST/PSI-BLAST	BlastMachine
DNA/protein	TeraBLAST	Decypher
ESTs vs. genomic	GeneBLAST	Decypher
ESTs vs. genomic	Smith-Waterman, semi-global	GeneMatcher
ESTs vs. protein	S-W Frame	GeneMatcher
DNA/protein vs. HMMs	HMM Search	Decypher
ESTs vs. HMMs	HMM Frame Search	Decypher
HMMs vs. genomic	GeneWise	Genematcher

Figure 3. Most appropriate sensitive search type for given data.

Availability

Unique Resources

In addition to being the only facility in Canada with this combination of resources, the University of Calgary facility hosts a unique in-house HMM database based on the popular NCBI Clusters of Orthologous Groups database. This database is the largest curated HMM set we know of at over 9000 prokaryotic and eukaryotic gene models. If you're still getting "hypothetical protein" from your PSI-BLAST responses, try searching COGSHMM on the Decypher system.

Casual Usage

Web interfaces for interactive submission by the general public exist for the Decypher and BlastMachine/GeneMatcher solutions.

Batch Jobs

More in depth coverage of how to use these resources is available at the University of Calgary web site (http://magpie.ucalgary.ca/search_resources.xhtml). Funded through Genome Canada and WestGrid grants, these machines are accessible for high throughput usage by Canadian academic researchers. For further details, please contact the Genome Canada Bioinformatics Platform Project Principal Investigator, Dr. Christoph Sensen, csensen . ucalgary.ca.

2) Software Spotlight

Featured Commercial Software: Phoretix 1D for 1D Gel Image Analysis

Feature article contributed by Russell Trischuk, UBI Application Scientist, r.trischuk[*]usask.ca

In the rapidly changing world of molecular biology high throughput analysis is a fact of life. As a result there is a need for a powerful, comprehensive, and user friendly gel analysis software. The answer to this requirement is Phoretix 1D Advanced, by Nonlinear Dynamics. Phoretix 1D is a complete 1D gel analysis package capable of analyzing any gel (DNA or protein) separated in a single dimension including PCR-based procedures (AFLP and RAPD), RFLP, SSCP, immunoblots, protein purification, and the analysis of post-translational modifications. Phoretix 1D is a highly automated, user friendly, accurate software package that provides the user with endless list of analysis applications.

This software is capable of:

Analyzing large format and multi tiered gels that have become common as a result of the high throughout requirements in molecular biology labs today

Multiple Lanes Image Multi Tiered Image

Accurate and reliable quantitation, utilizing eight different user defined methods of background subtraction, extensive normalization and quantity calibration strategies, and unique to Phoretix 1D, Gaussian fitting as employed in 2D gel analysis

A comprehensive, highly automated, flexible and accurate means of MW and PI calculation regardless of gel distortion and variation as Phoretix 1D enables the user to calibrate gels based in flexible Rf lines that effectively account for distortions within gels and variation between gels

RF Lines Feature

Accurate identification and matching of bands within gels allowing for many different user defined analytical opportunities including the ability to match all lanes in a gel to one or more reference lanes that allows easy and accurate editing of gel images; match data from all gels is automatically calculated and is available for presentation in a variety of formats (graphically, numerically or pictorially) based on the user's needs

Band ID-Matching Feature

Phoretix 1D is also available in a professional package (Phoretix 1D Pro) that couples the power of Phoretix 1D with a database that allows the user to create multiple libraries enabling the analysis of large cross-gel studies and projects.

If you are involved in any 1D gel-based projects and require an accurate, reliable and user friendly tool for the analysis and management of your data then Phoretix 1D Advanced or Professional should be your package of choice. For further information, trials, etc., please contact the Canadian distributor, UBI, at 866 202 2100, or by email at info^^ubi.ca

3) What's New?

01 Mar 2004	Proteome Analyst Subcellular Localization Paper - Dr. David Wishart, Director of the CBHD, along with other researchers in the Department of Computing Science at the University of Alberta, recently co-authored a paper that appeared in the March 1 issue of Bioinformatics [ABSTRACT] [PDF]. If you missed our Bioinformatics Profile article on the Proteome Analyst Subcellular Localization Prediction Server, please visit http://gchelpdesk.ualberta.ca/news/22jan04/cbhd_news_22jan04.php#profile
01 Mar 2004	Chicken Genome Assembled - The first draft of the chicken genome sequence has been deposited into free public databases around the world. This assembly of genomic sequence data from the Red Jungle Fowl (Gallus gallus), ancestor of domestic chickens, represents the first avian genome to be sequenced. To read the NIH News Advisory, please visit http://www.nhgri.nih.gov/11510730
25 Feb 2004	2004 Benjamin Franklin Award in Bioinformatics - The Bioinformatics Organization, Inc. (Bioinformatics.Org) will present the 2004 Benjamin Franklin Award in Bioinformatics to Lincoln D. Stein of Cold Spring Harbor Laboratory for his "creation of a great number of open-source bioinformatics programs and for championing open-source principals in many venues..." For the Bioinformatics.Org press release, please visit http://bioinformatics.org/forums/forum.php?forum_id=2516
19 Feb 2004	Genome War - James Shreeve has written a new book, The Genome War: How Craig Venter tried to capture the code of life and save the world, chronicling the ins and outs of daily life at Celera Genomics during the race to sequence the human genome. For a good introduction to this book, check out Kevin Davies' recent Around the Bases feature article from BioIT World.
18 Feb 2004	Search for Complex Genes - This Bio-IT World article describes some of the "tricks of the trade", old and new, that are being used by researchers to track down "genes for complex diseases" (http://www.bio-itworld.com/archive/021804/genes.html).
18 Feb 2004	Continued Need for Bioinformatics - Analysts at Navigant Consulting believe that the need for bioinformatics analytical software in the Pharmaceutical and Biotechnology sectors will not go away over the next five years. Source: http://businesswire.com
16 Feb 2004	Bioinformatics Software in Academia - There is an abundance of bioinformatics software in academia. However, in some cases, some projects have never been completed or they contain outdated code. One dilemma that many academics face—resurrect the legacy code, start from scratch, or abandon it altogether. Here is an interesting article from The Scientist on this subject (http://www.the-scientist.com/yr2004/feb/prof2_040216.html).

4) Upcoming Events

BIOINFORMATICS TRAINING

Applied Computational Genomics Course - The next ACGC will be held in Winnipeg, Manitoba, on June 12-20, 2004. Early bird registrations must be received before May 1, 2004. For more information, see last issue's Bioinformatics Profile article or visit the course web page.

Canadian Bioinformatics Workshops: In 2004, the CBW will be offering three remaining bioinformatics workshops on: 1. Developing the Tools (Deadline: March 6, 2004), 2. Proteomics (Deadline: May 22, 2004), and 3. Genomics (Deadline: June 19, 2004). These courses may count toward a Certificate in Bioinformatics. For further details, please visit http://www.bioinformatics.ca/workshops.php

BioneQ's Courses and Workshops - BioneQ offers a variety of courses and workshops in bioinformatics. Here are some of the courses and workshops that they offer: LIMS Workshop, EST Clustering Workshop, Workshop on Analysis of Expression Data, BASE Demo Installation, and Biojava Bootcamp. For further details, please visit their web site at http://www.bioneq.qc.ca

Training Program in Bioinformatics for Health Research: A bioinformatics training program, leading to a post-graduate diploma, M.Sc., or Ph.D., is "offered through a partnership between the BC Cancer Agency, Simon Fraser University and the University of British Columbia." For more information, visit http://bioinformatics.bcgsc.ca

Cold Spring Harbor Laboratory (CSHL) is offering a special 2 day course, summer, and fall courses. The deadlines for summer and fall courses are March 15, 2004 and July 15, 2004, respectively. For more information, please visit http://meetings.cshl.org/2004/2004courses.htm

BIOINFORMATICS MEETINGS

12-16 May 2004	2004 CSHL Meeting on The Biology of the Genomes: This meeting will take place in Cold Spring Harbor on May 12-16, 2004. For further details, please visit http://meetings.cshl.org/2004/2004genome.htm
14-16 May 2004	CPI 2004, MONTREAL, CANADA: The Fourth International Conference of the Canadian Proteomics Initiative (CPI) will be held in Montreal, Canada, on May 14-16, 2004. The CPI 2004 tutorials will take place on May 17-18, 2004. The deadline for abstracts is March 15, 2004. Registration closes April 13, 2004. For more information, visit http://www.pence.ualberta.ca/CPI/index.php?home
20-22 May 2004	Biotech China 2004: "Biotech China 2004 is an international, multidisciplinary conference designed to offer critical perspectives on the current status and future of cutting-edge genomic technologies such as RNAi, systems biology, functional genomics, proteomics and microarray." The deadline for acceptance of oral presentations has been extended to March 10, 2004. For more information, please visit http://www.biotechcn.com/
31 Jul-4 Aug 2004	ISMB/ECCB 2004: "In 2004—for the first time ever—Intelligent Systems for Molecular Biology (ISMB) will be held jointly with the European Conference on Computational Biology (ECCB), in conjunction with Genes, Proteins and Computers VIII" on July 31-August 4, 2004, in Glasgow, UK. Registration opens March 1, 2004. The poster submission deadline is April 19, 2004. For a list of key dates, please visit http://www.iscb.org/ismbeccb2004/keydates.html. For further details, please visit http://www.iscb.org/ismbeccb2004/
16-20 Aug 2004	CSB2004: "The 3rd annual Computational Systems Bioinformatics conference, CSB2004, is being organized once again by the IEEE Computer Society Technical Committee on Bioinformatics under the theme—Systems Bioinformatics." This conference will be held in Stanford, California, USA. The deadline for the submission of papers is March 22, 2004. The poster submission deadline is May 17, 2004. Pre-conference tutorials will be held on August 16, 2004. Post-conference half-day workshops will be held on August 20, 2004. For more information, please visit the conference web page: http://conferences.computer.org/bioinformatics/
23 Aug 2004	ECAI 2004: The 16th European Conference on Artificial Intelligence (ECAI) will be held in Valencia, Spain. On August 23, 2004, there will be a workshop entitled, "Data Mining in Functional Genomics and Proteomics: Current Trends and Future Directions" (http://www.softwareresearch.ca/ecai-bio/index.html). The deadline for the submission of papers is March 31, 2004. For further details, please visit the conference web site at http://www.dsic.upv.es/ecai2004/

5) Help Desk Software Repository

The Help Desk software repository is where researchers may upload or download bioinformatics programs of interest. Currently the repository has 50 programs. These are freeware packages that are available for anyone to download and install on their own computer. Many of the programs in the Help Desk repository have been thoroughly tested and a number have been published as research articles. Please take advantage of this resource. Downloads are encouraged and submissions are always welcome. The repository can be found at: http://gchelpdesk.ualberta.ca/repository/.

Attention all programmers—we encourage you to submit your favourite bioinformatics software to the Help Desk Software Repository

Please email Ian Forsythe (ian[A]gchelpdesk.ualberta.ca) if you would like to deposit software into the software repository. To deposit software now, please visit http://www.gchelpdesk.ualberta.ca/repository/SubmitRealSoftware.php

In case you missed it, last issue's Software Spotlight article highlighted parts one and two of the Programming in Perl tutorial from our Software Repository.

6) Bioinformatics Jobs

This is a resource for advertising positions in bioinformatics and computational biology. If you have a job you would like posted in this newsletter please email curators,+,bioinformatics.ca directly. Job postings will be carried for a maximum of 4 issues (8 weeks) unless the position is filled prior to that date.

Genome Canada is advertising several positions. Check out their career brochure (http://www.genomecanada.ca/GCmedia/CareerOpportunities.pdf) and their latest job postings (http://www.genomecanada.ca/GCcarriere/index.asp?l=e).

Job Title	Location	Date Posted
Tenure Stream Assistant Professor	Toronto, ON	March 3, 2004
SHARCNet Chair in Bioinformatics	London, ON	March 2, 2004
NSERC Industrial Research Chair in Biomedical Mass Spectrometry	Winnipeg, MB	March 2, 2004
Database Administrator; Chemistry Database Curators [details]	Toronto, ON	February 26, 2004
Position in functional genomics	Quebec, PQ	February 26, 2004
Postdoctoral position	Montreal / Quebec City, PQ	February 23, 2004
BIOINFORMATICS ANALYST	Montreal (St-Laurent), PQ	February 23, 2004
BIOINFORMATICS SOFTWARE SPECIALIST	Montreal (Saint-Laurent), PQ	February 19, 2004
DNA Sequence Finisher, RAII	Vancouver, BC	February 16, 2004
Molecular Database Curators	Toronto, ON	February 5, 2004
Postdoctoral position in statistical and evolutionary bioinformatics/phylogenetics	Halifax, NS	February 4, 2004
Curators	Montreal, PQ	January 30, 2004
Application Scientists (Part Time)	Calgary, AB	January 26, 2004
Gene Expression Research Associate (plus numerous additional positions at the Genome Sciences Centre)	Vancouver, BC	January 26, 2004
Computational Biology position (Assistant/Associate Professor tenure track)	Hamilton, ON	January 24, 2004
Future positions: Bioinformatics, molecular microbiology, and genomics	Burnaby, BC	Starting in 2004-2005

Source: http://www.bioinformatics.ca/jobs except for the Data Administrator, Chemistry Database Curators, Computational Biology tenure track, and future positions

7) CBHD Registration

WHY REGISTER?

Registering with the Canadian Bioinformatics Help Desk benefits both you and us.

Benefits include:

Add your contact information to our Directory of Canadian Researchers, facilitating communication and collaboration with other Canadian researchers.
Receive updates about bioinformatics software, news, events, conferences, and training sessions via our biweekly newsletter.
Deposit software into our Bioinformatics Software Repository.
Visit our registration web page to register now (http://www.gchelpdesk.ualberta.ca/user/register.php).

Free Subscription

To start your free subscription to this newsletter, send an email message to ian*_*gchelpdesk.ualberta.ca with the word "subscribe" in the subject line or body of the message. Please forward this newsletter to any interested colleagues or collaborators. We appreciate your comments; send your comments and feedback about this newsletter to ian~~gchelpdesk.ualberta.ca

Ian J. Forsythe, MSc
Bioinformatician
Canadian Bioinformatics Help Desk
University of Alberta
Department of Biological Sciences, CW 405
Edmonton, AB
Canada T6G 2E9

Phone: (780) 492-5969
Fax: (780) 492-9234
Email: ian#%#gchelpdesk.ualberta.ca
Website: http://gchelpdesk.ualberta.ca

The CBHD is sponsored by:

Received on 2004-03-04 - 21:32 GMT