QSAR - Canadian Bioinformatics Help Desk Newsletter -- February 19, 2004 from ian on 2004-02-20 (QSAR and MS List)

From: ian
Date: Fri, 20 Feb 2004 12:37:23 -0700

Canadian Bioinformatics Help Desk Newsletter -- February 19, 2004


CBHD Newsletter Issue 8 - February 19, 2004	CONTENTS: Bioinformatics Profile Genome Canada Bioinformatics Workshop Winnipeg Registration Software Spotlight Freeware - Programming in Perl Commercial - OptGene™ What's New? Upcoming Events CPI - Register now Software Repository Call for submissions Bioinformatics Jobs CBHD Registration Free Subscription

Online version of this newsletter:
http://gchelpdesk.ualberta.ca/news/19feb04/cbhd_news_19feb04.php

Welcome to the eighth issue of the Canadian Bioinformatics Help Desk (CBHD) Newsletter. Back issues of our newsletter can be viewed at our newsletter archive site (http://www.gchelpdesk.ualberta.ca/news/news.php). Our circulation base has reached 1050 subscribers. In this issue's Bioinformatics Profile, we feature an article on the Genome Canada Bioinformatics Workshop. In this issue's Software Spotlight, we highlight Programming in Perl, a collection of scripts for learning Perl. In our Commercial Software Spotlight we feature OptGene™, a novel Gene Optimizing tool, distributed in Canada by United Bioinformatica Inc. This biweekly newsletter is intended to keep Genome Canada researchers and other Help Desk users informed about new software, events, job postings, conferences, training opportunities, interviews, publications, awards, and other newsworthy items concerning bioinformatics, genomics and proteomics. The CBHD newsletter is a mandated service of the Help Desk and we hope to provide enough useful content to keep you interested and informed. If you know of anyone who would be interested in receiving future issues of this newsletter or contributing content to the newsletter, please email us at ian : gchelpdesk.ualberta.ca. To unsubscribe from this newsletter, send an email message to ian[a]gchelpdesk.ualberta.ca with the word "unsubscribe" in the subject line or body of the message.

1) Bioinformatics Profile

Genome Canada Bioinformatics Workshop

Feature article contributed by Brian Fristensky and Sophie Chung

Bioinformatics in the post-genomic era requires the analysis of large and diverse datasets using automated tools. While many Web-based tools are available to the wet-lab researcher, the Web is not well suited for tasks beyond single-sequence annotation. Researchers need to become productive in a server-based Unix environment with its wealth of scripting and automation tools. Even at an entry-level, this can be an intimidating task if proper guidance is not available.

The Genome Canada Bioinformatics Platform is working to empower researchers by teaching a hands-on course, outlining tried-and-proven approaches as well as new developments, with lectures by a panel of experts and integrated assignments to develop practical skills. The course is taught using tools and services which are available through the Genome Canada Bioinformatics Platform. This workshop takes place twice a year, once in Western Canada and once in Eastern Canada.

Two Genome Canada Bioinformatics workshops, entitled 'The Applied Computational Genomics Course', successfully took place in 2003. The first in the series of workshops was held last Summer in Calgary (see class photo in Figure 1), while the second workshop was held in the Fall in Toronto. Below are just a few comments received by various participants at the end of each workshop:

"Overall, a really good course. If I retain 10% it will still have been useful. A real eye opener as to what will be coming down the pipe in future years!" June 2003 workshop, Calgary, AB

"The contents of the workshop fitted my needs almost perfectly as I was looking for those bioinformatics tools myself. The workshop also upgraded and updated my knowledge and practical skills. I am sure my work will be much more efficient and could be on another higher level." June 2003 workshop, Calgary, AB

"The focus on new technologies was great. My knowledge of bioinformatics was limited to first user applications (i.e. NCBI website applications) and this 'next-phase' focus was really good for me personally!" November 2003 workshop, Toronto, ON

"Good intro to the basics. The prepared assignment handouts were indispensable for learning, and really helped my understanding." November 2003 workshop, Toronto, ON

The next workshop is scheduled to take place June 12-20, 2004, in Winnipeg, Manitoba. Topics covered will include:

Canadian Bioinformatics Resource (CBR)
Canadian Bioinformatics Help Desk (CBHD)
Basic Unix skills
Working on a network-centric Unix desktop
BIRCH: Working with large numbers of sequences on a comprehensive bioinformatics system
Perl scripting, BioPerl and SeqHound: Quick automation of data analysis tasks and utilization of web services
BioMOBY: a transparent software layer that automatically finds and uses web services.
How biological concepts are phrased in computational models and how such models can be made operational in a database
BIND: Analyzing a gene in its biological context and retrieving lists of functionally related genes, i.e. interactors or co-regulated genes
Retrieving datasets from public databases based on lists of functionally related genes
Strategies to discover correlations in biological data
High-throughput genome annotation with TimeLogic and GeneMatcher hardware
MAGPIE: Automated genome analysis and annotation
BLUEJAY: Genome data visualization

Additional information about this workshop and future workshops can be found at http://www.gcbioinformatics.ca

Calgary Workshop Photo

Figure 1. Participants from the Calgary Applied Computational Genomics Course, held in June 2003.

2) Software Spotlight

Programming in Perl: A Perl tutorial for new programmers

Feature article contributed by Ian J. Forsythe and Paul Stothard

In your day-to-day lab activities, have you ever found yourself carrying out the same set of calculations over and over? Imagine having to perform the same operation a thousand times. Data analysis can be an important part of a biologist's job, especially once all the data has been gathered and it is time to make sense of it and write a paper or submit a report. Have you ever wondered if there is a better way to get the job done? A set of calculations that is performed over and over can easily be automated using a scripting language such as Perl. Perl has been very popular with bioinformaticians because of its flexibility, its ease of use, and its powerful pattern matching features—perfect for finding a specific DNA or protein sequence motif. Perl enables users to write "quick and dirty" programs that get the job done. At the Help Desk, we have developed a set of sample scripts that demonstrate some simple applications of Perl in biological data analysis and also teach the biologist how to get started in writing their own Perl scripts.

This collection of programs is intended to introduce the Perl programming language to students with little or no programming experience. Included with every script are detailed comments that describe what each line of code does when executed. Students are encouraged to type (minus the comments), edit, and run these programs. By doing so they will become familiar with the notations used by Perl, and the error messages that arise from common typos. These programs are not meant to illustrate the best way to complete a particular task. Several features of Perl have been omitted for the sake of simplicity. Students are encouraged to write faster or more compact versions of these programs using subroutines, objects, or more complex regular expressions. As Larry Wall, the inventor of Perl, would say, "There's more than one way to do it!"

After typing out a script, Unix users can make it executable by typing chmod +x script_filename at the command line. Windows users need to install ActivePerl from ActiveState (http://aspn.activestate.com/ASPN/Downloads/ActivePerl/Source) and enter perl script_filename at the Windows command prompt. The programs can be run using the sample input data provided or the user's own data. Figure 2 shows a simple script, called 1_typeseq.pl, which prompts the user for the name of a DNA sequence and the corresponding raw DNA sequence. The comments included with each script provide a high level description about what the individual statements do when they are executed by the Perl interpreter. The scripts in the Programming in Perl tutorial become progressively more complex. By script four of part II, students will be using Perl to communicate with NCBI's BLAST servers. For information about the other scripts that make up this tutorial, please refer to Table 1 below.

typeseq.pl script

Figure 2. An example of an extensively commented Perl script from the Programming in Perl tutorial. The commented lines start with a '#' sign while the interpreted lines of Perl code do not. The comments help the new programmer learn about the structure of a typical Perl script and provide an explanation of how different Perl statements are used.

Name of Script	Function
Part I Scripts
1_typeseq.pl	-This script prompts the user for a name and a DNA sequence and then prints the information to the screen. -Demonstrates the use strict and use warnings pragmas (these are compiler options), declaring variables, basic syntax, print statement, standard input and output, and the chomp statement.
2_seqfromfile.pl	-This script reads a DNA sequence (in FASTA format) from a file, parses its individual components, and prints them to the screen. -Demonstrates opening and reading of files, opening and closing file handles, and the die statement.
3_readmany.pl	-This script reads multiple DNA sequences (in FASTA format) from a file and prints the name, length, and %GC content of each sequence. -Demonstrates arrays, the special variables $/ and $1, the while loop, the if statement, the else statement, the next statement, the matching operator, the substitution operator, the length function for strings, the string equality operator eq, the push statement for adding elements to an array, the sort function for sorting the elements of an array, the reverse function for reversing the elements of an array, the sprintf function for formatting strings, and the scalar function for determining the number of elements in an array.
4_translate.pl	-This script reads multiple DNA sequences (in FASTA format) from a file and translates the DNA into protein in reading frame 1. -Demonstrates hash tables (associative arrays), the lowercase function lc, the for loop, the substring function substr, the exists function for hash tables, and the join function for combining the elements of an array into a string.
5_revcomp.pl	-This script reads a single DNA sequence (in FASTA format) from a file and converts the DNA sequence into its reverse-complement form. Sequence composition statistics are printed for the original and reverse-complement sequence. -Demonstrates various regular expressions for use with the matching and substitution operators, the translation operator (tr), and the split function for converting a string into an array.
6_orfs_a.pl	-This script reads a single DNA sequence (in FASTA format) from a file and finds the open reading frames (ORFs) on the direct strand using pattern matching (regular expressions). The ORF ranges are then printed. -Demonstrates the pos function for monitoring the regular expression search, the elsif statement, the modulus operator, the pop function for removing and returning the last element in an array, the last statement for exiting a loop, the foreach loop, and the special variable $_.
6_orfs_b.pl	-This script is similar to 6_orfs_a.pl, except that it uses for loops to scan the DNA sequence in each of the three reading frames on the direct strand for start and stop codons. It is approximately 30 times slower than 6_orfs_a.pl. The speed difference is probably due to the optimization of the built in pattern matching functions, which are used in 6_orfs_a.pl. -Demonstrates the foreach loop and the special variable $_.
7_transorfs.pl	-This script reads a single DNA sequence (in FASTA format) from a file and then generates the protein translations of ORFs found in any of the six reading frames. The protein ORFs are displayed in FASTA format with an informative title. A minimum ORF length prevents short ORFs from being displayed. -Uses portions of the early programs, with slight modifications.
Part II Scripts
1_orf_finder_sequence.pl	-This script submits a sequence to NCBI's ORF finder and writes the unparsed results to a file. -Demonstrates Library for the WWW in Perl, the unless statement, and writing to a .html file.
2_orf_finder_parser.pl	-This script parses the output produced by 1_orf_finder_sequence.pl, extracts the ORF information, and writes it to a text file.
3_genscan.pl	-This script reads a single DNA sequence (in FASTA format) from a file and sends the sequence to MIT's Genscan web server; the predicted gene translations returned by Genscan are written to a file. -Further demonstrates and discusses the Library for the WWW in Perl.
4_blast.pl	-This script reads multiple protein sequences (in FASTA format) from a file and submits them to NCBI's BLAST server; the results for each sequence are written to a file. -Demonstrates the application program interface (API) for accessing the NCBI BLAST server, how to use Perl to submit BLAST queries, and how to retrieve the results using the assigned request ID.
5_MW.pl	-This script reads multiple protein sequences (in FASTA format) from a file, determines the amino acid usage and molecular weight of each sequence, and determines the combined amino acid usage for all the sequences. The results are written to a file. -Further demonstrates hash tables and writing to a file.

Table 1. This table provides a brief description of the scripts that make up Parts I and II of Programming in Perl.

To download your free copies of Programming in Perl, Parts I and II, please visit our Software Repository:

Part I http://www.gchelpdesk.ualberta.ca/repository/VersionDetails.php?fileId=38&submissionId=29
Part II http://www.gchelpdesk.ualberta.ca/repository/VersionDetails.php?fileId=39&submissionId=30

Recommended references:

Beginning Perl for Bioinformatics
Learning Perl, 3rd Edition—Making Easy Things Easy and Hard Things Possible
Teach Yourself Perl in 21 Days
Mastering Perl for Bioinformatics

Featured Commercial Software: OptGene™

Feature article contributed by Lindsay Moir

Production of genetically modified organisms to achieve higher productivity, disease resistance and other desirable properties is still based on naturally occurring gene sequences. Naturally occurring sequences prove futile in modern biotechnology with increased focus on safety requirement for recombinant products and at the same time higher flexibility in protein design. These gene sequences seldom meet the ever growing demand for optimized yields in heterogeneous systems.

OptGene™ is a novel Gene Optimizing tool that optimizes naturally occurring genes to achieve higher productivity, at the same time giving higher flexibility for protein design. The tool optimizes the genes using only the sequence information and the choice of expression system. OptGene™ allows the researcher to adapt genes and their products precisely to their specific requirements.

OptGene™ achieves optimization through

Adaptation of codon usage to that of host
Directed Mutagenesis
Introduction of restriction sites
Knockout of cryptic splice sites, RNA destabilizing sequences and other undesirable sequence signals
Removal of secondary structures in RNA
Reduction of transcription regions in unused frames

If you need a) to optimize expression from a transgenic organism (e.g. alfalfa, canola, mouse, goat, etc.), b) a management tool that can take you through the process, c) to manage all of the information, d) to explore a large number of possible alternatives, then OptGene™ is likely in your future. For further information on OptGene™, please visit www.ubi.ca/optgene.htm. Information on other bioinformatic products that UBI offers can be found at www.ubi.ca/products.htm.

3) What's New?

19 Feb 2004	CBHD Bioinformatics Needs Survey Report - We recently conducted a Canada-wide bioinformatics needs survey. We asked Genome Canada researchers what their bioinformatics and computational biology needs were. I wish to thank all who participated in this survey. Click here to view our report [PDF] [DOC].
16 Feb 2004	S2K Chooses GeneLinker Products for Advanced Data Analysis - "S2K is a consortium of highly recognized researchers from across Canada funded by a $15 million grant from Genome Canada, Genome Quebec and Ontario Genomics Institute. The S2K program, which is hosted by the Université de Montréal, aims to study the functional genomics, pharmacogenomics and proteomics of the immune response regarding HIV and HCV infections, SARS, transplant rejection and rheumatoid arthritis diseases. The ultimate goal of this program is to develop a bioinformatic model to predict susceptibility and progression of the targeted diseases as well as the response to a given therapy." Source: http://www.prweb.com/releases/2004/2/prweb104338.htm
16 Feb 2004	Bioinformatics Network Cheered - "European Virtual Institute for Genome Annotation will have a global impact. An initiative to tackle the current fragmentation of bioinformatics research across Europe has been welcomed by scientists in both Europe and the United States." Source: http://www.biomedcentral.com/news/20040216/03/
14 Feb 2004	BIOKNOPPIX Distribution - The High Performance Computing Facility at the University of Puerto Rico has released a Knoppix Linux Live CD distribution customized for the molecular biologist. Here is some of the software included with BIOKNOPPIX: EMBOSS 2.8.0, jemboss, artemis, clustal, Cn3D, ImageJ, BioPython, Rasmol, Bioperl, Bioconductor. Source: http://bioinformatics.org/forums/forum.php?forum_id=2500
12 Feb 2004	RNAsoft Release - Researchers in the Department of Computer Science at the University of British Columbia recently released online versions of RNAsoft, "software for RNA/DNA secondary structure prediction and design." RNAsoft consists of three main programs: PairFold, CombFold, and RNA Designer. For further details, see their recent publication in Nucleic Acids Research (http://nar.oupjournals.org/cgi/content/full/31/13/3416). Source: http://bioinformatics.ca/weblogs/
10 Feb 2004	Venter submits whole genome shotgun assemblies to GenBank - "The sequences of the whole genome shotgun assemblies (WGSA) generated by Venter et al. at Celera Genomics and The Center for the Advancement of Genomics (TCAG) have been deposited in the GenBank database (see accession nos. AADD00000000, AADC00000000, and AADB00000000). This data release accompanies a paper in PNAS comparing the sequence from the International Human Genome Sequencing Consortium (NCBI Build 34) with the WGSA." Source: http://www.bioinformatics.ca/weblogs/log.php?wid=128
06 Feb 2004	Science Issue on Mathematics in Biology - The Feb. 6 issue of Science is devoted to Mathematics in Biology. Several Genome Canada researchers co-authored this issue's paper entitled "Global Mapping of the Yeast Genetic Interaction Network". Source: http://www.bioinformatics.ca/weblogs/log.php?wid=127

4) Upcoming Events

BIOINFORMATICS TRAINING

A DNA Microarray Workshop will be held at the University of Saskatchewan on February 26-27, 2004. Due to limited space, participants should register before February 20. To register, please send your name, institution, and email address to Faouzi.Bekkaoui%x%nrc-cnrc.gc.ca. For further details, please see the workshop announcement [PDF].

BioneQ's "A Biologist's Introduction to UNIX" workshop is now full. However, additional registrations will be added to a waiting list for future offerings of this workshop. For further information, please visit BioneQ's training web pages ("A Biologist's Introduction to UNIX") (other BioneQ workshops).

The next Applied Computational Genomics Course will be held in Winnipeg, Manitoba, on June 12-20, 2004. Early bird registrations must be received before May 1, 2004. For more information, see this issue's Bioinformatics Profile article or visit the course web page.

Training Program in Bioinformatics for Health Research: A bioinformatics training program, leading to a post-graduate diploma, M.Sc., or Ph.D., is "offered through a partnership between the BC Cancer Agency, Simon Fraser University and the University of British Columbia." For more information, visit http://bioinformatics.bcgsc.ca

Canadian Bioinformatics Workshops (CBW): In 2004, CBW will be offering three remaining bioinformatics workshops on: 1. Developing the Tools (Deadline: March 6, 2004), 2. Proteomics (Deadline: May 22, 2004), and 3. Genomics (Deadline: June 19, 2004). These courses may count toward a Certificate in Bioinformatics. For further details, please visit http://www.bioinformatics.ca/workshops.php

Cold Spring Harbor Laboratory (CSHL) is offering a special 2 day course, spring, summer, and fall courses. The deadlines for summer and fall courses are March 15, 2004 and July 15, 2004, respectively. For more information, please visit http://meetings.cshl.org/2004/2004courses.htm

BIOINFORMATICS MEETINGS

12-16 May 2004	2004 CSH Meeting on The Biology of the Genomes: This meeting will take place in Cold Spring Harbor on May 12-16, 2004. For further details, please visit http://meetings.cshl.org/2004/2004genome.htm
14-16 May 2004	CPI '04, MONTREAL, CANADA: The Fourth International Conference of the Canadian Proteomics Initiative (CPI) will be held in Montreal, Canada, on May 14-16, 2004. The deadline for abstracts is March 15, 2004. For other key deadlines, please visit http://www.pence.ualberta.ca/CPI/index.php?keydates. For more information, visit http://www.pence.ualberta.ca/CPI/index.php?home
20-22 May 2004	Biotech China 2004: "Biotech China 2004 is an international, multidisciplinary conference designed to offer critical perspectives on the current status and future of cutting-edge genomic technologies such as RNAi, systems biology, functional genomics, proteomics and microarray." The deadline for acceptance of oral presentations has been extended to March 10, 2004. For more information, please visit http://www.biotechcn.com/
31 Jul-4 Aug 2004	ISMB/ECCB 2004: "In 2004—for the first time ever—Intelligent Systems for Molecular Biology (ISMB) will be held jointly with the European Conference on Computational Biology (ECCB), in conjunction with Genes, Proteins and Computers VIII" on July 31-August 4, 2004, in Glasgow, UK. Registration opens March 1, 2004. For further details, please visit http://www.iscb.org/ismbeccb2004/
16-20 Aug 2004	CSB 2004: "The 3rd annual Computational Systems Bioinformatics conference, CSB2004, is being organized once again by the IEEE Computer Society Technical Committee on Bioinformatics under the theme—Systems Bioinformatics." This conference will be held in Stanford, California, USA. The deadline for the submission of papers is March 22, 2004. For more details, please visit the conference web page: http://conferences.computer.org/bioinformatics/
23 Aug 2004	ECAI 2004: The 16th European Conference on Artificial Intelligence (ECAI) will be held in Valencia, Spain. On August 23, 2004, there will be a workshop entitled, "Data Mining in Functional Genomics and Proteomics: Current Trends and Future Directions". The deadline for the submission of papers is March 31, 2004. For further details, please visit http://www.softwareresearch.ca/ecai-bio/index.html and http://www.dsic.upv.es/ecai2004/

5) Help Desk Software Repository

The Help Desk software repository is where researchers may upload or download bioinformatics programs of interest. Currently the repository has 50 programs. These are freeware packages that are available for anyone to download and install on their own computer. Many of the programs in the Help Desk repository have been thoroughly tested and a number have been published as research articles. Please take advantage of this resource. Downloads are encouraged and submissions are always welcome. The repository can be found at: http://gchelpdesk.ualberta.ca/repository/.

Attention all programmers—we encourage you to submit your favourite bioinformatics software to the Help Desk Software Repository

Please email Ian Forsythe (ian^gchelpdesk.ualberta.ca) if you would like to deposit software into the software repository. To deposit software now, please visit http://www.gchelpdesk.ualberta.ca/repository/SubmitRealSoftware.php

In case you missed it, this issue's Software Spotlight article highlights parts one and two of the Programming in Perl tutorial from our Software Repository.

6) Bioinformatics Jobs

This is a resource for advertising positions in bioinformatics and computational biology. If you have a job you would like posted in this newsletter please email curators a bioinformatics.ca directly. Job postings will be carried for a maximum of 4 issues (8 weeks) unless the position is filled prior to that date.

Genome Canada is advertising several positions. Check out their career brochure (http://www.genomecanada.ca/media/CareerOpportunities.pdf) and their latest job postings (http://genomecanada.ca/careers/index.asp?l=e).

Job Title	Location	Date Posted
BIOINFORMATICS SOFTWARE SPECIALIST	Montreal (Saint-Laurent), PQ	February 19, 2004
DNA Sequence Finisher, RAII	Vancouver, BC	February 16, 2004
Bioinformatics position (Assistant Professor tenure track)	Toronto, ON	February 9, 2004
Molecular Database Curators	Toronto, ON	February 5, 2004
Postdoctoral position in statistical and evolutionary bioinformatics/phylogenetics	Halifax, NS	February 4, 2004
Curators	Montreal, PQ	January 30, 2004
Application Scientists (Part Time)	Calgary, AB	January 26, 2004
Gene Expression Research Associate (plus twelve additional positions at Genome Sciences Centre)	Vancouver, BC	January 26, 2004
Computational Biology position (Assistant/Associate Professor tenure track)	Hamilton, ON	January 24, 2004
SHARCNet Chair in Bioinformatics	London, ON	January 7, 2004
Future positions: Bioinformatics, molecular microbiology, and genomics	Burnaby, BC	Starting in 2004-2005

Source: http://www.bioinformatics.ca/jobs except for the Bioinformatics tenure track, Computational Biology tenure track, and future positions

7) CBHD Registration

WHY REGISTER?

Registering with the Canadian Bioinformatics Help Desk benefits both you and us.

Benefits include:

Add your contact information to our Directory of Canadian Researchers, facilitating communication and collaboration with other Canadian researchers.
Receive updates about bioinformatics software, news, events, conferences, and training sessions via our biweekly newsletter.
Deposit software into our Bioinformatics Software Repository.
Visit our registration web page to register now (http://www.gchelpdesk.ualberta.ca/user/register.php).

Free Subscription

To start your free subscription to this newsletter, send an email message to ian(!)gchelpdesk.ualberta.ca with the word "subscribe" in the subject line or body of the message. Please forward this newsletter to any interested colleagues or collaborators. We do appreciate your comments. Send your comments and feedback about this newsletter to ian-x-gchelpdesk.ualberta.ca

Ian J. Forsythe, MSc
Bioinformatician
Canadian Bioinformatics Help Desk
University of Alberta
Department of Biological Sciences, CW 405
Edmonton, AB
Canada T6G 2E9

Phone: (780) 492-5969
Fax: (780) 492-9234
Email: ian++gchelpdesk.ualberta.ca
Website: http://gchelpdesk.ualberta.ca

The CBHD is sponsored by:

Received on 2004-02-20 - 16:37 GMT