From owner-chemistry ^%at%^ ccl.net Thu Mar 23 19:07:01 2006
From: "Kalju Kahn kalju#chem.ucsb.edu" <owner-chemistry[A]server.ccl.net>
To: CCL
Subject: CCL:G: Time for a Current Protocols in Computational =?us-ascii?q?Chemistry=3F_=28longer_discussion=29?=
Message-Id: <-31303-060323160315-4413-8bfiS8nOZVYQF0AvdkhSyA[A]server.ccl.net>
X-Original-From: Kalju Kahn <kalju^chem.ucsb.edu>
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf-8
Date: Thu, 23 Mar 2006 12:13:41 -0800
MIME-Version: 1.0


Sent to CCL by: Kalju Kahn [kalju=-=chem.ucsb.edu]
Time For Current Protocols?

I thought I'll throw in my two cents here as I am somewhat familiar
with both molecular biology and computational chemistry.  Even
happened to teach both classes at the same quarter this year.  

Molecular biology, bioinformatics, and even protein analysis are
topics that lend itself well into recipe-type protocols.  The main
reason is that the chemical and physical properties of subjects in
such studies are fairly independent on the details of the  
composition of these molecules.  For example, the melting temperature
or electrophoretic mobility of DNA depends in a fairly simple and
well-understood manner of its nucleotide composition.  This allows one
to write a general protocol, for example on how to amplify DNA by PCR
(see sample protocol  at 
http://media.wiley.com/CurrentProtocols/047150338X/047150338X-sampleUn
it.pdf)

In essence, the same operations (add the same buffer, same
concentrations of template and primers, same amount of substrates,
bring volume always to the same level, and add the same amount of
polymerase enzyme last, then put it into a robotic thermocycler) can
be used to amplify or clone out, or even mutate pretty much arbitrary
DNA sequence.  The few system-specific things are the template DNA, a
pair of primers and appropriate annealing temperature. There are
recipies for figuring out how to purify the template from cells, how
to figure out appropriate primers, and estimate appropriate
temperature.  Of course, the specialists in the molecular biology will
point out that there are more to this ... may need to adjust the
annealing temperature from default for better purity, maybe tinker
with Mg concentration, things are little different with eukaryotic DNA
... However, for the novice the standard protocol will provide an
excellent starting point to amplify a piece of DNA regardless of what
it codes for.  The same basic protocol works if you plan to clone the
whole gene for hexokinase, make a mutation in the 42-residue Alzheimer
peptide, or identify a suspect based on the DNA sample!

The same goes for bioinformatics.  The steps are roughly the same if I
want to find out conserved residues in HIV protease, hemoglobin, or
ribosomal RNA.  Yes, one needs to realize that vertebrate databases
are more useful than bacterial databases for hemoglobin genes but
besides few obvious selections, the process is the same.  Again,
specialist would scorn and point out that appropriate gap penalty and
BLOSUM matrix should be selected for a particular case for optimal
results.  However, a novice can use a standard protocol with default
settings and still get very meaningful results.  Similar situation
applies for other common bioinformatics tasks, such as finding
evolutionary relationships, predict secondary structures, and for
predicting clevage by restriction endonucleases or proteases. 

Now, lets' see what in computational chemistry lends to such recipies.
 I think that some things do.  Consider for example, a conformational
analysis of molecules, say smaller than 600 Da (typical drugs).  Maybe
one could write a general recipy on how to carry out conformational
analysis: 
   obtain the structure of the molecule in an arbitrary initial
conformation
   identify rotatable bonds (ignore rotations that lead to same
structures, e.g. CH3)
   pick a method to generate new conformations (choose between
systematic or random search depending on the number of rotatable
bonds)
   pick a method to evaluate energy (choose a force field that is
known to work for this class of compounds)
   pick a method to minimize the guess structures (choose between
gradient and Hessian based methods, select convergence criteria)
   pick a treshold if you do not care about very-high-energy
conformers (recall that bioactive conformers bound to target proteins
in water could be several kcal/mol above the gas phase minimum)
  generate new conformations, minimize these, eliminate duplicate
structures, sort the remaining according to energy
  perform vibrational analysis to confirm the structures are minima
and not saddle points
  visually examine each to make sure they are really unique 

Now, what good does this recipy do for us?  First, the material above
is already explained at this level in some textbooks.   Can a novice
follow this recipe?  What if he'll wonder how do you define rotatable
bonds?  In CHARMM?  In AMBER? In BOSS? In SYBYL? In MacroModel? In
HyperChem? What about Gaussian?  Would we write a current protocols
for each of the common programs?  No, this would just duplicate the
program manuals.  Would we write a current protocol for a particular
program? No, this is not fair.  

To summarize the first point, even simple techniques in computational
chemistry are too dependent on a particular software and this prevents
the description of universally applicable protocols.  Current
textbooks and online tutorials cover the basic principles of methods
quite well.

If you are still with me ... There are plenty of things in
computational chemistry that do not lend itself into stanard
protocols.  Ever tried CAS or RAS, or MRCI?  But even simpler and more
common things are problematic.  Take transition states.  The best
methodology for finding a reasonably correct transition state for
isomerization of HCN is rather different for finding a TS for SN2
reaction between acetate and ethyl bromide, and yet another approach
may be required to find a correct TS for rotation in CH3-O-O-CH3.  
The search method, the electron correlation method, the practical
basis sets are different here.  In this case, QST2 is a good choice
for the SN2 reaction but not for HCN isomerization, and who know what
kind of electron correlation method gets the CH3-O-O-CH3 torsionl
motion right.  Furthermore, the protocols run at risk of becoming
obsolete quickly.  Remember the days when HF/6-31G(d) was the method
of choice to optimize everything?

To summarize the second point, most techniques in computational
chemistry have a limited range of practical applicability.  Or other
way around, most tasks in computational chemistry do not have one
standard solution.  A good practitioner in the art of computational
chemistry knows what methods to apply to what problems.  However, this
knowledge cannot be efficiently presented in the format of a
“current protocol”. 

I thus feel that we are not at the point of benefiting significantly
> from “current protocols”.  We could benefit for some problems if a
common, free, multiplatform front end to most computational chemistry
programs emerges such that the program-specific input and keywords can
be hidden from the novice.  (In bioinformatics, Biology Workbench at
SDSC is a good example).  I believe that molecular-mechanics-based
methods like conformational analysis, molecular dynamics, Monte Carlo,
and free energy calculations are good first candidates for such
protocols.  For problems that require quantum mechanics, we have to
wait until methods universally regarded as accurate are practically
applicable to a wide range of problems. 

That's all :)
Kalju
-------------------
> Sent to CCL by: "Andrew D. Fant" [fant=pobox.com]
> Afternoon all,
>    For the past couple of weeks, I have been kicking around an idea
and I would
> like some feedback from the larger community.
> 
>    Those here who deal with wet biology and biochemistry or
bioinformatics have
> probably come in contact with the Current Protocols series.  For
those who
> aren't familiar with it, Current Protocols is a set of loose-leaf
reference
> volumes ( or their electronic equivalent) that provide a standard
method for
> common lab operations and enough of the theory behind it.  They have
a volume
> for bioinformatics as well, with articles such as "Multiple Sequence
Alignment
> using ClustalW and ClustalX". The website, for those who are
interested is
> http://www.currentprotocols.com .
> 
>     My question/proposal is simple.  Has the time come for the
computational
> chemistry community to have a similar resource available?  I don't
necessarily
> think that this need be a volume in the aforementioned series, but
someplace
> that someone needing to perform a calculation that they aren't
familiar with can
> go to get a sense of what others would consider the "right" way to
do it.
> 
> So, what say ye?  Would this be considered useful, and, more to the
point, are
> there any practitioners out there who are interested in talking in
more depth
> about what form and content this project could take on?
> 
> Thanks,
> 	Andy
> 
> -- 
> Andrew Fant    | And when the night is cloudy    | This space to let
> Molecular Geek | There is still a light         
|----------------------
> fant#,#pobox.com | That shines on me               | Disclaimer:  I
don't
> Boston, MA     | Shine until tomorrow, Let it be | even speak for
myself

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dr. Kalju Kahn
Department of Chemistry and Biochemistry
University of California, Santa Barbara