CCL:G: Time for a Current Protocols in Computational Chemistry? (longer discussion)



 Sent to CCL by: Kalju Kahn [kalju=-=chem.ucsb.edu]
 Time For Current Protocols?
 I thought I'll throw in my two cents here as I am somewhat familiar
 with both molecular biology and computational chemistry.  Even
 happened to teach both classes at the same quarter this year.
 Molecular biology, bioinformatics, and even protein analysis are
 topics that lend itself well into recipe-type protocols.  The main
 reason is that the chemical and physical properties of subjects in
 such studies are fairly independent on the details of the
 composition of these molecules.  For example, the melting temperature
 or electrophoretic mobility of DNA depends in a fairly simple and
 well-understood manner of its nucleotide composition.  This allows one
 to write a general protocol, for example on how to amplify DNA by PCR
 (see sample protocol  at
 http://media.wiley.com/CurrentProtocols/047150338X/047150338X-sampleUn
 it.pdf)
 In essence, the same operations (add the same buffer, same
 concentrations of template and primers, same amount of substrates,
 bring volume always to the same level, and add the same amount of
 polymerase enzyme last, then put it into a robotic thermocycler) can
 be used to amplify or clone out, or even mutate pretty much arbitrary
 DNA sequence.  The few system-specific things are the template DNA, a
 pair of primers and appropriate annealing temperature. There are
 recipies for figuring out how to purify the template from cells, how
 to figure out appropriate primers, and estimate appropriate
 temperature.  Of course, the specialists in the molecular biology will
 point out that there are more to this ... may need to adjust the
 annealing temperature from default for better purity, maybe tinker
 with Mg concentration, things are little different with eukaryotic DNA
 ... However, for the novice the standard protocol will provide an
 excellent starting point to amplify a piece of DNA regardless of what
 it codes for.  The same basic protocol works if you plan to clone the
 whole gene for hexokinase, make a mutation in the 42-residue Alzheimer
 peptide, or identify a suspect based on the DNA sample!
 The same goes for bioinformatics.  The steps are roughly the same if I
 want to find out conserved residues in HIV protease, hemoglobin, or
 ribosomal RNA.  Yes, one needs to realize that vertebrate databases
 are more useful than bacterial databases for hemoglobin genes but
 besides few obvious selections, the process is the same.  Again,
 specialist would scorn and point out that appropriate gap penalty and
 BLOSUM matrix should be selected for a particular case for optimal
 results.  However, a novice can use a standard protocol with default
 settings and still get very meaningful results.  Similar situation
 applies for other common bioinformatics tasks, such as finding
 evolutionary relationships, predict secondary structures, and for
 predicting clevage by restriction endonucleases or proteases.
 Now, lets' see what in computational chemistry lends to such recipies.
  I think that some things do.  Consider for example, a conformational
 analysis of molecules, say smaller than 600 Da (typical drugs).  Maybe
 one could write a general recipy on how to carry out conformational
 analysis:
    obtain the structure of the molecule in an arbitrary initial
 conformation
    identify rotatable bonds (ignore rotations that lead to same
 structures, e.g. CH3)
    pick a method to generate new conformations (choose between
 systematic or random search depending on the number of rotatable
 bonds)
    pick a method to evaluate energy (choose a force field that is
 known to work for this class of compounds)
    pick a method to minimize the guess structures (choose between
 gradient and Hessian based methods, select convergence criteria)
    pick a treshold if you do not care about very-high-energy
 conformers (recall that bioactive conformers bound to target proteins
 in water could be several kcal/mol above the gas phase minimum)
   generate new conformations, minimize these, eliminate duplicate
 structures, sort the remaining according to energy
   perform vibrational analysis to confirm the structures are minima
 and not saddle points
   visually examine each to make sure they are really unique
 Now, what good does this recipy do for us?  First, the material above
 is already explained at this level in some textbooks.   Can a novice
 follow this recipe?  What if he'll wonder how do you define rotatable
 bonds?  In CHARMM?  In AMBER? In BOSS? In SYBYL? In MacroModel? In
 HyperChem? What about Gaussian?  Would we write a current protocols
 for each of the common programs?  No, this would just duplicate the
 program manuals.  Would we write a current protocol for a particular
 program? No, this is not fair.
 To summarize the first point, even simple techniques in computational
 chemistry are too dependent on a particular software and this prevents
 the description of universally applicable protocols.  Current
 textbooks and online tutorials cover the basic principles of methods
 quite well.
 If you are still with me ... There are plenty of things in
 computational chemistry that do not lend itself into stanard
 protocols.  Ever tried CAS or RAS, or MRCI?  But even simpler and more
 common things are problematic.  Take transition states.  The best
 methodology for finding a reasonably correct transition state for
 isomerization of HCN is rather different for finding a TS for SN2
 reaction between acetate and ethyl bromide, and yet another approach
 may be required to find a correct TS for rotation in CH3-O-O-CH3.
 The search method, the electron correlation method, the practical
 basis sets are different here.  In this case, QST2 is a good choice
 for the SN2 reaction but not for HCN isomerization, and who know what
 kind of electron correlation method gets the CH3-O-O-CH3 torsionl
 motion right.  Furthermore, the protocols run at risk of becoming
 obsolete quickly.  Remember the days when HF/6-31G(d) was the method
 of choice to optimize everything?
 To summarize the second point, most techniques in computational
 chemistry have a limited range of practical applicability.  Or other
 way around, most tasks in computational chemistry do not have one
 standard solution.  A good practitioner in the art of computational
 chemistry knows what methods to apply to what problems.  However, this
 knowledge cannot be efficiently presented in the format of a
 “current protocol”.
 I thus feel that we are not at the point of benefiting significantly
 > from “current protocols”.  We could benefit for some problems
 if a
 common, free, multiplatform front end to most computational chemistry
 programs emerges such that the program-specific input and keywords can
 be hidden from the novice.  (In bioinformatics, Biology Workbench at
 SDSC is a good example).  I believe that molecular-mechanics-based
 methods like conformational analysis, molecular dynamics, Monte Carlo,
 and free energy calculations are good first candidates for such
 protocols.  For problems that require quantum mechanics, we have to
 wait until methods universally regarded as accurate are practically
 applicable to a wide range of problems.
 That's all :)
 Kalju
 -------------------
 > Sent to CCL by: "Andrew D. Fant" [fant=pobox.com]
 > Afternoon all,
 >    For the past couple of weeks, I have been kicking around an idea
 and I would
 > like some feedback from the larger community.
 >
 >    Those here who deal with wet biology and biochemistry or
 bioinformatics have
 > probably come in contact with the Current Protocols series.  For
 those who
 > aren't familiar with it, Current Protocols is a set of loose-leaf
 reference
 > volumes ( or their electronic equivalent) that provide a standard
 method for
 > common lab operations and enough of the theory behind it.  They have
 a volume
 > for bioinformatics as well, with articles such as "Multiple Sequence
 Alignment
 > using ClustalW and ClustalX". The website, for those who are
 interested is
 > http://www.currentprotocols.com .
 >
 >     My question/proposal is simple.  Has the time come for the
 computational
 > chemistry community to have a similar resource available?  I don't
 necessarily
 > think that this need be a volume in the aforementioned series, but
 someplace
 > that someone needing to perform a calculation that they aren't
 familiar with can
 > go to get a sense of what others would consider the "right" way
 to
 do it.
 >
 > So, what say ye?  Would this be considered useful, and, more to the
 point, are
 > there any practitioners out there who are interested in talking in
 more depth
 > about what form and content this project could take on?
 >
 > Thanks,
 > 	Andy
 >
 > --
 > Andrew Fant    | And when the night is cloudy    | This space to let
 > Molecular Geek | There is still a light
 |----------------------
 > fant#,#pobox.com | That shines on me               | Disclaimer:  I
 don't
 > Boston, MA     | Shine until tomorrow, Let it be | even speak for
 myself
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Dr. Kalju Kahn
 Department of Chemistry and Biochemistry
 University of California, Santa Barbara