From owner-chemistry ^%at%^ ccl.net Thu Mar 23 19:07:01 2006 From: "Kalju Kahn kalju#chem.ucsb.edu" To: CCL Subject: CCL:G: Time for a Current Protocols in Computational =?us-ascii?q?Chemistry=3F_=28longer_discussion=29?= Message-Id: <-31303-060323160315-4413-8bfiS8nOZVYQF0AvdkhSyA[A]server.ccl.net> X-Original-From: Kalju Kahn Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=utf-8 Date: Thu, 23 Mar 2006 12:13:41 -0800 MIME-Version: 1.0 Sent to CCL by: Kalju Kahn [kalju=-=chem.ucsb.edu] Time For Current Protocols? I thought I'll throw in my two cents here as I am somewhat familiar with both molecular biology and computational chemistry. Even happened to teach both classes at the same quarter this year. Molecular biology, bioinformatics, and even protein analysis are topics that lend itself well into recipe-type protocols. The main reason is that the chemical and physical properties of subjects in such studies are fairly independent on the details of the composition of these molecules. For example, the melting temperature or electrophoretic mobility of DNA depends in a fairly simple and well-understood manner of its nucleotide composition. This allows one to write a general protocol, for example on how to amplify DNA by PCR (see sample protocol at http://media.wiley.com/CurrentProtocols/047150338X/047150338X-sampleUn it.pdf) In essence, the same operations (add the same buffer, same concentrations of template and primers, same amount of substrates, bring volume always to the same level, and add the same amount of polymerase enzyme last, then put it into a robotic thermocycler) can be used to amplify or clone out, or even mutate pretty much arbitrary DNA sequence. The few system-specific things are the template DNA, a pair of primers and appropriate annealing temperature. There are recipies for figuring out how to purify the template from cells, how to figure out appropriate primers, and estimate appropriate temperature. Of course, the specialists in the molecular biology will point out that there are more to this ... may need to adjust the annealing temperature from default for better purity, maybe tinker with Mg concentration, things are little different with eukaryotic DNA ... However, for the novice the standard protocol will provide an excellent starting point to amplify a piece of DNA regardless of what it codes for. The same basic protocol works if you plan to clone the whole gene for hexokinase, make a mutation in the 42-residue Alzheimer peptide, or identify a suspect based on the DNA sample! The same goes for bioinformatics. The steps are roughly the same if I want to find out conserved residues in HIV protease, hemoglobin, or ribosomal RNA. Yes, one needs to realize that vertebrate databases are more useful than bacterial databases for hemoglobin genes but besides few obvious selections, the process is the same. Again, specialist would scorn and point out that appropriate gap penalty and BLOSUM matrix should be selected for a particular case for optimal results. However, a novice can use a standard protocol with default settings and still get very meaningful results. Similar situation applies for other common bioinformatics tasks, such as finding evolutionary relationships, predict secondary structures, and for predicting clevage by restriction endonucleases or proteases. Now, lets' see what in computational chemistry lends to such recipies. I think that some things do. Consider for example, a conformational analysis of molecules, say smaller than 600 Da (typical drugs). Maybe one could write a general recipy on how to carry out conformational analysis: obtain the structure of the molecule in an arbitrary initial conformation identify rotatable bonds (ignore rotations that lead to same structures, e.g. CH3) pick a method to generate new conformations (choose between systematic or random search depending on the number of rotatable bonds) pick a method to evaluate energy (choose a force field that is known to work for this class of compounds) pick a method to minimize the guess structures (choose between gradient and Hessian based methods, select convergence criteria) pick a treshold if you do not care about very-high-energy conformers (recall that bioactive conformers bound to target proteins in water could be several kcal/mol above the gas phase minimum) generate new conformations, minimize these, eliminate duplicate structures, sort the remaining according to energy perform vibrational analysis to confirm the structures are minima and not saddle points visually examine each to make sure they are really unique Now, what good does this recipy do for us? First, the material above is already explained at this level in some textbooks. Can a novice follow this recipe? What if he'll wonder how do you define rotatable bonds? In CHARMM? In AMBER? In BOSS? In SYBYL? In MacroModel? In HyperChem? What about Gaussian? Would we write a current protocols for each of the common programs? No, this would just duplicate the program manuals. Would we write a current protocol for a particular program? No, this is not fair. To summarize the first point, even simple techniques in computational chemistry are too dependent on a particular software and this prevents the description of universally applicable protocols. Current textbooks and online tutorials cover the basic principles of methods quite well. If you are still with me ... There are plenty of things in computational chemistry that do not lend itself into stanard protocols. Ever tried CAS or RAS, or MRCI? But even simpler and more common things are problematic. Take transition states. The best methodology for finding a reasonably correct transition state for isomerization of HCN is rather different for finding a TS for SN2 reaction between acetate and ethyl bromide, and yet another approach may be required to find a correct TS for rotation in CH3-O-O-CH3. The search method, the electron correlation method, the practical basis sets are different here. In this case, QST2 is a good choice for the SN2 reaction but not for HCN isomerization, and who know what kind of electron correlation method gets the CH3-O-O-CH3 torsionl motion right. Furthermore, the protocols run at risk of becoming obsolete quickly. Remember the days when HF/6-31G(d) was the method of choice to optimize everything? To summarize the second point, most techniques in computational chemistry have a limited range of practical applicability. Or other way around, most tasks in computational chemistry do not have one standard solution. A good practitioner in the art of computational chemistry knows what methods to apply to what problems. However, this knowledge cannot be efficiently presented in the format of a “current protocol”. I thus feel that we are not at the point of benefiting significantly > from “current protocols”. We could benefit for some problems if a common, free, multiplatform front end to most computational chemistry programs emerges such that the program-specific input and keywords can be hidden from the novice. (In bioinformatics, Biology Workbench at SDSC is a good example). I believe that molecular-mechanics-based methods like conformational analysis, molecular dynamics, Monte Carlo, and free energy calculations are good first candidates for such protocols. For problems that require quantum mechanics, we have to wait until methods universally regarded as accurate are practically applicable to a wide range of problems. That's all :) Kalju ------------------- > Sent to CCL by: "Andrew D. Fant" [fant=pobox.com] > Afternoon all, > For the past couple of weeks, I have been kicking around an idea and I would > like some feedback from the larger community. > > Those here who deal with wet biology and biochemistry or bioinformatics have > probably come in contact with the Current Protocols series. For those who > aren't familiar with it, Current Protocols is a set of loose-leaf reference > volumes ( or their electronic equivalent) that provide a standard method for > common lab operations and enough of the theory behind it. They have a volume > for bioinformatics as well, with articles such as "Multiple Sequence Alignment > using ClustalW and ClustalX". The website, for those who are interested is > http://www.currentprotocols.com . > > My question/proposal is simple. Has the time come for the computational > chemistry community to have a similar resource available? I don't necessarily > think that this need be a volume in the aforementioned series, but someplace > that someone needing to perform a calculation that they aren't familiar with can > go to get a sense of what others would consider the "right" way to do it. > > So, what say ye? Would this be considered useful, and, more to the point, are > there any practitioners out there who are interested in talking in more depth > about what form and content this project could take on? > > Thanks, > Andy > > -- > Andrew Fant | And when the night is cloudy | This space to let > Molecular Geek | There is still a light |---------------------- > fant#,#pobox.com | That shines on me | Disclaimer: I don't > Boston, MA | Shine until tomorrow, Let it be | even speak for myself ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Dr. Kalju Kahn Department of Chemistry and Biochemistry University of California, Santa Barbara