CCL: origin of ab initio/basis set empiricism



 Sent to CCL by: Frank Jensen [frj###chem.au.dk]
 
At the risk of being accused for self-promovation, let me add a few comments on the issue of basis sets.
 
A dispropotional number of calculations are done using variations of the 6-31G* or 6-311G* basis sets. The justification is variations of the statement that 'this basis set has been shown to give good results'. Backtracing this leads to some calibration against a set of experimental data for a given set of test systems. Chosing basis sets in this fashion is thus 'empirical' in the sence that it is based on comparison with experiments. A non-empirical way of selecting a basis set must rely on a systematic sequence that smoothly converges towards the basis set limit.
 
Most 'popular' basis sets are of the segmented type, where both exponents and contraction coefficients are optimized based on energy. Unfortunately this leads to the 'multiple minimum' problem. We have for example shown that there are at least 19 different ways of constructing a '6-311' contraction of 11 s-functions. If one also consider possibilities like 7-211, 5-411 and 5-321, the number grows further. Most of these have very similar performances, and there is no unique way of selecting the 'best'. Generating a sequence of segmented basis sets that systematic approach the limit is in my oppinion not possible.
 
A general contracted basis set, on the other hand, separates the optimization of the exponents and the contraction coefficients. As the uncontracted set of functions can be made to converge towards the basis set limit, and the contraction error can be rigorous controlled, this allows construction of a systematic sequence of basis sets, as was first explored by the ANO type basis set.
 
Dunning showed how this could be used to construct the cc-pVXZ sequence for electron correlation methods, and we have used the same idea for constructing the pc-n basis sets for DFT methods. The natural quality parameter in these is the highest angular momentum included. Once this has been selected, the rest of the basis set is in principle unique. These basis sets therefore have a single non-empirical parameter that controls the accuracy. Plane waves have the same single non-empirical parameter quality, but as they usually employe a core-potential, this prevents a rigorous convergence towards the all-electron limit.
 
A notion on the side: while even-tempered basis sets provide a systematic way improving the quality of an atomic calculation, this is not the case for molecular systems. Here increasing the number of functions, even for strictly variational methods like HF and including polarization functions, can increase the energy, and in practise leads to oscillatory behavior.
 
A practical issue is how fast the basis set convergence is. While both the cc-pVXZ and pc-n basis sets provide a controlled convergence, the rate of convergence can for some properties be improved by adding diffuse or tight functions. Both of these options are available for both families of basis sets.
 
My (clearly biased) view is that calculations should always be done using at least a DZP and a TZP quality basis set to identify the pathological cases that always pops up. To illustrate this point: The Ahlrich SVP and the Pople 6-31G* are both of double zeta quality and have typical errors for calculating nuclear magnetic shielding of ~30 ppm. The B3LYP calculated value for oxygen in MgO is +23600 ppm with the SVP and -2960 ppm for 6-31G*. The basis set limiting value is -2440 ppm. Relying on calculations with a single 'empirical' chosen basis set will invariably run into such problems. Using basis sets that belong to families where the error can be controlled allows one to identify such cases, and rigorously remove the basis set error, albeit at a computational cost.
 Just my 0.02$ (well maybe 0.03$)
 Frank
 Citat af "Rene Fournier renef++yorku.ca"
 <owner-chemistry~!~ccl.net>:
 
 Sent to CCL by: Rene Fournier [renef(a)yorku.ca]
 Hello,
   I agree on
 
 All commonly used basis
 sets, even the Pople-style sets, are generated by optimizing
 exponents and contraction coefficients to minimize (ab initio)
 energies of atoms and sometimes molecules.
 
    That's correct.  But there are other problems with choices of basis
 sets and that's where empiricism can creep in.  How many contracted
 functions is enough?  What's the best contraction pattern?
 (why 6-31G and not 6-21G? or 4-31G? or 9-61G? or 7-1111G?)
 
How many polarization functions? Why not throw in a few bond-centered functions?
 Many basis sets went into oblivion not because they gave higher energies
 
for atoms or molecules --- adding more functions and keeping them uncontracted always lowers energy. It's that they did not give good agreement with experimental
 bond lengths, bond energies etc. relative to other basis sets of comparable
 computational cost.  I'm not sure how to call methods where high-level theory
 
is used instead of experiment as the reference to assess goodness of a lower level of theory (maybe a smaller basis). I think it's still empirical because there's
 the assumption that if the low-level theory reproduces high-level theory to
 within some accuracy for cases X, Y, Z, it will also reproduce high-level
 level theory to that accuracy for other cases when we do applications.
 ( There's also the obvious point that if the high-level theory reference is
 really, really good, then it IS the experimental result! )
    It is possible to take empiricism out of basis set choice by doing
 many calculations in a sequence with increasingly big basis sets defined
 by only 1 and maybe 2 parameters, so that one can crank up accuracy smoothly,
 and extrapolate results to "infinite basis set".  Basis sets suitable
 for
 that are:
 - even-tempered fully uncontracted basis sets (K Ruedenberg);
 - plane-waves (increase energy cut-off and box size);
 
- all-numerical programs (mostly for diatomics, Becke's NUMOL for polyatomics).
 But I don't see that done very often, even with plane-waves where it would be
 easy I suppose.
 Regards,
            Rene
   Rene Fournier                   Office:  303 Petrie
   Chemistry Dpt, York University  Phone: (416) 736 2100 Ext. 30687
   4700 Keele Street,  Toronto     FAX:   (416)-736-5936
   Ontario, CANADA   M3J 1P3       e-mail: renef---yorku.ca
 On Wed, 12 Dec 2007, Kirk Peterson kipeters-*-wsu.edu wrote:
 
 Sent to CCL by: Kirk Peterson [kipeters-$-wsu.edu]
 As a sidebar to this discussion, I have to strongly disagree that
 basis set parameters, exponents or contraction coefficients,
 use empirical data in their construction.  All commonly used basis
 sets, even the Pople-style sets, are generated by optimizing
 exponents and contraction coefficients to minimize (ab initio)
 energies of atoms and sometimes molecules.  Some of the Pople-style
 basis sets utilize scale factors to apply to atom-optimized exponents,
 but these were based on (ab initio) molecular calculations
 and not experimental data.
 regards,
 -Kirk
 On Dec 12, 2007, at 8:21 AM, Rene Fournier renef+*+yorku.ca wrote:
 >
 > Sent to CCL by: Rene Fournier [renef\a/yorku.ca]
 > David Craig and Robert Parr first used "ab initio" in quantum
 > chemistry,
 > see
 > http://www.quantum-chemistry-history.com/Parr1.htm
 > Near the middle of that page, Parr recounts:
 >
 > " Craig and I published this paper on "configuration interaction
 in
 > benzene", where we took the pi-system and did essentially a complete
 > configuration interaction calculation on it.
 >
 > That has some trivial historical interest in that it was there that
 > the
 > word, the term ab initio was introduced. Craig and Ross had computed
 > everything from the start in London and I had personally computed
 > everything from start in Pittsburgh. Then we compared our answers
 > when we
 > were finished- This involved computing of all the integrals as best as
 > they could be done and selecting the configurations to mix for the
 > ground
 > and exited states because there were electronic states that were of
 > experimental interest and we checked our answers one against each
 > other
 > when we were finished. And what the paper says is, that these
 > calculations
 > were done ab initio by Craig and Ross and by me, independently. And
 > Mulliken later said that this was the introduction of the term ab
 > initio
 > into quantum chemistry. In the short review that you have, I talk
 > about
 > this and reproduce a picture of a letter from Craig to me where he
 > uses
 > the term ab initio in a different context. So ab initio was
 > introduced in
 > the quantum chemistry by Craig in a letter to me and I put it into the
 > manuscript. That's where ab initio came from.  "
 >
 >
 > Funny thing is Parr later became a champion of Density Functional
 > Theory
 > and for many years (70's, 80's) DFT practitioners were often
 > criticized
 > for doing calculations that were not "ab initio".  I think views
 have
 > changed now;  "first-principles" was introduced probably to say
 > "mostly
 > not empirical" but without the implications "ab initio" had
 acquired
 > over
 > the years.  The term "ab initio calculation", as it's commonly
 used,
 > very rarely refers to a calculation "devoid of empiricism", for
 > example
 > the choice of basis set parameters is almost always empirical,
 > see discussion on
 
> http://www.ccl.net/chemistry/resources/messages/2001/11/28.002-dir/index.html
 >
 >  Rene Fournier                   Office:  303 Petrie
 >  Chemistry Dpt, York University  Phone: (416) 736 2100 Ext. 30687
 >  4700 Keele Street,  Toronto     FAX:   (416)-736-5936
 >  Ontario, CANADA   M3J 1P3       e-mail: renef*_*yorku.ca
 >
 >
 > On Wed, 12 Dec 2007, Christoph Etzlstorfer christoph.etzlstorfer .
 > jku.at wrote:
 >
 >> There is a story about that in Michael J.S. Dewars biography "A
 >> semiempirical life", American Chemical Society, 1992, p. 129.
 >>
 >> Best regards
 >>
 >> Christoph
 >>
 >>
 >> Am 11.12.2007 um 03:13 schrieb Tommy Ohyun Kwon ohyun.kwon _
 >> chemistry.gatech.edu:
 >>
 >>>
 >>> Sent to CCL by: Tommy Ohyun Kwon [ohyun.kwon .
 chemistry.gatech.edu]
 >>> Dear CCLers;
 >>> I would appreciate it if anyone could tell me who used the term of
 >>> "ab initio
 >>> calculations" first.
 >>> Thank you very much for your kind attention.
 >>>
 >>> Best wishes,
 >>>
 >>> Tommy
 >>>
 >>>
 >>> --
 >>> Tommy Ohyun Kwon, Ph.D
 >>> School of Chemistry and Biochemistry
 >>> Georgia Institute of Technology
 >>> Atlanta Georgia, 30332
 >>> Email: ohyun.kwon]*[chemistry.gatech.edu
 >>>
 >>>
 >>>
 >>> -= This is automatically added to each message by the mailing
 >>> script =-
 >>> To recover the email address of the author of the message, please
 >>> change> Conferences: http://server.ccl.net/chemistry/announcements/
 >>> conferences/
 >>>
 >>> Search Messages: http://www.ccl.net/htdig  (login: ccl, Password:
 >>> search)>
 >>>
 >>
 >> ####################################################
 >>                                  www.etzlstorfer.com
 >> ***********************************************************
 >> Dr. Christoph Etzlstorfer       Phone:  *43-732-2468-8750
 >> Universitaet Linz              Fax:    *43-732-2468-8747
 >> A-4040 Linz               E-mail: christoph.etzlstorfer,+,jku.at
 >> Austria                   http://www.orc.uni-linz.ac.at
 >> ####################################################
 >>
 >>
 >>
 >>
 >>
 >
 >
 >
 > -= This is automatically added to each message by the mailing script
 > =-
 > To recover the email address of the author of the message, please
 > change> Conferences: http://server.ccl.net/chemistry/announcements/
 > conferences/
 >
 > Search Messages: http://www.ccl.net/htdig  (login: ccl, Password:
 > search)>
 
 -= This is automatically added to each message by the mailing script =->
 
 Frank Jensen
 http://www.chem.au.dk/~frj
 ----------------------------------------------------------------
 This message was sent using IMP, the Internet Messaging Program.