summary: descriptors for the 'shape' and 'size' of molecules



 Hi,
 some days ago I posted the following question and received an enormous
 amount
 of information - that will keep me busy for a couple of months. :-)
 Here's the summary!
 Many thanks to all of those that responded (see below).
 Your information is highly appreciated.
 Greetings,
 Norman
 ------------------------------------------------------
 > Subject: CCL:descriptors for the 'shape' and 'size' of molecules
 > Hello,
 >
 > I am looking for a simple (!?) method to describe the 'size' and
 > topology of a molecule.
 > The aim is to establish a correlation between experimentally observed
 > retention times (by liquid cromatography) and the 'shape' of several
 > pure hydrocarbons
 > (bigger species including 'disk' and umbrella-shaped molecules).
 >
 > I have no experience in this field of research, whatsoever, so any
 hints
 > or comments
 > would be appreciated.
 ----------------------------------------------
        from:            "Hank D. Cochran"
 <hdc-0at0-ctrhdc1.ct.ornl.gov>
     The van der Waals volume and surface have been used for this purpose
 (by Bondi: "Physical Properties of Molecular Crystals, Liquids, and
 Glasses"
 Abrams and Prausnitz: AIChE J., 21, 116 (1975)).
 ------------------------------------------
 from:            Grunenberg Joerg <Joerg.Grunenberg-0at0-tu-bs.de>
 there is a simple method published by Grunenberg and Herges.
 Please, have a look at:
 http://www.tu-bs.de/institute/org-chem/herges/grunpic/log_poct/rm_logp.html
 ------------------------------------------
 from Leif Norskov
 lnl-0at0-novo.dk
 Norman,
 maybe the descriptors from MSI's Catalyst/Shape would be useful:
         http://www.msi.com/info/products/modules/catSHAPE.html
 It basically derives the moments of inertia plus some cross terms
 (total of 18 descriptors as far as I recall) for each conformer.
 But it is commercial (expensive) software.
 If need be I could do the calculation for you (assuming that
 one can extract the numbers into some ascii format - I have
 actually never used catShape).
 -------------------
 From Dr. Martin Mueller
 Internet: http://www.iuct.fhg.de
 If you're looking for a really simple (!?) method: what about
 topological indices like connectivity indices or shape indices? If you
 have 3D-Structures, you could calculate maximum and minimum diameters,
 and the ratio could be a measure of shape.
 --------------------------
 from:     "Qiang, Cui" <qiang-0at0-tammy.harvard.edu>
 As a matter of fact, the issue is rather similar to some of the drug
 design
 problem, just that the observable is kind of different. But u can
 certainly
 borrow some technique such as GA-NN (genetic algorithm-neural network)
 to set up
 the correlation. Look at some standard book on NN, and some paper on
 2D/3D
 drug-design ( for instance, I recommand several paper by S. So and M.
 Karplus,
 published in J. Med. Chem.; for more info, see
 http://yuri.harvard.edu/~so)
 --------------------------------
 from John Gunn (gunnj-0at0-cerca.umontreal.ca)
 Paul Mezey has written an entire book on this:
 Shape in Chemistry : An Introduction to Molecular Shape and Topology
 -------------------------------
 from Gerardo Gonzalez   |  Dpto. de Quimica      |
 gerardo-0at0-karin.fmq.uh.edu.cu
 In the Center for Pharmaceutical Chemistry (La Havana) was done a M.Sc.
 thesis precisely on the branch you want to work i.e. prediction of the
 retention time ( in HPLC ) of some compounds using some indexes
 developed
 from graph theory (including topological and topographical indexes) if
 this work is interesting for you I can contact the authors to obtain a
 copy of the above mentioned work, related works are perfomed actually at
 the Bioactive Center of the University of Las Villas under the guidance
 of
 Dr. E. Estrada, referee of various Comp Chem. journals and by Dr.
 Trinajstic.
 -------------
 from David A. Winkler                    Email:
 dave.winkler-0at0-molsci.csiro.au
 There are many kinds of molecular descriptor which may do what you want.
 They can range from the very simple molecular indices such as those of
 Randic and Kier & Hall through to molecular holograms, molecular fields
 etc.  I'm sure there are literature examples of the kind of work you
 want
 to do.  The lipophilicity of the molecules will probably correlate with
 the
 retention time.  Some of the simple indices are described on the Web
 site
 of my collaborator, Frank Burden
 (http://www.chem.monash.edu.au/Docs/ChemStaffProfiles/Burden.html).
 Frank's molecular eigenvalue descriptors (also known as BCUT) may work
 well.
 --------------------------------
 from Carmen Moure n:
          <mmoure-0at0-hsc.vcu.edu>
 There is a professor in my department that has written a couple of
 books on molecular connectivity. His name is Lemont Kier and the
 topological descriptors he has proposed are very simple, almost
 intuitive. His e-mail is: kier-0at0-gems.vcu.edu. The books are:
 Molecular connectivity in chemistry and drug research / Lemont
                    B. Kier, and Lowell H. Hall.
  PUBLISHER:      New York : Academic Press, 1976.
 and
  Molecular connectivity in structure-activity analysis / Lemont
                    B. Kier and Lowell H. Hall.
  PUBLISHER:      Letchworth, Hertfordshire, England : Research Studies
 Press ;
                    New York : Wiley, c1986.
 ------------
    from        "Tamas Gunda" <tamasgunda-0at0-tigris.klte.hu
 This problem is similar to those in drug research: how to describe the
 shape
 of a molekule or substituent. One of the early approaches (I mean before
 CoMFA and 3DQSAR era) to this was the use of Verloop's sterimol
 parameters: these characterize the shape of the molecule. My program
 Mol2mol can calculate it. To know more about sterimol,
 contact www.compuchem.com/m2m1.htm, at the end of that page there
 is summary in nutshell about sterimol.
 -----------------------
 from D.J. Livingstone  e-mail davel-0at0-chmqst.demon.co.uk
 I think you will find that quite a lot has already been done by
 others in modelling retention times (of all sorts) often using
 hydrophobicity descriptors of one form or another (retention times
 are frequently used as hydrophobicity descriptors themselves).  I
 guess you need to do a search.
 As for shape descriptors (and others) I suggest you have a look at
 some QSAR reviews and/or books.  I wrote one in 1991:
 D.J. Livingstone, "Quantitative Structure-Activity Relationships" in:
 Similarity Models in Organic Chemistry, Biochemistry and Related
 Fields (Eds, R.I. Zalewski, T.M. Krygowski & J. Shorter), Elsevier,
 1991, pp 557-627).
 You could also try:
 The  book by Hansch & Leo (Exploring QSAR Fundamentals and
 Applications in Chemistry and Biology, ACS, 1995, ISBN
 0-8412-2987-2).  Chapter 3 (Parameters) of Hugo Kubinyi's book (QSAR:
 Hansch Analysis
 and related Approaches, VCH, Weinheim, 1993, ISBN 3-527-30035-X) and
 it might pay to take a look at "Partition Coefficient Determination and
 Estimation", (Eds W. Dunn, J.
 Block & R. Pearlman), Pergamon, 1986, ISBN ???? - although this is
 perhaps a little dated now.
 --------
 from Prof. Luc Morin-Allory
 luc.morin-allory-0at0-univ-orleans.fr;
 have a look at one of the books of Roman Kaliszan.
 " quantitative structure-chromatographic retention relationships"
 John Wiley & Sons, 1987.
 It's a rather old book, but you will find all the references that you
 need.
 ----------------------
 From: Gerardo González Aguilar <gerardo-0at0-karin.fmq.uh.edu.cu>
  In the Center for Pharmaceutical Chemistry (La Havana) was done a M.Sc.
 thesis precisely on the branch you want to work i.e. prediction of the
 retention time ( in HPLC ) of some compounds using some indexes
 developed
 from graph theory (including topological and topographical indexes) if
 this work is interesting for you I can contact the authors to obtain a
 copy of the above mentioned work, related works are perfomed actually at
 the Bioactive Center of the University of Las Villas under the guidance
 of
 Dr. E. Estrada, referee of various Comp Chem. journals and by Dr.
 Trinajstic, at the CQF you can contact to Ramon Carrasco, M.Sc. at
 cqf-0at0-ceniai.cu or cqf00-0at0-infomed.sld.cu
 ----------------------------
 from S. Shapiro
 toukie-0at0-zui.unizh.ch
     For your _particular_ purposes I suspect that Kier-Hall molecular
 connec-
 tivity descriptors should suffice.  See Rev. Comput. Chem. 2: 367-422
 (1991) and
 Adv. Drug Res. 22: 1-38 (1992).
 ------------------
 from Gregory L. Durst    email:   gdurst-0at0-dowagro.com
 I can point you to the program of Kier & Hall called "Molconn"
 that
 calculates topological indexes and would be appropriate for the type of
 correlations you describe. Their program is available for unix and pc's.
 Contact is:
         Dr. Lowell Hall
         Hall Associates Consulting
         2 Davis Street
         Quincy, MA  02170
         USA
         617-773-6350  ext 280
 A publication/application such as you describe is:
 L.B. Kier & L.H. Hall, "J. Pharm. Sci.", v68, (1979), 120.
 There is another topological index program called "Polly" by
 Basak that runs on unix or pc's. The contact is:
         Dr. Subhash Basak
         Center for Water and the Environment
         University of Minnesota
         5013 Miller Trunk Highway
         Duluth, MN   55811
         USA
         218-720-4279
         email:  sbasak-0at0-ua.d.umn.edu
 ----------------------------
 from: dr. ANDREA ZALIANI E-mail  andrea-0at0-edith.sublink.org
 have a look at this
 - G. Bravi, E. Gancia, M. Pegna, P. Mascagni, A. Zaliani WHIM-MS, new 3D
 Theoretical descriptors derived from Molecular Surface Properties: a
 comparative 3D-QSAR study in a series of steroids J.Comp.-Aided Mol.
 Des. 11,79 (1997)
 "Nothing shocks me. I am a scientist."
                                         Indiana Jones
 --------------------
  from: Randy J. Zauhar, PhD      zauhar-0at0-fastrans.net
    I was talking to my collaborator at U. Missouri yesterday, and was
  reminded that he has developed QSAR models to predict retention time
  based on molecular properties! Some of those might explicitly include
  shape descriptors.
     His contact info:
        Prof. Bill Welsh
        Dept. of Chemistry
        U. Missouri - Saint Louis
        wwelsh-0at0-jinx.umsl.edu
     You might send him a message and see if he has references or other
  info. he could provide.
 ----------------------
 from Dr. John Waite, e-mail:  chem8-0at0-york.ac.uk * or
    Jarry Dodds' molecular volume code may be of use to you. Below I
  enclodse the comments from this + a table of atomic covalent radii:
    E-mail Larry if you want a copy of the program.
       SUBROUTINE MOLVOL
 c     volume.f - volume determination code
 c
 c     Author: Lawrence R. Dodd <dodd-0at0-roebling.poly.edu>
 c             Doros N. Theodorou <doros-0at0-pylos.cchem.berkeley.edu>
 c     Maintainer: Lawrence R. Dodd <dodd-0at0-roebling.poly.edu>
 c     Created: March 21, 1990
 c     Version: 2.0
 c     Date: 1994/07/22 15:45:51
 c     Keywords: volume and area determination
 c     Time-stamp: <94/07/22 11:02:23 dodd>
 c     Copyright (c) 1990, 1991, 1992, 1993, 1994
 c     by Lawrence R. Dodd and Doros N. Theodorou.
 C---------------------------------------------------------------------C
 C                     Plane Sphere Intersections                      C
 C---------------------------------------------------------------------C
 C     This program will find the total and individual volume and      C
 C     exposed surface area of an arbitrary collection of spheres of   C
 C     arbitrary radii cut by an arbitrary collection of planes        C
 C     analytically by analyzing the plane/sphere intersections.       C
 C---------------------------------------------------------------------C
 C     Algorithm by: Doros N. Theodorou and Lawrence R. Dodd           C
 C     Coded by: L.R. Dodd                                             C
 C---------------------------------------------------------------------C
 C     Created on: March 21, 1990                                      C
 C       Phase 1 Completed on: March 23, 1990                          C
 C       Phase 2 Completed on: April 16, 1990                          C
 C       Phase 3 Completed on: May   17, 1990                          C
 C       Phase 4 Completed on: June   5, 1990                          C
 C       Phase 5 Completed on: July  26, 1990                          C
 C---------------------------------------------------------------------C
 C     Reference:                                                      C
 C                                                                     C
 C       "Analytical treatment of the volume and surface area of       C
 C       molecules formed by an arbitrary collection of unequal        C
 C       spheres intersected by planes"                                C
 C                                                                     C
 C     L.R. Dodd and D.N. Theodorou                                    C
 C     MOLECULAR PHYSICS, Volume 72, Number 6, 1313-1345, April 1991   C
 C---------------------------------------------------------------------C
 C     Acknowlegement:                                                 C
 C                                                                     C
 C     LRD wishes to thank his mentor DNT for a stimulating and        C
 C     enjoyable post-doctoral experience.                             C
 C---------------------------------------------------------------------C
 C     General Notes On Program:                                       C
 C                                                                     C
 C     This program has been written with an eye towards both          C
 C     efficiency and clarity. On a philosophical note, many believe   C
 C     that these ideals are mutually exclusive but in general they    C
 C     are not. There are, however, a few instances where one ideal    C
 C     has been given more prominence over the other. The comments in  C
 C     the program, together with the associated journal article,      C
 C     should help to explain any apparent logical leaps in the        C
 C     algorithm.                                                      C
 C                                                                     C
 C     The program was intended to be used as a subroutine called      C
 C     repeatly by some main program. In this case the subroutine      C
 C     "VOLUME" is called by some main routine which has placed the
 C
 C     necessary information in common block /Raw Data/. The answers   C
 C     are returned in common block /Volume Output/. I must apologize  C
 C     for the poor input/output for the program. For example, the     C
 C     area/volume of each sphere is not placed in /Volume Output/.    C
 C                                                                     C
 C     This program was developed on a Sun SPARCstation 330 using Sun  C
 C     FORTRAN 1.3.1 (all trademarks of Sun Microsystems, Inc.). We    C
 C     have used some of extensions to the ANSI standard including:    C
 C                                                                     C
 C         o  long variable names (i.e., more than six characters)     C
 C         o  variable names containing the characters '$' and '_'     C
 C         o  END DO used in place of the CONTINUE statement           C
 C         o  DO-WHILE used in place of IF-GOTO constructs             C
 C         o  excessive number of continuation lines in some FORMATs   C
 C         o  generic intrinsic function calls (e.g., SIN for DSIN)    C
 C         o  IMPLICIT NONE statement (needed in development)          C
 C                                                                     C
 C     The advantage of using non-standard FORTRAN is that it makes it C
 C     considerably easier to follow the flow of a program. There are  C
 C     no extraneous statement labels in this program that may have    C
 C     obscured the logic (not a single GOTO was used). The previews   C
 C     of the new F90 standard appear to adopt many of the features    C
 C     already implemented in VMS, Sun, Cray, and IBM FORTRAN.         C
 C                                                                     C
 C     Note that this algorithm is completely parallelizable.          C
 C                                                                     C
 C                           Larry Dodd                                C
 C                           dodd-0at0-mycenae.cchem.berkeley.edu           C
 C                                                                     C
 C                           Department of Chemical Engineering        C
 C                           College of Chemistry                      C
 C                           University of California at Berkeley      C
 C                           Berkeley, California 94720-9989           C
 C                           (415) 643-7691 (LRD)                      C
 C                           (415) 643-8523 (DNT)                      C
 C                           (415) 642-5927 (Lab)                      C
 C                                                                     C
 C                            dodd-0at0-mycenae.cchem.berkeley.edu          C
 C                           doros-0at0-mycenae.cchem.berkeley.edu          C
 C                                                                     C
 C---------------------------------------------------------------------C
 C     Note:                                                           C
 C       Plane_Ordering of common block /Debug/ is, as the name        C
 C       implies, for debugging purposes only as is routine ORDERING.  C
 C       The information contain therein is not necessary for solving  C
 C       the sphere plane problem but proved incredibly useful during  C
 C       program development.                                          C
 C---------------------------------------------------------------------C
       BLOCK DATA
 C
       REAL*8 COVRAD,Au
           COMMON/CRADII/
      .               COVRAD(105)
       DATA (COVRAD(I), I = 1, 88) /
 C         H       He        Li      Be
      + 0.320D0, 0.930D0, 1.230D0, 0.900D0,
 C         B        C        N        O        F        Ne
      + 0.820D0, 0.770D0, 0.750D0, 0.730D0, 0.720D0, 0.710D0,
 C         Na       Mg
      + 1.540D0, 1.360D0,
 C         Al       Si       P        S        Cl       Ar
      + 1.180D0, 1.110D0, 1.060D0, 1.020D0, 0.990D0, 0.980D0,
 C         K        Ca
      + 2.030D0, 1.740D0,
 C         Sc       Ti       V        Cr       Mn
      + 1.440D0, 1.320D0, 1.220D0, 1.180D0, 1.170D0,
 C         Fe       Co       Ni       Cu       Zn
      + 1.170D0, 1.160D0, 1.150D0, 1.170D0, 1.250D0,
 C         Ga       Ge       As       Se       Br       Kr
      + 1.260D0, 1.220D0, 1.200D0, 1.160D0, 1.140D0, 1.120D0,
 C         Rb       Sr
      + 2.160D0, 1.910D0,
 C         Y        Zr       Nb       Mo       Tc
      + 1.620D0, 1.450D0, 1.340D0, 1.300D0, 1.270D0,
 C         Ru       Rh       Pd       Ag       Cd
      + 1.250D0, 1.250D0, 1.280D0, 1.340D0, 1.480D0,
 C         In       Sn       Sb       Te       I        Xe
      + 1.440D0, 1.410D0, 1.400D0, 1.360D0, 1.330D0, 1.310D0,
 C         Cs       Ba       La
      + 2.350D0, 1.980D0, 1.690D0,
 C         Ce       Pr       Nd       Pm       Sm       Eu       Gd
      + 1.650D0, 1.650D0, 1.640D0, 1.630D0, 1.620D0, 1.850D0, 1.610D0,
 C         Tb       Dy       Ho       Er       Tm       Yb       Lu
      + 1.590D0, 1.590D0, 1.580D0, 1.570D0, 1.560D0, 1.560D0, 1.560D0,
 C                  Hf       Ta       W        Re
      +          1.440D0, 1.340D0, 1.300D0, 1.280D0,
 C         Os       Ir       Pt       Au       Hg
      + 1.260D0, 1.270D0, 1.300D0, 1.340D0, 1.490D0,
 C         Tl       Pb       Bi       Po       At       Rn
      + 1.480D0, 1.470D0, 1.460D0, 1.460D0, 1.450D0, 0.000D0,
 C         Fr       Ra
      + 0.000D0, 0.000D0/
 C
       DATA (COVRAD(I), I = 89, 105) /
 C
 CMK92 Using the Lanthanides' values is probably the best approximation
 C         Ac
      + 1.690D0,
 C         Th       Pa       U        Np       Pu       Am       Cm
      + 1.650D0, 1.650D0, 1.640D0, 1.630D0, 1.620D0, 1.850D0, 1.610D0,
 C         Bk       Cf       Es       Fm       Md       No       Lr
      + 1.590D0, 1.590D0, 1.580D0, 1.570D0, 1.560D0, 1.560D0, 1.560D0,
      + 2 * 0.000D0/
 C
       E N D
 ---------
 --
 _____________________________________
 Dr. Norman Goldberg   (N.Goldberg-0at0-tu-bs.de)
 Technische Universitaet Braunschweig
 Institut fuer Organische Chemie
 Hagenring 30
 D-38106 Braunschweig (FRG)
 Tel.: +(0)531-391-5312
 Fax : +(0)531-391-5388
 http://www.tu-bs.de/institute/org-chem/goldberg/WELCOME.htm