3D substructure searching



Although some people have done work in searching for motifs in 3D protein
 databases, the problem of finding a given substructure still falls in the
 domain of smaller-molecule 3D substructure searching.  A review of that
 field can be found in Martin, Bures, and Willett, "Searching Databases of
 3D Structures", in Boyd and Lipkowitz, ed, Reviews in Computational
 Chemistry,
 Vol 1, VCH, 1990, 213-264.  MDL, for whom I work, has two programs which
 do 3D substructure searching over user-created 3D databases - MACCS-3D and
 ISIS-3D (which is distributed).  These allow searching for 2D and 3D
 substructure fragments, which can also be constrained by distances, angles,
 dihedrals, planes, lines, normals, and exclusion spheres.  They allow
 static and conformationally flexible searching (via torsional optimization
 and relaxation).  All that is the good news.... the bad news is that they
 are limited to 256 heavy atoms - fine for drug companies, but not for
 PDB-sized files.  We actually have someone here (Steve Muskal, formerly
 with Sung Ho Kim at UCB), who developed an application where he stored
 alpha carbon backbones only, and searched for motifs using our regular
 3D substructure searching capability.  He can be reached at SteveM -8 at 8-
 molecular.com
 I believe your company has already opened some kind of relationship with
 MDL, though I doubt you have any of our programs in-house.  At present, the
 main other programs I know of which do 3D SSS are mainly designed for
 drug-molecule searching, like ours is.  These programs include:
 Aladdin (Daylight Chemical Information systems)
 Sybyl-3DB Unity (Tripos)
 ChemDBS-3D (Chemical Design)
 Cambridge Crystallographic DB and software
 Caveat (Paul Bartlett, UC Berkeley)
 Catalyst (Biocad)
 DOCK (Tak Kuntz, UCSF)
 A few academics have worked in the area - most notable is Peter Willett -
 a literature search on his name will yield a few references  describing
 his protein searching software.
 I can be reached at:
 Doug Henry
 MDL
 2132 Farallon Drive
 San Leandro, CA  94577
 (510) 895-1313 xt 1316
 dough -8 at 8- molecular.com
 Date:    Mon, 26 Jul 1993 17:29:15 +0300 (MET-DST)
 From: GERARD -8 at 8- XRAY.BMC.UU.SE (Gerard Kleijwegt a.k.a. gerard -8 at 8-
 xray.bmc.uu.se)
 Subject: Re: PDB searching
 I'm sending you the manual as the next mail.
 In order to run DEJAVU you would need to have 'O' (successor to FRODO),
 although it's not strictly necessary.
 If you have a particular protein, i could run the program for you if
 you send me a PDB file.
 If you want to obain the software, please contact Prof Alwyn Jones
 (e-mail: "alwyn -8 at 8- xray.bmc.uu.se") first (he does the licensing
 stuff).
 If you want to inform others, no problemo.
 Note that the software is free for academic users with a valid O-license;
 an O-license can be obtained from prof Jones.  For non-academic users
 there is a charge.
 Stand by for the manual ...
 --Gerard
 (I did not forward the manual to list do to the length) if you are interested
 I will forward it to individuals - mwd -8 at 8- carina.cray.com).
  Gerard Kleywegt
  Dept. of Molecular Biology
  Biomedical Centre
  University of Uppsala
  DEJAVU will take a description of the secondary structure elements
  that occur in your particular protein and compare it to a huge
  database of secondary structure elements that occur in protein
  structures that have been published as PDB files.
  What's the basic idea ?  A MOTIF of secondary structure elements
  (henceforth abbreviated "SSEs") consists of N SSEs, each of
  which comprises M(i) residues and has a length of L(i) Angstrom
  (measured from the first residue's Calpha to that of the last
  residue), and which is characterised by a matrix D(i,j) which
  contains the centre-to-centre distances (for example) and by
  another matrix C(i,j) which contains the cosines of the angles
  made by the direction vectors of the individual elements (the
  direction vector goes FROM the N-terminal Calpha TO the C-terminal
  one).  Finding a motif in the database that is SIMILAR to that
  which occurs in your protein then comes down to finding suitable
  collections of N SSEs in the structures of other proteins which
  have approximately the same numbers of residues, the same lengths
  and comparable mutual distances and direction-vector cosines.
  And that is ALL there is to it !
  (1) CONTENTS
  ============
   1 - contents
   2 - introduction
   3 - user input files
   4 - running the program
   5 - finding a motif
   6 - analysing the results
   7 - a realistic example
   8 - automatic creation of input files
   9 - detailed analysis of results on cro
  10 - miscellaneous
  11 - release notes
  12 - select option
  13 - incremental search example
  14 - topology option
  15 - installing the software
  16 - running the software
 Date: Mon, 26 Jul 1993 15:30:06 -0500 (CDT)
 From: David Larson <larson -8 at 8- iaf.uiowa.edu>
 I am using Sybyl 6.0, Tripos Associates, on Silicon Graphics workstations,
 and their latest release includes an interface to the program PROTEP.
 Here is their blurb on PROTEP from the Help utility:
 PROTEP
   PROtein Topographic Exploration Programs
   PROTEP is a powerful method for finding relationships among protein
   structures.  It performs similarity searches on the entire Brookhaven
   Protein Data Bank in a matter of minutes (on an SGI 4d/35).  It was
   developed at Sheffield University by Professor Peter Willett, Dr. Pete
   Artymiuk and Dr. David Rice. PROTEP brings the formidable power of the
   branch of mathematics called Graph Theory to bear on the problem of
   ultra rapid 3D structural comparison.
   The PROTEP command gives you access to the various functions of the
   PROTEP software that can be run from within SYBYL.  It also allows you
   to display the results of a PROTEP Motif Search at the SYBYL display.
 Tripos Associates
 800-323-2960
 Hope this helps.
 Best regards,
 Dave Larson
 -------------------------------------------------------------------------------
 Dave Larson				| Image Analysis Facility, 70 EMRB
 University of Iowa			| Iowa City, IA 52242
 larson -8 at 8- caesar.iaf.uiowa.edu		| (319) 335-7900
 -------------------------------------------------------------------------------
 Thanks to: tucker -8 at 8- ERE.UMontreal.CA (Carrington Tucker)  I found out
 about Paul A. Bartlett:
 Date: Mon, 26 Jul 93 16:25:37 -0700
 From: paul -8 at 8- fire.cchem.berkeley.edu (Paul A. Bartlett)
 Dear Mark,
 Yes, you have the right Bartlett.  We have developed a program for
 identifying 3D similarities between molecules, and that can be used on
 structures within the PDB. We originally developed the program CAVEAT to
 assist in the structure-based drug design process, to help chemists
 identify templates and molecule fragments that could be used to hold
 functional groups or chains in a particular orientation, but there are
 many other applications for it.  In particular, I would be very excited
 to see someone apply it to searches in the PDB; it is ideally suited for
 certain types of searches, but we simply haven't identified a specific
 problem yet to apply it to.
 CAVEAT was developed specifically for searching for structural
 similarities that can be defined as a relationship between bonds and
 their orientations, in contrast to the more traditional search
 algorithms that are based on relationships between atoms and their
 distances.  Basically, for a given set of bonds in a molecule (user-
 selected), which we treat as vectors, we calculate the relationship
 between all the pairs and incorporate this information in what we call a
 "vector database".  This database is in reality an index to the source
 database, with the molecules characterized by bond relationships instead
 of atomic coordinates and connectivities. It is then a relatively quick
 process to search for molecules that have in common a particular
 relationship among bonds.
 For specific application to protein structures, we have made a variety
 of vector databases.  For example, in one instance, we have taken all
 the pair-wise combinations of C-alpha to C-beta bonds, within a
 specified radius (e.g., 20 Angstroms).  We can define almost any type of
 secondary structure through a combination of such vectors, and then look
 through the PDB for other proteins that may have the same structural
 element, independent of the identity of the amino acid side chains
 themselves or their connectivity (the relationship need not involve
 contiguous residues).  Or we can define a vector-database with the C-
 beta to C-gamma bonds, and look for similiarities between side-chain
 conformers.  A database constructed from carbonyl C=O bonds might be
 useful for probing backbone similarities, etc.
 There also exists the opportunity for flexible definition of tolerances;
 for example, in a single search, one can look for an alpha-helix
 (perhaps 3-5 side-chains, tightly defined) and a beta-sheet (similarly,
 a few side-chains with limited tolerance) and a particular relationship
 between them (pairing vectors in one with vectors from another with
 looser tolerances).
 The searches are relatively quick, typically less than a minute or two
 on an SGI R3000 Indigo (sorry, a Cray is not necessary), and the program
 can handle any number and combination of vectors.
 In spite of its origin, there are many ways to use CAVEAT; indeed, I'm
 sure there are a lot that we haven't thought of yet! Fundamentally,
 any search that you can define as a 3-dimensional relationship between
 bonds can probably be cast as a CAVEAT search.
 CAVEAT, as well as the companion program CLASS (that further screens and
 clusters CAVEAT hits) and TRIAD and ILIAD (which are 3D databases
 representing comprehensive collections of computed, minimized
 structures) are available through license from the University.
 I have probably gone on longer than you need (or anticipated...), but I
 am happy to provide additional information if you wish. You know my e-
 mail address, and you can reach me by phone (510-642-1259) or FAX (642-
 1454).
 Cheers,
 Paul
 From: A Poirrette <A.Poirrette -8 at 8- sheffield.ac.uk>
 Date: Tue, 27 Jul 93 15:34:26 BST
 Mark,
 TRIPOS ASSOCIATES INC. market a program called PROTEP that we developed here
 in Sheffield. This allows substructure searches based on protein
 secondary structure elements. You input a pattern of secondary structures
 in 3-D space and a complete search of the PDB takes about 5-10 mins
 for a substructure search or 20 mins for a maximal common subgraph
 search - where a user-specifed number of secondary structure elements
 must be in common - searches can be performed with whole proteins
 or just elements extracted from proteins (or made up patterns) - search
 is independent of sequence although you can specify that the hits must
 be in the same sequence order - for strands in a sheet for example.
 We are also testing a residue-based version of the above that allows
 similar searches but this time for patterns of residues in 3-D space
 so you can search for active sites etc. - this is working fine, but not
 yet marketed - search times are similar to above - on SGI 3000 type
 machine.
 Also Oxford Molecular market IDITIS - developed at Birkbeck College, London
 under Janet Thornton - this is a relational database-type product that
 is totally different to our approach but may do what you want.
 Also a program, WHATIF, by Vriend at EMBL, Heidelberg, may do what you want.
 Hope this helps
 Andrew Poirrette
 Department of Information Studies
 University of Sheffield
 UK
 --
 Mark Dalton                   AUG-GCU-AGA-AAG                  H
 Cray Research, Inc.           M   A   R   K                    |
 Eagan, MN 55121                                  CH3-S-CH2-CH2-C-COOH
 Internet: mwd -8 at 8- cray.com                                         |
 (612)683-3035                                                  NH2