From DOUGH-!at!-molecular.com Mon Jul 26 12:49:03 1993 Subject: 3D substructure searching Although some people have done work in searching for motifs in 3D protein databases, the problem of finding a given substructure still falls in the domain of smaller-molecule 3D substructure searching. A review of that field can be found in Martin, Bures, and Willett, "Searching Databases of 3D Structures", in Boyd and Lipkowitz, ed, Reviews in Computational Chemistry, Vol 1, VCH, 1990, 213-264. MDL, for whom I work, has two programs which do 3D substructure searching over user-created 3D databases - MACCS-3D and ISIS-3D (which is distributed). These allow searching for 2D and 3D substructure fragments, which can also be constrained by distances, angles, dihedrals, planes, lines, normals, and exclusion spheres. They allow static and conformationally flexible searching (via torsional optimization and relaxation). All that is the good news.... the bad news is that they are limited to 256 heavy atoms - fine for drug companies, but not for PDB-sized files. We actually have someone here (Steve Muskal, formerly with Sung Ho Kim at UCB), who developed an application where he stored alpha carbon backbones only, and searched for motifs using our regular 3D substructure searching capability. He can be reached at SteveM _-at-_)molecular.com I believe your company has already opened some kind of relationship with MDL, though I doubt you have any of our programs in-house. At present, the main other programs I know of which do 3D SSS are mainly designed for drug-molecule searching, like ours is. These programs include: Aladdin (Daylight Chemical Information systems) Sybyl-3DB Unity (Tripos) ChemDBS-3D (Chemical Design) Cambridge Crystallographic DB and software Caveat (Paul Bartlett, UC Berkeley) Catalyst (Biocad) DOCK (Tak Kuntz, UCSF) A few academics have worked in the area - most notable is Peter Willett - a literature search on his name will yield a few references describing his protein searching software. I can be reached at: Doug Henry MDL 2132 Farallon Drive San Leandro, CA 94577 (510) 895-1313 xt 1316 dough.,at,.molecular.com Date: Mon, 26 Jul 1993 17:29:15 +0300 (MET-DST) From: GERARD # - at - # XRAY.BMC.UU.SE (Gerard Kleijwegt a.k.a. gerard # - at - # xray.bmc.uu.se) Subject: Re: PDB searching I'm sending you the manual as the next mail. In order to run DEJAVU you would need to have 'O' (successor to FRODO), although it's not strictly necessary. If you have a particular protein, i could run the program for you if you send me a PDB file. If you want to obain the software, please contact Prof Alwyn Jones (e-mail: "alwyn _-at-_)xray.bmc.uu.se") first (he does the licensing stuff). If you want to inform others, no problemo. Note that the software is free for academic users with a valid O-license; an O-license can be obtained from prof Jones. For non-academic users there is a charge. Stand by for the manual ... --Gerard (I did not forward the manual to list do to the length) if you are interested I will forward it to individuals - mwd*- at -*carina.cray.com). Gerard Kleywegt Dept. of Molecular Biology Biomedical Centre University of Uppsala DEJAVU will take a description of the secondary structure elements that occur in your particular protein and compare it to a huge database of secondary structure elements that occur in protein structures that have been published as PDB files. What's the basic idea ? A MOTIF of secondary structure elements (henceforth abbreviated "SSEs") consists of N SSEs, each of which comprises M(i) residues and has a length of L(i) Angstrom (measured from the first residue's Calpha to that of the last residue), and which is characterised by a matrix D(i,j) which contains the centre-to-centre distances (for example) and by another matrix C(i,j) which contains the cosines of the angles made by the direction vectors of the individual elements (the direction vector goes FROM the N-terminal Calpha TO the C-terminal one). Finding a motif in the database that is SIMILAR to that which occurs in your protein then comes down to finding suitable collections of N SSEs in the structures of other proteins which have approximately the same numbers of residues, the same lengths and comparable mutual distances and direction-vector cosines. And that is ALL there is to it ! (1) CONTENTS ============ 1 - contents 2 - introduction 3 - user input files 4 - running the program 5 - finding a motif 6 - analysing the results 7 - a realistic example 8 - automatic creation of input files 9 - detailed analysis of results on cro 10 - miscellaneous 11 - release notes 12 - select option 13 - incremental search example 14 - topology option 15 - installing the software 16 - running the software Date: Mon, 26 Jul 1993 15:30:06 -0500 (CDT) From: David Larson I am using Sybyl 6.0, Tripos Associates, on Silicon Graphics workstations, and their latest release includes an interface to the program PROTEP. Here is their blurb on PROTEP from the Help utility: PROTEP PROtein Topographic Exploration Programs PROTEP is a powerful method for finding relationships among protein structures. It performs similarity searches on the entire Brookhaven Protein Data Bank in a matter of minutes (on an SGI 4d/35). It was developed at Sheffield University by Professor Peter Willett, Dr. Pete Artymiuk and Dr. David Rice. PROTEP brings the formidable power of the branch of mathematics called Graph Theory to bear on the problem of ultra rapid 3D structural comparison. The PROTEP command gives you access to the various functions of the PROTEP software that can be run from within SYBYL. It also allows you to display the results of a PROTEP Motif Search at the SYBYL display. Tripos Associates 800-323-2960 Hope this helps. Best regards, Dave Larson ------------------------------------------------------------------------------- Dave Larson | Image Analysis Facility, 70 EMRB University of Iowa | Iowa City, IA 52242 larson |-at-| caesar.iaf.uiowa.edu | (319) 335-7900 ------------------------------------------------------------------------------- Thanks to: tucker %-% at %-% ERE.UMontreal.CA (Carrington Tucker) I found out about Paul A. Bartlett: Date: Mon, 26 Jul 93 16:25:37 -0700 From: paul -x- at -x- fire.cchem.berkeley.edu (Paul A. Bartlett) Dear Mark, Yes, you have the right Bartlett. We have developed a program for identifying 3D similarities between molecules, and that can be used on structures within the PDB. We originally developed the program CAVEAT to assist in the structure-based drug design process, to help chemists identify templates and molecule fragments that could be used to hold functional groups or chains in a particular orientation, but there are many other applications for it. In particular, I would be very excited to see someone apply it to searches in the PDB; it is ideally suited for certain types of searches, but we simply haven't identified a specific problem yet to apply it to. CAVEAT was developed specifically for searching for structural similarities that can be defined as a relationship between bonds and their orientations, in contrast to the more traditional search algorithms that are based on relationships between atoms and their distances. Basically, for a given set of bonds in a molecule (user- selected), which we treat as vectors, we calculate the relationship between all the pairs and incorporate this information in what we call a "vector database". This database is in reality an index to the source database, with the molecules characterized by bond relationships instead of atomic coordinates and connectivities. It is then a relatively quick process to search for molecules that have in common a particular relationship among bonds. For specific application to protein structures, we have made a variety of vector databases. For example, in one instance, we have taken all the pair-wise combinations of C-alpha to C-beta bonds, within a specified radius (e.g., 20 Angstroms). We can define almost any type of secondary structure through a combination of such vectors, and then look through the PDB for other proteins that may have the same structural element, independent of the identity of the amino acid side chains themselves or their connectivity (the relationship need not involve contiguous residues). Or we can define a vector-database with the C- beta to C-gamma bonds, and look for similiarities between side-chain conformers. A database constructed from carbonyl C=O bonds might be useful for probing backbone similarities, etc. There also exists the opportunity for flexible definition of tolerances; for example, in a single search, one can look for an alpha-helix (perhaps 3-5 side-chains, tightly defined) and a beta-sheet (similarly, a few side-chains with limited tolerance) and a particular relationship between them (pairing vectors in one with vectors from another with looser tolerances). The searches are relatively quick, typically less than a minute or two on an SGI R3000 Indigo (sorry, a Cray is not necessary), and the program can handle any number and combination of vectors. In spite of its origin, there are many ways to use CAVEAT; indeed, I'm sure there are a lot that we haven't thought of yet! Fundamentally, any search that you can define as a 3-dimensional relationship between bonds can probably be cast as a CAVEAT search. CAVEAT, as well as the companion program CLASS (that further screens and clusters CAVEAT hits) and TRIAD and ILIAD (which are 3D databases representing comprehensive collections of computed, minimized structures) are available through license from the University. I have probably gone on longer than you need (or anticipated...), but I am happy to provide additional information if you wish. You know my e- mail address, and you can reach me by phone (510-642-1259) or FAX (642- 1454). Cheers, Paul From: A Poirrette Date: Tue, 27 Jul 93 15:34:26 BST Mark, TRIPOS ASSOCIATES INC. market a program called PROTEP that we developed here in Sheffield. This allows substructure searches based on protein secondary structure elements. You input a pattern of secondary structures in 3-D space and a complete search of the PDB takes about 5-10 mins for a substructure search or 20 mins for a maximal common subgraph search - where a user-specifed number of secondary structure elements must be in common - searches can be performed with whole proteins or just elements extracted from proteins (or made up patterns) - search is independent of sequence although you can specify that the hits must be in the same sequence order - for strands in a sheet for example. We are also testing a residue-based version of the above that allows similar searches but this time for patterns of residues in 3-D space so you can search for active sites etc. - this is working fine, but not yet marketed - search times are similar to above - on SGI 3000 type machine. Also Oxford Molecular market IDITIS - developed at Birkbeck College, London under Janet Thornton - this is a relational database-type product that is totally different to our approach but may do what you want. Also a program, WHATIF, by Vriend at EMBL, Heidelberg, may do what you want. Hope this helps Andrew Poirrette Department of Information Studies University of Sheffield UK -- Mark Dalton AUG-GCU-AGA-AAG H Cray Research, Inc. M A R K | Eagan, MN 55121 CH3-S-CH2-CH2-C-COOH Internet: mwd(+ at +)cray.com | (612)683-3035 NH2