3D substructure searching
Although some people have done work in searching for motifs in 3D protein
databases, the problem of finding a given substructure still falls in the
domain of smaller-molecule 3D substructure searching. A review of that
field can be found in Martin, Bures, and Willett, "Searching Databases of
3D Structures", in Boyd and Lipkowitz, ed, Reviews in Computational
Chemistry,
Vol 1, VCH, 1990, 213-264. MDL, for whom I work, has two programs which
do 3D substructure searching over user-created 3D databases - MACCS-3D and
ISIS-3D (which is distributed). These allow searching for 2D and 3D
substructure fragments, which can also be constrained by distances, angles,
dihedrals, planes, lines, normals, and exclusion spheres. They allow
static and conformationally flexible searching (via torsional optimization
and relaxation). All that is the good news.... the bad news is that they
are limited to 256 heavy atoms - fine for drug companies, but not for
PDB-sized files. We actually have someone here (Steve Muskal, formerly
with Sung Ho Kim at UCB), who developed an application where he stored
alpha carbon backbones only, and searched for motifs using our regular
3D substructure searching capability. He can be reached at SteveM -8 at 8-
molecular.com
I believe your company has already opened some kind of relationship with
MDL, though I doubt you have any of our programs in-house. At present, the
main other programs I know of which do 3D SSS are mainly designed for
drug-molecule searching, like ours is. These programs include:
Aladdin (Daylight Chemical Information systems)
Sybyl-3DB Unity (Tripos)
ChemDBS-3D (Chemical Design)
Cambridge Crystallographic DB and software
Caveat (Paul Bartlett, UC Berkeley)
Catalyst (Biocad)
DOCK (Tak Kuntz, UCSF)
A few academics have worked in the area - most notable is Peter Willett -
a literature search on his name will yield a few references describing
his protein searching software.
I can be reached at:
Doug Henry
MDL
2132 Farallon Drive
San Leandro, CA 94577
(510) 895-1313 xt 1316
dough -8 at 8- molecular.com
Date: Mon, 26 Jul 1993 17:29:15 +0300 (MET-DST)
From: GERARD -8 at 8- XRAY.BMC.UU.SE (Gerard Kleijwegt a.k.a. gerard -8 at 8-
xray.bmc.uu.se)
Subject: Re: PDB searching
I'm sending you the manual as the next mail.
In order to run DEJAVU you would need to have 'O' (successor to FRODO),
although it's not strictly necessary.
If you have a particular protein, i could run the program for you if
you send me a PDB file.
If you want to obain the software, please contact Prof Alwyn Jones
(e-mail: "alwyn -8 at 8- xray.bmc.uu.se") first (he does the licensing
stuff).
If you want to inform others, no problemo.
Note that the software is free for academic users with a valid O-license;
an O-license can be obtained from prof Jones. For non-academic users
there is a charge.
Stand by for the manual ...
--Gerard
(I did not forward the manual to list do to the length) if you are interested
I will forward it to individuals - mwd -8 at 8- carina.cray.com).
Gerard Kleywegt
Dept. of Molecular Biology
Biomedical Centre
University of Uppsala
DEJAVU will take a description of the secondary structure elements
that occur in your particular protein and compare it to a huge
database of secondary structure elements that occur in protein
structures that have been published as PDB files.
What's the basic idea ? A MOTIF of secondary structure elements
(henceforth abbreviated "SSEs") consists of N SSEs, each of
which comprises M(i) residues and has a length of L(i) Angstrom
(measured from the first residue's Calpha to that of the last
residue), and which is characterised by a matrix D(i,j) which
contains the centre-to-centre distances (for example) and by
another matrix C(i,j) which contains the cosines of the angles
made by the direction vectors of the individual elements (the
direction vector goes FROM the N-terminal Calpha TO the C-terminal
one). Finding a motif in the database that is SIMILAR to that
which occurs in your protein then comes down to finding suitable
collections of N SSEs in the structures of other proteins which
have approximately the same numbers of residues, the same lengths
and comparable mutual distances and direction-vector cosines.
And that is ALL there is to it !
(1) CONTENTS
============
1 - contents
2 - introduction
3 - user input files
4 - running the program
5 - finding a motif
6 - analysing the results
7 - a realistic example
8 - automatic creation of input files
9 - detailed analysis of results on cro
10 - miscellaneous
11 - release notes
12 - select option
13 - incremental search example
14 - topology option
15 - installing the software
16 - running the software
Date: Mon, 26 Jul 1993 15:30:06 -0500 (CDT)
From: David Larson <larson -8 at 8- iaf.uiowa.edu>
I am using Sybyl 6.0, Tripos Associates, on Silicon Graphics workstations,
and their latest release includes an interface to the program PROTEP.
Here is their blurb on PROTEP from the Help utility:
PROTEP
PROtein Topographic Exploration Programs
PROTEP is a powerful method for finding relationships among protein
structures. It performs similarity searches on the entire Brookhaven
Protein Data Bank in a matter of minutes (on an SGI 4d/35). It was
developed at Sheffield University by Professor Peter Willett, Dr. Pete
Artymiuk and Dr. David Rice. PROTEP brings the formidable power of the
branch of mathematics called Graph Theory to bear on the problem of
ultra rapid 3D structural comparison.
The PROTEP command gives you access to the various functions of the
PROTEP software that can be run from within SYBYL. It also allows you
to display the results of a PROTEP Motif Search at the SYBYL display.
Tripos Associates
800-323-2960
Hope this helps.
Best regards,
Dave Larson
-------------------------------------------------------------------------------
Dave Larson | Image Analysis Facility, 70 EMRB
University of Iowa | Iowa City, IA 52242
larson -8 at 8- caesar.iaf.uiowa.edu | (319) 335-7900
-------------------------------------------------------------------------------
Thanks to: tucker -8 at 8- ERE.UMontreal.CA (Carrington Tucker) I found out
about Paul A. Bartlett:
Date: Mon, 26 Jul 93 16:25:37 -0700
From: paul -8 at 8- fire.cchem.berkeley.edu (Paul A. Bartlett)
Dear Mark,
Yes, you have the right Bartlett. We have developed a program for
identifying 3D similarities between molecules, and that can be used on
structures within the PDB. We originally developed the program CAVEAT to
assist in the structure-based drug design process, to help chemists
identify templates and molecule fragments that could be used to hold
functional groups or chains in a particular orientation, but there are
many other applications for it. In particular, I would be very excited
to see someone apply it to searches in the PDB; it is ideally suited for
certain types of searches, but we simply haven't identified a specific
problem yet to apply it to.
CAVEAT was developed specifically for searching for structural
similarities that can be defined as a relationship between bonds and
their orientations, in contrast to the more traditional search
algorithms that are based on relationships between atoms and their
distances. Basically, for a given set of bonds in a molecule (user-
selected), which we treat as vectors, we calculate the relationship
between all the pairs and incorporate this information in what we call a
"vector database". This database is in reality an index to the source
database, with the molecules characterized by bond relationships instead
of atomic coordinates and connectivities. It is then a relatively quick
process to search for molecules that have in common a particular
relationship among bonds.
For specific application to protein structures, we have made a variety
of vector databases. For example, in one instance, we have taken all
the pair-wise combinations of C-alpha to C-beta bonds, within a
specified radius (e.g., 20 Angstroms). We can define almost any type of
secondary structure through a combination of such vectors, and then look
through the PDB for other proteins that may have the same structural
element, independent of the identity of the amino acid side chains
themselves or their connectivity (the relationship need not involve
contiguous residues). Or we can define a vector-database with the C-
beta to C-gamma bonds, and look for similiarities between side-chain
conformers. A database constructed from carbonyl C=O bonds might be
useful for probing backbone similarities, etc.
There also exists the opportunity for flexible definition of tolerances;
for example, in a single search, one can look for an alpha-helix
(perhaps 3-5 side-chains, tightly defined) and a beta-sheet (similarly,
a few side-chains with limited tolerance) and a particular relationship
between them (pairing vectors in one with vectors from another with
looser tolerances).
The searches are relatively quick, typically less than a minute or two
on an SGI R3000 Indigo (sorry, a Cray is not necessary), and the program
can handle any number and combination of vectors.
In spite of its origin, there are many ways to use CAVEAT; indeed, I'm
sure there are a lot that we haven't thought of yet! Fundamentally,
any search that you can define as a 3-dimensional relationship between
bonds can probably be cast as a CAVEAT search.
CAVEAT, as well as the companion program CLASS (that further screens and
clusters CAVEAT hits) and TRIAD and ILIAD (which are 3D databases
representing comprehensive collections of computed, minimized
structures) are available through license from the University.
I have probably gone on longer than you need (or anticipated...), but I
am happy to provide additional information if you wish. You know my e-
mail address, and you can reach me by phone (510-642-1259) or FAX (642-
1454).
Cheers,
Paul
From: A Poirrette <A.Poirrette -8 at 8- sheffield.ac.uk>
Date: Tue, 27 Jul 93 15:34:26 BST
Mark,
TRIPOS ASSOCIATES INC. market a program called PROTEP that we developed here
in Sheffield. This allows substructure searches based on protein
secondary structure elements. You input a pattern of secondary structures
in 3-D space and a complete search of the PDB takes about 5-10 mins
for a substructure search or 20 mins for a maximal common subgraph
search - where a user-specifed number of secondary structure elements
must be in common - searches can be performed with whole proteins
or just elements extracted from proteins (or made up patterns) - search
is independent of sequence although you can specify that the hits must
be in the same sequence order - for strands in a sheet for example.
We are also testing a residue-based version of the above that allows
similar searches but this time for patterns of residues in 3-D space
so you can search for active sites etc. - this is working fine, but not
yet marketed - search times are similar to above - on SGI 3000 type
machine.
Also Oxford Molecular market IDITIS - developed at Birkbeck College, London
under Janet Thornton - this is a relational database-type product that
is totally different to our approach but may do what you want.
Also a program, WHATIF, by Vriend at EMBL, Heidelberg, may do what you want.
Hope this helps
Andrew Poirrette
Department of Information Studies
University of Sheffield
UK
--
Mark Dalton AUG-GCU-AGA-AAG H
Cray Research, Inc. M A R K |
Eagan, MN 55121 CH3-S-CH2-CH2-C-COOH
Internet: mwd -8 at 8- cray.com |
(612)683-3035 NH2