From san ^at^ mbu.iisc.ernet.in  Mon Feb 20 12:07:17 1995
Received: from sangam.ncst.ernet.in  for san -x- at -x- mbu.iisc.ernet.in
	by www.ccl.net (8.6.9/930601.1506) id LAA29212; Mon, 20 Feb 1995 11:11:04 -0500
From: <san {*at*} mbu.iisc.ernet.in>
Received: from soochak.ncst.ernet.in (soochak.ncst.ernet.in [144.16.11.100]) by sangam.ncst.ernet.in (8.6.8.1/8.6.6) with ESMTP id VAA09709 for <chemistry ":at:" ccl.net>; Mon, 20 Feb 1995 21:40:39 +0530
Received: from iisc.ernet.in (iisc.iisc.ernet.in [144.16.64.3]) by soochak.ncst.ernet.in (8.6.8.1/8.6.5) with SMTP id VAA27527 for <chemistry:~at~:ccl.net>; Mon, 20 Feb 1995 21:37:38 +0530
Received: from vigyan.UUCP by iisc.ernet.in (ERNET-IISc/SMI-4.1)
	   id AA16703; Mon, 20 Feb 95 21:45:30+0530
Date: Mon, 20 Feb 95 21:45:30+0530
Message-Id: <9502201615.AA16703 #at# iisc.ernet.in>
Received: by vigyan.iisc.ernet.in (smail2.3)
	id AA26073; 20 Feb 95 21:38:30 EST (Mon)
To: vigyan!chemistry -8 at 8- ccl.net
Subject: summary of data analysis of protein structures



hi! 
here is the summary of responses I got this time for my query on
data analysis of protein crystal structures. I am grate ful to all who
responded especially erich baur who wants to have regular communuication
with me. However, I have not been able to contact him personally because
my mails to him are bouncing back. Through this medium, I apologise to him
for my failure in this regard and assure that I shall keep try untill my mail
stop bouncing back. My special thanks to K. Stewarts and Dr. Phillipe for
the references.
                       your's cordially
                       sandeep kumar
                       san #at# mbu.iisc.ernet.in
P.S.  I will welcome futher comments and suggestions.
 
**********START of SUMMRY*********************************************  
From: tbi.univie.ac.at!erich.bauer (Erich Bauer)
Message-Id: <9502091923.AA00383 /at\renoir>
Subject: Re: CCL:data analysis of protein crystal structure
To: san -8 at 8- mbu.iisc.ernet.in
Date: Thu, 9 Feb 1995 20:23:02 +0100 (MEZ)
Cc: san*- at -*mbu.iisc.ernet.in
In-Reply-To: <9502091622.AA16267.,at,.iisc.ernet.in> from "san.,at,.mbu.iisc.ernet.in" at Feb 9, 95 09:52:49 pm
X-Mailer: ELM [version 2.4 PL2]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Length: 8528      
Status: R

san*- at -*mbu.iisc.ernet.in
> 
> Hi! 

hi,

>     About Ten or fifteen days back I posted a query regarding the
> personal experiences of the people involved in the data analysis of
> the crystal structures of biomolecules esp. proteins.  I am sorry to
> state that my query evoked only two responses.
>  one of the advices that came my way was about the selection of dataset.
> I agree with the responder that this is a problem.

dear sandeep,
i got the message.
Interestingly, i am at the moment concerned with the same questions.
Being new to CCL and PDB I am, however not experienced in these fields.
I will try to give you some hints from what I learned on the
subjects during the last months.
Not all of it is on my mind right now, but, if you agree, we can stay
in touch and post ideas whenever they come to our minds.
It makes no sense to bother ALL CCLers with that.
( If you don't, just send an email saying "OH NO , NOT yOU AGAIN,
STOP, PLEASE STOP !!", i won't feel insulted.)

>   I have found out that in PDB there is a directory called user_group
> which contains different useful subdirectories.  One of them call
> subset_list contains the list of PDB entries for proteins which
> have been selected using some criteria.  one of the list is from
> Jane Richardson's lab for different structural motifs. Another from
> Chris Sander's lab lists the proteins with less than 30% sequence 
> homology.  I hope tihs information is very useful to the people involved
> in this area. 

it is, indeed.
if you need the original references, how to obtain such datasets:
 _-at-_)Article{Hobohm:91a,
  author =       "Uwe Hobohm and Michael Scharf and Reinhard Schneider and 
                  Chris Sander",
  title =        "Selection of representative protein data sets",
  journal =      "Protein Science",
  year =         "1992",
  volume =       "1",
  OPTnumber =    "",
  pages =        "409 - 417",
  OPTnote =      "eb00",
  OPTannote =    "P-DB"
}
 \\at// Article{Boberg:92a,
  author =       "Jorma Boberg and Tapio Salakoski and Mauno Vihinen",
  title =        "Selection of a Representative Set of Structures from
                  Brookhaven Protein Data Bank",
  journal =      "Proteins",
  year =         "1992",
  volume =       "14",
  OPTnumber =    "",
  pages =        "265 - 276",
  OPTnote =      "eb00",
  OPTannote =    "P-DB"
}

If you want, I can send you the ftp sites to obtain the newest datasets.
( i would have to gather the papers from home ... )
The whole subject seems to be a little bit involved:
If you want to choose data f.i. for secondary structure prediction,
you should try to avoid ( rare ) data from membrane proteins.
So you have to focus on let's say globular proteins. If you do so,
you put in additional information that in turn has been taken from the
knowledge of tertiary structur to classify the protein.
This can be repeated at any level of description ( certain class of
proteins, certain folds,domains etc...) and i  don't see any REAL
answer to that problem.

>            I renew my request to the people involved in data analysis
> to come forward and share their experiences with one another.  I may
> also be useful to the outsiders also as it can give them a glimpse
> of current situation in this important field.  I am basically interested
> in discussing how statistics can be exploited to shed some light on the
> hidden principles  and properties of protein structures which are being

If you need programms for data analysis, i might be able to point you to.
For the reasons mentioned above and as only a very restricted set of
structure proteins and some enzxymes are being crystallized from pharma,
meds etc. PDB itself is a strongly selected dataset.

> diposited in the databanks.  What all can we learn from data analysis?
> I am sure the are many more protein motifs still left undiscovered in

Most pdb files contain classification for secondary structures.
Several programs for optimisation and displaing structures are of different 
opinion.
I am presently doing an exhaustive search in *.pdb and we are looking at all kind of
representations for measures, f.i. not only the values of bond-angles,
plantwists etc. but also higher moment and norms etc. etc.
( my boss is mathematican, he probably knows which representations make 
sense on what )

> these PDB files. There may be answers hidden in these files which could
> tell us why protein secondary structure prediction is still inaccurate.

This question could serve as a basis for a newsgroup or a mail-reflector on its
own.
I can provide you SOME impressions right away, for more please don't hesitate
to ask fori, most of the subject is still obsured to me, comments are welcome.
1) The matter is related with experimental data exploitation and with the
   subject of global optimisation: probably it is non local interactions,
   f.i. neighbouring of other domains that enable/disable formation of
   secondary structure that, in turn are only the local outcome of non-local
   problem.
   ( Several attemp have made with more global approaches: then we have a 
   problem of tromendous high dimension even for small molecules.
   ( we are working on a global optimization packege that might do well for
   medium sized molecules, however accuracy of potential functions might
   be much to poor ) ).
2) One could try to predict molecule's structure with force filed calculations:
   too slow even for small molecules.
   ( paramatrisation of MD programs takes a LOOOOOOOOOOOOOOOOOOOONG time.
   only large companies, consortia etc. ( BIOSYM/SanDiego) can cope with that.)
3) MD is, in principle just an inadequate method.
   why shold f.i. 2 H behave differently in the same distance from each other
   when they have a different ditance along the backbone?
4) if you work with ab initio-methods to fit data, they take an even LOOOOOO
   OOOOOOOOOOOOOOOOOOOOOOOOOnger time to fit fro small molecules.
5) Measured data are not very accurate, sometimes the complete topology
   is wrong as X-RAY pictures have to be interpreted and sometimes people
   need much experience and intuition to guess the right folding pattern.
   ( Sometimes they have to remove one or the other .pdb file as results
   turn out to be TOTALLY wrongi as I was told. 
   Not to talk about the errors that are not being detected and noone 
   can prove ... )
6) SecStr prediction is basicly heuristics with neural nets etc, based on
   mentioned before datasets..
   helices f.i. are often considered a hydrophobic nucleation site - somtimes
   they are however strongly influenced by long range interactions.

> Only things we need to do is to be frank and informally discuss the
> various problems one generally encounters when taking up such a project.
> How best one can cope with it.  I will even welcome sharing of latest
> litrature on the subjects amongst the people involved in such projects.

fine.
On what topics ?
if you want , I can send you my literature list i read during the last months.
there is also a mail-server, where you can send your sequence and they send it
back within some hours with sec.str. prdiction.

>                            yours' cordially
>                            sandeep kumar
> 
> 
> -------This is added Automatically by the Software--------
> -- Original Sender Envelope Address: san%!at!%mbu.iisc.ernet.in
> -- Original Sender From: Address: san \\at// mbu.iisc.ernet.in
> CHEMISTRY- at -ccl.net -- everyone     | CHEMISTRY-REQUEST- at -ccl.net -- coordinator
> MAILSERV&$at$&ccl.net: HELP CHEMISTRY  | Gopher: www.ccl.net 73
> Anon. ftp www.ccl.net     | CHEMISTRY-SEARCH -x- at -x- ccl.net -- archive search
> http://www.ccl.net/chemistry.html |     for info send: HELP SEARCH to MAILSERV
> 

your's sincerelly
erich
-- 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| namenet : Erich  Bornberg - Bauer                        |
------------------------------------------------------------
| snailnet: Inst. f. theor. Chem., | Inst. fuer Mathematik |
|         : Waehringerstr. 17/308  | Strudlhofg. 4         |
------------------------------------------------------------
|         : Univ. Wien,   A - 1090,  Vienna / Austria      |
------------------------------------------------------------
| voicenet: *43-1-40 480 - 667, 677| (nonet yet, try drums)|
| faxnet  :             402 85 25  | (  ---     "   ---  ) |
| internet: erich /at\tbi.univie.ac.at | erich /at\cma.univie.ac.at|
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

project: making every problem NP-complete ...


From: bioinfo.ernet.in!sangeeta
Message-Id: <9502112000.AA03278 # - at - # bioinfo.bioinfo.ernet.in>
Content-Type: text
Apparently-To: san -A_T- mbu.iisc.ernet.in
Status: R

Hello Sandeep!
Extremely sorry for not responding earlier to your discussion. Actually I was
busy preparing the manuscript of a paper. How are you ? Have you received any
responses so far? The point that you have raised, about the relevance of the
three dimensional structures predicted from crystal structure data analysis to 
experimental biologists is quite an important one. So many methods have been
developed so far for predicting the sec. structures and many more may be still
coming up. I think what is more important from the experimental point of view
is the tertiary structure of proteins. So the efforts have to be concentrated 
on making more efficient and effective use of the crystal structure data for
predicting tertiary structures. I personally feel that given a sufficiently
large data set of highly resolved structures, one can get a lot of information
on the tert. structures (thus, the size of data set and quality of structures
being the limiting factors). In today's world of networks and high performance
computing, the tools using cellular automata theory and neural networks can definitle
can be exploited to derive biologically significant structures. My thoughts may 
sound too optimistic but then that is what I sincerely feel. I would welcome
further discussion on this topic. 
What else? How's your work going on? Do write. I would have relatively less hectic
time next week, hopefully. Also send me the responses from others. Bye for now.
-Sangeeta.11 Feb. '95.

From: cmda.abbott.com!STEWARTK
Received: from randb.pprd.abbott.com (randb.abbott.com) by abtlabs.abbott.com with SMTP id AA20723
  (5.65c/IDA-1.4.4 for <san ^%at%^ mbu.iisc.ernet.in>); Sat, 11 Feb 1995 12:49:58 -0600
Received: from DECNET-MAIL (STEWARTK {*at*} CMDA)
 by RANDB.PPRD.Abbott.Com (PMDF V4.3-13 #5551)
 id <01HMX9NS5Y688YZ76E "-at-" RANDB.PPRD.Abbott.Com>; Sat,
 11 Feb 1995 12:52:49 -0600 (CST)
Date: Sat, 11 Feb 1995 12:52:49 -0600 (CST)
Subject: Re: CCL:data analysis of protein crystal structure
To: san#* at *#mbu.iisc.ernet.in
Message-Id: <01HMX9NS7K1U8YZ76E ^at^ RANDB.PPRD.Abbott.Com>
X-Vms-To: RANDB::IN%"san: at :mbu.iisc.ernet.in"
Mime-Version: 1.0
Content-Transfer-Encoding: 7BIT
Status: R

Sandeep:  Here are some 1994 references in the area of Protein
Structure Analysis:

Protein Science 3, 1927-1937, 1994
FASAB J 8, 1237-1239 and 1240-1247, 1994
FEBS Letters 355, 213-219, 1994
CABIOS 10, 545-546, 1994
J. Mol. Biol. 242, 321-329, 1994
Curr. Opinion in Struct. Biol., 4 422-428, 1994
Proteins: Struct. Funct. Genet., 19, 222-229, 1994
Proteins: Struct. Funct. Genet., 19, 85-97 and 165-173
J. Mol. biol. 239, 306-314, 1994.

Of course, all publications by Chris Sander and Janet Thornton
are important in this area.  Tom Blundell is also
a researcher whose name comes to mind for important work
in this area.

Kent Stewart
Department of Structural Biology
Abbott Laboratories
Chicago, IL,   USA

From: biosym.com!youkha (Philippe Youkharibache)
Message-Id: <9502091857.AA15243#* at *#iris75.biosym.com>
To: <san ":at:" mbu.iisc.ernet.in>
Subject: Re:  CCL:data analysis of protein crystal structure
Status: R

Sandeep,

I do not have the refs at hand, right now.
however check in particular the work from

Manfred Sippl 
Steve Bryant 

who derive residue base force fields for threading and in 
the longer term protein folding prediction, from
statistical analysis of known protein structures

Good luck

Philippe


**************************************************************************
Dr. Philippe Youkharibache 			e-mail: youkha \\at// biosym.com
Biosym Technologies Inc.
9685 Scranton Road				tel: (619) 546 5562
San Diego, CA 92121				fax: (619) 458 0136
**************************************************************************
From: athe.wustl.edu!toni (Toni Kazic)
Message-Id: <9502091953.AA14776 $#at#$ athe.wustl.edu>
To: san&$at$&mbu.iisc.ernet.in
In-Reply-To: <9502091622.AA16267 {*at*} iisc.ernet.in> (san {*at*} mbu.iisc.ernet.in)
Subject: Re: CCL:data analysis of protein crystal structure
Status: R

Dear Sandeep,

I am not directly involved in that area, but I strongly encourage you in
asking such straightforward and practical questions.  There is too much
special-casing and not enough analysis!  Good luck,

Toni

Toni Kazic
Institute for Biomedical Computing
Washington University

From: "William T. Winter" <mailbox.syr.edu!wtwinter>
Subject: Re: CCL:data analysis of protein crystal structure
To: san %-% at %-% mbu.iisc.ernet.in
In-Reply-To: <9502091622.AA16267 %-% at %-% iisc.ernet.in>
Message-Id: <Pine.3.89.9502091700.A1093-0100000 ^%at%^ forbin.syr.edu>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Status: R

> state that my query evoked only two responses.
Thiis board is still viewed by many as quantum chemistry and hence is 
probably not widly followe by protein crystallographers.Subscribe to 
bionet.xtallography on you nearest internet newstand and send them your 
request.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Dr. William T. Winter                  Phone: (315)470-6876
315 Baker Lab                          FAX:   (315)470-6856
SUNY-ESF                               Internet: wtwinter \\at// mailbox.syr.edu
Syracuse, NY 13210-2786 
******************end of summary************************************