From san ^at^ mbu.iisc.ernet.in Mon Feb 20 12:07:17 1995 Received: from sangam.ncst.ernet.in for san -x- at -x- mbu.iisc.ernet.in by www.ccl.net (8.6.9/930601.1506) id LAA29212; Mon, 20 Feb 1995 11:11:04 -0500 From: Received: from soochak.ncst.ernet.in (soochak.ncst.ernet.in [144.16.11.100]) by sangam.ncst.ernet.in (8.6.8.1/8.6.6) with ESMTP id VAA09709 for ; Mon, 20 Feb 1995 21:40:39 +0530 Received: from iisc.ernet.in (iisc.iisc.ernet.in [144.16.64.3]) by soochak.ncst.ernet.in (8.6.8.1/8.6.5) with SMTP id VAA27527 for ; Mon, 20 Feb 1995 21:37:38 +0530 Received: from vigyan.UUCP by iisc.ernet.in (ERNET-IISc/SMI-4.1) id AA16703; Mon, 20 Feb 95 21:45:30+0530 Date: Mon, 20 Feb 95 21:45:30+0530 Message-Id: <9502201615.AA16703 #at# iisc.ernet.in> Received: by vigyan.iisc.ernet.in (smail2.3) id AA26073; 20 Feb 95 21:38:30 EST (Mon) To: vigyan!chemistry -8 at 8- ccl.net Subject: summary of data analysis of protein structures hi! here is the summary of responses I got this time for my query on data analysis of protein crystal structures. I am grate ful to all who responded especially erich baur who wants to have regular communuication with me. However, I have not been able to contact him personally because my mails to him are bouncing back. Through this medium, I apologise to him for my failure in this regard and assure that I shall keep try untill my mail stop bouncing back. My special thanks to K. Stewarts and Dr. Phillipe for the references. your's cordially sandeep kumar san #at# mbu.iisc.ernet.in P.S. I will welcome futher comments and suggestions. **********START of SUMMRY********************************************* From: tbi.univie.ac.at!erich.bauer (Erich Bauer) Message-Id: <9502091923.AA00383 /at\renoir> Subject: Re: CCL:data analysis of protein crystal structure To: san -8 at 8- mbu.iisc.ernet.in Date: Thu, 9 Feb 1995 20:23:02 +0100 (MEZ) Cc: san*- at -*mbu.iisc.ernet.in In-Reply-To: <9502091622.AA16267.,at,.iisc.ernet.in> from "san.,at,.mbu.iisc.ernet.in" at Feb 9, 95 09:52:49 pm X-Mailer: ELM [version 2.4 PL2] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Length: 8528 Status: R san*- at -*mbu.iisc.ernet.in > > Hi! hi, > About Ten or fifteen days back I posted a query regarding the > personal experiences of the people involved in the data analysis of > the crystal structures of biomolecules esp. proteins. I am sorry to > state that my query evoked only two responses. > one of the advices that came my way was about the selection of dataset. > I agree with the responder that this is a problem. dear sandeep, i got the message. Interestingly, i am at the moment concerned with the same questions. Being new to CCL and PDB I am, however not experienced in these fields. I will try to give you some hints from what I learned on the subjects during the last months. Not all of it is on my mind right now, but, if you agree, we can stay in touch and post ideas whenever they come to our minds. It makes no sense to bother ALL CCLers with that. ( If you don't, just send an email saying "OH NO , NOT yOU AGAIN, STOP, PLEASE STOP !!", i won't feel insulted.) > I have found out that in PDB there is a directory called user_group > which contains different useful subdirectories. One of them call > subset_list contains the list of PDB entries for proteins which > have been selected using some criteria. one of the list is from > Jane Richardson's lab for different structural motifs. Another from > Chris Sander's lab lists the proteins with less than 30% sequence > homology. I hope tihs information is very useful to the people involved > in this area. it is, indeed. if you need the original references, how to obtain such datasets: _-at-_)Article{Hobohm:91a, author = "Uwe Hobohm and Michael Scharf and Reinhard Schneider and Chris Sander", title = "Selection of representative protein data sets", journal = "Protein Science", year = "1992", volume = "1", OPTnumber = "", pages = "409 - 417", OPTnote = "eb00", OPTannote = "P-DB" } \\at// Article{Boberg:92a, author = "Jorma Boberg and Tapio Salakoski and Mauno Vihinen", title = "Selection of a Representative Set of Structures from Brookhaven Protein Data Bank", journal = "Proteins", year = "1992", volume = "14", OPTnumber = "", pages = "265 - 276", OPTnote = "eb00", OPTannote = "P-DB" } If you want, I can send you the ftp sites to obtain the newest datasets. ( i would have to gather the papers from home ... ) The whole subject seems to be a little bit involved: If you want to choose data f.i. for secondary structure prediction, you should try to avoid ( rare ) data from membrane proteins. So you have to focus on let's say globular proteins. If you do so, you put in additional information that in turn has been taken from the knowledge of tertiary structur to classify the protein. This can be repeated at any level of description ( certain class of proteins, certain folds,domains etc...) and i don't see any REAL answer to that problem. > I renew my request to the people involved in data analysis > to come forward and share their experiences with one another. I may > also be useful to the outsiders also as it can give them a glimpse > of current situation in this important field. I am basically interested > in discussing how statistics can be exploited to shed some light on the > hidden principles and properties of protein structures which are being If you need programms for data analysis, i might be able to point you to. For the reasons mentioned above and as only a very restricted set of structure proteins and some enzxymes are being crystallized from pharma, meds etc. PDB itself is a strongly selected dataset. > diposited in the databanks. What all can we learn from data analysis? > I am sure the are many more protein motifs still left undiscovered in Most pdb files contain classification for secondary structures. Several programs for optimisation and displaing structures are of different opinion. I am presently doing an exhaustive search in *.pdb and we are looking at all kind of representations for measures, f.i. not only the values of bond-angles, plantwists etc. but also higher moment and norms etc. etc. ( my boss is mathematican, he probably knows which representations make sense on what ) > these PDB files. There may be answers hidden in these files which could > tell us why protein secondary structure prediction is still inaccurate. This question could serve as a basis for a newsgroup or a mail-reflector on its own. I can provide you SOME impressions right away, for more please don't hesitate to ask fori, most of the subject is still obsured to me, comments are welcome. 1) The matter is related with experimental data exploitation and with the subject of global optimisation: probably it is non local interactions, f.i. neighbouring of other domains that enable/disable formation of secondary structure that, in turn are only the local outcome of non-local problem. ( Several attemp have made with more global approaches: then we have a problem of tromendous high dimension even for small molecules. ( we are working on a global optimization packege that might do well for medium sized molecules, however accuracy of potential functions might be much to poor ) ). 2) One could try to predict molecule's structure with force filed calculations: too slow even for small molecules. ( paramatrisation of MD programs takes a LOOOOOOOOOOOOOOOOOOOONG time. only large companies, consortia etc. ( BIOSYM/SanDiego) can cope with that.) 3) MD is, in principle just an inadequate method. why shold f.i. 2 H behave differently in the same distance from each other when they have a different ditance along the backbone? 4) if you work with ab initio-methods to fit data, they take an even LOOOOOO OOOOOOOOOOOOOOOOOOOOOOOOOnger time to fit fro small molecules. 5) Measured data are not very accurate, sometimes the complete topology is wrong as X-RAY pictures have to be interpreted and sometimes people need much experience and intuition to guess the right folding pattern. ( Sometimes they have to remove one or the other .pdb file as results turn out to be TOTALLY wrongi as I was told. Not to talk about the errors that are not being detected and noone can prove ... ) 6) SecStr prediction is basicly heuristics with neural nets etc, based on mentioned before datasets.. helices f.i. are often considered a hydrophobic nucleation site - somtimes they are however strongly influenced by long range interactions. > Only things we need to do is to be frank and informally discuss the > various problems one generally encounters when taking up such a project. > How best one can cope with it. I will even welcome sharing of latest > litrature on the subjects amongst the people involved in such projects. fine. On what topics ? if you want , I can send you my literature list i read during the last months. there is also a mail-server, where you can send your sequence and they send it back within some hours with sec.str. prdiction. > yours' cordially > sandeep kumar > > > -------This is added Automatically by the Software-------- > -- Original Sender Envelope Address: san%!at!%mbu.iisc.ernet.in > -- Original Sender From: Address: san \\at// mbu.iisc.ernet.in > CHEMISTRY- at -ccl.net -- everyone | CHEMISTRY-REQUEST- at -ccl.net -- coordinator > MAILSERV&$at$&ccl.net: HELP CHEMISTRY | Gopher: www.ccl.net 73 > Anon. ftp www.ccl.net | CHEMISTRY-SEARCH -x- at -x- ccl.net -- archive search > http://www.ccl.net/chemistry.html | for info send: HELP SEARCH to MAILSERV > your's sincerelly erich -- ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | namenet : Erich Bornberg - Bauer | ------------------------------------------------------------ | snailnet: Inst. f. theor. Chem., | Inst. fuer Mathematik | | : Waehringerstr. 17/308 | Strudlhofg. 4 | ------------------------------------------------------------ | : Univ. Wien, A - 1090, Vienna / Austria | ------------------------------------------------------------ | voicenet: *43-1-40 480 - 667, 677| (nonet yet, try drums)| | faxnet : 402 85 25 | ( --- " --- ) | | internet: erich /at\tbi.univie.ac.at | erich /at\cma.univie.ac.at| ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ project: making every problem NP-complete ... From: bioinfo.ernet.in!sangeeta Message-Id: <9502112000.AA03278 # - at - # bioinfo.bioinfo.ernet.in> Content-Type: text Apparently-To: san -A_T- mbu.iisc.ernet.in Status: R Hello Sandeep! Extremely sorry for not responding earlier to your discussion. Actually I was busy preparing the manuscript of a paper. How are you ? Have you received any responses so far? The point that you have raised, about the relevance of the three dimensional structures predicted from crystal structure data analysis to experimental biologists is quite an important one. So many methods have been developed so far for predicting the sec. structures and many more may be still coming up. I think what is more important from the experimental point of view is the tertiary structure of proteins. So the efforts have to be concentrated on making more efficient and effective use of the crystal structure data for predicting tertiary structures. I personally feel that given a sufficiently large data set of highly resolved structures, one can get a lot of information on the tert. structures (thus, the size of data set and quality of structures being the limiting factors). In today's world of networks and high performance computing, the tools using cellular automata theory and neural networks can definitle can be exploited to derive biologically significant structures. My thoughts may sound too optimistic but then that is what I sincerely feel. I would welcome further discussion on this topic. What else? How's your work going on? Do write. I would have relatively less hectic time next week, hopefully. Also send me the responses from others. Bye for now. -Sangeeta.11 Feb. '95. From: cmda.abbott.com!STEWARTK Received: from randb.pprd.abbott.com (randb.abbott.com) by abtlabs.abbott.com with SMTP id AA20723 (5.65c/IDA-1.4.4 for ); Sat, 11 Feb 1995 12:49:58 -0600 Received: from DECNET-MAIL (STEWARTK {*at*} CMDA) by RANDB.PPRD.Abbott.Com (PMDF V4.3-13 #5551) id <01HMX9NS5Y688YZ76E "-at-" RANDB.PPRD.Abbott.Com>; Sat, 11 Feb 1995 12:52:49 -0600 (CST) Date: Sat, 11 Feb 1995 12:52:49 -0600 (CST) Subject: Re: CCL:data analysis of protein crystal structure To: san#* at *#mbu.iisc.ernet.in Message-Id: <01HMX9NS7K1U8YZ76E ^at^ RANDB.PPRD.Abbott.Com> X-Vms-To: RANDB::IN%"san: at :mbu.iisc.ernet.in" Mime-Version: 1.0 Content-Transfer-Encoding: 7BIT Status: R Sandeep: Here are some 1994 references in the area of Protein Structure Analysis: Protein Science 3, 1927-1937, 1994 FASAB J 8, 1237-1239 and 1240-1247, 1994 FEBS Letters 355, 213-219, 1994 CABIOS 10, 545-546, 1994 J. Mol. Biol. 242, 321-329, 1994 Curr. Opinion in Struct. Biol., 4 422-428, 1994 Proteins: Struct. Funct. Genet., 19, 222-229, 1994 Proteins: Struct. Funct. Genet., 19, 85-97 and 165-173 J. Mol. biol. 239, 306-314, 1994. Of course, all publications by Chris Sander and Janet Thornton are important in this area. Tom Blundell is also a researcher whose name comes to mind for important work in this area. Kent Stewart Department of Structural Biology Abbott Laboratories Chicago, IL, USA From: biosym.com!youkha (Philippe Youkharibache) Message-Id: <9502091857.AA15243#* at *#iris75.biosym.com> To: Subject: Re: CCL:data analysis of protein crystal structure Status: R Sandeep, I do not have the refs at hand, right now. however check in particular the work from Manfred Sippl Steve Bryant who derive residue base force fields for threading and in the longer term protein folding prediction, from statistical analysis of known protein structures Good luck Philippe ************************************************************************** Dr. Philippe Youkharibache e-mail: youkha \\at// biosym.com Biosym Technologies Inc. 9685 Scranton Road tel: (619) 546 5562 San Diego, CA 92121 fax: (619) 458 0136 ************************************************************************** From: athe.wustl.edu!toni (Toni Kazic) Message-Id: <9502091953.AA14776 $#at#$ athe.wustl.edu> To: san&$at$&mbu.iisc.ernet.in In-Reply-To: <9502091622.AA16267 {*at*} iisc.ernet.in> (san {*at*} mbu.iisc.ernet.in) Subject: Re: CCL:data analysis of protein crystal structure Status: R Dear Sandeep, I am not directly involved in that area, but I strongly encourage you in asking such straightforward and practical questions. There is too much special-casing and not enough analysis! Good luck, Toni Toni Kazic Institute for Biomedical Computing Washington University From: "William T. Winter" Subject: Re: CCL:data analysis of protein crystal structure To: san %-% at %-% mbu.iisc.ernet.in In-Reply-To: <9502091622.AA16267 %-% at %-% iisc.ernet.in> Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Status: R > state that my query evoked only two responses. Thiis board is still viewed by many as quantum chemistry and hence is probably not widly followe by protein crystallographers.Subscribe to bionet.xtallography on you nearest internet newstand and send them your request. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Dr. William T. Winter Phone: (315)470-6876 315 Baker Lab FAX: (315)470-6856 SUNY-ESF Internet: wtwinter \\at// mailbox.syr.edu Syracuse, NY 13210-2786 ******************end of summary************************************