CCL: SUMMARY: Some Molecular File Format Descriptions



 This is a summary of info I had previously and responses I got
 with respect to my inquiry on molecular file formats.
 The original message said:
 > I have searched the CCL archives and several other web search
 > engines/sites, but I could not find that anybody has compiled
 > a list of the currently available molecular structure file
 > formats, along with their respective format description.
 > I know about the PDB, XYZ, MacroModel and some other formats,
 > but not many. By no means my search was exhaustive, so if someone
 > knows of good pointers related to this I will appreciate
 > your input. I will summarize to the list the info I receive.
 > Greetings and TIA for the help/hints/ideas/pointers.
 Most of the people indicated BABEL(++), as a source of info on file
 formats. I use it (almost) everyday, and can testify it is a fine
 program, but what I was looking were *descriptions* of the formats,
 in order to find out why some file I had were broken. Thanks to
 Pat Walters and Math Stahl for a good program.
 (++) BABEL: http://mercury.aichem.arizona.edu/babel.html
 Here goes the info:
 [1] I had gathered the info below about packages and file formats:
 (a) Macromodel and related info:
       http://www.columbia.edu/cu/chemistry/mmod/mmod.html
     also the manual that accompanies the package describes
     the formata in extenso.
 (b) PDB format and related info:
       http://pdb.pdb.bnl.gov
     specially:"The Protein Data Bank Contents Guide: Atomic
 Coordinate"
       http://pdb.pdb.bnl.gov/Format.doc/Format_Home.html
 (c) XMol general info (Minnesota Supercomputer Center, Inc.)
       http://www.msc.edu/msc/docs/xmol/XMol.html
     and the man page for XYZ (part of XMol)
 XYZ(5MSC)		   Unix	Programmer's Manual		     XYZ(5MSC)
 NAME
      XYZ - Cartesian molecular model file format
 COPYRIGHT
       : at : Copyright	1991  Research	Equipment  Inc.	 dba  Minnesota
 Supercomputer
      Center
 RESTRICTED RIGHTS LEGEND
      Use, duplication, or disclosure of	this software and its documentation by
      the  Government  is subject to restrictions as set	forth in subdivision {
      (b) (3) (ii) } of the Rights in  Technical	 Data  and  Computer  Software
      clause at 52.227-7013.
 DESCRIPTION
      XYZ datafiles specify molecular geometries	using a	 Cartesian  coordinate
      system.  This simple, stripped-down, ASCII-readable format	is intended to
      serve as a	"transition" format for	the XMol series	of
 applications.   For
      example,  suppose	a  molecular datafile was in a format not supported by
      XMol.  In order to	read the data into  XMol,  it  would  be  possible  to
      modify  the  datafile, perhaps by creating	a shell	script,	so that	it fit
      the relatively lenient requirements  of  the  XYZ	format	specification.
      Once  data	 is in XYZ format, it may be examined by XMol, or converted to
      yet another format.
      The XYZ format supports multi-step	datasets.  Each	step is	represented by
      a two-line	"header," followed by one line for each	atom.
      The first line of a step's	header is the number of	atoms  in  that	 step.
      This  integer  may	 be preceded by	whitespace; anything on	the line after
      the integer is ignored.  The second line of the header leaves room	for  a
      descriptive  string.   This  line	may  be	 blank,	or it may contain some
      information pertinent to that particular step, but	it must	exist, and  it
      must be just one line long.
      Each line of text describing a single atom	must  contain  at  least  four
      fields of information, separated by whitespace:  the atom's type (a short
      string of alphanumeric characters), and  its  x-,	y-,  and  z-positions.
      Optionally,  extra	 fields	 may be	used to	specify	a charge for the atom,
      and/or a vector associated	with the atom.	If an input line contains five
      or	 eight	fields,	 the  fifth field is interpreted as the	atom's charge;
      otherwise,	a charge of zero is assumed.  If an input line contains	 seven
      or	 eight fields, the last	three fields are interpreted as	the components
      of	a vector.  These components should be specified	in angstroms.
      Note that the XYZ format doesn't contain connectivity information.	  This
      intentional  omission  allows  for	greater	flexibility:  to create	an XYZ
      file, you don't need to know where	a molecule's bonds are;	you just  need
      to	 know  where  its  atoms  are.	 Connectivity information is generated
      automatically  for	 XYZ  files  as	 they  are  read   into	  XMol-related
      applications.   Briefly,  if  the distance	between	two atoms is less than
      the sum of	their covalent radii, they are considered bonded.
 FILES
      /usr/local/etc/xmol/examples/*
 	  sample datafiles
      /usr/local/etc/xmol/xyz.types
 	  table	of atom	types supported	by XYZ format
      /usr/local/etc/xmol/xyz.cnvt
 	  conversion table for XYZ format
 SEE ALSO
      xmol(1MSC)
 AUTHORS
      Carolyn Wasikowski
      Stefan Klemm
 				 27 Apr	1993
 (d) AMBER related info:
       http://www.amber.ucsf.edu/amber/amber.html
     and the AMBER file specifications:
       http://www.amber.ucsf.edu/amber/formats.html
 (e) CSD info in general at CCDC
       http://csdvx2.ccdc.cam.ac.uk/
     also the documentation that comes with the CD-ROM distribution.
 (f) SPARTAN (from wavefuntion): Uses a cartesian coordinate representation
     similar to the one used for XYZ files in its output file, minus
     the charge (listed separately).
 [2] From the responses I got the following pointers
 (a) MDL formats (there is a PDF file with lots of info here)
       http://www.mdli.com/prod/fileformats.html
 (b) and another PDB info site
       http://www.mi.uni-erlangen.de/~dosche/casihp.htm
 Thank you to all who responded (list below in no particular
 order, and I hope I am not missing anyone). Sorry I didn't
 get to answer to each one individually:
 Soaring Bear    <bear : at : ellington.pharm.arizona.edu>
 Pat Walters     <pwalters : at : portal.vpharm.com>
 Jonathan Baell  <J.Baell : at : chem.csiro.au>
 Dale Braden     <genghis : at : darkwing.uoregon.edu>
 Henry Chermette <CHERM : at : frcpn11.in2p3.fr>
 Stefan Grzybek  <grzybek : at : athena.chemie.uni-erlangen.de>
 Bill Ross       <ross : at : cgl.ucsf.EDU>
 Ralph Puchta    <Puchta : at : GWUP.org>
 Willie Cui      <microsim : at : nis.net>
 Jasna Klicic    <jasna : at : chem.columbia.edu>
 Greetings.
 P.S. Below it is a list of file formats babel undertands and
      converts.
 Babel 1.5 BETA -- Sep 29 1996 -- 22:48:48
 for menus type -- babel -m
 Usage is :
 babel [-v] -i<input-type> <name> -o<output-type> <name>
 "<keywords>"
 Currently supported input types
 	alc -- Alchemy file
 	prep -- AMBER PREP file
 	bs -- Ball and Stick file
 	bgf -- MSI BGF file
 	car -- Biosym .CAR file
 	boog -- Boogie file
 	caccrt -- Cacao Cartesian file
 	cadpac -- Cambridge CADPAC file
 	charmm -- CHARMm file
 	c3d1 -- Chem3D Cartesian 1 file
 	c3d2 -- Chem3D Cartesian 2 file
 	cssr -- CSD CSSR file
 	fdat -- CSD FDAT file
 	gstat -- CSD GSTAT file
 	dock -- Dock Database file
 	dpdb -- Dock PDB file
 	feat -- Feature file
 	fract -- Free Form Fractional file
 	gamout -- GAMESS Output file
 	gzmat -- Gaussian Z-Matrix file
 	gauout -- Gaussian 92 Output file
 	g94 -- Gaussian 94 Output file
 	hin -- Hyperchem HIN file
 	sdf -- MDL Isis SDF file
 	m3d -- M3D file
 	macmol -- Mac Molecule file
 	macmod -- Macromodel file
 	micro -- Micro World file
 	mm2in -- MM2 Input file
 	mm2out -- MM2 Output file
 	mm3 -- MM3 file
 	mmads -- MMADS file
 	mdl -- MDL MOLfile file
 	molen -- MOLIN file
 	mopcrt -- Mopac Cartesian file
 	mopint -- Mopac Internal file
 	mopout -- Mopac Output file
 	pcmod -- PC Model file
 	pdb -- PDB file
 	psin -- PS-GVB Input file
 	psout -- PS-GVB Output file
 	msf -- Quanta MSF file
 	schakal -- Schakal file
 	shelx -- ShelX file
 	smiles -- SMILES file
 	spar -- Spartan file
 	semi -- Spartan Semi-Empirical file
 	spmm -- Spartan Mol. Mechanics file
 	mol -- Sybyl Mol file
 	mol2 -- Sybyl Mol2 file
 	wiz -- Conjure file
 	unixyz -- UniChem XYZ file
 	xyz -- XYZ file
 	xed -- XED file
 Currently supported output types
 	diag -- DIAGNOTICS file
 	alc -- Alchemy file
 	bs -- Ball and Stick file
 	bgf -- BGF file
 	bmin -- Batchmin Command file
 	caccrt -- Cacao Cartesian file
 	cacint -- Cacao Internal file
 	cache -- CAChe MolStruct file
 	c3d1 -- Chem3D Cartesian 1 file
 	c3d2 -- Chem3D Cartesian 2 file
 	cdct -- ChemDraw Conn. Table file
 	dock -- Dock Database file
 	wiz -- Wizard file
 	contmp -- Conjure Template file
 	cssr -- CSD CSSR file
 	dpdb -- Dock PDB file
 	feat -- Feature file
 	fhz -- Fenske-Hall ZMatrix file
 	gamin -- Gamess Input file
 	gcart -- Gaussian Cartesian file
 	gzmat -- Gaussian Z-matrix file
 	gotmp -- Gaussian Z-matrix tmplt file
 	hin -- Hyperchem HIN file
 	icon -- Icon 8 file
 	idatm -- IDATM file
 	sdf -- MDL Isis SDF file
 	m3d -- M3D file
 	macmol -- Mac Molecule file
 	macmod -- Macromodel file
 	micro -- Micro World file
 	mm2in -- MM2 Input file
 	mm2out -- MM2 Ouput file
 	mm3 -- MM3 file
 	mmads -- MMADS file
 	mdl -- MDL Molfile file
 	miv -- MolInventor file
 	mopcrt -- Mopac Cartesian file
 	mopint -- Mopac Internal file
 	csr -- MSI Quanta CSR file
 	pcmod -- PC Model file
 	pdb -- PDB file
 	psz -- PS-GVB Z-Matrix file
 	psc -- PS-GVB Cartesian file
 	report -- Report file
 	smiles -- SMILES file
 	spar -- Spartan file
 	mol -- Sybyl Mol file
 	mol2 -- Sybyl Mol2 file
 	maccs -- MDL Maccs file
 	torlist -- Torsion List file
 	unixyz -- UniChem XYZ file
 	xyz -- XYZ file
 	xed -- XED file
 -----
          Jesus M. Castagnetto M.       | "Organic Chemistry: The practice
 Dep.of Chemistry - New York University | of transmuting vile substances
 4 Washington Pl, Room 514. NY 10003    | into publications" (The Last Word-
      jesus : at : canarylab.chem.nyu.edu      | The Ultimate Scientific
 Dictionary)