From jesus&$at$&canarylab.chem.nyu.edu Mon Oct 21 15:16:17 1996 Received: from is.nyu.edu for jesus*- at -*canarylab.chem.nyu.edu by www.ccl.net (8.8.0/950822.1) id PAA28706; Mon, 21 Oct 1996 15:14:31 -0400 (EDT) Received: from CANARYLAB.CHEM.NYU.EDU by is.nyu.edu; (5.65v3.0/1.1.8.2/23Sep94-1121PM) id AA14599; Mon, 21 Oct 1996 15:14:33 -0400 Received: by canarylab.chem.nyu.edu (940406.SGI/931108.SGI.AUTO.ANONFTP) for ;at;is.NYU.EDU:CHEMISTRY;at;www.ccl.net id AA01731; Mon, 21 Oct 96 15:27:47 -0400 From: jesus -x- at -x- canarylab.chem.nyu.edu (Jesus M. Castagnetto M.) Message-Id: <9610211927.AA01731*- at -*canarylab.chem.nyu.edu> Subject: CCL: SUMMARY: Some Molecular File Format Descriptions To: CHEMISTRY:~at~:www.ccl.net (Computational Chemistry List) Date: Mon, 21 Oct 1996 15:27:43 -0500 (EDT) Organization: New York University, Department of Chemistry X-Mailer: ELM [version 2.4 PL21] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit This is a summary of info I had previously and responses I got with respect to my inquiry on molecular file formats. The original message said: > I have searched the CCL archives and several other web search > engines/sites, but I could not find that anybody has compiled > a list of the currently available molecular structure file > formats, along with their respective format description. > I know about the PDB, XYZ, MacroModel and some other formats, > but not many. By no means my search was exhaustive, so if someone > knows of good pointers related to this I will appreciate > your input. I will summarize to the list the info I receive. > Greetings and TIA for the help/hints/ideas/pointers. Most of the people indicated BABEL(++), as a source of info on file formats. I use it (almost) everyday, and can testify it is a fine program, but what I was looking were *descriptions* of the formats, in order to find out why some file I had were broken. Thanks to Pat Walters and Math Stahl for a good program. (++) BABEL: http://mercury.aichem.arizona.edu/babel.html Here goes the info: [1] I had gathered the info below about packages and file formats: (a) Macromodel and related info: http://www.columbia.edu/cu/chemistry/mmod/mmod.html also the manual that accompanies the package describes the formata in extenso. (b) PDB format and related info: http://pdb.pdb.bnl.gov specially:"The Protein Data Bank Contents Guide: Atomic Coordinate" http://pdb.pdb.bnl.gov/Format.doc/Format_Home.html (c) XMol general info (Minnesota Supercomputer Center, Inc.) http://www.msc.edu/msc/docs/xmol/XMol.html and the man page for XYZ (part of XMol) XYZ(5MSC) Unix Programmer's Manual XYZ(5MSC) NAME XYZ - Cartesian molecular model file format COPYRIGHT -!at!-Copyright 1991 Research Equipment Inc. dba Minnesota Supercomputer Center RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure of this software and its documentation by the Government is subject to restrictions as set forth in subdivision { (b) (3) (ii) } of the Rights in Technical Data and Computer Software clause at 52.227-7013. DESCRIPTION XYZ datafiles specify molecular geometries using a Cartesian coordinate system. This simple, stripped-down, ASCII-readable format is intended to serve as a "transition" format for the XMol series of applications. For example, suppose a molecular datafile was in a format not supported by XMol. In order to read the data into XMol, it would be possible to modify the datafile, perhaps by creating a shell script, so that it fit the relatively lenient requirements of the XYZ format specification. Once data is in XYZ format, it may be examined by XMol, or converted to yet another format. The XYZ format supports multi-step datasets. Each step is represented by a two-line "header," followed by one line for each atom. The first line of a step's header is the number of atoms in that step. This integer may be preceded by whitespace; anything on the line after the integer is ignored. The second line of the header leaves room for a descriptive string. This line may be blank, or it may contain some information pertinent to that particular step, but it must exist, and it must be just one line long. Each line of text describing a single atom must contain at least four fields of information, separated by whitespace: the atom's type (a short string of alphanumeric characters), and its x-, y-, and z-positions. Optionally, extra fields may be used to specify a charge for the atom, and/or a vector associated with the atom. If an input line contains five or eight fields, the fifth field is interpreted as the atom's charge; otherwise, a charge of zero is assumed. If an input line contains seven or eight fields, the last three fields are interpreted as the components of a vector. These components should be specified in angstroms. Note that the XYZ format doesn't contain connectivity information. This intentional omission allows for greater flexibility: to create an XYZ file, you don't need to know where a molecule's bonds are; you just need to know where its atoms are. Connectivity information is generated automatically for XYZ files as they are read into XMol-related applications. Briefly, if the distance between two atoms is less than the sum of their covalent radii, they are considered bonded. FILES /usr/local/etc/xmol/examples/* sample datafiles /usr/local/etc/xmol/xyz.types table of atom types supported by XYZ format /usr/local/etc/xmol/xyz.cnvt conversion table for XYZ format SEE ALSO xmol(1MSC) AUTHORS Carolyn Wasikowski Stefan Klemm 27 Apr 1993 (d) AMBER related info: http://www.amber.ucsf.edu/amber/amber.html and the AMBER file specifications: http://www.amber.ucsf.edu/amber/formats.html (e) CSD info in general at CCDC http://csdvx2.ccdc.cam.ac.uk/ also the documentation that comes with the CD-ROM distribution. (f) SPARTAN (from wavefuntion): Uses a cartesian coordinate representation similar to the one used for XYZ files in its output file, minus the charge (listed separately). [2] From the responses I got the following pointers (a) MDL formats (there is a PDF file with lots of info here) http://www.mdli.com/prod/fileformats.html (b) and another PDB info site http://www.mi.uni-erlangen.de/~dosche/casihp.htm Thank you to all who responded (list below in no particular order, and I hope I am not missing anyone). Sorry I didn't get to answer to each one individually: Soaring Bear Pat Walters Jonathan Baell Dale Braden Henry Chermette Stefan Grzybek Bill Ross Ralph Puchta Willie Cui Jasna Klicic Greetings. P.S. Below it is a list of file formats babel undertands and converts. Babel 1.5 BETA -- Sep 29 1996 -- 22:48:48 for menus type -- babel -m Usage is : babel [-v] -i -o "" Currently supported input types alc -- Alchemy file prep -- AMBER PREP file bs -- Ball and Stick file bgf -- MSI BGF file car -- Biosym .CAR file boog -- Boogie file caccrt -- Cacao Cartesian file cadpac -- Cambridge CADPAC file charmm -- CHARMm file c3d1 -- Chem3D Cartesian 1 file c3d2 -- Chem3D Cartesian 2 file cssr -- CSD CSSR file fdat -- CSD FDAT file gstat -- CSD GSTAT file dock -- Dock Database file dpdb -- Dock PDB file feat -- Feature file fract -- Free Form Fractional file gamout -- GAMESS Output file gzmat -- Gaussian Z-Matrix file gauout -- Gaussian 92 Output file g94 -- Gaussian 94 Output file hin -- Hyperchem HIN file sdf -- MDL Isis SDF file m3d -- M3D file macmol -- Mac Molecule file macmod -- Macromodel file micro -- Micro World file mm2in -- MM2 Input file mm2out -- MM2 Output file mm3 -- MM3 file mmads -- MMADS file mdl -- MDL MOLfile file molen -- MOLIN file mopcrt -- Mopac Cartesian file mopint -- Mopac Internal file mopout -- Mopac Output file pcmod -- PC Model file pdb -- PDB file psin -- PS-GVB Input file psout -- PS-GVB Output file msf -- Quanta MSF file schakal -- Schakal file shelx -- ShelX file smiles -- SMILES file spar -- Spartan file semi -- Spartan Semi-Empirical file spmm -- Spartan Mol. Mechanics file mol -- Sybyl Mol file mol2 -- Sybyl Mol2 file wiz -- Conjure file unixyz -- UniChem XYZ file xyz -- XYZ file xed -- XED file Currently supported output types diag -- DIAGNOTICS file alc -- Alchemy file bs -- Ball and Stick file bgf -- BGF file bmin -- Batchmin Command file caccrt -- Cacao Cartesian file cacint -- Cacao Internal file cache -- CAChe MolStruct file c3d1 -- Chem3D Cartesian 1 file c3d2 -- Chem3D Cartesian 2 file cdct -- ChemDraw Conn. Table file dock -- Dock Database file wiz -- Wizard file contmp -- Conjure Template file cssr -- CSD CSSR file dpdb -- Dock PDB file feat -- Feature file fhz -- Fenske-Hall ZMatrix file gamin -- Gamess Input file gcart -- Gaussian Cartesian file gzmat -- Gaussian Z-matrix file gotmp -- Gaussian Z-matrix tmplt file hin -- Hyperchem HIN file icon -- Icon 8 file idatm -- IDATM file sdf -- MDL Isis SDF file m3d -- M3D file macmol -- Mac Molecule file macmod -- Macromodel file micro -- Micro World file mm2in -- MM2 Input file mm2out -- MM2 Ouput file mm3 -- MM3 file mmads -- MMADS file mdl -- MDL Molfile file miv -- MolInventor file mopcrt -- Mopac Cartesian file mopint -- Mopac Internal file csr -- MSI Quanta CSR file pcmod -- PC Model file pdb -- PDB file psz -- PS-GVB Z-Matrix file psc -- PS-GVB Cartesian file report -- Report file smiles -- SMILES file spar -- Spartan file mol -- Sybyl Mol file mol2 -- Sybyl Mol2 file maccs -- MDL Maccs file torlist -- Torsion List file unixyz -- UniChem XYZ file xyz -- XYZ file xed -- XED file ----- Jesus M. Castagnetto M. | "Organic Chemistry: The practice Dep.of Chemistry - New York University | of transmuting vile substances 4 Washington Pl, Room 514. NY 10003 | into publications" (The Last Word- jesus -8 at 8- canarylab.chem.nyu.edu | The Ultimate Scientific Dictionary)