RE: V3000 molfile format



The V3000 molfile format is not new.  It was first published in 1995 and
 its
 first role was to remove the arbitrary atom and bond limits that are built
 into the V2000 format.  Unfortunately the V2000 structure is very limiting
 and is not extensible.
 Large structure tend not to be distributed as molfiles and therefore there
 have been very few of them in circulation.  MDL's enhanced stereochemistry
 introduces a number of new representation features that could only be fitted
 in the V3000 format.  We expect that the enhanced stereochemical
 representation will have a noticeable impact on the number of V3000 format
 molfiles in circulation.
 The V2000 format has served us well and will continue to serve us for many
 years.  We have no plans to desupport it in our products.  Structures that
 can be totally represented in the V2000 will continue to be handled in a
 V2000 format file.  A V3000 format will be triggered only if the structure
 contains features that cannot be represented in the V2000 format.
 MDL is researching chemical structure file formats.  XML is part of that
 research and compatibility with CML is under consideration.
 XML formats are very verbose and this imposes a large overhead when it comes
 to parsing them.  If you are dealing with a small number of structures this
 overhead is tolerable.  It is, however, common in our user base to need to
 work on structure sets that contain 100,000+ entries.  The overhead then
 becomes significant and a more compact representation is required.  This is
 why applications that consume large numbers of structures tend to read and
 write SDfiles or concatenated SMILES strings.
 If anyone would like to engage in a more detailed discussion about this
 issue or anything else connected with chemical structure representation
 please contact me directly at k.taylor;at;mdl.com.
 -----Original Message-----
 From: Alberto Gobbi [mailto:agobbi;at;anadyspharma.com]
 Sent: Wednesday, December 04, 2002 10:47 PM
 To: CHEMISTRY;at;ccl.net
 Subject: CCL:V3000 molfile format
 Hi Everybody,
 without wanting to be ofensive I would like to ask if you have really
 considered all the options before creating a new file structure to store
 structures.
 The V2000 molfile format is well established and handles most of the cases
 required so far. There are thousands of applications which can read and
 write molfiles which would need to be modified. As one of your customers I
 do not consider that you are really doing us a good service in creating a
 new proprietary format.
 Also XML is becoming the standard for persistently storing and transmitting
 any kind of information worldwide in all different kinds of areas. There are
 a lot of standard, open, well tested and robust applications and libraries
 to read write and check for consistency of xml files. There is even an open
 standard CML (http://www.xml-cml.org/) for storing chemical structures and
 data based on XML. XML is carefully designed to be both flexible and
 extensible. It's certainly more extensible and flexible than the V3000
 format and would surely meet not just MDL's present needs but their future
 needs as well. So if you really think that there is a need for storing
 additional information I feel you would do your customers a better service
 in supporting CML instead of creating a new standard which will cause a lot
 of headaches to people who would like to exchange structures or simply
 import them into their applications.
 With best regards,
 Alberto
 ===============================================
 Alberto Gobbi
 Anadys Pharmaceuticals
 9050 Camino Santa Fe
 San Diego CA, 92121
 USA
 -----------------------------------------------
 AGobbi;at;AnadysPharma.com
 Tel.: +1 858 530 3657
 -----Original Message-----
 From: Keith Taylor [mailto:K.Taylor;at;mdl.com]
 Sent: Tuesday, December 03, 2002 8:24 AM
 To: Computational Chemistry List
 Subject: CCL:V3000 molfile format
 If you use molfiles to transport structure information between applications,
 you need to be aware that MDL is introducing an enhancement to its
 stereochemical representation and this has an impact on the format of the
 molfile.  The enhanced stereochemical representation will require the use of
 V3000 format molfiles and your molfile readers and writers will need to be
 updated to handle this information.
 MDL publishes the molfile format and the latest version of the document
 (August 2002) can be downloaded from:
 http://www.mdl.com/downloads/ctfile/ctfile_subs.html
 -= This is automatically added to each message by mailing script =-
 CHEMISTRY;at;ccl.net -- To Everybody  | CHEMISTRY-REQUEST;at;ccl.net -- To
 Admins
 Ftp: ftp.ccl.net  |  WWW: http://www.ccl.net/chemistry/   | Jan: jkl;at;ccl.net