CML (Chemical Markup Language) Update



CML is an Open XML infrastructure which supports molecular information and is compliant with current W3C (World Wide Web Consortium) protocols and philosophy. CML is designed as an application-independent adapter for chemical information. This is to announce a number of new developments. (Much of the work is jointly with Henry Rzepa).
 
(A) Specifications: The W3C released the XML Schema Recommendation last year and we have converted the current V1.0 DTD to a schema. This is still at a draft stage, but increases the power of the language while simplifying some of the syntax. W3C schemas are seen as the main way forward for most XML applications and will support a variety of powerful new tools like XML forms, XML query, etc. In addition we have developed a core XML language (STMML) for representing scientific data (such as data structures (arrays, matrices), many data types, scientific units, metadata, etc.). STMML is a core part of CML but can be re-used in other applications.
 
(B) Programming: We have created A W3C DOM (Document Object Model) for CML. This forms an Open abstract data model (and API) for those writing molecular applications in XML (and other OO approaches such as UML). The CML DOM has been developed alongside the Life Sciences Research group specification for small molecules for the Object Management Group. We have also created SAX2-based modules for CML. These interfaces are available under Open Source license for any developers.
 
(C). Data: We are working closely with the National Cancer Institute (NCI) who are converting their database to CML.
 
(D) CML is now an integral part of several Open Source projects such as OpenBabel, JMOL, JChemPaint and XDrawChem (all on http://www.sourceforge.net)
 
(E) A new resource has been set up at Source Forge, the Open Source repository. Project Page: http://cml.sourceforge.net; Downloads and CVS repository at http://www.sourceforge.net/projects/cml. There is much supporting material including examples and Java. Javascript, and XSLT toolkits. We have developed a C++ SAX2-like parser for OpenBabel: http://www.sourceforge.net/projects/openbabel
 
(F) SELFML: The SELF project (Prof. Henry Kehiaian, Paris) has created an ontology for physicochemical data (especially properties of molecules and mixtures of molecules). SELFML is the XML incarnation of SELF and supports the data and the specifications (dictionary entries). SELFML interoperates with CML and can therefore support collections of molecular properties (catalogues, dictionaries, etc.)
 
(G) CML is being extended to support reactions and computational chemistry. We are starting to convert codes (initially Open Source such as MOPAC and GROMACS) to read and write XML, and to develop the XML ontologies required. Offers of collaboration welcomed :-)
 Peter Murray-Rust
 --
 Peter Murray-Rust, pm286 AT cam.ac.uk
 Unilever Centre for Molecular Informatics, Chemistry Department
 Lensfield Road, Cambridge, CB2 1EW, UK
 +44-1223-336-432