CML (Chemical Markup Language) Update
- From: Peter Murray-Rust <pm286 #*at*# cam.ac.uk>
- Subject: CML (Chemical Markup Language) Update
- Date: Thu, 16 May 2002 17:54:48 +0100
CML is an Open XML infrastructure which supports molecular information and
is compliant with current W3C (World Wide Web Consortium) protocols and
philosophy. CML is designed as an application-independent adapter for
chemical information. This is to announce a number of new developments.
(Much of the work is jointly with Henry Rzepa).
(A) Specifications: The W3C released the XML Schema Recommendation
last
year and we have converted the current V1.0 DTD to a schema. This is
still
at a draft stage, but increases the power of the language while
simplifying
some of the syntax. W3C schemas are seen as the main way forward for
most
XML applications and will support a variety of powerful new tools like
XML
forms, XML query, etc. In addition we have developed a core XML
language
(STMML) for representing scientific data (such as data structures
(arrays,
matrices), many data types, scientific units, metadata, etc.). STMML is
a
core part of CML but can be re-used in other applications.
(B) Programming: We have created A W3C DOM (Document Object Model) for
CML.
This forms an Open abstract data model (and API) for those writing
molecular applications in XML (and other OO approaches such as UML).
The
CML DOM has been developed alongside the Life Sciences Research group
specification for small molecules for the Object Management Group. We
have
also created SAX2-based modules for CML. These interfaces are available
under Open Source license for any developers.
(C). Data: We are working closely with the National Cancer Institute
(NCI)
who are converting their database to CML.
(D) CML is now an integral part of several Open Source projects such
as
OpenBabel, JMOL, JChemPaint and XDrawChem (all on http://www.sourceforge.net)
(E) A new resource has been set up at Source Forge, the Open Source
repository. Project Page: http://cml.sourceforge.net; Downloads and CVS
repository at http://www.sourceforge.net/projects/cml. There is much
supporting material including examples and Java. Javascript, and XSLT
toolkits. We have developed a C++ SAX2-like parser for OpenBabel:
http://www.sourceforge.net/projects/openbabel
(F) SELFML: The SELF project (Prof. Henry Kehiaian, Paris) has created
an
ontology for physicochemical data (especially properties of molecules
and
mixtures of molecules). SELFML is the XML incarnation of SELF and
supports
the data and the specifications (dictionary entries). SELFML
interoperates
with CML and can therefore support collections of molecular properties
(catalogues, dictionaries, etc.)
(G) CML is being extended to support reactions and computational
chemistry.
We are starting to convert codes (initially Open Source such as MOPAC
and
GROMACS) to read and write XML, and to develop the XML ontologies
required. Offers of collaboration welcomed :-)
Peter Murray-Rust
--
Peter Murray-Rust, pm286 AT cam.ac.uk
Unilever Centre for Molecular Informatics, Chemistry Department
Lensfield Road, Cambridge, CB2 1EW, UK
+44-1223-336-432