Chemical table file
Chemical table files are files that contain information about chemicals.
File formats
Chemical table files come in various formats. In addition to the formats discussed below, other formats include RGfiles, Rxnfiles, RDfiles, XDfiles and Clipboard.
Molfile
Filename extension |
.mol |
---|---|
Internet media type |
chemical/x-mdl-molfile |
Type of format | chemical file format |
An MDL Molfile is a file format for holding information about the atoms, bonds, connectivity and coordinates of a molecule.
The format was created by MDL Information Systems (MDL), which was acquired by Symyx Technologies then merged with Accelrys Corp., and now called BIOVIA, a subsidiary of Dassault Systemes[1]
The molfile consists of some header information, the Connection Table (CT) containing atom info, then bond connections and types, followed by sections for more complex information.
The molfile is sufficiently common that most, if not all, cheminformatics software systems/applications are able to read the format, though not always to the same degree. It is also supported by some computational software such as Mathematica.
The current de facto standard version is molfile V2000; although, more recently, the V3000 format has been circulating widely enough to present a potential compatibility issue for those not yet V3000-capable.
BIOVIA publishes a specification of their Connection-Table formats, which include Molfile and SD formats.[2]
Following are the contents of a Molfile of benzene created in ChemSketch, as seen in a text editor (Note: According to the official molfile specification,[3] the '$$$$' notation applied only to the SDF file – not to the molfile, so ChemSketch molfiles will not always function properly.):
benzene
ACD/Labs0812062058
6 6 0 0 0 0 0 0 0 0 1 V2000
1.9050 -0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.9050 -2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7531 -0.1282 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7531 -2.7882 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.3987 -0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.3987 -2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 0 0 0 0
3 1 2 0 0 0 0
4 2 2 0 0 0 0
5 3 1 0 0 0 0
6 4 1 0 0 0 0
6 5 2 0 0 0 0
M END
$$$$
Lines | Section | Description |
---|---|---|
1-3 | Header | |
1 | Molecule name ("benzene") | |
2 | User/Program/Date/etc information | |
3 | Comment (blank) | |
4-17 | Connection table (Ctab) | |
4 | Counts line: 6 atoms, 6 bonds, ..., V2000 standard | |
5-10 | Atom block (1 line for each atom): x, y, z (in angstroms), element, etc. | |
11-16 | Bond block (1 line for each bond): 1st atom, 2nd atom, type, etc. | |
17 | Properties block (empty) | |
18 | $$$$ | See note |
SDF
Filename extension |
.sd, .sdf |
---|---|
Internet media type |
chemical/x-mdl-sdfile |
Type of format | chemical file format |
SDF is one of a family of chemical-data file formats developed by MDL; it is intended especially for structural information. "SDF" stands for structure-data file, and SDF files actually wrap the molfile (MDL Molfile) format. Multiple compounds are delimited by lines consisting of four dollar signs ($$$$). A feature of the SDF format is its ability to include associated data.
Associated data items are denoted as follows:
> <Unique_ID>
XCA3464366
> <ClogP>
5.825
> <Vendor>
Sigma
> <Molecular Weight>
499.611
Some programs that can import SDF files (e.g. ISIS/Base) require that the first data field after the molecule data (in the example above, Unique_ID) be a unique identifier for each record.
Multiple data items are permitted on multiple lines. The MDL SDF-format specification requires that a hard-carriage-return character be inserted into any text field whose content exceeds 200 characters. This requirement is frequently violated in practice, as many SMILES and InChI strings exceed that length.
See also
References
- ↑ Dalby, A.; Nourse, J. G.; Hounshell, W. D.; Gushurst, A. K. I.; Grier, D. L.; Leland, B. A.; Laufer, J. (1992). "Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited". Journal of Chemical Information and Modeling 32 (3): 244. doi:10.1021/ci00007a012.
- ↑ Biovia (June 2014), CT File Formats, Biovia. CTFile format definitions available on request (registration required).
- ↑ Chemical table file specification is available at http://download.accelrys.com/freeware/ctfile-formats/ctfile-formats.zip (December 2011)
External links
- SDF Toolkit free software to process SD files (SDF).
- NCI/CADD Chemical Identifier Resolver generates SD files (SDF) from chemical names, CAS Registry Numbers, SMILES, InChI, InChIKey, ....
- KNIME free software to manipulate data and do datamining, can also read and write SD files (SDF).