Chemical table file
From Wikipedia, the free encyclopedia
ctab | |
---|---|
File name extension | .mol, .sd, .sdf |
Type of format | chemical file format |
Chemical table files are files that contain information about chemicals.
Contents |
[edit] File formats
Chemical table files come in various formats. In addition to the formats discussed below, other formats include RGfiles, Rxnfiles, RDfiles, XDfiles and Clipboard.
[edit] Molfiles
A MDL Molfile is a file format created by MDL and now owned by Symyx, for holding information about the atoms, bonds, connectivity and coordinates of a molecule. The molfile consists of some header information, the Connection Table (CT) containing atom info, then bond connections and types, followed by sections for more complex information.
The molfile is sufficiently common that most, if not all, cheminformatics software systems/applications are able to read the format, though not always to the same degree. It is also supported by some computational software such as Mathematica.
There are different versions, the current de facto standard is the V2000 molfile, though more recently the V3000 format has been circulating in large-enough volumes to be an issue for those unable to read V3000-format files.
MDL publishes a specification of their Connection Table formats, which include Molfile and SD formats.
Following are the contents of a Molfile of benzene created in ChemSketch, as seen in a text editor:
benzene
ACD/Labs0812062058
6 6 0 0 0 0 0 0 0 0 1 V2000
1.9050 -0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.9050 -2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7531 -0.1282 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7531 -2.7882 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.3987 -0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.3987 -2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 0 0 0 0
3 1 2 0 0 0 0
4 2 2 0 0 0 0
5 3 1 0 0 0 0
6 4 1 0 0 0 0
6 5 2 0 0 0 0
M END
$$$$
1: header
2: comment
3: general information: 6 atoms, 6 bonds, ..., V2000 standard
4-9: x, y, z, element, extra information
10-15: bonding information (each bond listed): 1st atom, 2nd atom, type, extra information
According to the official mole file specification the $$$$ sign applied only to the SDF file but no to the mol file. So, ChemSketch doesn't work properly.
[edit] SDF
SDF is one of a family of file formats from MDL holding chemical data, especially structure information. "SDF" stands for structure-data file and SDF files actually wrap the molfile (MDL_Molfile) format. Multiple compounds are separated by a delimiter, a line of four dollar signs ($$$$). A feature of SDF is the possibility of storing associated data items.
Associated data items are denoted as follows:
> <Unique_ID> XCA3464366 > <ClogP> 5.825 > <Vendor> Sigma > <Molecular Weight> 499.611
Some SDF import programs (e.g. ISIS/Base) require that the first data field after the molecule data (in the example above, Unique_ID) be a unique identifier for each record.
Multiple data items are possible on multiple lines. The MDL SDF format specifications require a hard carriage return to be inserted in any text field exceeding 200 characters in length. This is frequently violated in practice, as many SMILES and InCHi strings exceed this limit.
[edit] See also
[edit] References
- Dalby, A.; Nourse, J. G.; Hounshell, W. D.; Gushurst, A. K. I.; Grier, D. L. et al. Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited, Journal of Chemical Information and Computer Sciences, 1992, 32, 244-255.
[edit] External links
- MDL
- CTFile format definition latest version available (November, 2007).