Text Encoding Initiative
From Wikipedia, the free encyclopedia
The Text Encoding Initiative (TEI) is a consortium of institutions and research projects which collectively maintains and develops a standard for the representation of texts in digital form. Originally sponsored by three scholarly societies, the TEI is now an independent membership consortium, hosted by academic institutions in the US and in Europe. Its major deliverable is a set of Guidelines, which specify encoding methods for machine-readable texts, chiefly in the humanities, social sciences and linguistics. Since 1994, these guidelines are a widely-used standard for text materials for performing online research and teaching.
Contents |
[edit] Sponsors and organisation
The scholarly societies originally sponsoring the TEI are the Association for Computers and the Humanities, the Association for Computational Linguistics, and the Association for Literary and Linguistic Computing. These three groups first organized the TEI in 1987 as a research effort funded exclusively by significant grants from many agencies.
Today, the TEI Consortium is a member-funded non-profit corporation hosted by:
- The Research Technologies Service at the University of Oxford,
- the Scholarly Technology Group at Brown University,
- a francophone group comprising ATILF, INIST, and LORIA, co-ordinated at Nancy
- the Electronic Text Center and the Institute for Advanced Technology in the Humanities at the University of Virginia.
[edit] The guidelines
The Guidelines define some 400 different textual components and concepts, which can be expressed using a markup language and defined by a DTD or XML schema. Early versions of the Guidelines used SGML as a means of expression; more recently XML has been adopted. The basic concepts have been stable for over a decade, with TEI P3 (public release version 3) published in 1994, and updated in 1999. P4 (2002) is a slight update to accommodate XML; TEI P5 is currently under development and adds many new features.
The TEI scheme is a modular one, designed to be customized for particular research or production environments. Many different applications of it are possible; one very popular example customization is subset is known as TEI Lite.
There is ongoing work on TEI P5 which, although it breaks backward compatibility in a number of ways, has significantly updated the inner workings including a reorganization of the underlying structures of elements into classes which allow greater and easier customization. Maintenance and development continue under the sponsorship of the TEI Consortium. The TEI component for marking up feature structures (a model of data sometimes used in linguistics) has been adopted as the basis of the ongoing development of an ISO standard for feature structures.
[edit] TEI projects
The TEI is used by many projects worldwide. The TEI Website contains a list of TEI Projects and a form for adding your project. Some well-known projects include:
- Canterbury Tales Project
- EpiDoc
- Henrik Ibsens skrifter
- Medieval Nordic Text Archive
- Oxford Text Archive
- Perseus Project
- Women Writers Project
- New Zealand Electronic Text Centre