Xml:tm

From Wikipedia, the free encyclopedia

Contents

[edit] xml:tm XML-based Text Memory

xml:tm (XML-based Text Memory) is the vendor-neutral open XML based standard for embedding text memory within an XML document. xml:tm leverages the namespace syntax of XML to embed text memory information within the XML document itself. xml:tm is developed and maintained by OSCAR[1] (Open Standards for Container/Content Allowing Re-use), a special interest group of LISA[2] (Localization Industry Standards Association).

xml:tm provides a radical new approach to the task of authoring and translating XML documents.

At the core of xml:tm is the concept of "text memory". Text memory comprises two components:

  1. Author Memory
  2. Translation Memory

[edit] Author Memory

XML namespace is used to map a text memory view onto a document. This process is called segmentation. The text memory works at the sentence level of granularity - the text unit. Each individual xml:tm text unit is allocated a unique identifier. This unique identifier is immutable for the life of the document. As a document goes through its life cycle the unique identifiers are maintained and new ones are allocated as required. This aspect of text memory is called author memory. It can be used to build author memory systems which can be used to simplify and improve the consistency of authoring.

[edit] Translation Memory

When an xml:tm namespace document is ready for translation the namespace itself specifies the text that is to be translated. The tm namespace can be used to create an OASIS XLIFF document for translation. xml:tm allows for much more focused and better defined translation memory matching:

  • Exact Matching
    Author memory provides exact details of any changes to a document. Where text units have not been changed for a previously translated document xml:tm provides the basis for declaring an "Exact match" with the previously translated target language document.
  • In document leveraged matching
    xml:tm can also be used to find in-document leveraged matches
  • Database Leveraged matching
    When an xml:tm document is translated the translation process provides perfectly aligned source and target language text units. These can be used to create traditional translation memories.
  • In document inexact matching
    The text units contained in the leveraged memory database can also be used to provide approximate matches of similar previously translated text from within the same document.
  • Inexact matching
    The text units contained in the leveraged memory database can also be used to provide approximate matches of similar previously translated text.
  • Non translatable text
    Text units that are made up solely of numeric, alphanumeric, punctuation or measurement items can be identified during authoring and flagged as non translatable, thus reducing the translation count metrics.

[edit] Interoperability with other Localization Industry Standards

xml:tm was designed from the outset to integrate closely with and leverage the potential of other XML based Localization Industry Standards as well as that of XML syntax itself. In particular:

  • SRX (Segmentation Rules eXchange)
    xml:tm mandates the use of SRX for text segmentation of paragraphs into text units.
  • Unicode Standard Annex #29-9
    xml:tm mandates the use of Unicode Standard Annex #29 for tokenization of text into words.
  • XLIFF 1.2
    xml:tm mandates the use of XLIFF for the actual translation process. xml:tm is designed to facilitate the automated creation of XLIFF files from xml:tm enabled documents, and after ranslation to easily create the target versions of the documents.
  • GMX-V (Global Information Management Metrics eXchange - Volume)
    xml:tm mandates the use of GMX-V for all metrics concerning authoring and translation.
  • TMX (Translation Memory eXchange)
    xml:tm facilitates the easy creation of TMX documents, aligned at the sentence level.
  • DITA (Darwin Information Technology Architecture)
    xml:tm complements the DITA standard by allowing text reuse at the sentence level within DITA documents.
  • W3C ITS
    xml:tm mandates the use of W3C ITS Document Rules for identifying translatable text within an XML document as well as W3C ITS Best Practices with regard to XML document localization.

[edit] References

  1. ^ OSCAR - Open Standards for Container/Content Allowing Re-use
  2. ^ LISA - Localization Industry Standards Association

[edit] External links

Template:XML-based standards-stub