OmegaT

From Wikipedia, the free encyclopedia

OmegaT

OmegaT under Windows XP
Developer: Maxym Mykhalchuk
Latest release: 1.6.1_04 / February 03, 2007
OS: Cross-platform
Use: Computer-assisted translation
License: GPL
Website: www.omegat.org

OmegaT® is a computer-assisted translation tool written in the Java programming language. It is free software originally developed by Keith Godfrey in 2000, and is currently developed by a team led by Maxym Mykhalchuk.

OmegaT is intended for professional translators. Some of its features include user-customisable segmentation using regular expressions, translation memory, fuzzy matching, match propagation, glossary matching, context search in translation memories and keyword search in reference materials.

It requires Java 1.4, which is available for Linux, Mac OS X and Microsoft Windows 98 or higher.

The name OmegaT (the name registered on the SourceForge site) is now a registered trademark. Currently, only the software released by the OmegaT project is allowed to use the name OmegaT.

Contents

History

OmegaT was first developed by Keith Godfrey in 2000. The original engine was written in C++, but the first public release in February 2001 was written in Java.

The first Java version used a proprietary translation memory format, and required Java 1.3 to run. It offered support for StarOffice documents, so-called plain text and Unicode text, and HTML, and could do only block-level segmentation (which for most practical purposes meant paragraph segmentation).

The current version, 1.6.1_04,[1] has many additional features. Additional features include flexible segmentation rules which makes sentence segmentation possible, much improved TMX import (up to 1.4b level2), regular expressions based searches, DocBook support as well as a new graphical user interface among others.

Workflow in OmegaT

The user places source documents, existing translation memories and any glossaries in specified subfolders of a translation project. When a project is "opened", OmegaT extracts the translatable text from all recognised documents. As the translator translates each segment, OmegaT adds the translation units to a translation memory. Finally, OmegaT creates the target documents by merging the translation memory with the source documents.

During translation, fuzzy matches from the translation memory and matches from the glossaries for the current segment are displayed in the adjacent Match/Glossary window. Fuzzy matches are inserted by the translator using keyboard shortcuts. Fuzzy matches above a user-determined threshold can optionally be inserted automatically.

The translator can switch to a different document in the same project at any time using the Project Files viewer, or to a different segment in the same file using keyboard shortcuts or by double-clicking the appropriate segment.

Whenever additional source documents, translation memories or glossaries are added to the project, or when manual changes are made to those files, the translator must "reload" the project, so that OmegaT recognises the newly added segments. The project must also be reloaded when changes to the segmentation rules are made in mid-translation.

Collaboration between translators

Translators using different computer assisted translation tools can only share their translation memories if (a) either or both their respective programs can import and/or export the other program's proprietary format or (b) both programs can import and export an intermediary format. OmegaT does the latter. It can import and export the industry standard intermediary format TMX (Translation Memory eXchange).

OmegaT's glossary files are tab-delimited plain text files with the source term in the first column and the target text in the second column, a third column can be used for anything (e.g. user comments. Additional columns are ignored by OmegaT. OmegaT does not support the industry standard glossary format TBX proposed by LISA.[2]

Supported source document formats

OmegaT can translate the following formats: text files (any text format which Java can handle) encoded in a variety of encodings including Unicode, HTML/XHTML, Java properties files, StarOffice, OpenOffice.org and OpenDocument (ODF),[3] as well as DocBook files, Portable Object (PO) files and files with a "Key=Value" structure. It handles formatted documents using tagged text in a way which is similar to that of other commercial translation memory tools.

OmegaT does not offer direct support for Microsoft Office formats Word, Excel and PowerPoint. However, OpenOffice.org (and variants)[4] can be used to convert such formats to OpenDocument, that OmegaT natively supports.

The Translate Toolkit,[5] a python tool set, provides users with a number of converters to and from Portable Object, including Mozilla .properties and dtd files, CSV files, Qt .ts files, XLIFF files. It includes a number of tools to manipulate such files before or after their translation in OmegaT.

Files formats such as LaTeX, TeX, POD etc can be converted to and from Portable Object using the po4a utility.[6]

The Text Extraction Utility from the Okapi Framework has an option for creating an OmegaT project folder tree.[7] which brings even more alien formats within OmegaT's reach. Okapi is a .Net 2.0 application and requires Windows to run. It will not run on .Net free implementations that do not support .Net 2.0.

OmegaT does not officially support file formats such as WordML, ExcelML, and Latex, or standard formats used for translation, such as Trados uncleaned RTF files, TTX or XLIFF files. It is possible to translate uncleaned RTF files by tweaking segmentation rules after conversion to a supported format.

Supported memory and glossary formats

OmegaT's internal translation memory format is not visible to the user, but every time it autosaves the translation project, all new or updated translation units are automatically exported and added to three external TMX memories: a native OmegaT TMX, a level 1 TMX and a level 2 TMX.

  • The native TMX file is for use in OmegaT projects.
  • The level 1 TMX file preserves textual information and can be used with TMX level 1 and 2 supporting CAT tools.
  • The level 2 file preserves textual information as well as inline tag information and can be used with TMX level 2 supporting CAT tools.

Exported level 2 files include OmegaT's internal tags encapsulated in TMX tags which allows such TMX files to generate matches in TMX level 2 supporting CAT tools. Tests have been positive in Trados and SDLX.

OmegaT can import TMX files up to version 1.4b level 1 as well as level 2. Level 2 files imported in OmegaT will not generate matches of the same level since OmegaT ignores the contents of the formatting encapsulated contents. OmegaT recognizes its own TMX level 2 files flawlessly which permits their use in other OmegaT projects as if they were native OmegaT TMX files. Here again, tests have been positive with TMX files created by DVX, Trados and SDLX.

For glossaries, it uses tab-delimited plain text files. The structure of a glossary file is extremely simple: the first column contains the source language word, the second column contains the corresponding target language words, the third column (optional) can contain anything including comments on context etc. Such glossaries can easily be created by exporting 3 columns spreadsheets to CSV format with the following parameters: field delimiter={tab}, word delimiter={space}.

Documentation

When OmegaT starts, a quick guide called "Instant Start" is displayed. A comprehensive User Manual, originally by Marc Prior, the project coordinator, is bundled with the OmegaT installation. Both of these have been translated into several languages by volunteers. Finally, the archived messages of OmegaT's user groups are searchable by anyone without registration.

At the time of 1.6.1_03, OmegaT is localized to the following languages:

  • Full localizations:
    • Albanian
    • Italian (updated
    • Portuguese (Brazil)
    • Serbo-Croatian
    • Slovak
    • Slovenian
  • Partial localizations:
    • Belarussian
    • French
    • Japanese

Development and localisation

Code development is currently handled by a team lead by Maxym Mykhalchuk. Other current code contributors include Didier Briel, Sacha Chua, Kim Bruning, Thomas Huriaux, Henry Pijffers, Benjamin Siband and Martin Wunderlich. The developers respond to bug reports and requests for enhancements filed on the SourceForge development site.[8]

The OmegaT user interface and bundled documentation is translated by volunteers.[9] If you find that your language is not available in the latest release, please contact the user group[10] and do not hesitate to propose your help.

OmegaT users are encouraged to contribute tools written by themselves in response to translators' needs which are not yet addressed by the main OmegaT program itself.[11]

Related software

Tools created by OmegaT contributors

Several tools have been created by OmegaT contributors for the purpose of being used in conjunction with OmegaT. Some of which are only useful with previous versions of OmegaT.

You will find them either on the OmegaT web site[12] or on the user mailing list web space:[13]

  • Benjamin Siband's OpenOffice.org segmenter macros
  • Didier Briel's aligner utility
  • Dmitri Gabinski's aligner utility
  • Dmitri's Wordfast TMX converter
  • Dmitri's language selector
  • Henry Pijffers's TMX merger tool
  • Henry's TMX cleaner tool
  • Marc Prior's external spell-checker
  • Marc's sentence segmenter tool
  • Samuel Murray's two spell-checker scripts
  • Samuel's complicated utility for creating Trados uncleaned files
  • Sonja Tomaskovic's macro for removing internal tags from TMX files

TMX creation tools

A number of third party tools can be used to create TMX files (besides for Didier's and Dmitri's tools described above):

The OmegaT Project also proposes a java bundle properties aligner to create TMX files.[14]

CSVConverter, published by Heartsome, creates a TMX from a CSV file. It is distributed for free.[15]

Mikel Forcada and Susana Santos's aligner, bitext2tmx, can also be used in conjunction with OmegaT.[16]

The Translate Toolkit po2tmx converter can be used to create TMX memories too.[17]

See also

Forks

There is a collection of some of the above tools repackaged in different compressions schemes that also distributes a stripped down version of OmegaT® 1.4.5.[18] The distributor has not been authorized to use the name OmegaT, which is a registered trademark.

External links

User groups

References

  1. ^ OmegaT 1.6.1 Update 4 Release notes - Released on 2007, Feb. 03
  2. ^ LISA - Localization Industry Standards Association
  3. ^ Open Document Format for Office Applications - ISO/IEC 26300:2006 format
  4. ^ OpenOffice.org - A free office suite that offers conversion filters to and from most of the commonly used Microsoft Office file formats
  5. ^ Translate Toolkit - The Translate Toolkit is a toolkit to convert between various different translation formats (such as gettext-based .po formats, OpenOffice.org formats, and Mozilla formats).
  6. ^ po4a - A conversion utility to and from the Portable Object format, perl application packaged under Debian
  7. ^ Okapi Framework - Text Extraction utility can create an OmegaT project folder trees
  8. ^ OmegaT development site Filing requests for enhancements and bug reports
  9. ^ Localization process How to contribute a translation
  10. ^ OmegaT Localization Plan Finding the localisation status of a specific language
  11. ^ OmegaT Getting Involved - Translators are encouraged to write their own supplementary tools
  12. ^ OmegaT Resources - Third-party tools on the OmegaT web site
  13. ^ OmegaT Files - Third-party tools on the OmegaT user mailing list web space (registration required)
  14. ^ Java .properties Import Java application to create TMX files from existing bundle properties translations
  15. ^ Heartsome's free tools - CSVConverter (creates a TMX from a CSV file). Also: TBXMaker, RTFCleaner, TMXValidator (released under the Eclipse Public License), Java Properties Viewer
  16. ^ bitext2tmx - Aligner written in Java by Mikel Forcada and Susana Santos
  17. ^ toolkit:po2tmx - Convert Gettext PO files to a TMX translation memory file
  18. ^ fork - Translation tools collection containing the fork
In other languages