OmegaT
From Wikipedia, the free encyclopedia
OmegaT | |
OmegaT under Windows XP |
|
Developer: | Maxym Mykhalchuk |
---|---|
Latest release: | 1.6.1_04 / February 03, 2007 |
OS: | Cross-platform |
Use: | Computer-assisted translation |
License: | GPL |
Website: | www.omegat.org |
OmegaT® is a computer-assisted translation tool written in the Java programming language. It is free software originally developed by Keith Godfrey in 2000, and is currently developed by a team led by Maxym Mykhalchuk.
OmegaT is intended for professional translators. Some of its features include user-customisable segmentation using regular expressions, translation memory, fuzzy matching, match propagation, glossary matching, context search in translation memories and keyword search in reference materials.
It requires Java 1.4, which is available for Linux, Mac OS X and Microsoft Windows 98 or higher.
The name OmegaT (the name registered on the SourceForge site) is now a registered trademark. Currently, only the software released by the OmegaT project is allowed to use the name OmegaT.
Contents |
History
OmegaT was first developed by Keith Godfrey in 2000. The original engine was written in C++, but the first public release in February 2001 was written in Java.
The first Java version used a proprietary translation memory format, and required Java 1.3 to run. It offered support for StarOffice documents, so-called plain text and Unicode text, and HTML, and could do only block-level segmentation (which for most practical purposes meant paragraph segmentation).
The current version, 1.6.1_04,[1] has many additional features. Additional features include flexible segmentation rules which makes sentence segmentation possible, much improved TMX import (up to 1.4b level2), regular expressions based searches, DocBook support as well as a new graphical user interface among others.
Workflow in OmegaT
The user places source documents, existing translation memories and any glossaries in specified subfolders of a translation project. When a project is "opened", OmegaT extracts the translatable text from all recognised documents. As the translator translates each segment, OmegaT adds the translation units to a translation memory. Finally, OmegaT creates the target documents by merging the translation memory with the source documents.
During translation, fuzzy matches from the translation memory and matches from the glossaries for the current segment are displayed in the adjacent Match/Glossary window. Fuzzy matches are inserted by the translator using keyboard shortcuts. Fuzzy matches above a user-determined threshold can optionally be inserted automatically.
The translator can switch to a different document in the same project at any time using the Project Files viewer, or to a different segment in the same file using keyboard shortcuts or by double-clicking the appropriate segment.
Whenever additional source documents, translation memories or glossaries are added to the project, or when manual changes are made to those files, the translator must "reload" the project, so that OmegaT recognises the newly added segments. The project must also be reloaded when changes to the segmentation rules are made in mid-translation.
Collaboration between translators
Translators using different computer assisted translation tools can only share their translation memories if (a) either or both their respective programs can import and/or export the other program's proprietary format or (b) both programs can import and export an intermediary format. OmegaT does the latter. It can import and export the industry standard intermediary format TMX (Translation Memory eXchange).
OmegaT's glossary files are tab-delimited plain text files with the source term in the first column and the target text in the second column, a third column can be used for anything (e.g. user comments. Additional columns are ignored by OmegaT. OmegaT does not support the industry standard glossary format TBX proposed by LISA.[2]
Supported source document formats
OmegaT can translate the following formats: text files (any text format which Java can handle) encoded in a variety of encodings including Unicode, HTML/XHTML, Java properties files, StarOffice, OpenOffice.org and OpenDocument (ODF),[3] as well as DocBook files, Portable Object (PO) files and files with a "Key=Value" structure. It handles formatted documents using tagged text in a way which is similar to that of other commercial translation memory tools.
OmegaT does not offer direct support for Microsoft Office formats Word, Excel and PowerPoint. However, OpenOffice.org (and variants)[4] can be used to convert such formats to OpenDocument, that OmegaT natively supports.
The Translate Toolkit,[5] a python tool set, provides users with a number of converters to and from Portable Object, including Mozilla .properties and dtd files, CSV files, Qt .ts files, XLIFF files. It includes a number of tools to manipulate such files before or after their translation in OmegaT.
Files formats such as LaTeX, TeX, POD etc can be converted to and from Portable Object using the po4a utility.[6]
The Text Extraction Utility from the Okapi Framework has an option for creating an OmegaT project folder tree.[7] which brings even more alien formats within OmegaT's reach. Okapi is a .Net 2.0 application and requires Windows to run. It will not run on .Net free implementations that do not support .Net 2.0.
OmegaT does not officially support file formats such as WordML, ExcelML, and Latex, or standard formats used for translation, such as Trados uncleaned RTF files, TTX or XLIFF files. It is possible to translate uncleaned RTF files by tweaking segmentation rules after conversion to a supported format.
Supported memory and glossary formats
OmegaT's internal translation memory format is not visible to the user, but every time it autosaves the translation project, all new or updated translation units are automatically exported and added to three external TMX memories: a native OmegaT TMX, a level 1 TMX and a level 2 TMX.
- The native TMX file is for use in OmegaT projects.
- The level 1 TMX file preserves textual information and can be used with TMX level 1 and 2 supporting CAT tools.
- The level 2 file preserves textual information as well as inline tag information and can be used with TMX level 2 supporting CAT tools.
Exported level 2 files include OmegaT's internal tags encapsulated in TMX tags which allows such TMX files to generate matches in TMX level 2 supporting CAT tools. Tests have been positive in Trados and SDLX.
OmegaT can import TMX files up to version 1.4b level 1 as well as level 2. Level 2 files imported in OmegaT will not generate matches of the same level since OmegaT ignores the contents of the formatting encapsulated contents. OmegaT recognizes its own TMX level 2 files flawlessly which permits their use in other OmegaT projects as if they were native OmegaT TMX files. Here again, tests have been positive with TMX files created by DVX, Trados and SDLX.
For glossaries, it uses tab-delimited plain text files. The structure of a glossary file is extremely simple: the first column contains the source language word, the second column contains the corresponding target language words, the third column (optional) can contain anything including comments on context etc. Such glossaries can easily be created by exporting 3 columns spreadsheets to CSV format with the following parameters: field delimiter={tab}, word delimiter={space}.
Documentation
When OmegaT starts, a quick guide called "Instant Start" is displayed. A comprehensive User Manual, originally by Marc Prior, the project coordinator, is bundled with the OmegaT installation. Both of these have been translated into several languages by volunteers. Finally, the archived messages of OmegaT's user groups are searchable by anyone without registration.
At the time of 1.6.1_03, OmegaT is localized to the following languages:
- Full localizations:
- Albanian
- Italian (updated
- Portuguese (Brazil)
- Serbo-Croatian
- Slovak
- Slovenian
- Partial localizations:
- Belarussian
- French
- Japanese
Development and localisation
Code development is currently handled by a team lead by Maxym Mykhalchuk. Other current code contributors include Didier Briel, Sacha Chua, Kim Bruning, Thomas Huriaux, Henry Pijffers, Benjamin Siband and Martin Wunderlich. The developers respond to bug reports and requests for enhancements filed on the SourceForge development site.[8]
The OmegaT user interface and bundled documentation is translated by volunteers.[9] If you find that your language is not available in the latest release, please contact the user group[10] and do not hesitate to propose your help.
OmegaT users are encouraged to contribute tools written by themselves in response to translators' needs which are not yet addressed by the main OmegaT program itself.[11]
Related software
Tools created by OmegaT contributors
Several tools have been created by OmegaT contributors for the purpose of being used in conjunction with OmegaT. Some of which are only useful with previous versions of OmegaT.
You will find them either on the OmegaT web site[12] or on the user mailing list web space:[13]
- Benjamin Siband's OpenOffice.org segmenter macros
- Didier Briel's aligner utility
- Dmitri Gabinski's aligner utility
- Dmitri's Wordfast TMX converter
- Dmitri's language selector
- Henry Pijffers's TMX merger tool
- Henry's TMX cleaner tool
- Marc Prior's external spell-checker
- Marc's sentence segmenter tool
- Samuel Murray's two spell-checker scripts
- Samuel's complicated utility for creating Trados uncleaned files
- Sonja Tomaskovic's macro for removing internal tags from TMX files
TMX creation tools
A number of third party tools can be used to create TMX files (besides for Didier's and Dmitri's tools described above):
The OmegaT Project also proposes a java bundle properties aligner to create TMX files.[14]
CSVConverter, published by Heartsome, creates a TMX from a CSV file. It is distributed for free.[15]
Mikel Forcada and Susana Santos's aligner, bitext2tmx, can also be used in conjunction with OmegaT.[16]
The Translate Toolkit po2tmx converter can be used to create TMX memories too.[17]
See also
Forks
There is a collection of some of the above tools repackaged in different compressions schemes that also distributes a stripped down version of OmegaT® 1.4.5.[18] The distributor has not been authorized to use the name OmegaT, which is a registered trademark.
External links
- OmegaT Home - Official OmegaT web site
- Project: OmegaT - OmegaT's SourceForge project page
User groups
- omegat@yahoogroups.com - User mailing list (archives searchable without subscription)
- WeSolveIt - Sabine Cretella's discussion board.
- omegat@googlegroups.com - 日本語ユーザーリスト
References
- ^ OmegaT 1.6.1 Update 4 Release notes - Released on 2007, Feb. 03
- ^ LISA - Localization Industry Standards Association
- ^ Open Document Format for Office Applications - ISO/IEC 26300:2006 format
- ^ OpenOffice.org - A free office suite that offers conversion filters to and from most of the commonly used Microsoft Office file formats
- ^ Translate Toolkit - The Translate Toolkit is a toolkit to convert between various different translation formats (such as gettext-based .po formats, OpenOffice.org formats, and Mozilla formats).
- ^ po4a - A conversion utility to and from the Portable Object format, perl application packaged under Debian
- ^ Okapi Framework - Text Extraction utility can create an OmegaT project folder trees
- ^ OmegaT development site Filing requests for enhancements and bug reports
- ^ Localization process How to contribute a translation
- ^ OmegaT Localization Plan Finding the localisation status of a specific language
- ^ OmegaT Getting Involved - Translators are encouraged to write their own supplementary tools
- ^ OmegaT Resources - Third-party tools on the OmegaT web site
- ^ OmegaT Files - Third-party tools on the OmegaT user mailing list web space (registration required)
- ^ Java .properties Import Java application to create TMX files from existing bundle properties translations
- ^ Heartsome's free tools - CSVConverter (creates a TMX from a CSV file). Also: TBXMaker, RTFCleaner, TMXValidator (released under the Eclipse Public License), Java Properties Viewer
- ^ bitext2tmx - Aligner written in Java by Mikel Forcada and Susana Santos
- ^ toolkit:po2tmx - Convert Gettext PO files to a TMX translation memory file
- ^ fork - Translation tools collection containing the fork