OmegaT
From Wikipedia, the free encyclopedia
OmegaT | |
---|---|
OmegaT under Mac OS X |
|
Design by | Keith Godfrey |
Developed by | Didier Briel, Zoltan Bartko, Tiago Saboga, etc... |
Initial release | November 28, 2002 |
Latest release | 1.7.3_01 / February 17, 2008 |
Preview release | 1.8 / March 2, 2008 |
OS | Cross-platform |
Genre | Computer-assisted translation |
License | GPL |
Website | www.omegat.org |
OmegaT is a computer-assisted translation tool written in the Java programming language. It is free software originally developed by Keith Godfrey in 2000, and is currently developed by a team led by Didier Briel.
OmegaT is intended for professional translators. Some of its features include user-customisable segmentation using regular expressions, translation memory, fuzzy matching, match propagation, glossary matching, context search in translation memories and keyword search in reference materials.
It requires Java 1.4, which is available for Linux, Mac OS X and Microsoft Windows 98 or higher.
The name OmegaT (the name registered on the SourceForge site) is now a registered trademark. Currently, only the software released by the OmegaT Project is allowed to use the name OmegaT[1].
Contents |
[edit] History
OmegaT was first developed by Keith Godfrey in 2000. The original engine was written in C++, but the first public release in February 2001 was written in Java.
The first Java version used a proprietary translation memory format, and required Java 1.3 to run. It offered support for StarOffice documents, so-called plain text and Unicode text, and HTML, and could do only block-level segmentation (which for most practical purposes meant paragraph segmentation).
From version 1.4.4 to version 1.6.0, development was led by Maxym Mykhalchuk. Henry Pijffers took over and was release manager until 1.7.1. The current release manager is Didier Briel.
OmegaT requires Java 1.4 or better, and uses a number of LGPL libraries.
[edit] Development
Code development is currently handled by a team led by Didier Briel. Current code contributors include Zoltan Bartko, Didier Briel, Kim Bruning, Henry Pijffers, Tiago Saboga, and a few others. The developers respond to bug reports and requests for enhancements filed on the SourceForge development site.[2]
[edit] Releases
- A stable version (currently 1.7.3 update 1) is released with a stable features set and an up to date manual. Minor updates to the stable version include bug fixes and eventually new localizations.
- Development then proceeds by putting new code into the code repository so that testers can verify its stability and usability.
- Once this code is considered stable, a preview (test) version is released (currently 1.8). The preview version includes new code but the manual is in the state of the stable version. Eventually new localizations of the old release will be included.
- Once the manual has been updated with the preview version features, the stable version is released to begin a new release cycle.
The current stable version, 1.7.3_1,[3] has features such as flexible segmentation rules which makes sentence segmentation possible, much improved TMX support (up to 1.4b level2), regular expressions based searches, DocBook support, and a new graphical user interface. Compared to the previous 1.6, the new version 1.7 contains 18 functional enhancements, including direct support for Office 2007 XML, and 36 bug fixes. The manual has been completely revised, and is already available in seven languages in addition to English. The program is available in 27 languages.
The current preview version, 1.8, [4] contains 13 functional enhancements, such as spell checker. Using Hunspell, the same dictionary engine as in OpenOffice.org and Firefox, it allows spell checking in more than 80 languages. .
[edit] Workflow in OmegaT
The user places source documents, existing translation memories and any glossaries in specified subfolders of a translation project. When a project is "opened", OmegaT extracts the translatable text from all recognised documents. As the translator translates each segment, OmegaT adds the translation units to a translation memory. Finally, OmegaT creates the target documents by merging the translation memory with the source documents.
During translation, fuzzy matches from the translation memory and matches from the glossaries for the current segment are displayed in the adjacent Match/Glossary window. Fuzzy matches are inserted by the translator using keyboard shortcuts. Fuzzy matches above a user-determined threshold can optionally be inserted automatically.
The translator can switch to a different document in the same project at any time using the Project Files viewer, or to a different segment in the same file using keyboard shortcuts or by double-clicking the appropriate segment.
Whenever additional source documents, translation memories or glossaries are added to the project, or when manual changes are made to those files, the translator must "reload" the project, so that OmegaT recognises the newly added segments. The project must also be reloaded when changes to the segmentation rules are made in mid-translation.
[edit] Collaboration between translators
Translators using different computer assisted translation tools can only share their translation memories if (a) either or both their respective programs can import and/or export the other program's proprietary format or (b) both programs can import and export an intermediary format. OmegaT does the latter. It can import and export the industry standard intermediary format TMX (Translation Memory eXchange).
OmegaT's glossary files are tab-delimited plain text files with the source term in the first column and the target text in the second column, a third column can be used for anything (e.g. user comments. Additional columns are ignored by OmegaT. OmegaT does not support the industry standard glossary format TBX proposed by LISA.[5]
[edit] Supported source document formats
OmegaT can directly translate the following formats:
- text files (any text format which Java can handle) encoded in a variety of encodings including Unicode,
- HTML/XHTML,
- Java properties files,
- StarOffice, OpenOffice.org and OpenDocument (ODF),[6]
- Office Open XML
- XLIFF files
- as well as DocBook files,
- Portable Object (PO) files
- and files with a "Key=Value" structure.
It handles formatted documents using tagged text in a way which is similar to that of other commercial translation memory tools.
Currently, OmegaT does not directly support file formats such as WordML, ExcelML, and Latex, or formats such as Trados uncleaned RTF files or TTX files. It is possible to translate uncleaned RTF files by tweaking segmentation rules after conversion to a supported format.
Unsupported file formats can be accessed in OmegaT by using complementary tools such as:
[edit] OpenOffice.org
OmegaT does not offer direct support for Microsoft Office formats Word, Excel and PowerPoint. However, OpenOffice.org (and variants)[7] can be used to convert such formats to OpenDocument, that OmegaT natively supports. But the current version (1.7.3) directly supports MS Office 2007 file formats.
[edit] Translate Toolkit
The Translate Toolkit, a python tool set, provides users with a number of converters to and from Portable Object, including Mozilla .properties and dtd files, CSV files, Qt .ts files, XLIFF files. It includes a number of tools to manipulate such files before or after their translation in OmegaT.
[edit] po4a
Files formats such as LaTeX, TeX, POD etc can be converted to and from Portable Object using the po4a utility.[8]
[edit] Okapi Framework
The Text Extraction Utility from the Okapi Framework has an option for creating an OmegaT project folder tree.[9] which brings even more alien formats within OmegaT's reach. Okapi is a .Net 2.0 application and requires Windows to run. It will not run on .Net free implementations that do not support .Net 2.0. It is also possible to create an OmegaT specific XLIFF file supported in the current preview version (1.7.2).
[edit] Supported memory and glossary formats
OmegaT's internal translation memory format is not visible to the user, but every time it autosaves the translation project, all new or updated translation units are automatically exported and added to three external TMX memories: a native OmegaT TMX, a level 1 TMX and a level 2 TMX.
- The native TMX file is for use in OmegaT projects.
- The level 1 TMX file preserves textual information and can be used with TMX level 1 and 2 supporting CAT tools.
- The level 2 file preserves textual information as well as inline tag information and can be used with TMX level 2 supporting CAT tools.
Exported level 2 files include OmegaT's internal tags encapsulated in TMX tags which allows such TMX files to generate matches in TMX level 2 supporting CAT tools. Tests have been positive in Trados and SDLX.
OmegaT can import TMX files up to version 1.4b level 1 as well as level 2. Level 2 files imported in OmegaT will not generate matches of the same level since OmegaT ignores the contents of the formatting encapsulated contents. OmegaT recognizes its own TMX level 2 files flawlessly which permits their use in other OmegaT projects as if they were native OmegaT TMX files. Here again, tests have been positive with TMX files created by DVX, Trados and SDLX.
For glossaries, it uses tab-delimited plain text files. The structure of a glossary file is extremely simple: the first column contains the source language word, the second column contains the corresponding target language words, the third column (optional) can contain anything including comments on context etc. Such glossaries can easily be created by exporting 3 columns spreadsheets to CSV format with the following parameters: field delimiter={tab}, word delimiter={space}.
[edit] Documentation
When OmegaT starts, a quick guide called "Instant Start" is displayed. A comprehensive User Manual is bundled with the OmegaT installation. Both of these have been translated into several languages by volunteers. Finally, the archived messages of OmegaT's user groups are searchable by anyone without registration.
[edit] Localizations
The OmegaT user interface and bundled documentation is translated by volunteers.[10]
The current stable version (1.7.3 update 1) is localized to the following languages:
- Full localizations (include the GUI and the manual among other documents):
- Basque
- Catalan
- Dutch
- Hungarian
- Russian
- Serbo-Croatian
- Slovenian
- Partial localizations (include the GUI and the tutorial, sometimes the 1.6 series manual):
- Albanian
- Belarusian
- Czech
- Danish
- Esperanto
- French
- German
- Greek (GUI only)
- Japanese
- Italian
- Polish
- Portuguese (Brazil)
- Simplified Chinese
- Slovak
- Spanish
- Traditional Chinese
- Turkish
- Ukrainian
For a total of 27 languages.
[edit] The OmegaT Project
The OmegaT Project is also a sort of "computer literacy" group that focus on translators' needs.
OmegaT users are encouraged to contribute tools written by themselves in response to translators' needs which are not yet addressed by the main OmegaT program itself.[11]
[edit] Related software
[edit] Tools created by OmegaT contributors
Several tools have been created by OmegaT contributors for the purpose of being used in conjunction with OmegaT. Some of which are only useful with previous versions of OmegaT.
You will find them either on the OmegaT web site[12] or on the user mailing list web space:[13]
- Benjamin Siband's OpenOffice.org segmenter macros
- Didier Briel's aligner utility
- Dmitri Gabinski's aligner utility
- Dmitri's Wordfast TMX converter
- Dmitri's language selector
- Henry Pijffers's TMX merger tool
- Henry's TMX cleaner tool
- Marc Prior's external spell-checker
- Marc's sentence segmenter tool
- Sonja Tomaskovic's macro for removing internal tags from TMX files
- Samuel Murray's collection of scripts
[edit] See also
[edit] External links
- OmegaT Home - Official OmegaT web site
- Project: OmegaT - OmegaT's SourceForge project page
[edit] User group
- omegat@yahoogroups.com - User mailing list (archives searchable without subscription)
[edit] References
- ^ OmegaT registered trademark use policy
- ^ OmegaT development site Filing requests for enhancements and bug reports
- ^ OmegaT stable version 1.7.3 update 1 has been released - Released on 2008, Feb. 17
- ^ OmegaT test version 1.8 released - Released on 2008. March 02
- ^ LISA - Localization Industry Standards Association
- ^ Open Document Format for Office Applications - ISO/IEC 26300:2006 format
- ^ OpenOffice.org - A free office suite that offers conversion filters to and from most of the commonly used Microsoft Office file formats
- ^ po4a - A conversion utility to and from the Portable Object format, perl application packaged under Debian
- ^ Okapi Framework - Text Extraction utility can create an OmegaT project folder trees
- ^ Localization process How to contribute a translation
- ^ OmegaT Getting Involved - Translators are encouraged to write their own supplementary tools
- ^ OmegaT Resources - Third-party tools on the OmegaT web site
- ^ OmegaT Files - Third-party tools on the OmegaT user mailing list web space (registration required)