Computer-assisted translation

From Wikipedia, the free encyclopedia

Computer-assisted translation,computer-aided translation, or CAT is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process.

Computer-assisted translation is sometimes called machine-assisted, or machine-aided, translation.

Contents

[edit] Computer-assisted translation and machine translation

Some advanced computer-assisted translation solutions include controlled machine translation (MT). Integration of MT into computer-assisted translation has been implemented in various ways by various parties. Although this type of technology is neither widely known nor available to individual translators, carefully-customized user dictionaries based on correct terminology significantly improve the accuracy of MT, and as a result, they improve the efficiency of translation process.

[edit] Overview

Computer-assisted translation is a broad and imprecise term covering a range of tools, from the fairly simple to the more complicated. These can include:

  • Spell checkers, either built into word processing software, or add-on programs;
  • Grammar checkers, again either built into word processing software, or add-on programs;
  • terminology managers, allowing the translator to manage his own terminology bank in an electronic form. This can range from a simple table created in the translator's word processing software or spreadsheet, a database created in a program such as FileMaker Pro or, for more robust (and more expensive) solutions, specialized software packages such as LogiTerm, MultiTerm, Termex, etc.
  • Dictionaries on CD-ROM, either unilingual or bilingual
  • Terminology databases, either on CD-ROM or accessible through the Internet, (such as The Open Terminology Forum, TERMIUM or Grand dictionnaire terminologique from the Office québécois de la langue française)
  • Full-text search tools (or indexers), which allow the user to query already translated texts or reference documents of various kinds. In the translation industry one finds such indexers as Naturel, ISYS Search Software and dtSearch.
  • Concordancers, which are programs that retrieve instances of a word or an expression and their respective context in a monolingual, bilingual or multiligual corpus, such as a bitext or a translation memory.
  • Bitexts, a fairly recent development, the result of merging a source text and its translation, which can then be analyzed using a full-text search tool or a concordancer.
  • Project management software that allows linguists to structure complex translation projects, assign the various tasks to different people, and track the progress of each of these tasks.
  • Translation memory managers (TMM), tools consisting of a database of text segments in a source language and their translations in one or more target languages.
  • Systems that are nearly automatic as in machine translation, but allow user decisions for ambiguous cases. These are sometimes called human-aided machine translation.

[edit] Translation memory software

Translation memory (TM) programs store previously translated source texts and their equivalent target texts in a database and retrieve related segments during the translation of new texts.

Such programs split the source text into manageable units known as "segments." A source-text sentence or sentence-like unit (headings, titles or elements in a list) may be considered a segment, or texts may be segmented into larger units such as paragraphs or small ones, such as clauses. As the translator works through a document, the software displays each source segment in turn and provides a previous translation for re-use, if the program finds a matching source segment in its database. If it does not, the program allows the translator to enter a translation for the new segment. After the translation for a segment is completed, the program stores the new translation and moves onto the next segment. In the dominant paradigm, the translation memory, in principle, is a simple database of fields containing the source language segment, the translation of the segment, and other information such as segment creation date, last access, translator name, and so on. Another translation memory approach does not involve the creation of a database, relying on aligned reference documents instead (e.g. Star Transit).

Some translation memory programs function as standalone environments, while others function as an add-on or macro to commercially available word-processing or other business software programs. Add-on programs allow source documents from other formats, such as desktop publishing files, spreadsheets, or HTML code, to be handled using the TM program.

[edit] Language Search Engine Software

New to the translation industry, Language Search Engine software is typically an Internet based system that works similarly to Internet search engines. Rather than searching the Internet, however, a language search engine searches a large repository of Translation Memories to find previously translated sentence fragments, phrases, whole sentences, even complete paragraphs that match source document segments. It leverages more from translation memories than traditional translation memory software.

Language search engines are designed to leverage modern search technology to conduct searches based on the source words in context to ensure that the search results match the meaning of the source segments. Like traditional TM tools, the value of a language search engine rests heavily on the Translation Memory repository it searches against.

[edit] Terminology management software

Terminology management software provides the translator a means of automatically searching a given terminology database for terms appearing in a document, either by automatically displaying terms in the translation memory software interface window or through the use of hot keys to view the entry in the terminology database. Some programs have other hotkey combinations allowing the translator to add new terminology pairs to the terminology database on the fly during translation. Some of the more advanced systems enable translators to check, either interactively or in batch mode, if the correct source/target term combination has been used within and across the translation memory segments in a given project.

[edit] Alignment software

Alignment programs take completed translations, divide both source and target texts into segments, and attempt to determine which segments belong together in order to build a translation memory database with the content. Many alignment programs allow translators to manually realign mismatched segments. The resulting translation memory file can then be imported into a translation memory program for future translations.

[edit] Comparison of different CAT tools

(Alphabetical order, free software first, proprietary solutions second.)

Tool Supported File Formats OS Price License
Anaphraseus ODT, all OpenOffice Writer formats (DOC, TXT etc.) Multiplatform (StarBasic macro) GPL
gtranslator PO POSIX GPL
Okapi Framework PO, Windows RC, TMX, Wordfast, Trados, Java Properties, Regular-expression-based text, Illustrator, INX, ResX, Table-type files, XML Windows (.NET) LGPL
OmegaT HTML, XHTML, DocBook, Plain Text, PO, JavaHelp, Java Resource Bundles, OpenDocument (ODF), OpenOffice, StarOffice, Office Open XML, HTML Help Compiler (HCC), INI files Multiplatform (Java) GPL
OmegaT+ HTML, XHTML, Plain Text, Java Resource Bundles, OpenDocument (ODF), OpenOffice, StarOffice. Multiplatform (Java) GPL
Open Language Tools HTML/XHTML, XML, DocBook SGML, ASCII, StarOffice/OpenOffice/ODF, .po (gettext), .properties, .java (ResourceBundle), .msg/.tmsg (catgets) Multiplatform (Java) CDDL
Poedit Gettext PO Multiplatform Free MIT license
Pootle Gettext PO, XLIFF, OpenOffice GSI files (.sdf), TMX, TBX, Java Properties, DTD, CSV, HTML, XHTML, Plain Text Multiplatform (Python) GPL
Transolution HTML, StarOffice/OpenOffice,
XLIFF, DOCBOOK
Multiplatform (Python) GPL
across MS Word, MS Excel, MS PowerPoint, HTML, XML, RTF, Plain Text (TXT), EXE, RC, DLL, QuarkXPress, Adobe FrameMaker (MIF), Adobe InDesign, MSI, INI, OCX, SCR, CPL, NLS, RESX Windows, web application Freelance version: free, corporate licenses on inquiry Proprietary
AidTransStudio OpenOffice,MS Excel, MS PowerPoint, MS Word, MS Word Xml, HTML, ASP, PHP, ASPX, Plain Text, XML, Trados TTX, TMX, Custom Format (config based on Regular Expressions) Windows (.NET) Basic Edition: Free, Pro and Ent See price list Proprietary
Araya HTML, XML, plain text, RTF, TMX, XLIFF Multiplatform (Java) 400 euro / Server 6500 euro Proprietary
Cafetran HTML, XML,
OpenOffice, AbiWord, Kword, MS Word
Multiplatform (Java) 180 euro
CatsCradle HTML, CSV, Help contents and index files (.hhc, .hhk) Windows 60 euro
Déjà Vu (DVX) XML, Plain Text, OpenOffice, Adobe FrameMaker, Adobe PageMaker, ASP, Interleaf/Quicksilver, InDesign, Help Content, SGML, MS Access, MS Excel, MS PowerPoint, MS Word, QuarkXPress, RTF, Resource files, C/C++/Java source files, Java Properties, JavaScript, VBScript, GNU gettext Windows Standard: 490 euro, Pro: 990 euro, Workgroup: 1490 euro Proprietary
Felix MS Word, MS Excel, MS PowerPoint (for Windows); HTML Windows $US 350 Proprietary
Heartsome Translation Suite HTML/XHTML, XML, Plain Text, OpenOffice, StarOffice, AbiWord, PO/POT (GNU Gettext), SVG, Adobe FrameMaker (MIF), Adobe InDesign, DocBook, DITA, Java Properties, JavaScript, RTF, Tagged RTF, Trados TTX, MS Office 2003 XML, ResX (Windows .NET Resources), RC (Windows C/C++ Resources), MS Office 2007 (beta) Multiplatform (Java) See price list. Proprietary
Lingotek Language Search Engine HTML, XHTML, XML, MS Word, MS Excel, MS Powerpoint, OpenOffice, OpenDocument (ODF), OpenDocument Text (.odt), OpenDocument Spreadsheet (.ods), OpenDocument Presentation (.odp), Adobe FrameMaker (.mif), Microsoft Resource (.rc), Rich Text Format (.rtf), Plain Text (.txt), Java Properties (.properties), Gettext PO, StarOffice, TMX, XLIFF, TTX Web application Free web access Proprietary
LogiTerm  ?  ?  ? ?
MemoQ HTML, plain text, All MS Office 2000/XP/2003 formats (doc, xls, ppt), RTF, bilingual RTF (Trados compatible), Trados TTX, Adobe FrameMaker, Adobe Indesign CS & CS2, proprietary bilingual format (MBD), XML, TMX, CSV, TSV Windows 4Free: Freeware
Translator Pro: 390 euro
LSP 5: 1490 euro
Proprietary
MetaTexis HTML, XML, Resource files
MS Word (all kinds of text files that can be imported by MS Word), MS Excel, MS PowerPoint, Adobe FrameMaker, Adobe PageMaker, QuarkXPress
Windows Lite: for free
Pro: 98 euro
NET/Office: 138 euro
Proprietary
MultiCorpora MultiTrans HTML, XML, MS Word, MS Excel, MS Powerpoint, WordPerfect, QuarkXPress, Adobe FrameMaker (MIF), Adobe InDesign Windows Proprietary
MultiLing Fortis Translation Suite TMX, Word, HTML, Framemaker, InDesign, QuarkXpress, and more. Windows  ?
ppp.helper MS PowerPoint Windows 39 euro Proprietary
Rainbow HTML, XHTML, Scripts,
Photoshop, etc.
Windows (.NET) Freeware Proprietary
SDL Trados 2007 Features 3 translation environments: dedicated TagEditor, MSWord Interface, SDLX. Additional filters for translating with TagEditor available: Word, Excel, PowerPoint, OpenOffice, InDesign, QuarkXPress, PageMaker, Interleaf, Framemaker, HTML, SGML, XML, SVG, .... Includes SDL MultiTerm for terminology management and Project Management Dashboard for automating tasks and tracking. Windows New Freelance version, approx. 200-900 euro Proprietary
SEER English Spanish Translator MS Office Word, Excel, PowerPoint, Plain Text, HTML Windows $US 299 Proprietary
Similis HTML, PDF,
Word, Trados
Windows 295 euro (monoposte) Proprietary
STAR Transit Text ANSI / ASCII / Unicode for Windows, Text for Apple Macintosh, Corel WordPerfect, HTML, XML (ASP.NET, ASP, JSP, XSL), SGML, SVG (Scalable Vector Graphics), MS Word for Windows, MS Excel, MS PowerPoint, RTF y RTF for WinHelp, RC, QuarkXPress, Adobe FrameMaker, Adobe PageMaker, Interleaf /Quicksilver, Adobe InDesign, XGate para QuarkXPress, AutoCAD Windows Proprietary
Swordfish XLIFF, HTML/XHTML, XML, Plain Text, OpenOffice, StarOffice, AbiWord, PO/POT (GNU Gettext), SVG, Adobe FrameMaker (MIF), Adobe InDesign (INX), DocBook, DITA, Java Properties, JavaScript, RTF, Trados Tagged RTF, Trados TTX, MS Office 2003 XML, ResX (Windows .NET Resources), RC (Windows C/C++ Resources), MS Office 2007 Multiplatform Windows - Linux - Mac OS X (Java) 200 euro Proprietary
Termbases  ? Web application Free Proprietary
Tr-aid  ?  ?  ?  ?
TransSearch Any text copied on the web form Web application 129,95 $CAN Proprietary
Translation Search Engine Any format that can be copied and pasted onto a web page, TMX Web application Free Web Access Proprietary
Wordfast MS Office Word, Excel, PowerPoint (for Windows and Mac); tagged documents Microsoft Office Word addin 250/125 euro Proprietary
WordFisher MS Word WordBasic\Ms Office Word macro Free Licence
File Formats OS Price License

[edit] See also

Wikibooks
Wikibooks has a book on the topic of

[edit] External links

[edit] Computer-assisted translation tools indexes