Caitra is a translation tool developed by the University of Edinburgh. This Computer Assisted Tool, or CAT, is provided from an online platform and is based on the AJAX Web.2 technologies and the Moses decoder. The web page of this tool is implemented with Ruby on Rails, an open source web framework, and C++.
Caitra helps human translators by offering suggestions and alternative translations, simplifying the process.
Contents |
Machine Translation (MT) systems are typically used by readers who do not need a quality translation and want fast access to the foreign language. Professional translators need more advanced machine translation tools to make their work easier and to give a high-quality translation to their clients. The Trans-Type project (Langlais et al., 2000) gave a pioneer approach to the MT as a help to human translators. The translation tool would suggest different translations for a segment and the translator may accept them or overwrite their own translation, which triggers new possible translations to the tool. This is a big development but is not necessarily suitable for professional translators. Tools with post-edition facilities have also been developed as an intermediate field between typical MT and human translators in order to integrate MT and human translation and to achiever the desired results. The School of Informatics and the Machine Translation Group of the University of Edinburgh has created a research program, CAITRA, to analyze the benefits of different types of MTs and to explore the interaction between the machine and the user in order to develop new CAT tools.
Caitra is programmed with an open-source web framework, Ruby on Rails (Thomasand Hansson, 2008). The online platform uses Ajax-style Web 2.0 technologies (Raymond, 2007) connected to a MySQL database-driven back-end. The machine translation back-end is powered by the statistical sentence-based MT, Moses (Koehn et al., 2007). C++ programming language is used to improve the speed of the translation suggestions The tool is provided online in order to make a wide research about this type of MT and obtain an advanced study of the user’s interaction with the tool. Moreover, the online feature allows the translation community to access to tool and know their opinions.
A simple text box is the link between the user and the tool. Caitra processes the text which is typed in the box by clicking the "Upload" icon. The process may last a few minutes, and Caitra will find different options for the translation, one of them is taken by default. Once the process is finished, translators have multiple options of assistance, presented in an interface. The segment for translation is the sentence and so Caitra works with only one sentence at the same time.
The Trans-Type project (Langlais et al., 2000) has done a deep investigation about Interactive Machine Translation, consisting of sentence-segment translation aided by a CAT tool, which suggests several different options for the translation. The human translators may choose one of them or typing their own translation if they do not like the offered translations. This process is similar to the auto-completion which is used in a lot of office programs.
The statistical translation system is followed to generate the predictions for translation. These predictions are provided in short phrases, according to the statistical phrase-based translation model. In addition, this model helps the user not to overload their sight, by using a few words at time. University of Edinburgh is still investigating the proper length for these suggestions but it has not been developed yet. At the moment, short phrases are used and they are more useful and not distractive for the users. The suggestions and the user actions are stored in a large data base. During the user interaction, Caitra quickly matches user input against the graph using a string edit distance measure. The prediction is the optimal completion path that matches the user input with (a) minimal string edit distance and (b) highest sentence translation probability. This computation takes place at the server and is implemented in C++, as Philipp Koehn explains.[1] Once the user accepts a suggestion, a new one is displayed as well the typing of a new segment. This process is very fast, it lasts less than a second. The acceptance of suggestions depends on the pair of languages and the difficulty of the text. Preliminary studies about CAITRA suggest that users usually accept 50-80% of predictions generated by the system.
Once the text is uploaded and after a few minutes wait, users can see the result of the machine translation and edit the text based on the predictions. The prediction table is displayed by clicking the edit icon. The text is divided into sentences which are also divided into smaller units. Predictions for these units appear in a box, and the most likely suggestion has a different colour in the highest part of the table. Predictions are accepted by clicking on them and the system updates the election to the user input. The database is made of amounts of pairs of translated texts and translations. The most likely prediction is the result of previous matches in the database. The users choices are scored in the database to be used in future translations. These predictions help not only professional translators, but also novice translators who do not know the vocabulary and people without knowledge of the foreign language.
Users can review their translation and make any change to correct possible mistakes. The changes appear in the output display.
Caitra stored in the data base the time users need to accept a prediction or writing their own translation. The actions have different importance for the future predictions depending on the user's actions and in the time they need to perform their translation. Every action, pause or movement is relevant in order to improve future translations.