Eurotra

From Wikipedia, the free encyclopedia

Eurotra was an ambitious machine translation project established and funded by the European Commission from the late 1970s until 1994.

Emboldened by modest success with an older, commercially-developed machine translation system SYSTRAN, a large network of European computational linguists embarked upon the Eurotra project with the hope of creating a state-of-the-art MT system for the then seven, later nine, official languages of the European Community.

However, as time passed, expectations became tempered; "Fully Automatic High Quality Translation" was not a reasonably attainable goal. The true character of Eurotra was eventually acknowledged to be in fact pre-competitive research rather than prototype development.

The project was motivated by one of the founding principles of the EU: that all citizens had the right to read any and all proceedings of the Commission in their own language. As more countries joined, this produced a combinatorial explosion in the number of language pairs involved, and the need to translate every paper, speech and even set of meeting minutes produced by the EU into the other eight languages meant that translation rapidly became the overwhelming component in the administrative budget. To solve this problem Eurotra was devised.

The project was unusual in that rather than consisting of a single research team, it had member groups of six to twelve distributed around the member countries, with at least one in each one (Belgium, Greece and the United Kingdom each had two, and there was an additional secretariat based at the European Commission in Brussels. While this contributed significantly to the culture of the project, it also demonstrated graphically Brooks' assertion in The Mythical Man-Month that adding personnel to a project results in it taking longer to complete; the more the number of groups involved, the more time is spent on administration and communication rather than actual research per se.

The actual design of the project was unusual as MT projects go. Older systems, such as SYSTRAN, were heavily dictionary-based, with minor support for rearranging word order. More recent systems have often worked on a probabilistic approach, based on its source corpora. Eurotra addressed the constituent structure of the text to be translated, going through first a syntactic parse followed by a second parse to produce a dependency structure followed by a final parse with a third grammar to produce what was referred internally as Intermediate Representation (IR). Since all three modules were implemented as Prolog programs, it would then in principle be possible to put this structure backwards through the corresponding modules for another language to produce a translated text in any of the other languages. However, it is unknown whether this was in fact possible.

The first "live" translation occupied a 4Mb Microvax running Ultrix and C-Prolog for a complete weekend some time in early 1987. The sentence, translated from English into Danish, was "Japan makes computers". The main problem faced by the system was the generation of so-called "Parse Forests" - often a large number of different grammar rules could be applied to any particular phrase, producing hundreds, even thousands of (often identical) parse trees. This used up huge quantities of computer store, slowing the whole process down unnecessarily.

While Eurotra never delivered a "working" MT system, the project made a far-reaching long-term impact on the nascent language industries in European member states, in particular among the southern countries of Greece, Italy, Spain, and Portugal. There is at least one commercial MT system (developed by an academic/commercial consortium in Denmark) derived from Eurotra technology.