PLuTO | |
Patent Language Translations Online | |
Keywords | Information Retrieval, Machine Translation, Evaluation |
---|---|
Funding Agency | European Union |
Project Type | Type B, Pilot |
Reference | 250416 |
Objective | CIP-ICT-PSP.2009.5.1 - Multilingual Web: Machine translation for the multilingual web |
Participants | CNGL, Dublin City University (Ireland) (coordinator), Information Retrieval Facility (Austria), Cross Language NV (Belgium), ESTEAM AB (Sweden), WON - WERKGEMEENSCHAP OCTROOI-INFORMATIE NEDERLAND (Netherlands) |
Budget | Total: approx. €4,36 million Euro Funding: approx. €2,18 million Euro |
Duration | 1 April 2010 - 31 March 2013 |
Web Site | http://www.pluto-patenttranslation.eu/ |
Patent Language Translations Online (PLuTO) is a commercial development project funded under the ICT Policy Support Programme, started in April 2010. During three years of transdisciplinary research and application development, the consortium addresses the increasing need for cross-language translation services due to the rising demand in the patent domain (applications, oppositions, and infringement lawsuits). The aim of the project is to provide an integrated, online translation tool that allows human experts (technical, legal, consultants) to take advantage of existing content and data-driven, adaptable, machine translation tools (MT) to collaboratively select and translate patents.
Processing patent documents - in this case translating them - is a significantly more difficult challenge than regular text (e.g. newswire or web pages) due to the nature of the language. The translation engine must learn how patent attorneys phrase their texts and provide a translation which makes sense in the other language. The same applies to the retrieval engine. The assumption regarding the distribution of terms in a text, the basis of the statistical IR is not necessarily holding in the patent domain. An example of non-standard text is the frequent use of hypernyms followed by long lists of specifications which define the invention.
PLuTO will adapt and integrate two mature Machine Translation solutions (ESTeam Translator[1], MaTrEx[2]) and one Indexing Engine (SOLR[3]). It uses the MAREC patent data collection.
Contents |
Another important objective of PLuTO is the evaluation of the machine translation and cross-lingual retrieval components developed in this projects. Within the framework of specific use-cases, information professionals will test and provide feedback. Research on cross-lingual retrieval of patents is part of the broader research fields of cross-language information retrieval and machine translation. The scientific challenge of evaluating cross-language retrieval systems is covered at the CLEF evaluation campaign[4]; cross lingual patent information retrieval is covered in the evaluation exercise CLEF-IP[5]. The evaluation of patent machine translation systems is one objective in the respective tracks at NTCIR[6] and IRF Symposium[7].