Patent Language Translations Online

PLuTO
Patent Language Translations Online
Keywords Information Retrieval, Machine Translation, Evaluation
Funding Agency European Union
Project Type Type B, Pilot
Reference 250416
Objective CIP-ICT-PSP.2009.5.1 - Multilingual Web: Machine translation for the multilingual web
Participants CNGL, Dublin City University (Ireland) (coordinator),

Information Retrieval Facility (Austria), Cross Language NV (Belgium), ESTEAM AB (Sweden), WON - WERKGEMEENSCHAP OCTROOI-INFORMATIE NEDERLAND (Netherlands)

Budget Total: approx. €4,36 million Euro

Funding: approx. €2,18 million Euro

Duration 1 April 2010 - 31 March 2013
Web Site http://www.pluto-patenttranslation.eu/

Patent Language Translations Online (PLuTO) is a commercial development project funded under the ICT Policy Support Programme, started in April 2010. During three years of transdisciplinary research and application development, the consortium addresses the increasing need for cross-language translation services due to the rising demand in the patent domain (applications, oppositions, and infringement lawsuits). The aim of the project is to provide an integrated, online translation tool that allows human experts (technical, legal, consultants) to take advantage of existing content and data-driven, adaptable, machine translation tools (MT) to collaboratively select and translate patents.

Processing patent documents - in this case translating them - is a significantly more difficult challenge than regular text (e.g. newswire or web pages) due to the nature of the language. The translation engine must learn how patent attorneys phrase their texts and provide a translation which makes sense in the other language. The same applies to the retrieval engine. The assumption regarding the distribution of terms in a text, the basis of the statistical IR is not necessarily holding in the patent domain. An example of non-standard text is the frequent use of hypernyms followed by long lists of specifications which define the invention.

PLuTO will adapt and integrate two mature Machine Translation solutions (ESTeam Translator[1], MaTrEx[2]) and one Indexing Engine (SOLR[3]). It uses the MAREC patent data collection.

Contents

Evaluation

Another important objective of PLuTO is the evaluation of the machine translation and cross-lingual retrieval components developed in this projects. Within the framework of specific use-cases, information professionals will test and provide feedback. Research on cross-lingual retrieval of patents is part of the broader research fields of cross-language information retrieval and machine translation. The scientific challenge of evaluating cross-language retrieval systems is covered at the CLEF evaluation campaign[4]; cross lingual patent information retrieval is covered in the evaluation exercise CLEF-IP[5]. The evaluation of patent machine translation systems is one objective in the respective tracks at NTCIR[6] and IRF Symposium[7].

References

Further reading

External links