Language Weaver

From Wikipedia, the free encyclopedia

Language Weaver is a Los Angeles, California–based company that was founded in 2002 by the University of Southern California's Kevin Knight and Daniel Marcu, to commercialize a statistical approach to automatic |language translation and natural language processing - now known globally as statistical machine translation software (SMTS).

Language Weaver’s statistically-based translation software is an instance of a recent advance in automated translation. While earlier machine translation technology relied on collections of linguistic rules to analyze the source sentence, and then map the syntactic and semantic structure into the target language, Language Weaver uses statistical techniques from cryptography, applying machine learning algorithms that automatically acquire statistical models from existing parallel collections of human translations. These models are more likely to be up to date, appropriate and idiomatic, because they are learned directly from real translations. The software can also be quickly customized to any subject area or style and do a full translation of previously unseen text.

Statistical MT was once thought appropriate only for languages with very large amounts of pre-translated data. However, with new advances in SMT, Language Weaver has been able to also create translation systems for languages smaller amounts of parallel data. Additionally, with customization, SMT can also "learn" to accurately translate highly technical material.

Language Weaver's primary product is their translation software. They currently offer 24 bi-directional language pairs—these include English to and from French, Italian, Danish, Greek, Spanish, German, Dutch, Portuguese, Swedish, Russian, Czech, Romanian, Polish, Arabic, Persian, Simplified and Traditional Chinese, Korean, and Hindi. Several non-English language pairs are also available, such as Arabic-Spanish, Arabic-French, Spanish-French and French-German.

The current language pairs all utilize phrase-based statistical MT. However, the company is also working on syntax-based statistical MT for certain language pairs to improve the overall translation quality.

Language Weaver can also create customized (domain specific) language pairs for particular companies. They uses a customer's existing, pre-translated data to "train" a new translation system that statistically understands how to translate that customers information so new data can be traslated in a shorter amount of time and edited as needed prior to publication.

As well as their primary translation software, Language Weaver has several other products available. Their Alignment Tool is a translation memory generator. This allows users to enter previously translated documents, and align them at the segment level, producing a translation memory file. The company also has Customizer, a customization tool. This product allows users to fine-tune the translation system using small amounts (up to 2 million words) of pre-translated data in a specific subject area. This tool allows for incremental improvements over time and gives users more control of the process.

[edit] See also

[edit] External links

Languages