Omniscien Technologies
Privately held company | |
Industry | Localization, eCommerce, Online Research and Publishing, Online Travel, Media Enterprise and Government |
Founder | Gregory Binger, Dion Wiggins, Bob Hayward |
Headquarters | Singapore |
Number of locations | Singapore, Thailand, The Netherlands |
Key people | Andrew Rufener (CEO), Gregory Binger (COO), Dion Wiggins (CTO), Philipp Koehn (Chief Scientist) |
Products | Language Studio™ Language Processing, Machine Translation and Machine Learning Platform |
Services | Automated translation, custom machine translation engines, language processing and machine learning |
Website | http://www.omniscien.com, http://www.languagestudio.com |
Omniscien Technologies (formerly Asia Online) is a privately owned, multinational company delivering services and software for language processing, machine translation and machine learning. The company, led by CEO Andrew Rufener, was founded in 2007 by Prof. Dr. Philipp Koehn, a leading scientist in the field, Gregory Binger a technologist and IT/IP lawyer, and former Gartner senior analysts Bob Hayward and Dion Wiggins.[1] Omniscien Technologies is headquartered in Singapore, and has offices in Zoetermeer, the Netherlands (European and North American Sales as well as Technical Operations) and in Bangkok, Thailand (Asian Sales and R&D).
The company provides a range of solutions for the localization industry as well as Government, eCommerce, Online Research and Publishing, Online Travel, Media and large Enterprise customers based on statistical machine translation (SMT) and hybrid neural machine translation (NMT) technology. Omniscien Technologies currently supports in excess of 550 global language pairs in 13 industry domains.
The company's statistically and neural based translation software employ recent advances in automated translation as well as extensive data manufacturing technologies. Until the early 1990s, almost all production-level machine translation technology relied on collections of linguistic rules to analyze the source sentence, and then map the syntactic and semantic structure into the target language. Its current approach uses statistical and/or neural techniques from cryptography, applying machine learning algorithms that automatically acquire statistical models from existing parallel collections of human translations, in the same way as Google Translate and the systems made using Philipp Koehn's own open source Moses tool for SMT.
Differences from other approaches
Google, Microsoft, Baidu, KantanMT, SDL, Systran and others have also employed SMT and more recently NMT systems, some publicly accessible. However, the approaches are substantially different depending on the desired outcome. While the cloud players mainly provide "gist" translation and a few other providers largely aim to perform the same within the confines of an Enterprise, the SDL, KantanMT and Omniscien Technologies systems concentrate on providing a customized solution. The specific differences in Omniscien Technologies approaches are:
- Clean data: Omniscien Technologies focuses on clean data in contrast to the traditional approach that leverages content found on the web in corporate sites, news articles and other similar sources where the same content is available in multiple languages, but does not guarantee high quality data. To ensure that data is as clean and as accurate as possible, Omniscien Technologies has put effort into machine and human resources in this area. The company's data is sourced from high-quality translations provided by book publishers and translation companies, and is aligned at the segment level (usually sentences) and converted into a consistent format in order to be processed by the learning software. This step includes extracting segments from files and documents if they are not in a TMX format. Then the extracted sequence are aligned—and processed by machines, with humans used to validate the accuracy. The data is converted to a base UTF-8 encoding for training the SMT system, small subsets are extracted to guide training, and finally the data is reviewed, cleaned, and analyzed.
- Multiple domains: the system allows for training in many domains, by extending a base set of information with multiple additional learning sources, including tuning for specific writing style
- Real-time feedback loops and unknown term resolution
- Scalability and Control, scaling up to billions of words per day and allowing extensive control in the workflow
Languages
The company currently has more than 550 language pairs available in a baseline form and is progressively deploying 13 domains across each language pair. In addition, Omniscien Technologies offers more than 160 Industry Engines that can be used "off the shelf". Language coverage includes all major European languages, Middle Eastern and Asian languages as well as a range of African languages.
Further reading
Machine Translation, although made accessible to a large number of users by the cloud players, remains a complex domain of expertise. The following articles and webinars discuss some of the challenges:
- Comparison of machine translation applications
- Combined Spoken Language Translation by Markus Freitag, Joern Wuebker, Stephan Peitz, Hermann Ney, Matthias Huck, Alexandra Birch, Nadir Durrani, Philipp Koehn, Mohammed Mediani, Isabel Slawik, Jan Niehues, Eunah Cho, Alex Waibel, Nicola Bertoldi, Mauro Cettolo and Marcello Federico, Proceedings of the International Workshop on Spoken Language Translation (IWSLT), 2014
- The Impact of Machine Translation Quality on Human Post-editing by Philipp Koehn and Ulrich Germann, Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation, 2014
- Improving Machine Translation via Triangulation and Transliteration by Nadir Durrani and Philipp Koehn, Proceedings of 17th Annual Conference of the European Association for Machine Translation, 2014
- Omniscien Technologies' Webinar Series
References
- ↑ https://www.omniscien.com
External links
- Omniscien Technologies Homepage
- Globalization & Localization Association (GALA)
- TAUS
- MultiLingual
- The Language Industry