Talk:Machine translation

From Wikipedia, the free encyclopedia

If you guys don't mind, I'm going to make some changes to the history section of this article some time in the next week. I'm going to add some material about the work of David G. Hays. He led the MT effort at RAND back in the 50s and 60s, was one of the authors of the ALPAC report, wrote the first textbook in computational linguistics, was instrumental in founding the Association for Computational Linguistics and so forth. I'll be adding an article about Hays himself, then I'll link it to this article, and, as I said, make a some changes to the history section. Bill 21:12, 16 March 2006 (UTC)


What is all this new stuff in the last two weeks? Most of it has little or no value and none of it makes a hint of sense in an encyclopedia. A lot of it is copyright violation, and even if it wasn't it still doesn't belong here in that form. Deleted most all of it, couldn't figure out exactly which version to revert to. Diderot 19:49, 6 May 2004 (UTC)

Contents

[edit] Fact tagged

  • The PaTrans system requires both manual pre- and post-editing, but the monthly output is still approximately 400,000 words per operator.[citation needed]

[edit] Suggestion

Add "Speech Recognition" and "Computational Linguistics" Wikimedia links under "See Also"

Create pages for each of the following organizations (with external links) under "See Also" and linked to "Machine Translation" page.

International Association for Machine Translation (IAMT) http://www.isi.edu/natural-language/organizations/IAMT-bylaws.html

Association for Machine Translation in the Americas (AMTA) http://www.amtaweb.org/

European Association for Machine Translation (EAMT) http://www.eamt.org/

Asia-Pacific Association for Machine Translation (APAMT) http://www.aamt.info/

Association for Computational Linguistics http://www.aclweb.org/

under:See also

I added:

You may wish to remove it, but it seemed to need adressing, the whole catagory of language translators available freely and easilly on the web, without need of so much as a download Cross-Translation Tool comparing SYSTRAN powered services (Babel Fish, Google Language, etc) with other translation services My thanks for the indulgence is sincere



[edit] Examples

If no one objects, I'm going to make the Examples into a separate page, maybe Examples of Machine Translation. In my opinion they clutter up the page a bit too much. - FrancisTyers 17:02, 9 October 2005 (UTC)

Moved to Examples of SYSTRAN machine translation. - FrancisTyers 17:03, 16 October 2005 (UTC)


[edit] History

does anyone else think this is a joke?

Although there is no system that provides the holy-grail of "Fully automatic high quality machine translation" (FAHQMT), many systems provide reasonable output.

FAHQ MT? Who came up with this acronym? Isn't there a less dramatic and confusing way to summarize that machine translation isn't as high quality as human translation, but produces practical results? - enjone

[edit] Language Weaver

Language Weaver, developed in 2005, already translates general text from Spanish to English with a high degree of accuracy, rivaling that of weaker human translators.

I'd love to see a source for this. What metrics they used for the evaluation etc. I don't doubt that LW is a damned fine product (it has Kevin Knight working on it!) :), but for such a large claim we'd need some source. - FrancisTyers 09:11, 1 March 2006 (UTC)

[edit] Commercial software

Edited on 20:03, 20 February 2006 FrancisTyers (→Commercial software - google trans is just repackaged systran)

Not anymore. The one I quoted: http://translate.google.com/translate_t was developed by Franz Och in Google Labs: http://googleblog.blogspot.com/2005/08/machines-do-translating.html

Galilite, 1 March 2006

Cool! :) Feel free to re-add this, it might be worth noting on the page that this is an example of a stat mt package available online? - FrancisTyers 12:50, 1 March 2006 (UTC)
I've done some testing and read the links and I remain unconvinced, at least for the language pairs that are not labelled BETA:
This is a paste from the BBC "This bill brings forward root-and-branch reform I promised ensuring we have a far more comprehensive and co-ordinated system.":
  • SYSTRAN:
« Cette facture apporte la réforme vers l'avant de racine-et-branche que je nous ai promis s'assurant ai un système bien plus complet et plus coordonné. »
  • GOOGLE:
"cette facture apporte la réforme vers l'avant de racine-et-branche que je nous ai promis s'assurant ai un système bien plus complet et plus coordonné."
  • BABELFISH:
"cette facture apporte la réforme vers l'avant de racine-et-branche que je nous ai promis s'assurant ai un système bien plus complet et plus coordonné."
It doesn't specifically say on that googleblog link you pasted that translate.google.com is using this statistical method for all translations. I suspect highly that the ones labelled "BETA" on translate.google.com are using the statistical methods. Perhaps this could be pointed out. - FrancisTyers 13:03, 1 March 2006 (UTC)
FrancisTyers, thanks for welcoming, correcting me and testing their system. I suspect their announcement and examples with Al Quaeda stuff were pure publicity. Maybe their own MT didn't leave the lab yet. I suggest instead of giving the link to their system quoting their log in another section - after all, it seems significant development. In general, great job cleaning up all this stuff, thanks! - Galilite 09:28, 2 March 2006 (UTC)
No problem, this article needs a lot of work, and when I've got some more time and experience I'm going to have a shot at it. If you are interested in machine translation, there are loads of papers at http://www.mt-archive.info :) - FrancisTyers 10:43, 2 March 2006 (UTC)
Yep, actually I'm the one who added it to the list :-) . Galilite 21:36, 2 March 2006 (UTC)


[edit] External links

Time to tidy up the external links section. More coming soon... Note Wikipedia:External links - FrancisTyers 14:51, 31 March 2006 (UTC)

I've made a start, if anyone disagrees, feel free to comment below. - FrancisTyers 15:11, 31 March 2006 (UTC)
  • Qwika — a multiple language search engine of Wikipedias — in beta as of Feb 17,2006
Delete, Not specifically related to machine translation. - FrancisTyers 14:54, 31 March 2006 (UTC)
Delete, Not the place for advertising this. - FrancisTyers 14:54, 31 March 2006 (UTC)
Keep, the MTBOOK is good. - FrancisTyers 14:54, 31 March 2006 (UTC)
Keep, Invaluable resource. - FrancisTyers 14:54, 31 March 2006 (UTC)
Delete, Unless anyone can give a better reason, mt-archive has most of this stuff I think. - FrancisTyers 14:58, 31 March 2006 (UTC)
Francis, first - great job tidying it up, but - John Hutchins is (the only) de-facto chronicler of MT and deserves a special entry. This site is a separate one and many of publications do not appear in MT archive. Galilite 00:12, 5 April 2006 (UTC)
I checked out the site again, and I agree with you, I'll restore it. I didn't realise it had different stuff from the mt-archive site :) And I agree, he should have his own article, I keep meaning to write one but am having trouble finding any biographical information. - FrancisTyers 09:46, 5 April 2006 (UTC)
  • European Association for Machine Translation: EAMT, non-profit org
Delete, Not really necessarily as we have a page. - FrancisTyers 14:54, 31 March 2006 (UTC)
  • Association for Machine Translation in the Americas: AMTA, non-profit org
Delete, As above, if we don't already have a page we should have. - FrancisTyers 14:54, 31 March 2006 (UTC)
Delete, Crystal ball gazing mostly :) - FrancisTyers 14:58, 31 March 2006 (UTC)
Delete, Although the article probably has useful stuff that could make this article better. - FrancisTyers 14:58, 31 March 2006 (UTC)
Delete, Not particularly informative. - FrancisTyers 15:11, 31 March 2006 (UTC)
Delete, Humourous maybe, but not encyclopaedic. - FrancisTyers 14:58, 31 March 2006 (UTC)
Keep, For now... - FrancisTyers 14:58, 31 March 2006 (UTC)
Delete, Might be good to have an OpenDir link maybe? - FrancisTyers 14:58, 31 March 2006 (UTC)
Ehm... This is an official compendium of EAMT, the most influential MT body. I think OpenDir would be less comprehensive. - Galilite 00:12, 5 April 2006 (UTC)
I'd tend to agree, but unfortunately you have to pay for it :( Btw, will you be attending the conference in Norway? - FrancisTyers 09:46, 5 April 2006 (UTC)
Didn't notice that, sorry. Nope, I'm located down under, a bit too far. I am not affiliated with an academic institution, rather trying to get into commercial MT industry... - Galilite 00:15, 6 April 2006 (UTC)

[edit] Image

I created an image from a drawing in John Hutchins Introduction to Machine Translation, it is nice to have an image in the article, but I'm not sure how much it adds :) Btw, anyone can edit that image because it is created as an SVG, there is a free software program to edit it, see Inkscape. - FrancisTyers 10:23, 5 April 2006 (UTC)

Picture is worth a thousand words, and IMHO it belongs here. - Galilite 23:44, 5 April 2006 (UTC)

[edit] Footnotes

There are many kinds of footnotes, I prefer Footnotes3, so that is what I shall be using. If you wish to make some substantial contributions to the article please select whichever footnotes system you prefer. Please don't make edits which just change the footnotes system. Thanks >___> - FrancisTyers 08:39, 24 April 2006 (UTC)

Unfortunately, what you are saying goes agains the established Wikipedia policy WP:OWN, which states that no one owns any articles, not even by virtue of being the major or even sole contributor. Please refer to the text at the bottom of every edit page on Wikipedia: If you don't want your writing to be edited mercilessly or redistributed by others, do not submit it. Although maybe you did write a lot of this article, which referencing style you prefer isn't really relevant because of WP:OWN. Cite.php is demonstrably better and the majority of editors prefer it as a referencing style. --Cyde Weys 06:01, 25 April 2006 (UTC)

Thanks Cyde, I wasn't trying to be an asshole, I was just stating my preference, feel free to change it, but I will change it back when I am working on it. I don't think the majority of editors prefer it. - FrancisTyers 07:47, 25 April 2006 (UTC)
I just added a HTML comment requsting people check on the talk page before changing the citation format; IMO, there's no need to alter the format on articles which have a regular editor who prefers another format; there are still thousands of articles where no-one activly prefers the current format, and to me, it's better to work on those first. Nice article, btw. JesseW, the juggling janitor 23:59, 26 April 2006 (UTC)
Thanks, I've rewritten the History section, and I'm hoping to do the rest when I get some more time :) - FrancisTyers 00:09, 27 April 2006 (UTC)
True, it'll be quite a while yet before we reach the situation where there are just a few "holdouts" of the older referencing style to figure out what to do with. But this article's near the top of the first page of Special:Whatlinkshere/Template:Ref, so it's probably going to see a lot of wikignomes like me stumbling across it until then (I was half a second away from clicking "save page" with the ref formatting updated when I saw the HTML comment myself). May lead to a lot of unfortunte reverting, all done in good faith. Bryan 07:22, 10 May 2006 (UTC)
Yeah, I have been in discussion with the developer to try and work a way around it, and other users have too, see the talk page on Wikipedia_talk:Footnotes. - FrancisTyers 12:19, 10 May 2006 (UTC)

[edit] Removed history section

The first attempts at machine translation were conducted after World War II. It was assumed at this time that the newly invented computers would have no trouble in translating texts. The reasoning was that computers were able to do complex mathematics quickly, something that humans did with more difficulty. On the other hand, even young children were able to learn to understand human language; therefore, computers could do the same. In actual fact, this belief was soon shown to be incorrect.

On 7 January 1954, the Georgetown-IBM experiment, the first public demonstration of a MT system, was held in New York at the head office of IBM. The demonstration was widely reported in the newspapers and received much public interest. The system itself, however, was no more than what today would be called a "toy" system, having just 250 words and translating just 49 carefully selected Russian sentences into English — mainly in the field of chemistry. Nevertheless it encouraged the view that MT was imminent — and in particular stimulated the financing of MT research, not just in the US but worldwide.

The first serious MT systems were used during the Cold War to parse texts in Russian scientific journals. The rough translations produced were sufficient to understand the "gist" of the articles. If an article discussed a subject deemed to be of security interest, it was sent to a human translator for a complete translation; if not, it was discarded. The governmental support was however cut down in 1966, after the report of ALPAC, a committee established in order to review the investments, which considered that machine translation, despite the expenses, was not likely to reach the quality of a human translator.

Although the ALPAC report had tremendous impact on research in machine translation, there were notable exceptions; SYSTRAN, for example, managed to attract commercial and defence/security customers and survived the decrease of direct governmental funding. Limited field of use systems have also been successful in a number of specialized applications, for instance the METEO System has been used in Canada since 1977 to translate weather forecasts from English to French and now translates close to 80,000 words a day or 30 million words a year.

The advent of low-cost and more powerful computers towards the end of the 20th century brought MT to the masses, as did the availability of sites on the Internet. They are of particular interest to countries in East Asia wishing to export to the North American and European markets.

Much of the effort previously spent on MT research, however, has shifted to the development of computer-assisted translation (CAT) systems, such as translation memories, which are seen to be more successful and profitable. Although the two concepts are similar, machine translation (MT) should not be confused with computer-assisted translation (CAT) (also known as machine-assisted translation (MAT)).

In machine translation, the translator supports the machine, that is to say that the computer or program translates the text, which is then edited by the translator, whereas in computer-assisted translation, the computer program supports the translator, who translates the text himself, making all the essential decisions involved.

Removed for new sub article, feel free to merge in stuff from this. - FrancisTyers 15:00, 25 April 2006 (UTC)

[edit] Removed from Users

It has been reported that in April 2003 Microsoft began using a hybrid MT system for the translation of a database of technical support documents from English to Spanish. The system was developed internally by Microsoft's Natural Language Research group. The group is currently testing an English-Japanese system as well as bringing English-French and English-German systems online. The latter two systems use a learned language generation component, whereas the first two have manually developed generation components. The systems were developed and trained using translation memory databases with over a million sentences each.

Probably true, but I'd like to see a citation. - FrancisTyers · 13:38, 7 June 2006 (UTC)

[edit] Interesting fact

I have copied the following from talk:Sans-culottes. It's not relevant there. It might be relevant here.- Jmabel | Talk 21:39, 11 September 2006 (UTC)

[Begin copied text]

"Culottes" is also french for "knickers" or "panties":

http://babelfish.altavista.com/tr?doit=done&intl=1&tt=urltext&trtext=panties&lp=en_fr&btnTrTxt=Translate ("panties" translated into French)

"sans" means "Without". —The preceding unsigned comment was added by 86.132.47.53 (talk • contribs) 8 September 2006.

[End copied text]

[edit] Standardized English Wikipedia

I have a suggestion for a machine-translatable Wikipedia. The current state of machine translation is that it is not possible to reliably translate natural language. In fact, complete grammars of natural languages have yet to be written.

However, a basic English grammar that fulfills the purposes of most communication would be easy to write in a few dozen augmented context-free rules. The syntax would be unambiguous and therefore more easy to reliably parse by machine. And an unambiguous vocabulary could be developed. This has been done before, for example, by Caterpillar and Xerox, and some others.

Once this grammar and vocabulary were developed, it could be used to start a new standardized English Wikipedia, something like std.wikipedia.org. This Wikipedia would have an automatic standardized grammar checker built-in so that edits would have to be grammatical in order to be saved. That is not as difficult as it sounds, since as an automatically grammar-checked page it can provide grammar hints.

In addition, the vocabulary to the language could be extensible by user suggestion, so that proper names can be added.

This version of Wikipedia would also be somewhat more resistant to idle vandalism.

Once this new Standardized English Wikipedia were in place, it should be possible to develop a program to automatically and (mostly) reliably translate between English and other languages. That is my suggestion. LaggedOnUser 17:05, 1 October 2006 (UTC)

If only it were that easy… --Sabik 15:29, 7 November 2006 (UTC)
Something like the Voice of America special English broadcasts that use a carefully chosen restricted vocabulary. For general use, I think machine translation is best for someone who has had a year or two of the foreign language that the machine is translating. That way you can compare the text yourself so you won't be tricked by a "poison cookie" the machine messes up on. Good Wikipedia articles are better than average as targets for machine translation since they are usually fairly well written and don't have overly complex grammar. For example, I've tried using Google Translate's Arabic -> English translation engine. The results are mostly understandable but I wouldn't rely on it since I haven't studied Arabic. However, I did take a year of Russian and feel confident enough that I could puzzle out where the machine messes up so I would be much more confident using machine translation from Russian.DavidCowhig 06:04, 2 January 2007 (UTC)

[edit] Updated references on Google Translate and Machine Translation

Updated references on Google Translate and machine translation.DavidCowhig 05:53, 2 January 2007 (UTC)

Boa tarde, Nádia!

Vou-te enviar amanhã por correio azul uma carta da Deco Pró teste

no seguimento de uma reclamação feita por um cliente.

O ponto da situação é o seguinte:

As encomendas dos produtos que estão em falta já estão feitas ao

fornecedor pela Rosa Maria e pela Graça Henriques, aguardamos a

sua chegada para procedermos à entrega dos mesmos.

Se necessitares de mais alguma informação estou ao teu dispor.

Atentamente

Marta Martins

[edit] Rare language

It would be useful to say if there is a free MT project somewhere, and what could be the cost of building such a system for one pair of languages (for instance in term of the capital of a company doing the job). My underlying question is: "Is it reasonable to expect/promote a MT for rare languages, and if yes under which economic model (community, proprietary)? " (I live in Mongolia.)--Henri de Solages 12:46, 27 June 2007 (UTC)

Hi, I've replied on your talk page. - Francis Tyers · 15:10, 27 June 2007 (UTC)

[edit] Incorrect theory presented

Article currently states:

> The translation process may be stated as:

  1. Decoding the meaning of the source text; and
  2. Re-encoding this meaning in the target language.

Behind this ostensibly simple procedure lies a complex cognitive operation. <

I think this is false, I can't make much sense of "decode", what should mean exactly is quite blurred. In reality machine translation usally works like this:

Translate from input natural language A to common artificial language X Translate from common artificial language X to output natural language B (where X may be esperanto or other construct)

Not that it matters much, without true AI the net result will always be like "bard in, junk out". 82.131.210.162 18:47, 20 July 2007 (UTC)

hola como estas —Preceding unsigned comment added by 200.121.66.43 (talk) 22:02, 28 September 2007 (UTC)


[edit] Does anybody know of any other MT applications which can translate English to Hebrew other than Babylon and 1-800-translate ?

If you do, please let me know. Acidburn24m 05:18, 16 November 2007 (UTC)

[edit] Problems with machine translation

This entry on Language Log gives a cautionary example of what can go wrong when translating from Chinese to English! -- Arwel (talk) 22:59, 9 December 2007 (UTC)

[edit] Acronyms

MT is not an acronym. It is an abbreviation. An acronym is a word that is formed from the initial letters of other words (e.g. LASER, RADAR).

Stephen Shaw (talk) 00:34, 23 March 2008 (UTC)

[edit] References

There is only one book listed in the references and that is from 1992. New doesn't mean better, but I am sure there have been books written after that. Is there a book that goes through all the steps of building a simple translation between two languages (say english and french or english and latin)? The books seem to talk about generalities rather than a specific implementation. If I am a beginner who wants to build a translator between english and some new language, it would be really helpful to have a working program to tweak with. If you any such book, please let me know. Thanks. Kanfoo (talk) 19:35, 24 April 2008 (UTC)