International e-mail

From Wikipedia, the free encyclopedia

International E-mail (IDN E-mail or Intl E-mail) is E-mail that contains international, UTF-8 encoded, characters (characters which do not exist in the ASCII character set) in the e-mail header. The most significant aspect of this is the allowance of e-mail addresses (also know as e-mail identities) in any language, at both interface and transport levels.

Contents

[edit] International E-mail Address

Traditional e-mail addresses are limited to characters from the ASCII character set.[1] Therefore, it is impossible to use international Unicode UTF-8 characters in a traditional e-mail address. This confines every e-mail address in the world to characters from the English alphabet and a few other special characters such as those from the set: { !, #, $, %, &, ', *, +, -, /, =, ?, ^, _, `, ., {, |, }, ~, }. A rather awkward scenario for non-English speaking people.

The following are valid traditional e-mail addresses:

  Abc@example.com                                (English, ASCII)
  Abc.123@example.com                            (English, ASCII)
  user+mailbox/department=shipping@example.com   (English, ASCII)
  !#$%&'*+-/=?^_`.{|}~@example.com               (English, ASCII)
  "Abc@def"@example.com                          (English, ASCII)
  "Fred Bloggs"@example.com                      (English, ASCII)
  "Joe.\\Blow"@example.com                       (English, ASCII)

With International E-mail however; since it uses Unicode UTF-8 for encodeing the text in addresses and in headers, composition and transportation of e-mail to addresses in any language is possible.[2] The following are all valid international e-mail addresses:

   伊昭傑@郵件.商務                                 (Chinese, ASCII)
   राम@मोहन.ईन्फो                                    (Hindi, ASCII)
   юзер@екзампл.ком                            (Ukrainian, ASCII)
   θσερ@εχαμπλε.ψομ                              (Greek, ASCII)

[edit] Traditional E-mail Addresses and Identity

Imagine a native Russian speaker who doesn't know any English. The Russian language, as with other Slavic languages, is written using characters from the Cyrillic alphabet:

   а б в г д е ё ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю я 

Using an identifier composed of characters from this alphabet, or script, would be far more natural to the native Russian speaker than using characters from the English alphabet. A Russian might wish to use дерек@екзампил.ком as their identifier; however, since traditional email identifiers are confined to English script characters the Russian is forced to use another identifier in the awkward [for him/her] English form. In other words, the Russian might be forced to use a Roman transcription of their native Russian identifier such as derek@example.com or even some other completely unrelated Roman identifier instead.

As a result either email users are forced to identify themselves using potentially non-native language scripts (i.e. as only Roman script characters are traditionally allowed) or programmers of email systems must compensate for this by converting identifiers from their non-English scripts to English scripts and back again, using sophisticated, and unconventional conversion processes, at the user interface layer.

[edit] UTF-8 Headers

Although, the traditional format for e-mail headers allows non-ASCII characters to be included in the value portion of the header using the MIME encoded word, the process for including such characters requires extra processing of the header to convert the data to and from its MIME encoded word representation. Including international characters in these fields using UTF-8 encoding eliminates this extra processing and also the need to transmit additional charset information as will be assumed UTF-8 encoding implicitly.

[edit] Interoperability via Downgrading

Since Traditional E-mail standards constrain all e-mail header values to ASCII only characters, it is possible that the presence of UTF-8 characters in e-mail headers would decrease the stability and reliability of transporting such e-mail. This is because most, if not all, e-mail servers, at the time of this writing, do not support these characters.

A method has been proposed, by members of the IETF, by which e-mail can be downgraded into the "legacy" all ASCII format which all standard e-mail servers should support. This downgrade mechanism fulfills the requirement that e-mail transport be as robust and reliable as possible.

[edit] Origin

The E-mail Address Internationalization (EAI) working group of the Internet Engineering Task Force (IETF) is currently finalizing internet drafts for International E-mail. These drafts specify changes to the current format of e-mail messages and the e-mail communication protocols used for transporting these messages. These changes affect the way e-mail messages are actually delivered from sender to recipient. Once finalized, these drafts will become part of the internet in the form of Internet RFC standards.

[edit] Protocol Extensions

[edit] SMTP

[edit] POP

[edit] IMAP

[edit] See also

[edit] References

  1. ^ RFC 2088: Internet Message Format
  2. ^ RFC 4952: Overview and Framework for Internationalized Email

[edit] Bibliography

[edit] External links