Internationalized domain name

Example of Greek IDN with domain name in non-Latin alphabet: ουτοπία.δπθ.gr

An internationalized domain name (IDN) is an Internet domain name that contains at least one label that is displayed in software applications, in whole or in part, in a language-specific script or alphabet, such as Arabic, Chinese, Cyrillic, Tamil, Hebrew or the Latin alphabet-based characters with diacritics or ligatures, such as French. These writing systems are encoded by computers in multi-byte Unicode. Internationalized domain names are stored in the Domain Name System as ASCII strings using Punycode transcription.

The Domain Name System, which performs a lookup service to translate user-friendly names into network addresses for locating Internet resources, is restricted in practice[1] to the use of ASCII characters, a practical limitation that initially set the standard for acceptable domain names. The internationalization of domain names is a technical solution to translate names written in language-native scripts into an ASCII text representation that is compatible with the Domain Name System. Internationalized domain names can only be used with applications that are specifically designed for such use; they require no changes in the infrastructure of the Internet.

IDN was originally proposed in December 1996 by Martin Dürst[2][3] and implemented in 1998 by Tan Juay Kwang and Leong Kok Yong under the guidance of Tan Tin Wee. After much debate and many competing proposals, a system called Internationalizing Domain Names in Applications (IDNA)[4] was adopted as a standard, and has been implemented in several top-level domains.

In IDNA, the term internationalized domain name means specifically any domain name consisting only of labels to which the IDNA ToASCII algorithm (see below) can be successfully applied. In March 2008, the IETF formed a new IDN working group to update[5] the current IDNA protocol.

In October 2009, the Internet Corporation for Assigned Names and Numbers (ICANN) approved the creation of internationalized country code top-level domains (IDN ccTLDs) in the Internet that use the IDNA standard for native language scripts.[6][7] In May 2010 the first IDN ccTLD were installed in the DNS root zone.[8]

Internationalizing Domain Names in Applications

Internationalizing Domain Names in Applications (IDNA) is a mechanism defined in 2003 for handling internationalized domain names containing non-ASCII characters. These names either are Latin letters with diacritics (ñ, é) or are written in languages or scripts which do not use the Latin alphabet: Arabic, Hangul, Hiragana and Kanji for instance. Although the Domain Name System supports non-ASCII characters, applications such as e-mail and web browsers restrict the characters which can be used as domain names for purposes such as a hostname. Strictly speaking it is the network protocols these applications use that have restrictions on the characters which can be used in domain names, not the applications that have these limitations or the DNS itself. To retain backwards compatibility with the installed base the IETF IDNA Working Group decided that internationalized domain names should be converted to a suitable ASCII-based form that could be handled by web browsers and other user applications. IDNA specifies how this conversion between names written in non-ASCII characters and their ASCII-based representation is performed.

An IDNA-enabled application is able to convert between the internationalized and ASCII representations of a domain name. It uses the ASCII form for DNS lookups but can present the internationalized form to users who presumably prefer to read and write domain names in non-ASCII scripts such as Arabic or Hiragana. Applications that do not support IDNA will not be able to handle domain names with non-ASCII characters, but will still be able to access such domains if given the (usually rather cryptic) ASCII equivalent.

ICANN issued guidelines for the use of IDNA in June 2003, and it was already possible to register .jp domains using this system in July 2003 and .info[9] domains in March 2004. Several other top-level domain registries started accepting registrations in 2004 and 2005. IDN Guidelines were first created[10] in June 2003, and have been updated[11] to respond to phishing concerns in November 2005. An ICANN working group focused on country code domain names at the top level was formed in November 2007[12] and promoted jointly by the country code supporting organization and the Governmental Advisory Committee.

Mozilla 1.4, Netscape 7.1, Opera 7.11 were among the first applications to support IDNA. A browser plugin is available for Internet Explorer 6 to provide IDN support. Internet Explorer 7.0[13][14] and Windows Vista's URL APIs provide native support for IDN.[15]

ToASCII and ToUnicode

The conversions between ASCII and non-ASCII forms of a domain name are accomplished by algorithms called ToASCII and ToUnicode. These algorithms are not applied to the domain name as a whole, but rather to individual labels. For example, if the domain name is www.example.com, then the labels are www, example, and com. ToASCII or ToUnicode are applied to each of these three separately.

The details of these two algorithms are complex, and are specified in RFC 3490. The following gives an overview of their function.

ToASCII leaves unchanged any ASCII label, but will fail if the label is unsuitable for the Domain Name System. If given a label containing at least one non-ASCII character, ToASCII will apply the Nameprep algorithm, which converts the label to lowercase and performs other normalization, and will then translate the result to ASCII using Punycode[16] before prepending the four-character string "xn--".[17] This four-character string is called the ASCII Compatible Encoding (ACE) prefix, and is used to distinguish Punycode encoded labels from ordinary ASCII labels. The ToASCII algorithm can fail in several ways; for example, the final string could exceed the 63-character limit of a DNS name. A label for which ToASCII fails cannot be used in an internationalized domain name.

The function ToUnicode reverses the action of ToASCII, stripping off the ACE prefix and applying the Punycode decode algorithm. It does not reverse the Nameprep processing, since that is merely a normalization and is by nature irreversible. Unlike ToASCII, ToUnicode always succeeds, because it simply returns the original string if decoding fails. In particular, this means that ToUnicode has no effect on a string that does not begin with the ACE prefix.

Example of IDNA encoding

IDNA encoding may be illustrated using the example domain Bücher.ch. “Bücher” is German for “books”, and .ch is the ccTLD of Switzerland. This domain name has two labels, Bücher and ch. The second label is pure ASCII, and is left unchanged. The first label is processed by Nameprep to give bücher, and then converted to Punycode to result in bcher-kva. It is then prefixed with xn-- to produce xn--bcher-kva. The resulting label suitable for use in the DNS is therefore xn--bcher-kva.ch.

Top-level domain implementation

In 2009, ICANN decided to implement a new class of top-level domains, assignable to countries and independent regions, similar to the rules for country code top-level domains. However, the domain names may be any desirable string of characters, symbols, or glyphs in the language-specific, non-Latin alphabet or script of the applicant's language, within certain guidelines to assure sufficient visual uniqueness.

The process of installing IDN country code domains began with a long period of testing in a set of subdomains in the test top-level domain. Eleven domains used language-native scripts or alphabets, such as δοκιμή,[18] meaning test in Greek.

These efforts culminated in the creation of the first internationalized country code top-level domains (IDN ccTLDs) for production use in 2010.

In the Domain Name System, these domains use an ASCII representation consisting of the prefix xn-- followed by the Punycode translation of the Unicode representation of the language-specific alphabet or script glyphs. For example, the Cyrillic name of Russia's IDN ccTLD is рф. In Punycode representation, this is p1ai, and its DNS name is xn--p1ai.

Non-IDNA or non-ICANN registries that support non-ASCII domain names

There are other registries that support non-ASCII domain names. The company ThaiURL.com in Thailand supports .com registrations via its own IDN encoding, ThaiURL. However, since most modern browsers only recognize IDNA/punycode IDNs, ThaiURL-encoded domains must be typed in or linked to in their encoded form, and they will be displayed thus in the address bar. This limits their usefulness; however, they are still valid and universally accessible domains.

ASCII spoofing concerns

Main article: IDN homograph attack

The use of Unicode in domain names makes it potentially easier to spoof web sites as the visual representation of an IDN string in a web browser may make a spoof site appear indistinguishable to the legitimate site being spoofed, depending on the font used. For example, Unicode character U+0430, Cyrillic small letter a, can look identical to Unicode character U+0061, Latin small letter a, used in English. As a concrete example, using Cyrillic letters а, е (“Ie”/“Ye”, U+0435, looking essentially identical to Latin letter e), Belarusian-Ukrainian і (U+0456, essentially identical to Latin letter i), р (“Er”, U+0440, essentially identical to Latin letter p), we form the URL wіkіреdіа.org (xn--wkd-8cdx9d7hbd.org in encoded form), which is virtually indistinguishable from the visual representation of the legitimate wikipedia.org (possibly depending on fonts).

Top-level domains accepting IDN registration

Many top-level domains have started to accept internationalized domain name registrations at the second or lower levels.

DotAsia, the registrar for the TLD Asia, conducted a 70-day sunrise period starting May 11, 2011 for second-level domain registrations in the Chinese, Japanese and Korean scripts.[19]

Timeline

See also

References

  1. RFC 2181, Clarifications to the DNS Specification: section 11 explicitly allows any binary string
  2. Dürst, Martin J. (December 10, 1996). "Internet Draft: Internationalization of Domain Names". The Internet Engineering Task Force (IETF), Internet Society (ISOC). Retrieved 2009-10-31.
  3. Dürst, Martin J. (December 20, 1996). "URLs and internationalization". World Wide Web Consortium. Retrieved 2009-10-30.
  4. RFC 3490, IDN in Applications, Faltstrom, Hoffman, Costello, Internet Engineering Task Force (2003)
  5. John Klensin (2008). "Internationalized Domain Names in Applications (IDNA): Protocol". IETF IDNAbis WG. External link in |publisher= (help)
  6. "ICANN Bringing the Languages of the World to the Global Internet" (Press release). Internet Corporation For Assigned Names and Numbers (ICANN). October 30, 2009. Retrieved 2009-10-30.
  7. "Internet addresses set for change". BBC News. October 30, 2009. Retrieved 2009-10-30.
  8. 1 2 "First IDN ccTLDs now available" (Press release). Internet Corporation For Assigned Names and Numbers (ICANN). May 5, 2010. Retrieved 2010-05-06.
  9. Mohan, Ram, German IDN, German Language Table, March 2003
  10. Dam, Mohan, Karp, Kane & Hotta, IDN Guidelines 1.0, ICANN, June 2003
  11. Karp, Mohan, Dam, Kane, Hotta, El Bashir, IDN Guidelines 2.0, ICANN, November 2005
  12. Jesdanun, Anick (Associated Press) (2 November 2007). "Group on Non-English Domains Formed". Archived from the original on December 20, 2008. Retrieved 2 November 2007.
  13. What's New in Internet Explorer 7
  14. International Domain Name Support in Internet Explorer 7
  15. Handling Internationalized Domain Names (IDNs)
  16. RFC 3492, Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA), A. Costello, The Internet Society (March 2003)
  17. IANA e-mails explaining the final choice of ACE prefix
  18. IANA Report on Delegation of Eleven Evaluative Internationalised Top-Level Domains
  19. Dot-Asia releases IDN dates, Managing Internet IP, April 14, 2011.
  20. "draft-duerst-dns-i18n-00 - Internationalization of Domain Names". Tools.ietf.org. Retrieved 2010-07-29.
  21. http://archive.minc.org/about/history/
  22. "the leading Telecom magazine, ICT magazine, Telecom magazine, ICT and Telecom". Connect-World. Retrieved 2010-07-29.
  23. "APNG". APNG. Retrieved 2010-07-29.
  24. "The community of Asia Pacific Internet Organization". Apstar.Org. Retrieved 2010-07-29.
  25. Archived April 22, 2006 at the Wayback Machine
  26. Archived August 23, 2003 at the Wayback Machine
  27. Archived August 11, 2006 at the Wayback Machine
  28. "Method and system for internationalizing domain names (US6182148)". Delphion.com. Retrieved 2010-07-29.
  29. "draft-jseng-utf5-00 - UTF-5, a transformation format of Unicode and ISO 10646". Tools.ietf.org. 1999-07-27. Retrieved 2010-07-29.
  30. "draft-jseng-utf5-01 - UTF-5, a transformation format of Unicode and ISO 10646". Tools.ietf.org. 2000-01-28. Retrieved 2010-07-29.
  31. Archived August 23, 2003 at the Wayback Machine
  32. Archived November 10, 2004 at the Wayback Machine
  33. "APRICOT 2000 in Seoul". Apricot.net. Retrieved 2010-07-29.
  34. "Multilingual Internet Names Consortium". MINC. Retrieved 2010-07-29.
  35. Archived January 26, 2004 at the Wayback Machine
  36. "Chinese Domain Name Consortium". CDNC. 2000-05-19. Retrieved 2010-07-29.
  37. "Chinese Domain Name Consortium". CDNC. Retrieved 2010-07-29.
  38. urduworkshop.sdnpk.org
  39. "Signposts in Cyberspace: The Domain Name System and Internet Navigation". Nap.edu. 2001-11-07. Retrieved 2010-07-29.
  40. "ITU-T SG17 Meeting Documents". Itu.int. Retrieved 2010-07-29.
  41. "ITU-T Newslog - Multilingual Internet Work Progresses". Itu.int. 2006-05-04. Retrieved 2010-07-29.
  42. "GNSO IDN WG". icann.org. 2007-03-22. Retrieved 2010-08-30.
  43. Mohan, Ram, GNSO IDN Working Group, Outcomes Report (PDF), ICANN
  44. On Its Way: One of the Biggest Changes to the Internet
  45. My Name, My Language, My Internet: IDN Test Goes Live
  46. Successful Evaluations of .test IDN TLDs
  47. IDNAbis overview (2008)
  48. ICANN - Paris/IDN CCTLD discussion - Wiki
  49. ICANN Seeks Interest in IDN ccTLD Fast-Track Process
  50. Proposed Final Implementation Plan: IDN ccTLD Fast Track Process, 30 September 2009
  51. Regulator approves multi-lingual web addresses, Silicon Republic, 30.10.2009
  52. "First IDN ccTLDs Requests Successfully Pass String Evaluation". ICANN. 2010-01-21.

External links

This article is issued from Wikipedia - version of the Tuesday, February 09, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.