IDN homograph attack
From Wikipedia, the free encyclopedia
The internationalized domain name (IDN) homograph attack is a means by which a malicious party may seek to deceive computer users about what remote system they are communicating with, by exploiting the fact that many different characters may have nearly (or wholly) indistinguishable glyphs.
Contents |
[edit] Homographs
In multilingual computer systems, different logical characters may have identical or very similar appearances. For example, Unicode character U+0430, Cyrillic small letter a ("а"), can look identical to Unicode character U+0061, Latin small letter a, ("a") which is the lowercase "a" used in English. Technically, characters that look alike in this way are known as homoglyphs (a subgroup of homographs). Spoofing attacks based on these similarities are known as homograph spoofing attacks.
The problem arises from the different treatment of the characters in the user's mind and the computer's programming. From the viewpoint of the user, a Cyrillic "а" within a Latin string is a Latin "a"; there is no difference in the glyphs for these characters in most fonts. However, the computer treats them differently when processing the character string as an identifier. Thus, the user's assumption of a one-to-one correspondence between the visual appearance of a name, and the named entity, breaks down.
In a typical example of a hypothetical attack, someone could register a domain name that appears identical to an existing domain but goes somewhere else. For example, the spoofed domain "pаypal.com" contains a Cyrillic a, not a Latin a. In many ways, this is not a new thing. For example, even staying within the old character set of A-Z, 0-9 and hyphen, G00GLE.COM looks much like GOOGLE.COM in some fonts; or, using a mix of uppercase and lowercase characters, googIe.com (capital i, not small ell) looks much like google.com in some fonts. PayPal itself was a target of a phishing scam exploiting this, using the domain PayPaI.com Or, displaying characters in lowercase alone, rnozilla.org ("RNOZILLA.ORG") looks very much like mozilla.org in many fonts; similarly, in certain narrow-spaced fonts such as Tahoma (the default address bar in Windows XP), placing a c in front of a j, l or i will produce homoglyphs such as cl cj ci (d g a). A unique homograph issue is that of long s (ſ), which has long been confused with "f" but is recognized as "s" in URLs. What is new was that the expansion by the internationalized domain name system of the character repertoire from a few dozen characters in a single alphabet to many thousands of characters in many scripts greatly increased the scope for homograph attacks.
[edit] Homographs in internationalized domain names
The limitation of domain names to ASCII characters may not last forever, and is coming under pressure from organizations based in regions that do not use Latin characters. Internationalized domain names provides a backward-compatible way for domain names to use the full Unicode character set, and this standard is already widely supported.
For example, the Russian newspaper website gazeta.ru may wish to use the URL газета.рф, reflecting the newspaper's name spelled in Cyrillic. The disadvantage in this example is that the Cyrillic letters 'а', 'е', and 'р' all strongly resemble (or are indistinguishable, depending on the font) the Latin letters 'a', 'e', and 'p' Some of these pairings (such as а-a) are of two letters that are close etymologically, while others look similar by coincidence. For instance, the Cyrillic letter 'р' represents a phoneme similar to the English 'r', but the glyph strongly resembles the Latin letter 'p' in most fonts.
This opens a rich vein of opportunities for phishing and other varieties of fraud. An attacker could register a domain name that looks just like that of a legitimate website, but in which some of the letters have been replaced by homographs in another alphabet. The attacker could then send e-mail messages purporting to come from the original site, but directing people to the bogus site. The spoof site could then record information such as passwords or account details, while passing traffic through to the real site. The victims may never notice the difference, until suspicious or criminal activity occurs with their accounts.
The following alphabets have characters that can be used for spoofing attacks (please note, these are only the most obvious and common, given artistic license and how much risk the spoofer will take of getting caught; the possibilities are far more numerous than can be listed here):
[edit] Cyrillic
Cyrillic, by far, is the most commonly used alphabet for homoglyphs, largely because it contains 10 lowercase glyphs that are identical (or nearly identical) to Latin counterparts. The following Cyrillic letters have optical counterparts in the basic Latin alphabet: асһеіјорѕху, which look close or identical to acheijopsxy, and Cyrillic З resembles the numeral 3. Italic type generates more homoglyphs: тпи (тпи in standard type), resembling mnu. Cyrillic ёї can also be used if an IDN itself is being spoofed, to fake ëï.
If capital letters are counted, ВНКМТ can substitute BHKMT, in addition to the capitals for the lowercase Cyrillic homoglyphs.
[edit] Greek
From the Greek alphabet, only omikron ο and sometimes nu ν qualify in the lowercase used for URLs. Fonts that are in italic type will feature Greek alpha α looking like a Latin a.
This list increases if close matches are also allowed (such as Greek εικηρτυωχγ for eiknptuwxy). Using capital letters, the list expands greatly. Greek ΑΒΕΗΙΚΜΝΟΡΤΧΥΖ looks identical to Latin ABEHIKMNOPTXYZ.
If an IDN itself is being spoofed, Greek beta β can be a substitute for German esszet ß in some fonts (and in fact, code page 437 treats them as equivalent), as can Greek sigma ς for ç; accented Greek substitutes όίά can usually be used for óíá in many fonts, with the last of these (alpha) again only resembling a in italic type.
[edit] Armenian
Also the Armenian alphabet can contribute critical characters: ցհոօզս which look like ghnoqu, յ which resembles j (albeit dotless), and ք, which can either resemble p or f depending on the font. However, the use of Armenian is problematic. Most standard fonts do not feature the Armenian glyphs (whereas the Greek and Cyrillic scripts are in most standard fonts). Because of this, Windows normally renders Armenian in a distinct font, Sylfaen, which supports Armenian, and the mixing of Armenian with Latin will appear obviously different if using a font other than Sylfaen or a Unicode typeface. Furthermore, this font differentiates Latin g from Armenian ց.
Two letters in Armenian (Ձշ) also can resemble the number 2, while another (վ) sometimes resembles the number 4.
[edit] Hebrew
Hebrew spoofing is generally rare. Only two letters from that alphabet can reliably be used: samekh (ס), which sometimes resembles o, and vav with diacritic, וֹ, which resembles an i. Less accurate approximants for some other alphanumerics can also be found, but these are usually only accurate enough to use for the purposes of foreign branding and not for substitution. Furthermore, the Hebrew alphabet is written from right to left and trying to mix it with left-to-right glyphs may cause problems.
[edit] Defending against the attack
The simplest defense is for web browsers not to support IDNA or other similar mechanisms, or for users to turn off whatever support their browsers have. That could mean blocking access to IDNA sites, but generally browsers permit access and just display IDNs in Punycode. Either way, this amounts to abandoning non-ASCII domain names.
Firefox and Opera display punycode for IDNs unless the top-level domain (TLD, for example, .ac
or .museum
) prevents homograph attacks by restricting which characters can be used in domain names.[1] They both also allow users to manually add TLDs to the allowed list.[2][3]
Internet Explorer 7 allows IDNs except for labels that mix scripts for different languages. Labels that mix scripts are displayed in punycode. There are exceptions to locales where ASCII characters are commonly mixed with localized scripts.[4]
As an additional defense, Internet Explorer 7, Firefox 2.0 and Opera 9.10 include phishing filters to alert users when they visit malicious websites.[5][6][7]
Another possible defense would be for web browsers to display non-ASCII characters in URLs distinctively, perhaps by changing their color or that of their background. This wouldn't provide protection against spoofing by changing one non-ASCII character to another similar-looking one (for example, replacing a Greek ο with a Cyrillic о or vice versa). (A solution to this problem would be using a different color for all character groups, but no software implements it that way.) This approach was adopted, as of July 9, 2005, by the plug-in Quero Toolbar for Internet Explorer. Besides IDN highlighting Quero has implemented several other techniques to mitigate IDN spoofing attacks like mixed-script/missing glyph detection, IDN/digit indication and "core domain" highlighting.
Using certain fonts that differentiate between homoglyphs can help identify a phony character in a URL. For instance, Courier New, which is widely available as a standard monospace font, constructs its characters in a way such that some characters that appear to be homoglyphs in other fonts appear distinctly different in Courier New (although there are still several characters that still appear identical). However, the ability to readily change the font of the address bar is not yet widespread or easy for the typical Internet user to implement at this time.
There is not yet (as of March 2005) a clear consensus as to the best way to balance the needs of the international community with protection against domain-name spoofing.
[edit] See also
[edit] References
- ^ Advisory: Internationalized domain names (IDN) can be used for spoofing.. Opera (2005-02-25). Retrieved on 2007-02-24.
- ^ IDN-enabled TLDs. Mozilla (2006-08-07). Retrieved on 2006-11-30.
- ^ Opera's Settings File Explained: IDNA White List. Opera Software (2006-12-18). Retrieved on 2007-02-24.
- ^ Sharif, Tariq (2006-07-31). Changes to IDN in IE7 to now allow mixing of scripts. IEBlog. Microsoft. Retrieved on 2006-11-30.
- ^ Sharif, Tariq (2005-09-09). Phishing Filter in IE7. IEBlog. Microsoft. Retrieved on 2006-11-30.
- ^ Firefox 2 Phishing Protection. Mozilla (2006). Retrieved on 2006-11-30.
- ^ Opera Fraud Protection. Opera Software (2006-12-18). Retrieved on 2007-02-24.
[edit] External links
- http://www.shmoo.com/idn/homograph.txt The state of homograph attacks, by Eric Johanson.
- http://secunia.com/advisories/14163/, http://secunia.com/advisories/14209/ Secunia advisories about IDN spoofing
- http://www.centr.org/docs/2005/02/homographs.html CENTR statement on IDN homograph attacks, issued by the Council of European National TLD registries.
- The Homograph Attack, Evgeniy Gabrilovich and Alex Gontmakher, Communications of the ACM, 45(2):128, February 2002
- Quero Toolbar - An IDN-enabling plug-in for Internet Explorer with anti-spoofing techniques.
- Erik van der Poel's Unofficial Nameprep/IDNA/Stringprep website