IETF language tag

From Wikipedia, the free encyclopedia

IETF language tags are defined by BCP 47, which is currently RFC 4646 and RFC 4647. These language tags are used in a number of modern standards, such as HTTP,[1] HTML,[2] XML[3] and PNG.[4]

Each language tag is composed of one or more “subtags” separated by hyphens. With the exception of private use language tags and grandfathered language tags, the subtags occur in the following order:

  • a language subtag (potentially followed by up to three extended language subtags)
  • an optional script subtag
  • an optional region subtag
  • optional variant subtags
  • optional extension subtags
  • optional private use subtags

Language subtags are mainly derived from ISO 639-1 and ISO 639-2, script subtags from ISO 15924, and region subtags from ISO 3166-1 alpha-2 and UN M.49. Variant subtags are not derived from any standard. No extension subtags have yet been defined. The Language Subtag Registry, maintained by IANA, lists the current valid public subtags.

The most commonly seen language tags consist of just a language subtag, or a language subtag and a region subtag. For example, en represents English, and consists of a single language subtag (from ISO 639-1), while en-CA represents Canadian English, and consists of the language subtag en followed by the region subtag CA (from ISO 3166-1).

Subtags are not case sensitive, but the specification recommends using the same case as in the Language Subtag Registry, where region subtags are uppercase, script subtags are titlecase and all other subtags are lowercase. This capitalization follows the recommendations of the underlying ISO standards.

Contents

[edit] History

IETF language tags were first defined in RFC 1766, published in March 1995. In January 2001 this was superseded by RFC 3066, which added the use of ISO 639-2 codes (whereas previously only ISO 639-1 codes had been allowed), permitted subtags with digits for the first time, and adopted the concept of language ranges from HTTP/1.1 to help with matching of language tags.

The next revision of the specification came in September 2006 with the publication of RFC 4646 (the main part of the specification) and RFC 4647 (which deals with matching behaviour). RFC 4646 introduced a more structured format for language tags and replaced the old register of tags with a new register of subtags that utilizes ISO 15924 and UN M.49 in addition to the previously used ISO 639 and ISO 3166. The small number of previously defined tags that did not conform to the new structure were grandfathered in order to maintain compatibility with RFC 3066.

An IETF Working Group is currently preparing the next version of the specification. The main purpose of this revision is to incorporate codes from ISO 639-3 into the Language Subtag Registry.[5]

[edit] Relation to other standards

Although subtags are often derived from ISO standards, they do not follow these standards absolutely as this could lead to the meaning of language tags changing over time.

In particular, a subtag derived from a code assigned by ISO 639, ISO 15924 or ISO 3166 remains a valid (though deprecated) subtag even if the code is withdrawn from the corresponding ISO standard. If the ISO standard later assigns a new meaning to the withdrawn code, the corresponding subtag will still retain its old meaning.

This stability was introduced in RFC 4646. Before RFC 4646, changes in the meaning of ISO codes could cause changes in the meaning of language tags.

[edit] Issues with ISO 3166-1 and UN M.49

If a new ISO 3166-1 alpha-2 code would conflict with an existing region subtag (due to the code having previously had a different meaning), a UN M.49 code can be used instead. This rule was introduced in RFC 4646 and so far there has been no need to use it. UN M.49 is also the source for region subtags such as 005 for South America, as ISO 3166 does not provide codes for supranational regions.

[edit] Relation to ISO 639-3

RFC 4646, unlike its predecessors, defines the concept of an “extended language subtag”, although it does not permit the registration of such subtags. The next version of the specification (currently in draft) is expected to require certain ISO 639-3 codes to be registered as extended language subtags, and to require other ISO 639-3 codes to be registered as (primary) language subtags.[6]

[edit] See also

[edit] References

[edit] External links