Collating sequence

From Wikipedia, the free encyclopedia

The term collating sequence refers to the order in which character strings should be placed when sorting them.

A common example is the familiar "alphabetic order," in which "Alfred" occurs before "Zeus" because "A" occurs before "Z" in the English alphabet. But there are other issues that a collating sequence must consider, say in a computer system.

  • Upper and Lower-Case: Should "Alfred" be placed before or after "alfred"? Generally one would say "no," because an upper-case "A" and a lower-case "a" are usually considered to be the same letter. But it may be that you want to sort the records otherwise.
  • National characters, accents, tildes: Various languages use these marks over and around letters, but once again the speakers of the language might consider the characters to be "the same."

In a computer system, each letter is necessarily assigned a unique numeric code (as in the ASCII or Unicode character set), but the proper and customary ordering of strings is not performed by a simple numeric comparison of those codes. Rather, the ordering is determined by reference to the collating sequence.