Head/tail Breaks

1024 cities that follow exactly Zipf's law, which implies that the first largest city is size 1, the second largest city is size 1/2, the third largest city is size 1/3, ... and the smallest city is size 1/1024. The left pattern is produced by head/tail breaks, while the right one by natural breaks, also known as Jenks natural breaks optimization.

Head/tail breaks is a new clustering algorithm scheme for data with a heavy-tailed distribution such as power laws and lognormal distributions. The heavy-tailed distribution can be simply referred to the scaling pattern of far more small things than large ones. The classification is done through dividing things into large (or called the head) and small (or called the tail) things around the arithmetic mean or average, and then recursively going on for the division process for the large things or the head until the notion of far more small things than large ones is no longer valid, or with more or less similar things left only. [1]

Motivation

The head/tail breaks is mainly motivated by inability of conventional classification methods such as equal intervals, quantiles, geometric progressions, standard deviation, and natural breaks - commonly known as Jenks natural breaks optimization for revealing the underlying scaling pattern of far more small things than large ones. Note that the notion of far more small things than large one is not only referred to geometric property, but also to topological and semantic properties. In this connection, the notion should be interpreted as far more unpopular (or less-connected) things than popular (or well-connected) ones, or far more meaningless things than meaningful ones.

Method

Given some variable X that demonstrates a heavy-tailed distribution, there are far more small x than large ones. Take the average of all xi, and obtain the first mean m1. Then calculate the second mean for those xi greater than m1, and obtain m2. In the same recursive way, we can get m3 depending on whether the ending condition of no longer far more small x than large ones is met. For simplicity, we assume there are three means, m1, m2, and m3. This classification leads to four classes: [minimum, m1], (m1, m2], (m2, m3], (m3, maximum]. In general, it can be represented as a recursive function as follows:

    Recursive function Head/tail Breaks:
    Break the input data (around mean or average) into the head and the tail;  
    // the head for data values greater the mean
    // the tail for data values less the mean
    while (head <= 40%):
        Head/tail Breaks(head);
    End Function

The resulting number of classes is referred to as ht-index, an alternative index to fractal dimension for characterizing complexity of fractals or geographic features: the higher the ht-index, the more complex the fractals.[2]

Applications

Instead of more or less similar things, there are far more small things than large ones surrounding us. Given the ubiquity of the scaling pattern, head/tail breaks is found to be of use to statistical mapping, map generalization, cognitive mapping and even perception of beauty .[3][4][5] It helps visualize big data, since big data are likely to show the scaling property of far more small things than large ones. The visualization strategy is to recursively drop out the tail parts until the head parts are clear or visible enough [6] In addition, it helps delineate cities or natural cities to be more precise from various geographic information such as street networks, social media geolocation data, and nighttime images.

The left panel pattern contains 50,000 natural cities, which can be put into 7 hierarchical levels. It looks like a hair ball. Instead of showing all the 7 hierarchical levels, we show 4 top levels, by dropping out 3 low levels. Now with the right panel, the scaling pattern of far more small cities than large ones emerges. It is important to note that the remaining parts are self-similar to the whole.

References

  1. Jiang, Bin (2013a). "Head/tail breaks: A new classification scheme for data with a heavy-tailed distribution", The Professional Geographer, 65 (3), 482 – 494.
  2. Jiang, Bin and Yin Junjun (2014). "Ht-index for quantifying the fractal or scaling structure of geographic features", Annals of the Association of American Geographers, 104(3), 530–541.
  3. Jiang, Bin, Liu, Xintao and Jia, Tao (2013). "Scaling of geographic space as a universal rule for map generalization", Annals of the Association of American Geographers, 103(4), 844 – 855.
  4. Jiang, Bin (2013b). "The image of the city out of the underlying scaling of city artifacts or locations", Annals of the Association of American Geographers, 103(6), 1552-1566.
  5. Jiang, Bin and Sui, Daniel (2014). "A new kind of beauty out of the underlying scaling of geographic space", The Professional Geographer, 66(4), 676–686
  6. Jiang, Bin (2015). "Head/tail breaks for visualization of city structure and dynamics", Cities, 43, 69-77.

Further reading