Head/tail Breaks
Head/tail breaks is a new clustering algorithm scheme for data with a heavy-tailed distribution such as power laws and lognormal distributions. The heavy-tailed distribution can be simply referred to the scaling pattern of far more small things than large ones. The classification is done through dividing things into large (or called the head) and small (or called the tail) things around the arithmetic mean or average, and then recursively going on for the division process for the large things or the head until the notion of far more small things than large ones is no longer valid, or with more or less similar things left only. [1]
Motivation
The head/tail breaks is mainly motivated by inability of conventional classification methods such as equal intervals, quantiles, geometric progressions, standard deviation, and natural breaks - commonly known as Jenks natural breaks optimization for revealing the underlying scaling pattern of far more small things than large ones. Note that the notion of far more small things than large one is not only referred to geometric property, but also to topological and semantic properties. In this connection, the notion should be interpreted as far more unpopular (or less-connected) things than popular (or well-connected) ones, or far more meaningless things than meaningful ones.
Method
Given some variable X that demonstrates a heavy-tailed distribution, there are far more small x than large ones. Take the average of all xi, and obtain the first mean m1. Then calculate the second mean for those xi greater than m1, and obtain m2. In the same recursive way, we can get m3 depending on whether the ending condition of no longer far more small x than large ones is met. For simplicity, we assume there are three means, m1, m2, and m3. This classification leads to four classes: [minimum, m1], (m1, m2], (m2, m3], (m3, maximum]. In general, it can be represented as a recursive function as follows:
Recursive function Head/tail Breaks: Break the input data (around mean or average) into the head and the tail; // the head for data values greater the mean // the tail for data values less the mean while (head <= 40%): Head/tail Breaks(head); End Function
The resulting number of classes is referred to as ht-index, an alternative index to fractal dimension for characterizing complexity of fractals or geographic features: the higher the ht-index, the more complex the fractals.[2]
Applications
Instead of more or less similar things, there are far more small things than large ones surrounding us. Given the ubiquity of the scaling pattern, head/tail breaks is found to be of use to statistical mapping, map generalization, cognitive mapping and even perception of beauty .[3][4][5] It helps visualize big data, since big data are likely to show the scaling property of far more small things than large ones. The visualization strategy is to recursively drop out the tail parts until the head parts are clear or visible enough [6] In addition, it helps delineate cities or natural cities to be more precise from various geographic information such as street networks, social media geolocation data, and nighttime images.
References
- ↑ Jiang, Bin (2013a). "Head/tail breaks: A new classification scheme for data with a heavy-tailed distribution", The Professional Geographer, 65 (3), 482 – 494.
- ↑ Jiang, Bin and Yin Junjun (2014). "Ht-index for quantifying the fractal or scaling structure of geographic features", Annals of the Association of American Geographers, 104(3), 530–541.
- ↑ Jiang, Bin, Liu, Xintao and Jia, Tao (2013). "Scaling of geographic space as a universal rule for map generalization", Annals of the Association of American Geographers, 103(4), 844 – 855.
- ↑ Jiang, Bin (2013b). "The image of the city out of the underlying scaling of city artifacts or locations", Annals of the Association of American Geographers, 103(6), 1552-1566.
- ↑ Jiang, Bin and Sui, Daniel (2014). "A new kind of beauty out of the underlying scaling of geographic space", The Professional Geographer, 66(4), 676–686
- ↑ Jiang, Bin (2015). "Head/tail breaks for visualization of city structure and dynamics", Cities, 43, 69-77.
Further reading
- Lin, Yue (2013), A comparison study on natural and head/tail breaks involving digital elevation models. http://www.diva-portal.org/smash/get/diva2:658963/FULLTEXT02.pdf
- Ma, Ding (2014), A simple .Net implementation for head/tail breaks applying on single data array, https://github.com/digmaa/HeadTailBreaks
- Wu, Jou-Hsuan (2014), The mirror of the self test: http://sharon19891101.wix.com/mirror-of-the-self