Data classification

From Wikipedia, the free encyclopedia

Contents

[edit] Data Classification

Data classification is the determining of class intervals and class boundaries in that data to be mapped and it depends in part on the number of observations. Most of the maps are designed with 4-6 classifications however with more observations you have to choose a large number of classes but too many classes are also not good, since it makes the map interpretation difficult. There are five classification methods for making a graduated color or graduated symbol map. All these methods reflect different patterns affecting the map display. 1. Natural breaks 2. Quantile 3. Equal interval 4. Standard deviation 5. Equal area

[edit] Natural Breaks Classification

It is a manual data classification method that divides data into classes based on the natural groups in the data distribution. It uses a statistical formula (Jenk’s optimization) that calculates groupings of data values based on data distribution, and also seeks to reduce variance within groups and maximize variance between groups. This method is based on subjective decision and it is best choice for combining similar values. Since the class ranges are specific to individual dataset, it is difficult to compare a map with another map and to choose the optimum number of classes especially if the data is evenly distributed.

[edit] Quantile Classification

Quantile classification method distributes a set of values into groups that contain an equal number of values. This method place the same number of data values in each class and will never have empty classes or classes with few or, too many values and is attractive in that this method always produces distinct map patterns.

[edit] Equal Interval Classification

Equal Interval Classification method divides a set of attribute values into groups that contain an equal range of values. This method better communicates with continuous set of data. The map designed by using equal interval classification is easy to accomplish and read . It however is not good for clustered data because you might get the map with many features in one or two classes and some classes with no features because of clusterd data.

[edit] Standard Deviation Classification

Standard deviation classification method finds the mean value, and then places class breaks above and below the mean at intervals of either 0.25, 0.5 or, one standard deviation until all the data values are contained within the classes. Values that are beyond the three standard deviations from the mean are aggregated into two classes; greater than three standard deviation above the mean and less than three standard deviation below the mean.

[edit] References

  • Campbell J., (2001). Map Use and Analysis. Boston; McGraw-Hill.
  • Diaz G, (2006). Using Free Demographic Datasets to Support Teaching and Researching Projects. Twenty-Sixth annual ESRI International User Conference, San Diego, CA.
  • ESRI, (1996). Reproduce from the “Using ArcView GIS guide for software version 3.1”, pp. 103-109.
  • Krygier J, Wood D, (2005). Making maps: a visual guide to map design for GIS. Guilford publications.
  • Slocum A.T., (1993). Comparison of Methods for Learning Choropleth Maps [1988-1990: United States], Inter-University Consortium for Political and Social Research.
  • Slocum A. T., (2005). Thematic Cartography and Geographic Visualization. Pearson/Prentice Hall, New Jersey.
  • Tennessee Electronic Atlas. Web GIS Tutorial Exercise 2 http://tnatlas.geog.utk.edu/tea/
  • Tyner, Judith, (1992). Introduction to Thematic Cartography. Englewood Cliffs, N.J.: Prentice Hall.
  • Wade T, Sommer S, (2006). A to Z GIS, An Illustrated Dictionary of Geographic Information Systems. ESRI online library.
  • Yang T. C, (2005). GIS Resource Document 05-72 (GIS_RD_05-72)