Ordinal data

Ordinal data is a categorical, statistical data type where the variables have natural, ordered categories and the distances between the categories is not known.^[1]^:2 These data exist on an ordinal scale, one of four levels of measurement described by S. S. Stevens in 1946. The ordinal scale is distinguished from the nominal scale by having ordered categories. It also differs from interval and ratio scales by not having category widths that represent equal increments of the underlying attribute.^[2]

Examples of ordinal data

A well-known example of ordinal data is the Likert scale. An example of a Likert scale is:^[3]^:685

Like	Like Somewhat	Neutral	Dislike Somewhat	Dislike
1	2	3	4	5

Examples of ordinal data are often found in questionnaires: for example, the survey question "Is your general health poor, reasonable, good, or excellent?" may have those answers coded respectively as 1, 2, 3, and 4. Sometimes data on an interval scale or ratio scale are grouped onto an ordinal scale: for example, individuals whose income is known might be grouped into the income categories $0-$19,999, $20,000-$39,999, $40,000-$59,999, ..., which then might be coded as 1, 2, 3, 4, .... Other examples of ordinal data include socioeconomic status, military ranks, and letter grades for coursework.^[4]

Ways to analyze ordinal data

Ordinal data analysis requires a different set of analyses than other qualitative variables. These methods incorporate the natural ordering of the variables in order to avoid loss of power.^[1]^:88 Finding a mean or standard deviation for ordinal data is often discouraged, but other methods such as median or mode should instead be used.^[5]

General

Stevens (1946) argued that, because the assumption of equal distance between categories does not hold for ordinal data, the use of means and standard deviations for description of ordinal distributions and of inferential statistics based on means and standard deviations was not appropriate. Instead, positional measures like the median and percentiles, in addition to descriptive statistics appropriate for nominal data (number of cases, mode, contingency correlation), should be used.^[2]^:678 Nonparametric methods have been proposed as the most appropriate procedures for inferential statistics involving ordinal data, especially those developed for the analysis of ranked measurements.^[4]^:25–28 However, use of parametric statistics for ordinal data may be permissible with certain caveats to take advantage of the greater range of available statistical procedures.^[6]^[7]^[3]^:90

Univariate statistics

In place of means and standard deviations, univariate statistics appropriate for ordinal data include the median,^[8]^:59–61 other percentiles (such as quartiles and deciles),^[8]^:71 and the quartile deviation.^[8]^:77 One-sample tests for ordinal data include the Kolmogorov-Smirnov one-sample test,^[4]^:51–55 the one-sample runs test,^[4]^:58–64 and the change-point test.^[4]^:64–71

Bivariate statistics

In lieu of testing differences in means with t-tests, differences in distributions of ordinal data from two independent samples can be tested with Mann-Whitney,^[8]^:259–264 runs,^[8]^:253–259 Smirnov,^[8]^:266–269 and signed-ranks^[8]^:269–273 tests. Test for two related or matched samples include the sign test^[4]^:80–87 and the Wilcoxon signed ranks test.^[4]^:87–95 Analysis of variance with ranks^[8]^:367–369 and the Jonckheere test for ordered alternatives^[4]^:216–222 can be conducted with ordinal data in place of independent samples ANOVA. Tests for more than two related samples include the Friedman two-way analysis of variance by ranks^[4]^:174–183 and the Page test for ordered alternatives.^[4]^:184–188 Correlation measures appropriate for two ordinal-scaled variables include Kendall's tau,^[8]^:436–439 gamma,^[8]^:442–443 r_s,^[8]^:434–436 and d_yx/d_xy.^[8]^:443

Regression applications

Ordinal data can be considered as a quantitative variable. In logistic regression, the equation

logit[P(Y=1)]=\alpha +\beta _{1}c+\beta _{2}x

is the model and c takes on the assigned levels of the categorical scale.^[1]^:189 In regression analysis, outcomes (dependent variables) that are ordinal variables can be predicted using a variant of ordinal regression, such as ordered logit or ordered probit.

In multiple regression/correlation analysis, ordinal data can be accommodated using power polynomials and through normalization of scores and ranks.^[9]

Linear trends

Linear trends are also used to find associations between ordinal data and other categorical variables, normally in a contingency tables. A correlation r is found between the variables where r lies between -1 and 1. To test the trend, a test statistic:

M^{2}=(n-1)r^{2}

is used where n is the sample size.^[1]^:87

R can be found by letting $u_{1}\leq u_{2}\leq ...\leq u_{I}$ be the row scores and $v_{1}\leq v_{2}\leq ...\leq v_{I}$ be the column scores. Let ${\bar {u}}\ =\sum _{i}u_{i}p_{i+}$ be the mean of the row scores while ${\bar {v}}\ =\sum _{j}v_{j}p_{j+}.$ . Then $p_{i+}$ is the marginal row probability and $p_{+j}$ is the marginal column probability. R is calculated by:

r={\frac {\sum _{i,j}\left(u_{i}-{\bar {u}}\ \right)\left(v_{j}-{\bar {v}}\ \right)p_{ij}}{\sqrt {\left\lbrack \sum _{i}(u_{i}-{\bar {u}}\ \right)^{2}p_{i+}\rbrack \lbrack \sum _{i}(v_{j}-{\bar {v}}\ )^{2}p_{+j}\rbrack }}}

Classification methods

Classification methods have also been developed for ordinal data. The data are divided into different categories such that each observations are similar to each other. Dispersion is measured and minimized in each group to maximize classification results. The dispersion function is used in information theory.^[10]

Visualization and display

Ordinal data can be visualized in several different ways. Common visualizations are the bar chart or a pie chart. Tables can also be useful for displaying ordinal data and frequencies. Mosaic plots can be used to show the relationship between an ordinal variable and a nominal or ordinal variable.^[11] A bump chart—a line chart that shows the relative ranking of items from one time point to the next—is also appropriate for ordinal data.^[12]

Color or grayscale gradation can be used to represent the ordered nature of the data. A single-direction scale, such as income ranges, can be represented with a bar chart where increasing (or decreasing) saturation or lightness of a single color indicates higher (or lower) income. The ordinal distribution of a variable measured on a dual-direction scale, such as a Likert scale, could also be illustrated with color in a stacked bar chart. A neutral color (white or gray) might be used for the middle (zero or neutral) point with contrasting colors used in the opposing directions from the midpoint, where increasing saturation or darkness of the colors could indicate categories at increasing distance from the midpoint.^[13] Choropleth maps also use color or grayscale shading to display ordinal data.^[14]

Example bar plot of opinion on defense spending.

Example bump plot of opinion on defense spending by political party.

Example mosaic plot of opinion on defense spending by political party.

Example stacked bar plot of opinion on defense spending by political party.

Applications

The use of ordinal data can be found in most areas of research where categorical data are generated. Settings where ordinal data are often collected include the social and behavioral sciences and governmental and business settings where measurements are collected from persons by observation, testing, or questionnaires. Some common contexts for the collection of ordinal data include survey research;^[15]^[16] and intelligence, aptitude, and personality testing.^[3]^:89–90

References

1 2 3 4 Agresti, Alan (2013). Categorical Data Analysis (3 ed.). Hoboken, New Jersey: John Wiley & Sons. ISBN 978-0-470-46363-5.
1 2 Stevens, S. S. (1946). "On the Theory of Scales of Measurement". Science. New Series. 103 (2684): 677–680.
1 2 3 Cohen, Ronald Jay; Swerdik, Mark E.; Phillips, Suzanne M. (1996). Psychological Testing and Assessment: An Introduction to Tests and Measurement (3rd ed.). Mountain View, CA: Mayfield. p. 685. ISBN 1-55934-427-X.
1 2 3 4 5 6 7 8 9 10 Siegel, Sidney; Castellan, N. John, Jr. (1988). Nonparametric Statistics for the Behavioral Sciences (2nd ed.). Boston: McGraw-Hill. pp. 25–26. ISBN 0-07-057357-3.
↑ Jamieson, Susan (December 2004). "Likert scales: how to (ab)use them". Medical Education. 38 (12): 1212–1218. doi:10.1111/j.1365-2929.2004.02012.x. |access-date= requires |url= (help)
↑ Sarle, Warren S. (Sep 14, 1997). "Measurement theory: Frequently asked questions".
↑ van Belle, Gerald (2002). Statistical Rules of Thumb. New York: John Wiley & Sons. pp. 23–24. ISBN 0-471-40227-3.
1 2 3 4 5 6 7 8 9 10 11 12 Blalock, Hubert M., Jr. (1979). Social Statistics (Rev. 2nd ed.). New York: McGraw-Hill. ISBN 0-07-005752-4.
↑ Cohen, Jacob; Cohen, Patricia (1983). Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences (2nd ed.). Hillsdale, New Jersey: Lawrence Erlbaum Associates. p. 273. ISBN 0-89859-268-2.
↑ Laird, Nan M. (1979). "A Note on Classifying Ordinal-Scale Data". Sociological Methodology. 10: 303–310. doi:10.2307/270775.
↑ "Plotting Techniques".
↑ Berinato, Scott (2016). Good Charts: The HBR Guide to Making Smarter, More Persuasive Data Visualizations. Boston: Harvard Business Review Press. p. 228. ISBN 978-1633690707.
↑ Kirk, Andy (2016). Data Visualisation: A Handbook for Data Driven Design (1st ed.). London: SAGE. p. 269. ISBN 978-1473912144.
↑ Cairo, Alberto (2016). The Truthful Art: Data, Charts, and Maps for Communication (1st ed.). San Francisco: New Riders. p. 280. ISBN 978-0321934079.
↑ Alwin, Duane F. (2010). Marsden, Peter V.; Wright, James D., eds. Assessing the Reliability and Validity of Survey Measures. Handbook of Survey Research. Howard House, Wagon Lane, Bingley BD16 1WA, UK: Emerald House. p. 420. ISBN 978-1-84855-224-1.
↑ Fowler, Floyd J., Jr. (1995). Improving Survey Questions: Design and Evaluation. Thousand Oaks, CA: SAGE. pp. 156–165. ISBN 0-8039-4583-3.