Parallel coordinates

Parallel coordinates is a common way of visualizing high-dimensional geometry and analyzing multivariate data.

To show a set of points in an n-dimensional space, a backdrop is drawn consisting of n parallel lines, typically vertical and equally spaced. A point in n-dimensional space is represented as a polyline with vertices on the parallel axes; the position of the vertex on the ith axis corresponds to the ith coordinate of the point.

1 History
2 Higher dimensions
3 Statistical considerations
4 Generalized parallel coordinates
5 References
6 External links

History

Parallel coordinates were invented by Maurice d'Ocagne in 1885,^[1] and were independently re-discovered and popularised by Al Inselberg ^[2] in 1959 and systematically developed as a coordinate system starting from 1977. Some important applications are in Collision Avoidance Algorithms for Air Traffic Control (1987—3 USA patents), Data Mining (USA patent), Computer Vision (USA patent), Optimization, Process Control, more recently in Intrusion Detection and elsewhere (see discussion). It is worth mentioning that most of these applications of parallel coordinates and their success are due to the landmark paper entitled "Hyperdimensional Data Analysis Using Parallel Coordinates" (Wegman 1990). Generalized parallel coordinates system is proposed by Moustafa and Wegman (2002,2006), at which the Cartesian coordinates system is transformed into a parameter space (parallel coordinates) using basis functions. The relationships between generalized parallel coordinates and Andrews plots, as well as the Grand tour are explored by Moustafa and Wegman (2002,2006).

Higher dimensions

Adding more dimensions in the parallel coordinates (often abbreviated ||-coords or PCs) involves adding more axes. The value of parallel coordinates is that certain geometrical properties in high dimensions transform into easily seen 2D patterns. For example, a set of points on a line in n-space transforms to a set of polylines(or curves) in parallel coordinates all intersecting at n − 1 points. For n = 2 this yields a point <---> line duality pointing out why the mathematical foundations of parallel coordinates are developed in the Projective rather than Euclidean space. Also known are the patterns corresponding to (hyper)planes, curves, several smooth (hyper)surfaces, proximities, convexity and recently non-orientability.^[3] It is worth mentioning that since the process maps a k-dimensional data onto a lower 2D space, some loss of information is expected. The loss of information can be measured using the Parseval's identity (or energy norm).

Statistical considerations

When used for statistical data visualisation there are three important considerations: the order, the rotation, and the scaling of the axes.

The order of the axes is critical for finding features, and in typical data analysis many reorderings will need to be tried. Some authors have come up with ordering heuristics which may create illuminating orderings.^[4]

The rotation of the axes is a translation in the parallel coordinates and if the lines intersected outside the parallel axes it can be translated between them by rotations. The simplest example of this is rotating the axis by 180 degrees. More details can be found at.^[5]

The necessity of scaling stems from the fact that the plot is based on interpolation (linear combination) of consecutive pairs of variables.^[5] Therefore, the variables must be in common scale, and there are many scaling methods to be considered as part of data preparation process that can reveal more informative views.

Generalized parallel coordinates

The generalized parallel coordinate plot (GPCP) has been proposed^[6] as a generalization of parallel coordinates plots, based on parameter transformation. In this design, instead of plotting the raw data, it is transformed in some way first. If the interpolation function is piecewise Lagrange, this corresponds to the traditional PCP. If splines are used as the interpolation function, then the smooth parallel coordinate plot (SPCP) is achieved. In the smooth plot, every observation is mapped into a parametric line (or curve), which is smooth, continuous on the axes, and orthogonal to each parallel axis.^[5]

This SPCP design gives a clear quantization level of each data attribute, that can best describe its distribution in complex situations, even with large data sets. Finally, if one uses the Fourier interpolation of degree equals to the data dimensionality, then an Andrews plot^[7] is achieved. The GPCP design gives opportunities to researchers to explore alternative interpolation functions that best suited for particular application, and statistical dualities between the data space and GPC space that are important for visual pattern recognition using GPCP .^[8]

References

^ d'Ocagne, Maurice (1885). Coordonnées Parallèles et Axiales: Méthode de transformation géométrique et procédé nouveau de calcul graphique déduits de la considération des coordonnées parallèlles. Paris: Gauthier-Villars.
^ Alfred Inselberg (1985). "The Plane with Parallel Coordinates". Visual Computer 1 (4): pages 69–91. doi:10.1007/BF01898350.
^ A. Inselberg (2009). Parallel Coordinates: VISUAL Multidimensional Geometry and its Applications. Springer.
^ Interactive Hierarchical Dimension Ordering Spacing and Filtering for Exploration of High Dimensional Datasets (pages 3–4)
^ ^a ^b ^c . R. Moustafa, E. Wegman (2006). "Multivariate continuous data – Parallel Coordinates". In: Unwin, A., Theus M., Hofmann, H. (Eds.), Graphics of Large Datasets: Visualizing a Million, Springer: 143–156.
^ R. Moustafa, E. Wegman (2002). "On Some Generalization to Parallel Coordinate Plot". Seeing a million, A Data Visualization Workshop, Rain am Lech (nr.), Germany.
^ D. F. Andrews (1972). "Plots of High-Dimensional Data". International Biometric Society 18 (1): pages 125–136. JSTOR 2528964.
^ R. Moustafa (2009). "QGPCP: Quantized Generalized Parallel Coordinate Plots for Large Multivariate Data Visualization". Journal of Computational and Graphical Statistics 18 (1): pages 32–51. doi:10.1198/jcgs.2009.0003.

External links

Alfred Inselberg's Homepage, with Visual Tutorial, History, Selected Publications and Applications
Parallel Coordinates: Visual Multidimensional Geometry and Its Applications by Alfred Inselberg, Springer, 2009.
An Investigation of Methods for Visualising Highly Multivariate Datasets by C.Brunsdon, A.S.Fotheringham & M.E.Charlton, University of Newcastle, UK
Parallel coordinates plot in GGobi
Parallel coordinates plot in the public-domain software package XmdvTool
Using Curves to Enhance Parallel Coordinate Visualisations by Martin Graham & Jessie Kennedy, Napier University, Edinburgh, UK
On Some Generalization of Parallel Coordinate Plots by Rida E. Moustafa and Edward J. Wegman (2002), George Mason University, Fairfax, VA
picviz — the graphviz of parallel coordinates licensed under the GNU GPL v3 – implemented in C, with Python bindings used for the GUI.
Clustergram: A graph for visualizing cluster analyses based on the Parallel Coordinates of each observations cluster mean over the number of potential clusters (implemented in R).
XDAT – a free GPL JAVA-based software for plotting parallel coordinates.