Stress majorization

From Wikipedia, the free encyclopedia

Stress majorization is an optimization strategy used in multidimensional scaling (MDS) where, for a set of n, m-dimensional data items, a configuration X of n points in r(<<m)-dimensional space is sought that minimises the so called stress function $σ(X)$ . Usually r is 2 or 3, i.e. the $r\times n$ matrix X lists points in 2- or 3-dimensional Euclidean space so that the result may be visualised (i.e. an MDS plot). The function $σ$ is a loss or cost function that measures the squared differences between ideal ( $m$ -dimensional) distances and actual distances in r-dimensional space. It is defined as:

$\sigma(X)=\sum_{i<j\le n}w_{ij}(d_{ij}(X)-\delta_{ij})^2$

where $w_{ij}\ge 0$ is a weight for the measurement between a pair of points $(i, j)$ , $d i j (X)$ is the euclidean distance between $i$ and $j$ and $δ i j$ is the ideal distance between the points (their separation) in the $m$ -dimensional data space. Note that $w i j$ can be used to specify a degree of confidence in the similarity between points (e.g. 0 can be specified if there is no information for a particular pair).

A configuration $X$ which minimises $σ(X)$ gives a plot in which points that are close together correspond to points that are also close together in the original $m$ -dimensional data space.

There are many ways that $σ(X)$ could be minimised. For example, Kruskal^[1] recommended an iterative steepest descent approach. However, a significantly better (in terms of guarantees on, and rate of, convergence) method for minimising stress was introduced by Jan de Leeuw.^[2] De Leeuw's iterative majorization method at each step minimises a simple convex function which both bounds $σ$ from above and touches the surface of $σ$ at a point $Z$ , called the supporting point. In convex analysis such a function is called a majorizing function. This iterative majorization process is also referred to as the SMACOF algorithm ("Scaling by majorizing a complicated function").

[edit] The SMACOF algorithm

The stress function $σ$ can be expanded as follows:

$\sigma(X)=\sum_{i<j\le n}w_{ij}(d_{ij}(X)-\delta_{ij})^2 =\sum_{i<j}w_{ij}\delta_{ij}^2 + \sum_{i<j}w_{ij}d_{ij}^2(X)-2\sum_{i<j}w_{ij}\delta_{ij}d_{ij}(X)$

Note that the first term is a constant $C$ and the second term is quadratic in X (i.e. for the Hessian matrix V the second term is equivalent to tr $X' V X$ ) and therefore relatively easily solved. The third term is bounded by:

$\sum_{i<j}w_{ij}\delta_{ij}d_{ij}(X)=\,\operatorname{tr}\, X'B(X)X \ge \,\operatorname{tr}\, X'B(Z)Z$

where $B (Z)$ has:

$b_{ij}=-\frac{w_{ij}\delta_{ij}}{d_{ij}(Z)}$ for $d_{ij}(Z)\ne 0, i \ne j$

and $b i j = 0$ for $d_{ij}(Z)=0, i\ne j$

and $b_{ii}=-\sum_{j=1,j\ne i}^n b_{ij}$ .

Proof of this inequality is by the Cauchy-Schwartz inequality, see Borg^[3] (pp. 152--153).

Thus, we have a simple quadratic function $τ(X, Z)$ that majorizes stress:

$\sigma(X)=C+\,\operatorname{tr}\, X'VX - 2 \,\operatorname{tr}\, X'B(X)X\le C+\,\operatorname{tr}\, X' V X - 2 \,\operatorname{tr}\, X'B(Z)Z = \tau(X,Z)$

The iterative minimization procedure is then:

at the k^th step we set $Z\leftarrow X^{k-1}$
$X^k\leftarrow \min_X \tau(X,Z)$
stop if $σ(X k - 1) - σ(X k) < ε$ otherwise repeat.

This algorithm has been shown to decrease stress monotonically (see de Leeuw^[2]).

[edit] Use in graph drawing

Stress majorization and algorithms similar to SMACOF also have application in the field of graph drawing.^[4]^[5] That is, one can find a reasonably aesthetically-appealing layout for a network or graph by minimizing a stress function over the positions of the nodes in the graph. In this case, the $δ i j$ are usually set to the graph-theoretic distances between nodes i and j and the weights $w i j$ are taken to be $\delta_{ij}^{-\alpha}$ . Here, $α$ is chosen as a trade-off between preserving long- or short-range ideal distances. Good results have been shown for $α = 2$ .^[6]

[edit] References

^ Kruskal, J. B. (1964), “Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis”, Psychometrika 29: 1–27 .
^ ^a ^b de Leeuw, J. (1977), “Applications of convex analysis to multidimensional scaling”, in Barra; Brodeau, F. & Romie, G. et al., Recent developments in statistics, pp. 133-145 .
^ Borg, I. & Groenen, P. (1997), Modern Multidimensional Scaling: theory and applications, New York: Springer-Verlag .
^ Michailidis, G. & de Leeuw, J. (2001), “Data visualization through graph drawing”, Computation Stat. 16 (3): 435–450 .
^ Gansner, E.; Koren, Y. & North, S. (2004), “Graph Drawing by Stress Majorization”, Proceedings of 12th Int. Symp. Graph Drawing (GD'04), vol. 3383, Lecture Notes in Computer Science, Springer-Verlag, pp. 239–250 .
^ Cohen, J. (1997), “Drawing graphs to convey proximity: an incremental arrangement method”, ACM Transactions on Computer-Human Interaction 4 (3): 197–229 .