User:Zmoboros

From Wikipedia, the free encyclopedia

1 Johnson-Lindenstrauss lemma
2 Fast monte-carlo algorithms for MM

[edit] Johnson-Lindenstrauss lemma

The Johnson-Lindenstrauss lemma asserts that a set of n points in any high dimensional Euclidean space can be mapped down into an $O(\frac{log n}{\epsilon^{2}} )$ dimensional Euclidean space such that the distance between any two points changes by only a factor of $(1 + ε)$ for any $0 < ε < 1$ .

[edit] Introduction

Johnson and Lindenstrauss {cite} proved a fundamental mathematical result: any $n$ point set in any Euclidean space can be embedded in $O (log n / ε 2)$ dimensions without distorting the distances between any pair of points by more than a factor of $(1 + ε)$ , for any $0 < ε < 1$ . The original proof of Johnson and Lindenstrauss was much simplified by Frankl and Maehara {cite}, using geometric insights and refined approximation techniques.

[edit] Proof

Suppose we have a set of $n$ $d$ -dimensional points $p 1, p 2,..., p n$ and we map them down to $k=C \frac{\log n}{\epsilon^2} \ll d$ dimensions, for appropriate constant $C > 1$ . Define $T (.)$ as the linear map, that is if $x \in R^d$ , then $T(x) \in R^k$ . For example $T$ could be a $k \times d$ matrix.

The general proof framework

All known proofs of the Johnson-Lindenstrauss lemma proceed according to the following scheme: For given $n$ and an appropriate $k$ , one defines a suitable probability distribution $F$ on the set of all linear maps $T : R d - > R k$ . Then one proves the following statement:

Statement: If any $T : R n - > R k$ is a random linear mapping drown from the distribution $F$ , then for every vector $x \in R^d$ we have

$Prob[(1-\epsilon)||x|| \leq ||T(x)|| \leq (1+\epsilon) ||x||] \geq 1 - \frac{1}{n^2}$

Having established this statement for the considered distribution $F$ , the JL result follows easily: We choose $T$ at random according to F. Then for every $i < j$ , using linearity of $T$ and the above Statement with $x = p i - p j$ , we get that $T$ fails to satisfy $(1-\epsilon) ||p_i - p_j|| \leq ||T(p_i) - T(p_j)|| \leq (1+\epsilon)||p_i - p_j||$ with probability at most $\frac{1}{n^2}$ . Consequently, the probability that any of the $(n 2)$ pairwise distances is distorted by $T$ by more than $1 + ε$ is at most $(n 2)/n^2 < \frac{1}{2}$ . Therefore, a random $T$ works with probability at least $\frac{1}{2}$ .

[edit] References

S. Dasgupta and A. Gupta, An elementary proof of the Johnson-Lindenstrauss lemma, Tech. Rep. TR-99-06, Intl. Comput. Sci. Inst., Berkeley, CA, 1999.
W. Johnson and J. Lindenstrauss. Extensions of Lipschitz maps into a Hilbert space. Contemporary Mathematics, 26:189--206, 1984.

[edit] Fast monte-carlo algorithms for MM

Given an $m \times n$ matrix $A$ and an $n\times p$ matrix $B$ , we present 2 simple and intuitive algorithms to compute an approximation P to the product $A B$ , with provable bounds for the norm of the "error matrix" $P- A\times B$ . Both algorithms run in $O (m p + m n + n p)$ time. In both algorithms, we randomly pick $s = O (1)$ columns of A to form an $m\times s$ matrix S and the corresponding rows of B to form an $s\times p$ matrix R. After scaling the columns of S and the rows of R, we multiply them together to obtain our approximation P . The choice of the probability distribution we use for picking the columns of A and the scaling are the crucial features which enable us to give fairly elementary proofs of the error bounds. Our rest algorithm can be implemented without storing the matrices A and B in Random Access Memory, provided we can make two passes through the matrices (stored in external memory). The second algorithm has a smaller bound on the 2-norm of the error matrix, but requires storage of A and B in RAM. We also present a fast algorithm that \describes" P as a sum of rank one matrices if B = A