Positive-definite kernel

In operator theory, a branch of mathematics, a positive definite kernel is a generalization of a positive-definite matrix.

Definition

Let

\{ H_n \}_{n \in {\mathbb Z}}

be a sequence of (complex) Hilbert spaces and

\mathcal{L}(H_i, H_j)

be the bounded operators from Hi to Hj.

A map A on {\mathbb Z} \times {\mathbb Z} where

A(i,j)\in\mathcal{L}(H_i, H_j)

is called a positive definite kernel if for all m > 0 and h_i \in H_i, the following non-negativity condition holds:

\sum_{-m \leq i\quad\, \atop j \leq m} \langle A(i,j) h_i, h_j \rangle \geq 0.

Examples

Positive definite kernels provide a framework that encompasses some basic Hilbert space constructions.

Reproducing kernel Hilbert space

The definition and characterization of positive kernels extend verbatim to the case where the integers Z is replaced by an arbitrary set X. One can then give a fairly general procedure for constructing Hilbert spaces that is itself of some interest.

Consider the set F0(X) of complex-valued functions f: XC with finite support. With the natural operations, F0(X) is called the free vector space generated by X. Let δx be the element in F0(X) defined by δx(y) = δxy. The set {δx}xX is a vector space basis of F0(X).

Suppose now K: X × XC is a positive definite kernel, then the Kolmogorov decomposition of K gives a Hilbert space

 ({\mathcal H}, \langle \cdot, \cdot \rangle)

where F0(X) is "dense" (after possibly taking quotients of the degenerate subspace). Also, ⟨[δx], [δy]⟩ = K(x,y), which is a special case of the square root factorization claim above. This Hilbert space is called the reproducing kernel Hilbert space with kernel K on the set X.

Notice that in this context, we have (from the definition above)

\{ H_n \}_{n \in {\mathbb Z}}

being replaced by

 \{ {\mathbb C} \}_{x \in X}.

Thus the Kolmogorov decomposition, which is unique up to isomorphism, starts with F0(X).

One can readily show that every Hilbert space is isomorphic to a reproducing kernel Hilbert space on a set whose cardinality is the Hilbert space dimension of H. Let {ex}x ∈ X be an orthonormal basis of H. Then the kernel K defined by K(x, y) = ⟨ex, ey⟩ = δxy reproduces a Hilbert space H. The bijection taking ex to δx extends to a unitary operator from H to H' .

Direct sum and tensor product

Let H(K, X) denote the Hilbert space corresponding to a positive kernel K on X × X. The structure of H(K, X) is encoded in K. One can thus describe, for example, the direct sum and the tensor product of two Hilbert spaces via their kernels.

Consider two Hilbert spaces H(K, X) and H(L, Y). The disjoint union of X and Y is the set

X \sqcup Y = \{ (x, \xi)| x \in X\} \cup \{ (\xi, y)| y \in Y \} .

Define a kernel

 K \oplus L

on this disjoint union in a way that is similar to direct sum of positive matrices, and the resulting Hilbert space

H( K \oplus L , X \sqcup Y)

is then the direct sum, in the sense of Hilbert spaces, of H(K, X) and H(L, Y).

For the tensor product, a suitable kernel

K \otimes L

is defined on the Cartesian product X × Y in a way that extends the Schur product of positive matrices:

(K \otimes L) ((x,y), (x', y')) = K(x, x') L(y, y').

This positive kernel gives the tensor product of H(K, X) and H(L, Y),

H (K \otimes L, X \times Y)

in which the family { [δ(x,y)] } is a total set, i.e. its linear span is dense.

Characterization

Motivation

Consider a positive matrix ACn × n, whose entries are complex numbers. Every such matrix A has a "square root factorization" in the following sense:

A = B*B where B: Cn HA for some (finite-dimensional) Hilbert space HA.

Furthermore, if C and G is another pair, C an operator and G a Hilbert space, for which the above is true, then there exists a unitary operator U: GHA such that B = UC.

The can be shown readily as follows. The matrix A induces a degenerate inner product <·, ·>A given by <x, y>A = <x, Ay>. Taking the quotient with respect to the degenerate subspace gives a Hilbert space HA, a typical element of which is an equivalence class we denote by [x].

Now let B: CnHA be the natural projection map, Bx = [x]. One can calculate directly that

\langle x, B^*By\rangle = \langle Bx, By\rangle_A = \langle[x], [y]\rangle_A = \langle x, Ay\rangle.

So B*B = A. If C and G is another such pair, it is clear that the operator U: GHA that takes [x]G in G to [x] in HA has the properties claimed above.

If {ei} is a given orthonormal basis of Cn, then {Bi = Bei} are the column vectors of B. The expression A = B*B can be rewritten as Ai, j = Bi*Bj. By construction, HA is the linear span of {Bi}.

Kolmogorov decomposition

This preceding discussion shows that every positive matrix A with complex entries can expressed as a Gramian matrix. A similar description can be obtained for general positive definite kernels, with an analogous argument. This is called the Kolmogorov decomposition:

Let A be a positive definite kernel. Then there exists a Hilbert space HA, and a map B defined on Z where B(n) lies in \mathcal{L}(H_n, H_A), such that
\quad A(i,j) = B^*(i)B(j) \quad \mbox{and} \quad H_A = \bigvee_{n \in {\mathbb Z}} B(n) H_n \;,

where ⋁ denotes disjoint union as defined above. The condition that HA = ⋁B(n)Hn is referred to as the minimality condition. Similar to the scalar case, this requirement implies unitary freedom in the decomposition:

If there is a Hilbert space G and a map C on Z that gives a Kolmogorov decomposition of A, then there is a unitary operator
U: G \rightarrow H_A \quad \mbox{such that} \quad UC(n) = B(n) \quad \mbox{for all} \quad n \in {\mathbb Z}.

Some applications

Embedding probability distributions in a RKHS

In machine learning, a class of algorithms based on the kernel embedding of distributions has been formulated to represent probability distributions as functions in a RKHS. This embedding thus allows manipulations of the distributions to be done via Hilbert space operations.

See also

References