Talk:Mercer's theorem

From Wikipedia, the free encyclopedia

Two pages link to here: kernel trick and James Mercer. Yet, when I click on "what links here" I am told that no pages link to this one. Michael Hardy 22:22, 24 Aug 2003 (UTC)

Kernel trick is showing up for me -- perhaps the changes just take a while to propagate? -- Tlotoxl 22:33, 24 Aug 2003 (UTC)

I dispute the last 2 days edits to this article: a kernel is not just defined to map two 1-D intervals (a square) to a real. In general, the arguments of the kernel can elements in a measurable set. For example, see [1].

I don't mind using a 1-D interval as an example, but the definitions and theorems in this article should not be limited to 1-D intervals. Right now, the generalization to measurable sets is deferred to the very end, which is very misleading. In general, the kernel trick for the vast majority of cases people are interested in needs to operate on measurable sets (typically R^n) ---hike395 00:33, 13 May 2004 (UTC)



Thanks for the article. The machinery you are talking about deaals with an entirely different subject altogether; that of reproducing kernel Hilbert spaces; the theorems you need are

  • Facts about reproducing kernel Hilbert spaces
  • The spectral theorem for compact operators applied to Hilbert Schmidt operators.

Examples of reproducing kernel spaces are spaces of analytic functions, not the only ones

Mercer's theorem is a much older theorem which existed much before reproducing kernel spaces were introduced by Aronszaijn (sp?) and Bergman.

Yes I have seen othre work on learning involvin reproducing kernel Hilbert spaces. We should start an article on that if something doesn't already exist.CSTAR 01:46, 13 May 2004 (UTC)


Contents

[edit] Reproducing kernel Hilbert space

is now an article. I will add more to it subsequently.CSTAR 02:08, 13 May 2004 (UTC)

I don't have access to the original paper by Mercer. Are you saying that he proved the theorem only for 1-D intervals? If so, we need to fix the kernel trick article and not attribute it to Mercer's theorem. This would mean that the article is still somewhat misleading --- the definition of kernel has changed since Mercer's day, and we should use the true, up-to-date definition. -- hike395 02:28, 13 May 2004 (UTC)

I added a representation for an arbitrary continuous kernel on a compact space. As a matter of expository style, I don't recommend putting it befor the more special form, particularly since that form is the one used in represntations of Stochastic processes via Karhunen Loeve. Yes th ekernel trick stuff needs to be fixed. I'll read the article you referenced.CSTAR 02:33, 13 May 2004 (UTC)

I respectfully disagree. Stochastic processes also are not limited to have index sets that are intervals. In Kriging, for example, geostatisticians use Gaussian processes with index sets that are typically in R^2. Machine learning researchers use Gaussian processes have index sets of quite high dimension.
I'll attempt to make the edit, let's see if we can get agreement.

Yes, of course, you are right about stochastic processes on higher index sets, but the more general case of the kernel representation (the correct generalization which is stated further down) is pedagogically more involved needing a measure witt full support. I still insist that pedagogically this approach is clearer.CSTAR 04:12, 13 May 2004 (UTC)


In fact we can consider kernels on locally comapct spaces and on and on. The generalizations are limitless (we can consider even kernels on non-commutatiive spaces a la Alain Connes ... but these articles also have to be approachable by a novice. And as far as stochastic processes go, the classic I. Guikhman and A. Skorokhod (at least in the old SOviet version translated into French that I have) deals with 1-dimensional index sets.CSTAR 04:17, 13 May 2004 (UTC)



[edit] The Theorem you need?

Here's the proof: If K is a p.d. continuos kernel on a compact space, then there is a Hilbert space H and a measurable function phi:X -> H such that

K(x,y) = <phi(x) , phi(y)>

Proof: Let H = l^2 and let

phi(x) = sum_i sqrt{lambda_i} e_i(x)

By the trace property of the associated operator, this thing is well defined.

I think this is the right statement of the Kernel Trick (I don't think you can get by with L^2 kernels, because you don't get fast enough convergence.CSTAR 04:36, 13 May 2004 (UTC)~

[edit] Further clarification

In fact, the kernel trick mentioned above has really nothing to do with Mercer's theorem. It results from a general procedure for constructing Hilbert spaces out of positive definite functions on arbitrary sets: If K is positive definite function, there is a Hilbert space H and a function phi : X --> H

K(x,y) = <phi(x) , phi(y)>

This is an old trick. Define a (possibly degenerate) inner product on functions of finite support on X using the kernel K as foolws

< a, b > = sum_x,y K(x,y) a(x) b(y)

Divide out by vectors of norm zero ancd complete. This is the oldest trick in the book of Hilbert space constructions. So you can forget about continuity., Mercer's theorem and everything else. CSTAR 05:23, 13 May 2004 (UTC)

[edit] More precise reference

A more precise reference of where to find the proofs of the theorems in the article would be nice.

130.238.5.7 18:47, 26 January 2006 (UTC)

The last two theorems (generalizations of Mercer's theorem) are very interesting but I haven't managed to find them in any of the books cited, does anyone have a reference?

Added a reference to the journal where the theorm was first published and cited the year which I added the other day and someone added the tag that it needed to be cited hope its all following the templates and guidelines I have tried to make sure everything is as it should be - sorry if I haven't done it the proper wiki way.

158.143.77.11 11:41, 7 December 2006 (UTC) Thanks for all the work on this topic. I haven't been able to find the last generalization of Mercer's theorem in any of the books cited, in particular, a proof that: if (X,M,μ) is a σ-finite measure space, and K a symmetric nonnegative definite kernel on (X,M,μ), then K has the given spectral expansion. This is a very nice extension of the unnecessarily restrictive condition on K which is usually given in textbooks, namely that K:[a,b]x[a,b]->R. But: how to prove this, or where to find a proof in the literature? Thanks.

This follows from the structure theorem for Hilbert-Schmidt operators. Note that it is required that K be a square-integrable kernel. I'm sure this is wel-known, but I'll have to dig up a reference.--CSTAR 14:02, 7 December 2006 (UTC)
Thanks for the quick reply, the reference would be very useful. 158.143.77.11 14:42, 7 December 2006 (UTC)