User talk:Nvrmnd

From Wikipedia, the free encyclopedia

Your redirect of linear system was very clumsy, and I have reverted it. Contrary to what you wrote in the edit summary, there was plenty included in that page which is not included in system of linear equations.

Charles Matthews 07:26, 25 Nov 2004 (UTC)

Hello. I think you used too many capital letters in the title of Sum of Squared Error; I'm going to move the page. Generally, capital letters are not used gratuitously in Wikipedia article titles or in section headings. The first letter of an article title is case-insensitive in links (the later letters are case-sensitive) and always appears as a capital at the top of the article. Proper names and other things that are capitalized in the middle of a sentence are capitalized in article titles; common nouns generally are not. Michael Hardy 03:37, 26 Nov 2004 (UTC)

Yes of course, good idea. Nvrmnd 03:52, 26 Nov 2004 (UTC)

Regarding your redirect to least squares.
Least squares is an optimzation technique this minimizes SSE. Now it is perfectly valid for someone in the field of machine learning to say "Print the SSE your model gets on training data X". This sentence has nothing to do with the optimization technique (I may have minimized absolute error for that). I suppose that my argument is that SSE is a difference measure between two vectors and not really an optimzation technique. In stats perhaps they are the same thing but not in general. Nvrmnd 01:41, 27 Nov 2004 (UTC)

Several problems: (1) I don't think the name makes sense. "Sum of squares of errors" would make more sense, or "sum of squared errors" (although strictly speaking they are residuals rather than errors). (2) In what sense are they either errors or residuals if you're just talking about a difference between two arbitrary vectors? Michael Hardy 04:15, 27 Nov 2004 (UTC)

1) The name "sum of squared errors" is probably best, I dropped the 's' simply because I've most often seen it written without. And yes error in this sense is synonymous with residual however, the acronym SSE (which is most common) would no longer work 2) The article should, I suppose, explain this better but they are errors because one of the vectors is interpreted as the true value and we are looking for a measure of how far away from the correct value we are so that it may be minmized. Nvrmnd 05:21, 27 Nov 2004 (UTC)

[edit] Function approximation

Function approximation is a general class of problem solving where one tries to approximate an unknown function from a labeled data set (X, y).
....
Mathematically the problem can be posed as:
\min_{w} \|Xw - y\|^2.

I do not understand the above. What does "labeled" mean? Is the "data set" simply a finite collection of ordered pairs of numbers? The part following the words "the problem can be posed as" makes no sense at all. Could you make some attempt to explain it in the article? Michael Hardy 04:24, 27 Nov 2004 (UTC)

Yes of course I find it a bit confusing myself. Nvrmnd 05:06, 27 Nov 2004 (UTC)

OK, I can now hazard a guess. In the first place, the fact that it says "minw" means that it's about optimization, not just about differences between two vectors. In the second place, one may take y to be an n×1 column vector, X to be a n×p matrix, w to be a p×1 column vector, and | v |2 to be the sum of squares of the scalar components of the vector v, for any such v. Then it makes sense. Michael Hardy 23:16, 27 Nov 2004 (UTC)

I'm working on it atm, and I will explain that part a bit better. You are correct in that we are optimizing some distance metric between two vectors. Nvrmnd 23:19, 27 Nov 2004 (UTC)