Talk:Covariance
From Wikipedia, the free encyclopedia
Contents |
[edit] Unclear line 3
I think the following is unclear, can someone clarify it? (Since it's not clear to me, I doubt I'm the right one to clarify it... (3) positive definite: Var(X) = Cov(X, X) ≥ 0, and Cov(X, X) = 0 -> X is a constant random variable (K). Pt314156 15:38, 27 April 2007 (UTC)pt314156
[edit] Miscellaneous
I question the phrase 'Random Variables' - surely if they were truly random variables, there could be no question of any meaningful correlation, and the whole concept would lose all utility - <email removed> 12-July-2005 00:59 Sydney Earth :)
- Please don't be a total crackpot. If you want tutoring or instruction on these concepts, ask someone. Don't use the word "would", as if you were talking about something hypothetical. Admit your ignorance. Simple example: Pick a married couple randomly from a large population. The husband's height is a random variable, the randomness coming from the fact that the couple was chosen randomly. The wife's height, for the same reason, is a random variable. The two are correlated, in that the conditional probability that the wife is more than six feet tall, given that the husband is more than six-and-a-half feet tall, is different from the unconditional probability that the that the wife is more than six feet tall. See the Wikipedia article on statistical independence of random variables; some pairs of random variables are independent; some are not. Another example: the square of the husband's age is a random variable; the husband's age is another random variable. And they are correlated. As for utility, just look at all of the practical applications of these concepts in statistical analysis. Michael Hardy 21:04, 11 July 2005 (UTC)
not sure about my edit what are ì and í ?
The converse however is not true ? Do you mean than Covariance could be 0 with dependant variables ?
- Yes. example: y=x² ( where -1<=x<=1 ). Cov=0. Dependant? Yes, it's a function!
It is not clear what 'E' means in the equation E(XY).
E(XY) = (1/N)*sum((X_i*Y_i), i=1..N), perhaps there should be the sum definition explicitly stated in this article as well as the expectation value article? --dwee
- That formula is correct only in the case in which the number of possible values of X and Y is some finite number N. More generally, the expectation could be an infinite series or an integral. At any rate, E(X) is the expected value of the random variable X. Michael Hardy 01:59, 7 Oct 2004 (UTC)
Just to note that the second equation is not rendered.
- It is on the browser I am using. Michael Hardy 20:26, 6 Jan 2004 (UTC)
the covarioance definition for vector valued random variables looks very sophisticated...
- That seems like a completely subjective statement. Michael Hardy 21:51, 3 Jan 2005 (UTC)
what is the actual reason for defining a thing like this ? Why not put the E(X_iY_j) entries into a table,
- That's exactly what it says is done! Michael Hardy 21:51, 3 Jan 2005 (UTC)
why a matrix is needed ?
- The only difference between a "table" and a matrix is that one has a definition of matrix multiplication. And one often has occasion to multiply these matrices. Michael Hardy 21:51, 3 Jan 2005 (UTC)
your "explanation" does not explain anything as any table can be treated as matrix and multiplied with other tables. This of course does not make much sense in general. So the actual question is : why the definition (and multiplication) of a matrix with entries E(X_iY_j) makes sense ?
- Your original question was completely unclear, to say the least. Maybe I'll write something on motivation of this definition at some point. Michael Hardy 20:23, 4 Jan 2005 (UTC)
-
-
- "For column-vector valued random variables X and Y with respective expected values μ and ν, and n and m scalar components respectively, the covariance is defined to be the n×m matrix"
-
-
- I don't get it either; how do you get a matrix as the cov(X,Y) when normally it is a scalar? Probably I am not understanding how you are defining X and Y. To me it sounds like the m or n components are just sample values of random variables X and Y? --Chinasaur 02:00, 1 Apr 2005 (UTC)
___
I'm sorry but this page makes UTTERLY NO SENSE, could someone please add in a paragraph explaining things for those of us that aren't super mathmaticians? 203.112.19.195 16:24, 25 July 2005 (UTC)
___
The covariance of two column vectors is stated to generate a matrix. Is there an similar function to covariance which generates a single scalar instead of a matrix by instead multiplying the transpose of the first term against the unaltered column vector? Is there a reference where we could find the derivations of these terms? Ben hall 15:09, 16 September 2005 (UTC)
- If X and Y are both n × 1 random vectors, so that cov(X,Y) is n × n, then the trace of the covariance may perhaps be what you're looking for. Michael Hardy 18:29, 16 September 2005 (UTC)
-
- Sorry, but I think I may have expressed myself poorly. I was thinking in terms of a list of variables which could be described as a vector, or as a matrix. For example if I have the cartesian coordinates of particles in a box over a period of time, I see how I can find the covariance matrix based on each of the components for each of the particles but I cannot see how I might find a covariance matrix based solely on the motions of the particles with respect to one another (ie if they are moving in the same direction or opposing directions). For this would it be suitable to take the inner product of the differences between the cartesian coordinates and their averages? Also, how could I show that this is a valid approach? Thanks for your suggestions so far. Ben hall 20:02, 17 September 2005 (UTC)
___
In probability theory and statistics, the covariance between two real-valued random variables X and Y, with expected values E(X) = μ and E(Y) = ν is defined as: -- it is unclear that we speak about n sets of variables {X} and {Y}. I suggest starting with In probability theory and statistics, the covariance between two real-valued random variables X and Y in the given sets {X} and {Y}, with expected values E(X) = μ and E(Y) = ν is defined as:. This is unclear too, but more explanatory why do you ever speak about E(X) and E(Y) without noting they are sets of variables, not just 'variables'. Also the speech about mean values as well as expected values in th same meaning would be preferrable. I would also suggest adding a link to http://mathworld.wolfram.com/Covariance.html.
Please comment on my comments :) I will start reorganizing the article if there will be no comments within a month. --GrAndrew 13:07, 21 April 2006 (UTC)
- Having a PhD in statistics somehow fails to enable me to understand what you're saying. What is this thing you're calling "n"?? Why do you speak of X and Y as being "in the given sets {X} and {Y}, and what in the world does that mean? If you change the sentence to say that, I will certainly revert. The first sentence looks fine to me, and your proposed change to it looks very bad in a number of ways, not only because it's completely cryptic. The covariance is between random variables, not between sets of random variables. And to refer to "n" without saying what it is would be stupid. Michael Hardy 18:20, 21 April 2006 (UTC)
I've just looked at that article on mathworld. It's quite confused. This article is clearer and more accurate. I'm still trying to guess what you mean by saying "they are sets of variables, not just 'variables'". In fact, they are just random variables, not sets of random variables. Michael Hardy 21:01, 21 April 2006 (UTC)
- I would guess that he is coming at the problem from a time series analysis point of view and thinking of covariance in terms of a statistic calculated using a time series of sampled data. To someone from that background it can seem confusing to think of covariance calculated for a single variable when, in practice, you calculate it from a set of data. This is not to say I think the article should be changed though perhaps if I found time to add something about calculating sample covariance on data it would clarify. --Richard Clegg 12:45, 24 April 2006 (UTC)
[edit] Positive feedback
Just want to say that as a high school student working in a lab, I found this article (especially the first section) to be exceptionally well written. Most higher-level math pages are overly pedantic and throw terminology around far to much, only occaisionally linking to equally poorly defined articles. As an outside observer, I think this article should be an example to other mathematical pages. Etiher that, or I've become far too familar with this type of subject matter! Thanks for making my work eaiser to understand!
Thanks to the above - nice to haev positive feedback! Johnbibby 19:38, 1 September 2006 (UTC)
[edit] Redundancy
After the latest edit, the article includes this:
- If X and Y are independent, then their covariance is zero. This follows because under independence,
- The converse, however, is not true: it is possible that X and Y are not independent, yet their covariance is zero. This is because although under statistical independence,
- the converse is not true.
The second sentence seems to be just a repetition of the first. Why is the addition of the second sentence, repeating the content of the first, an improvement? Michael Hardy
18:15, 21 August 2006 (UTC)
-
- Clarification: On second thought, what I meant was: the part that begins with "This is because..." and ends with "... is not true" seems to repeat what came before it. Michael Hardy 18:24, 21 August 2006 (UTC)
You're right! I introduced some (partial) redundancy - I'll go back to amend it.
{I'm new to Wikipedia & still haven't sorted this 'Talk' thing out yet - so please bear with me while I learn!} Johnbibby 16:59, 22 August 2006 (UTC)
[edit] algorithm to estimate
This article should include (an easily understandable) outline of an algorithm to estimate the covariance between two sets of (finite) N measurements of variables X and Y. A note should be included about maximum likelihood vs. unbiased estimators and how to convert between the two.
eg. start with two random variables X and Y, each with N measured values
X = { X_1, X_2, ... , X_N } = { X_n }, n = 1 ... N
Y = { Y_1, Y_2, ... , Y_N }
estimate their means
muX = sum(X_i)/N, i = 1 ... N
muY = sum(Y_i)/N, i = 1 ... N
'centre' the values about their estimated mean
centreX = { X_i - mu_X }, i = 1 ... N
centreY = { Y_i - mu_Y }, i = 1 ... N
then estimate the Covarance of X and Y
Cov(X, Y) = sum( centreX_i * centreY_i ) / (N - 1), i = 1 ... N (unbiased)
Cov(X, Y) = sum( centreX_i * centreY_i ) / N, i = 1 ... N (maximum likelihood)
I'm not entirely sure that I've got unbiased / maximum likelihood correct, but this paper seems to agree at the bottom of page 1 and top of page 2.
142.103.107.125 23:34, 30 August 2006 (UTC)
[edit] outer product?
Outer_product#Applications mentions that the outer product can be used for computing the covariance and auto-covariance matrices for two random variables. How this is accomplished should be outlined on this page, or that page... somewhere. 142.103.107.125 00:53, 31 August 2006 (UTC)
[edit] unbiased estimation
For variance, there is a biased estimation,and an unbiased estimation. Is there any unbiased estimation for covariance? Jackzhp 17:55, 2 February 2007 (UTC)
[edit] How about a beginners definition
Don't get me wrong, I think it's great that wikipedia has brainiacs that want to include all kinds of details. But how about a really good definition for math newbies? How about a metaphor or an example for folks that only want to understand it enough to complete a conversation and then go back to their relatively mathless life? —The preceding unsigned comment was added by Tghounsell (talk • contribs) 01:09, 20 March 2007 (UTC).
[edit] Myu vs v thing
Why are we using a little v thing for the mean of y? Shouldn't we use μy ?? Fresheneesz 07:28, 21 March 2007 (UTC)
ν is the Greek letter "nu", which comes after μ in the alphabet. 66.28.71.70 17:47, 29 March 2007 (UTC)
[edit] Inner product
The last part of the section on inner product is not clear. Perhaps someone could explain better why random variables is in quotes, for example.
- "It follows that covariance is an inner product over a vector space of "random variables", with a(X) = (aX) and X + Y = (X + Y). "Random variables" is in quotes because it is not true that X + K is distributed the same as X for any constant K; but as long as these three basic properties of covariance apply, the duals of theorems regarding inner products that depend only on those properties will be valid."
Of course X + K is not distributed the same as X; usually the mean will be different. That doesn't explain why "random variables" is in quotation marks. Maybe it should say it's because a constant K isn't really a random variable? Or just remove the quotation marks? Using quotation marks to indicate vagueness in a math article may not be a good idea; better to state it a different way, correctly. --Coppertwig 12:34, 17 June 2007 (UTC)
- OK, I think I fixed it. --Coppertwig 12:46, 17 June 2007 (UTC)