Talk:Multivariate normal distribution

From Wikipedia, the free encyclopedia

== Question on Bivariate Normal

If X is normally distributed, and Y is normally distributed. If z = X * Y, is z bivariate normally distributed?

Thanks

I moved the following condition from the main page:

φX(z)=exp(iμTzuTΓu).

Is u the same as μ here? AxelBoldt

The vector u is not the expected vector. The characteristic functional of X is the expectation value of exp[i(u1X1+...+unXn)]. I write φX(u)=E[eiu X]. The expected vector is the gradient of φ at u=0. I made a mistake: the correct characterization is

φX(u)=exp(iμTuuTΓu).

This characterization is necessary as a technical step in the proof of equivalence. As far as I know, it is the only way to show that, it every linear combination of the Xi is Gaussian, then the Xi are jointly Gaussian. -- Miguel

ok, I'll put it back in. AxelBoldt

The motivation of this page is that it is a prerequisite to defining a "Gaussian stochastic process". The best way to do this is to say that every linear functional the random function is a gaussian random variable. -- Miguel


The "characterization"

is actually not correct because it implies that the various Xi are uncorrelated.

yup, you're right; I'll mention that in the article. AxelBoldt

Also, it seems that Gaussian is capitalized because it is the name of a person. -- Miguel

There is also a program used in computational chemistry whose name is Gaussian (with the G) -- people used to this program may expect lower-case g for other uses of "gaussian". And according to the ACS Style Guide, the trend is toward lower-casing surnames that are used as units. But then again, this is math, not chemistry... -- Marj Tiefert, Monday, May 6, 2002
Oh, didn't know that "Gaussian" is based on a surname -- that't probably why I was not able to find many examples of the term lowercased on Google. A redirect with the term in lowercase already points here, so I think that is enough. --maveric149
I agree that units should be lower-case. For example, the "gauss" is a unit of magnetic field strength, in honour of Gauss' work on magnetism.
Now that we're at it and maveric is in this discussion, maybe we should agree on the best way to disambiguate gauss (unit) from Gauss (prince of mathematicians), and gaussian (computer program) from Gaussian (random variable) -- Miguel

Hum... Interesting quandary. First lets start with the personality since that is the easiest. It really isn't necessary to list Gauss the man on a disambiguation page located at Gauss since nobody with half a brain would simply link to Gauss and expect that link to go directly to an article about Carl Friedrich Gauss (which is named correctly BTW). The same would be true about Smith and Adam Smith -- it is a misuse of disambiguation pages to list people who had X for a last name unless they were primarily only known by their last name and other things are also known by that name (wikipedia is not a name directory). A good example of this would be Seneca which is both the name of a first century philosopher and the name of a Native American tribe (some disambiguation is needed at Seneca I see...).

However, Gauss is already redirected to Carl Friedrich Gauss, which is not surprising since mathematicians (and, more generally, scientists) are almost universally known by a single surname. This leads to confusion: there were five Bernouillis, two Banachs, two Pearsons... To confuse matters more, there is not only "gauss (unit)", which is a unit of magnetic field intensity, but also "Gauss units", which is a particular choice of normalization of the Maxwell equations and the elementary charge (there are also "Heaviside units") -- Miguel

As for the "gaussian" issue: I wasn't sure about this one since I didn't know there was a computer program with the same name, so I did a little Googling. Found out that <guassian> got 2/3rds of a million hits and <gaussian "computer program"> got less than 1% that number of hits.

Searching for "Gaussian computational chemistry" gives 16200 hits  :-) (Marj)

This tells me that Gaussian the computer program is far less widely known of than gaussian the variable -- thus confirming my first reaction. Since one useage is far more widely known and expected than the other we should have an article titled gaussian that is only about the mathematics term. A link to either Gaussian computer program or Gaussian (computer program) can then be placed at the bottom of that page (in the same way as Paris, Texas is linked at the bottom of the Paris entry -- which is about Paris, France BTW). This is what I like to call 'weak disambiguation'.

Not sure what the name of the article about the computer program should be... Would it sound odd to use "Gaussian computer program" in a sentence talking about Gaussian the computer program? Or is this computer program almost always referred to simply as "Gaussian"?

Gaussian is produced by Gaussian, Inc (http://www.gaussian.com/) who refer to it as simply "Gaussian". Their website looks like they have more of an academic than a big-corporation mindset, however - like, I didn't notice whether they'd trademarked this use of "Gaussian" (if they in fact could have). Among computational chemists, I've always heard it referred to as "Gaussian", but there wasn't any ambiguity, since they were talking about computational chemistry. Probably the program makes use of the mathematical species of "Gaussian", or "gaussian".  ;-) -- Marj Tiefert, Wednesday, May 8, 2002
You can see "the mathematical species of Gaussian" in gaussian.com's logo. -- Miguel

This is important since a major part of our naming conventions deals with easy-linking and whenever a disambiguation issue like this arises, we first really should look for alternatives that are also widely used yet less ambigous. Who wants to have to write [[Guassian (computer program)|Gaussian]] each time they link to that article? However, if the use of the term "Gaussian computer program" makes for contrived and odd sounding sentences then we might just as well place that article at [[Guassian (computer program)]] so as not to needlessly imply that "computer program" is part of its name.. The use of parentheses in disambiguation is is what I like to call 'strong disambiguation' and is something to be used only as a last resort. Hope this helps.

BTW, I'm still not sure about a general rule for capitalizing units that are derived from surnames... As it is, I am beginning to lean in favor of making them lowercase. However, we might want to explore whether there might be any exceptions where a capitalized term would be used. For examle the unit newton is commonly expressed in lowercase form, but then Celsius is usually shown with a capital 'C' (along with the other two common temperature scales).... Any other thoughts?--maveric149



Miguel, I don't think the first and the second condition given in the article are equivalent. Take for instance X=(X1,X2) where X1 is standard normal and X2 is uniform on [0,1]. Then the first condition is not satisfied, but the second is, using the matrix A = (1 0). I claim A needs to be square (and will then automatically be invertible.) --AxelBoldt

You're right. Thanks for pointing that out. I reversed the relation between X and Z. The result, with a rectangular A, is correct. The reason the original Z=A(X-μ) doesn't work is that the covariance matrix of Z doesn't have the right rank. If Z=A X and the covariance matrix of X is Γ, then the covariance matrix of Z must be AΓAT. But the rank of this is at most the rank of Γ and we are requiring the components of Z to be independent N[0,1]. That's why Z needs to have a smaller dimension. But, as you point out, this doesn't work either.
As far as the current statement goes, the number of components of Z could be arbitrarily large, but not smaller than the rank of Γ. Miguel

We still have serious problems with the definition here. First, do we consider a variable that's constant 0 to be normally distributed? If not, then the first two statements are not equivalent. Also, in the third statement, should we go to a positive semidefinite Γ? AxelBoldt 06:13 Jan 24, 2003 (UTC)


We definitely need to consider a constant (not only 0) to be normally distributed (with variance 0, of course), and we need to eliminate the words "unless all ai are 0". The reason is that we need to allow singular variance matrices, and once that happens we have some nonzero linear combinations of non-degenerate normals adding up to to a constant. Example: the residuals (which are not independent, and must not be confused with the errors, which are independent) from the simplest sort of ordinary linear regression are constrained to lie within a space of codimension 2. That vector of residuals has a singular variance matrix. The distribution of its sum of squares is chi-square with n-2 degrees of freedom. The whole discussion leading to that conclusion would be horribly complicated if we're forbidden to speak of normal distributions whose variance is a singular matrix. Michael Hardy 17:19 Jan 24, 2003 (UTC)


Shouldn't

  • there is a vector μ=(μ1,...,μn) and a symmetric, positive semidefinite matrix Γ such that X has density
fX(x1,...,xn)dx1...dxn = (det(2πΓ))-1/2 exp ½((X-μ)TΓ-1(X-μ)) dx1...dxn

be

  • there is a vector μ=(μ1,...,μn) and a symmetric, positive definite matrix Γ such that X has density
fX(x1,...,xn)dx1...dxn = (det(2πΓ))-n/2 exp ½((X-μ)TΓ-1(X-μ)) dx1...dxn

(semidefinite -> definite, 1 -> n) or should I stick to things I know something about? — user:192.38.66.188

positive semidefinite means that we are allowing zero variance (i.e., a random variable that always takes the same value). See the discussion just above your question.
The determinant of Γ takes into account the variances and covariances of all variables, and so it need not be raised to the nth power.
Last but not least, if you know enough to ask these questions, you actually "know something about" this ;-) — Miguel 17:44, 2004 Feb 24 (UTC)
I agree with the non-logged-in user's criticism. Multivariate normal distributions exist in which the variance is a positive semi-definite matrix of determinant zero. In a coordinate system in which the components are independent, one or more components has variance zero. But: such a distribution has no density with respect to the usual n-dimensional Lebesgue measure; no density function should be attributed to such distributions unless it is with respect to a measure on a space of lower dimension. Michael Hardy 21:07, 24 Feb 2004 (UTC)
You're completely right, as usual :-) Miguel 21:24, 2004 Feb 24 (UTC)

[edit] proposed rearranged first section

I propose the following rearrangement and partial rewrite of the intro section of this article. The main motivation is that the general case can be understood at an informal level without the need to be familiar with characteristic functions. Comments, please. --Zero 12:37, 15 Sep 2004 (UTC)

Since the definition you single out applies only to non-degenerate multivariate normals, you need to mention degeneracy explicitly in the following paragraph.
I would call X a "random vector", not a "random variable".
Make the paragraphs after "A formal definition" the first section of the body of the article, called "Formal definition".
IMHO, the most intuituvely compelling and informally understandable definition is the one that says every linear combination of the coordinates is normally distributed.
Miguel 19:59, 2004 Sep 15 (UTC)


In probability theory and statistics, a multivariate normal distribution, also sometimes called a multivariate Gaussian distribution in honor of Carl Friedrich Gauss, is a generalization of the normal distribution to several dimensions.

In the case of a random variable X with a non-degenerate multivariate normal distribution, there is a vector μ and a symmetric, positive definite matrix Σ such that X has density

f_X(x_1,\ldots,x_n)\, dx_1\ldots dx_n= \frac{1}{(2\pi)^{n/2}|\Sigma|^{1/2}} \exp\left(-\frac{1}{2}({\mathbf x}-{\mathbf\mu})^T{\mathbf\Sigma}^{-1}({\mathbf x}-{\mathbf\mu}) \right)dx_1\ldots dx_n,

where \left|A\right| is the determinant of A. Note how the equation above reduces to that of the univariate normal distribution if Σ is a 1\times 1 matrix (ie a real number).

More generally, a multivariate normal distribution in n dimensions consists of a non-degenerate multivariate normal distribution sitting inside some m-dimenional affine subspace (a linear subspace possibly shifted from the origin) for some m\le n. For example, if Z is a 1-dimensional normal distribution, then the vector (Z,Z) whose components are equal has a multivariate normal distribution which sits inside the subspace {(x,y) | x = y}.

A formal definition is that an n-dimensional random variable X= X1, ... , Xn has a multivariate normal distribution, if it satisfies the following equivalent conditions:

  • there is a random vector Z=(Z1, ..., Zm), whose components are independent standard normal random variables, a vector μ = (μ1, ..., μn) and an n×m matrix A such that X = A Z + μ.
  • there is a vector μ and a symmetric, positive semi-definite matrix Γ such that the characteristic function of X is
φX(u)=exp(iμTu − (½) uT Γ u).

The vector μ in these conditions is the expected value of X and the matrix {\mathbf\Sigma}={\mathbf A}{\mathbf A}^T is the covariance matrix of the components Xi.

Note that the Xi are in general not independent; they can be seen as the result of applying the linear transformation A to a collection of independent Gaussian variables Z.



What is the N at the end in the kullback-leibler distance. It would make sense to add what the value N signifies in the formulae. And where does the formuale come form, any references too would help

The N is the dimension of both Multivariate normal distribution, as defined above. But I will make it more clear. Unfortunately, I dont found any reference yet, whether the formular is correct.

[edit] A counterexample

Would it not make the article clearer to merge "A counterexample" and "correlation and independence" into one section? — ciphergoth 07:09, 2005 Apr 29 (UTC)