Hotelling's T-square distribution

From Wikipedia, the free encyclopedia

In statistics, Hotelling's T-square statistic,[1] named for Harold Hotelling, is a generalization of Student's t statistic that is used in multivariate hypothesis testing.

Hotelling's T-square statistic is defined as follows. Suppose

{\mathbf x}_1,\dots,{\mathbf x}_n

are p×1 column vectors whose entries are real numbers. Let

\overline{\mathbf x}=(\mathbf{x}_1+\cdots+\mathbf{x}_n)/n

be their mean. Let the p×p positive-definite matrix

{\mathbf W}=\sum_{i=1}^n (\mathbf{x}_i-\overline{\mathbf x})(\mathbf{x}_i-\overline{\mathbf x})'/(n-1)

be their "sample variance". (The transpose of any matrix M is denoted above by M′). Let μ be some known p×1 column vector (in applications a hypothesized value of a population mean). Then Hotelling's T-square statistic is

t^2=n(\overline{\mathbf x}-{\mathbf\mu})'{\mathbf W}^{-1}(\overline{\mathbf x}-{\mathbf\mu}).

Note that t2 is closely related to the squared Mahalanobis distance.

The reason that this is interesting is that if \mathbf{x}\sim N_p(\mu,{\mathbf V}) is a random variable with a multivariate normal distribution and {\mathbf W}\sim W_p({\mathbf V},n) has a Wishart distribution, and {\mathbf x} and {\mathbf W} are independent, then the probability distribution of t2 is T2(p,n), Hotelling's T-square distribution with parameters p and n.

The assumptions above are frequently met in practice: it can be shown [2] that if {\mathbf x}_1,\dots,{\mathbf x}_n\sim N_p(\mu,{\mathbf V}), are independent, and \overline{\mathbf x} and {\mathbf W} are as defined above then {\mathbf W} is independent of \overline{\mathbf x}, and

\overline{\mathbf x}\sim N_p(\mu,V/n)
\mathbf{W} \sim W_p(V,n-1).

If, moreover, both distributions are nonsingular, it can be shown[2] that

t^2 = n(\overline{\mathbf x}-{\mathbf\mu})'{\mathbf W}^{-1}(\overline{\mathbf x}-{\mathbf\mu}) \sim T^2(p, n-1)

and

\frac{m-p+1}{pm} t^2\sim F(p,m-p+1)

where F is the F-distribution.

[edit] Hotelling's two-sample T-square statistic

If {\mathbf x}_1,\dots,{\mathbf x}_{n_x}\sim N_p(\mu,{\mathbf V}) and {\mathbf y}_1,\dots,{\mathbf y}_{n_y}\sim N_p(\mu,{\mathbf V}), with the samples independently drawn from two independent multivariate normal distributions with the same mean and covariance, and we define

\overline{\mathbf x}=\frac{1}{n_x}\sum_{i=1}^{n_x} \mathbf{x}_i \qquad \overline{\mathbf y}=\frac{1}{n_y}\sum_{i=1}^{n_y} \mathbf{y}_i

as the sample means, and

{\mathbf W}= \frac{\sum_{i=1}^{n_x}(\mathbf{x}_i-\overline{\mathbf x})(\mathbf{x}_i-\overline{\mathbf x})' +\sum_{i=1}^{n_y}(\mathbf{y}_i-\overline{\mathbf y})(\mathbf{y}_i-\overline{\mathbf y})'}{n_x+n_y-2}

as the unbiased pooled covariance matrix estimate, then Hotelling's two-sample T-square statistic is

t^2 = \frac{n_x n_y}{n_x+n_y}(\overline{\mathbf x}-\overline{\mathbf y})'{\mathbf W}^{-1}(\overline{\mathbf x}-\overline{\mathbf y}) \sim T^2(p, n_x+n_y-2)

and it can be related to the F-distribution by

\frac{n_x+n_y-p-1}{(n_x+n_y-2)p}t^2 \sim F(p,n_x+n_y-1-p).[2]

[edit] See also

[edit] References

  1. ^ H. Hotelling (1931) The generalization of Student's ratio, Ann. Math. Statist., Vol. 2, pp360-378.
  2. ^ a b c K.V. Mardia, J.T. Kent, and J.M. Bibby (1979) Multivariate Analysis, Academic Press.
Image:Bvn-small.png Probability distributionsview  talk  edit ]
Univariate Multivariate
Discrete: BenfordBernoullibinomialBoltzmanncategoricalcompound PoissondegenerateGauss-Kuzmingeometrichypergeometriclogarithmicnegative binomialparabolic fractalPoissonRademacherSkellamuniformYule-SimonzetaZipfZipf-Mandelbrot Ewensmultinomialmultivariate Polya
Continuous: BetaBeta primeCauchychi-squareDirac delta functionErlangexponentialexponential powerFfadingFisher's zFisher-TippettGammageneralized extreme valuegeneralized hyperbolicgeneralized inverse GaussianHalf-LogisticHotelling's T-squarehyperbolic secanthyper-exponentialhypoexponentialinverse chi-squareinverse Gaussianinverse gammaKumaraswamyLandauLaplaceLévyLévy skew alpha-stablelogisticlog-normalMaxwell-BoltzmannMaxwell speednormal (Gaussian)normal inverse GaussianParetoPearsonpolarraised cosineRayleighrelativistic Breit-WignerRiceshifted GompertzStudent's ttriangulartype-1 Gumbeltype-2 GumbeluniformVariance-GammaVoigtvon MisesWeibullWigner semicircleWilks' lambda Dirichletinverse-WishartKentmatrix normalmultivariate normalmultivariate Studentvon Mises-FisherWigner quasiWishart
Miscellaneous: Cantorconditionalexponential familyinfinitely divisiblelocation-scale familymarginalmaximum entropyphase-typeposteriorpriorquasisamplingsingular
In other languages