Behrens–Fisher distribution

From Wikipedia, the free encyclopedia

In statistics, the Behrens–Fisher distribution, named after Ronald Fisher and W. V. Behrens, is a parameterized family of probability distributions arising from the solution of the Behrens–Fisher problem proposed first by Behrens and several years later by Fisher. The Behrens–Fisher problem is that of statistical inference concerning the difference between the means of two normally distributed populations when the ratio of their variances is not known (and in particular, it is not known that their variances are equal).

Definition

The Behrens–Fisher distribution is the distribution of a random variable of the form

$T_{2}\cos \theta -T_{1}\sin \theta \,$

where T₁ and T₂ are independent random variables each with a Student's t-distribution, with respective degrees of freedom ν₁ = n₁ − 1 and ν₂ = n₂ − 1, and θ is a constant. Thus the family of Behrens–Fisher distributions is parametrized by ν₁, ν₂, and θ.

Derivation

Suppose it were known that the two population variances are equal, and samples of sizes n₁ and n₂ are taken from the two populations:

${\begin{aligned}X_{{1,1}},\ldots ,X_{{1,n_{1}}}&\sim \operatorname {i.i.d.}N(\mu _{1},\sigma ^{2}),\\[6pt]X_{{2,1}},\ldots ,X_{{2,n_{2}}}&\sim \operatorname {i.i.d.}N(\mu _{2},\sigma ^{2}).\end{aligned}}$

where "i.i.d" are independent and identically distributed random variables and N denotes the normal distribution. The two sample means are

${\begin{aligned}{\bar {X}}_{1}&=(X_{{1,1}}+\cdots +X_{{1,n_{1}}})/n_{1}\\[6pt]{\bar {X}}_{2}&=(X_{{2,1}}+\cdots +X_{{2,n_{2}}})/n_{2}\end{aligned}}$

The usual "pooled" unbiased estimate of the common variance σ² is then

$S_{{\mathrm {pooled}}}^{2}={\frac {\sum _{{k=1}}^{{n_{1}}}(X_{{1,k}}-{\bar X}_{1})^{2}+\sum _{{k=1}}^{{n_{2}}}(X_{{2,k}}-{\bar X}_{2})^{2}}{n_{1}+n_{2}-2}}={\frac {(n_{1}-1)S_{1}^{2}+(n_{2}-1)S_{2}^{2}}{n_{1}+n_{2}-2}}$

where S₁² and S₂² are the usual unbiased (Bessel-corrected) estimates of the two population variances.

Under these assumptions, the pivotal quantity

${\frac {(\mu _{2}-\mu _{1})-({\bar X}_{2}-{\bar X}_{1})}{\displaystyle {\sqrt {{\frac {S_{{\mathrm {pooled}}}^{2}}{n_{1}}}+{\frac {S_{{\mathrm {pooled}}}^{2}}{n_{2}}}}}}}$

has a t-distribution with n₁ + n₂ − 2 degrees of freedom. Accordingly, one can find a confidence interval for μ₂ − μ₁ whose endpoints are

${\bar {X}}_{2}-{\bar {X_{1}}}\pm A\cdot S_{{\mathrm {pooled}}}{\sqrt {{\frac {1}{n_{1}}}+{\frac {1}{n_{2}}}}},$

where A is an appropriate percentage point of the t-distribution.

However, in the Behrens–Fisher problem, the two population variances are not known to be equal, nor is their ratio known. Fisher considered^{[citation needed]} the pivotal quantity

${\frac {(\mu _{2}-\mu _{1})-({\bar X}_{2}-{\bar X}_{1})}{\displaystyle {\sqrt {{\frac {S_{1}^{2}}{n_{1}}}+{\frac {S_{2}^{2}}{n_{2}}}}}}}.$

This can be written as

$T_{2}\cos \theta -T_{1}\sin \theta ,\,$

where

$T_{i}={\frac {\mu _{i}-{\bar {X}}_{i}}{S_{i}/{\sqrt {n_{i}}}}}{\text{ for }}i=1,2\,$

are the usual one-sample t-statistics and

$\tan \theta ={\frac {S_{1}/{\sqrt {n_{1}}}}{S_{2}/{\sqrt {n_{2}}}}}$

and one takes θ to be in the first quadrant. The algebraic details are as follows:

${\begin{aligned}{\frac {(\mu _{2}-\mu _{1})-({\bar X}_{2}-{\bar X}_{1})}{\displaystyle {\sqrt {{\frac {S_{1}^{2}}{n_{1}}}+{\frac {S_{2}^{2}}{n_{2}}}}}}}&={\frac {\mu _{2}-{\bar {X}}_{2}}{\displaystyle {\sqrt {{\frac {S_{1}^{2}}{n_{1}}}+{\frac {S_{2}^{2}}{n_{2}}}}}}}-{\frac {\mu _{1}-{\bar {X}}_{1}}{\displaystyle {\sqrt {{\frac {S_{1}^{2}}{n_{1}}}+{\frac {S_{2}^{2}}{n_{2}}}}}}}\\[10pt]&=\underbrace {{\frac {\mu _{2}-{\bar {X}}_{2}}{S_{2}/{\sqrt {n_{2}}}}}}_{{{\text{This is }}T_{2}}}\cdot \underbrace {\left({\frac {S_{2}/{\sqrt {n_{2}}}}{\displaystyle {\sqrt {{\frac {S_{1}^{2}}{n_{1}}}+{\frac {S_{2}^{2}}{n_{2}}}}}}}\right)}_{{{\text{This is }}\cos \theta }}-\underbrace {{\frac {\mu _{1}-{\bar {X}}_{1}}{S_{1}/{\sqrt {n_{1}}}}}}_{{{\text{This is }}T_{1}}}\cdot \underbrace {\left({\frac {S_{1}/{\sqrt {n_{1}}}}{\displaystyle {\sqrt {{\frac {S_{1}^{2}}{n_{1}}}+{\frac {S_{2}^{2}}{n_{2}}}}}}}\right)}_{{{\text{This is }}\sin \theta }}.\qquad \qquad \qquad (1)\end{aligned}}$

The fact that the sum of the squares of the expressions in parentheses above is 1 implies that they are the cosine and sine of some angle.

The Behren–Fisher distribution is actually the conditional distribution of the quantity (1) above, given the values of the quantities labeled cos θ and sin θ. In effect, Fisher conditions on ancillary information.

Fisher then found the "fiducial interval" whose endpoints are

${\bar {X}}_{2}-{\bar {X}}_{1}\pm A{\sqrt {{\frac {S_{1}^{2}}{n_{1}}}+{\frac {S_{2}^{2}}{n_{2}}}}}$

where A is the appropriate percentage point of the Behrens–Fisher distribution. Fisher claimed^{[citation needed]} that the probability that μ₂ − μ₁ is in this interval, given the data (ultimately the Xs) is the probability that a Behrens–Fisher-distributed random variable is between −A and A.

Fiducial intervals versus confidence intervals

Bartlett^{[citation needed]} showed that this "fiducial interval" is not a confidence interval because it does not have a constant coverage rate. Fisher did not consider that a cogent objection to the use of the fiducial interval.^{[citation needed]}

Benini
Benktander 1st kind
Benktander 2nd kind
Beta prime
Burr
chi-squared
chi
Coxian
Dagum
Davis
EL
Erlang
exponential
F
folded normal
Flory-Schulz
Fréchet
Gamma
Gamma/Gompertz
generalized inverse Gaussian
Gompertz
half-logistic
half-normal
Hotelling's T-squared
hyper-Erlang
hyperexponential
hypoexponential
inverse chi-squared (scaled inverse chi-squared)
inverse Gaussian
inverse gamma
Kolmogorov
Lévy
log-Cauchy
log-Laplace
log-logistic
log-normal
Maxwell–Boltzmann
Maxwell–Jüttner
Mittag–Leffler
Nakagami
noncentral chi-squared
Pareto
phase-type
Poly-Weibull
Rayleigh
relativistic Breit–Wigner
Rice
Rosin–Rammler
shifted Gompertz
truncated normal
type-2 Gumbel
Weibull
Wilks' lambda

Continuous univariate supported on the whole real line (−∞, ∞)

Cauchy exponential power Fisher's z generalized normal generalized hyperbolic geometric stable Gumbel Holtsmark hyperbolic secant Johnson SU Landau Laplace Linnik logistic noncentral t normal (Gaussian) normal-inverse Gaussian skew normal slash stable Student's t type-1 Gumbel variance-gamma Voigt

Continuous univariate with support whose type varies

generalized extreme value generalized Pareto Tukey lambda q-Gaussian q-exponential shifted log-logistic

Mixed continuous-discrete univariate distributions

rectified Gaussian

Multivariate (joint)

Discrete Ewens multinomial Dirichlet-multinomial negative multinomial Continuous Dirichlet Generalized Dirichlet multivariate normal Multivariate stable multivariate Student normal-scaled inverse gamma normal-gamma Matrix-valued inverse matrix gamma inverse-Wishart matrix normal matrix t matrix gamma normal-inverse-Wishart normal-Wishart Wishart

Directional

Univariate (circular) directional Circular uniform univariate von Mises wrapped normal wrapped Cauchy wrapped exponential wrapped Lévy Bivariate (spherical) Kent Bivariate (toroidal) bivariate von Mises Multivariate von Mises–Fisher Bingham

Degenerate and singular

Degenerate discrete degenerate Dirac delta function Singular Cantor

Families

Circular compound Poisson elliptical exponential natural exponential location-scale maximum entropy mixture Pearson Tweedie wrapped

This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.

Behrens–Fisher distribution

Definition

Derivation

Fiducial intervals versus confidence intervals

Further reading