Truncated distribution

From Wikipedia, the free encyclopedia

A truncated distribution is a conditional distribution that is derived from some other probability distribution. Suppose we have a random variable, $X$ that is distributed according to some probability density function, $f (x)$ , with cumulative distribution function $F (x)$ both of which have infinite support. Suppose we wish to know the probability density of the random variable after restricting the support to be between two constants so that the support, $y = (a, b]$ . That is to say, suppose we wish to know how $X$ is distibuted given $a < X \leq b$ .

$f(X|a < X \leq b) = \frac{g(x)}{F(b)-F(a)} = Tr(x)$

where $g (x) = f (x)$ for all $a <x \leq b$ and $g (x) = 0$ everywhere else. Notice that $T r (x)$ has the same support as $g (x)$ .

There is, unfortunately, an ambiguity about the term Truncated Distribution. When one refers to a truncated distribution they could be refering to $g (x)$ where one has removed the parts from the distribution $f (x)$ but not scaled up the distribution, or they could be refering to the $T r (x)$ . g(x) is not a generally a probability density function since it does not integrate to one, where as $T r (x)$ is a probability density function. In this article, a truncated distribution refers to $T r (x)$

Notice that in fact $f(X|a < X \leq b)$ is a distribution:

$\int_{a}^{b} f(X|a < X \leq b))dx = \frac{1}{F(b)-F(a)} \int_{a}^{b} g(x) dx = 1$ .

Truncated distributions need not have parts removed from the top and bottom. A truncated distribution where the just bottom of the distribution has been removed is as follows:

$f(X|X>y) = \frac{g(x)}{1-F(y)}$

where $g (x) = f (x)$ for all $y < x$ and $g (x) = 0$ everywhere else, and $F (x)$ is the cumulative distribution function.

A truncated distribution where the top of the distribution has been removed is as follows:

$f(X|X \leq y) = \frac{g(x)}{F(y)}$

where $g (x) = f (x)$ for all $x \leq y$ and $g (x) = 0$ everywhere else, and $F (x)$ is the cumulative distribution function.

[edit] Expectation of Truncated Random Variable

Suppose we wish to find the expected value of a random variable distributed according to the density $f (x)$ and a cumulative distribution of $F (x)$ given that the random variable, $X$ , is greater than some known value $y$ . The expectation of a truncated random variable is thus:

$E(X|X>y) = \frac{\int_y^\infty x g(x) dx}{1 - F(y)}$

where again $g (x)$ is again $g (x) = f (x)$ for all $y < x$ and $g (x) = 0$ everywhere else.

Letting $a$ and $b$ be the lower and upper limits respectively of support for $f (x)$ (i.e. the original density) properties of $E (u (X) | X > y)$ where $u (X)$ is some continuous function of $X$ with a continuous derivative and where $f (x)$ is assumed continuous include:

(i) $\lim_{y \to a} E(u(X)|X>y) = E(u(X))$

(ii) $\lim_{y \to b} E(u(X)|X>y) = u(b)$

(iii) $\frac{\partial}{\partial y}[E(u(X)|X>y)] = \frac{f(y)}{1-F(y)}[E(u(X)|X>y) - u(y)]$

(iv) $\lim_{y \to a}\frac{\partial}{\partial y}[E(u(X)|X>y)] = f(a)[E(u(X)) - u(a)]$

(v) $\lim_{y \to b}\frac{\partial}{\partial y}[E(u(X)|X>y)] = \frac{1}{2}u'(b)$

Provided that the limits exist, that is: $\lim_{y \to c} u'(y) = u'(c)$ , $\lim_{y \to c} u(y) = u(c)$ and $\lim_{y \to c} f(y) = f(c)$ where $c$ represents either $a$ or $b$ .

The Tobit model employs truncated distributions.

[edit] Random Truncation

Suppose we have the following set up: a truncation value, $t$ , is selected at random from a density, $g (t)$ , but this value is not observed. Then a value, $x$ , is selected at random from the truncated distribution, $f (x | t) = T r (x)$ . Suppose we observe $x$ and wish to update our belief about the density of $t$ given the observation.

First, by definition:

$f(x)=\int_{x}^{\infty} f(x|t)g(t)dt$ , and $F(a)=\int_{-\infty}^a[\int_{x}^{\infty} f(x|t)g(t)dt]dx$

Notice that $t$ must be greater than $x$ , hence when we integrate over $t$ , we set a lower bound of $x$ . $f (x)$ and $F (x)$ are the unconditional density and unconditional cumulative distribution function, respectively.

By Bayes Rule:

$g(t|x)= \frac{f(x|t)g(t)}{f(x)}$

which expands to:

$g(t|x) = \frac{f(x|t)g(t)}{\int_{x}^{\infty} f(x|t)g(t)dt}$

[edit] Example, Two Uniform Distributions

Suppose we know that t is uniformly distributed from [0,T] and x|t is distributed uniformly from [0,t]. Let g(t) and f(x|t) be the densities that describe t and x respectively. Suppose we observe a value of x and wish to know the distribution of t given that value of x.

$g(t|x) =\frac{f(x|t)g(t)}{f(x)} = \frac{1}{t(ln(T) - ln(x))}$ $\forall t > x$