Mixed logit

From Wikipedia, the free encyclopedia

This article may be too technical for most readers to understand, and needs attention from an expert on its subject. Please expand it to make it accessible to non-experts, without removing the technical details.

Mixed logit is a fully general, statistical model for approximating utility functions. The inspration for the mixed logit model came from the limitatations of the standard logit, and probit models. The standard logit model has three problem which mixed logit solves. "It [Mixed Logit] obviates the three limitations of standard logit by allowing for random taste variation, unrestriced substitution patterns, and correlation in unobserved factors over time."^[1] Mixed Logit can also take any distrbution unlike probit which is limited to the normal distribution.

1 Random taste variation
2 Unrestricted substitution patterns
3 Correlation in unobserved factors over time
4 Example
5 References
6 External links

[edit] Random taste variation

The standard logit model's "taste" cofficients, or betas, are fixed, which means the betas are the same for everyone. Mixed logit has different betas for each respondent or person.

The utility of person n for alternative i with the standard logit model is:

$U_{ni} = \beta x_{ni} + \varepsilon_{ni}$

with

$\varepsilon_{ni}$ ~ iid extreme value

The utility of person n for alternative i with the mixed logit model is:

$U_{ni} = \beta_n x_{ni} + \varepsilon_{ni}$

[[ ]] with

$\varepsilon_{ni}$ ~ iid extreme value

$\quad \beta_n \sim f(\beta_n | \theta)$

where θ is the distribution parameters over the population. It is also called random coefficient model since $β n$ is a random variable. It allows for the slope of the model to be random, an extension from the random effects model where only the intercept was stochastic.

The distribution of the probability density function of the parameters over the population can be modeled with a variety of distributions. This allows the programmer more flexibility then with probit, where the distribution is fixed.

[edit] Unrestricted substitution patterns

The mixed logit model does not have a restrictive substitution pattern because unlike logit it is not independent of irrelevant alternatives (IIA). "The percentage change in the probability for one alternative given a change in the mth attribute of another alternative is

$E_{nix_{nj}^m} = -\frac{1} {P_{ni}} \int \beta^m L_{ni}(\beta) L_{nj}(\beta) f(\beta) d \beta = - \int \beta^m L_{nj} (\beta) \frac{L_{ni} (\beta)} {P_{ni}} f(\beta) d \beta$

where β ^m is the mth element of $β$ ."^[2] It can be seen that because the probability of respondent n with respect to alternative i, P _ni , is not in the denominator of the integral that, "A ten-percent reduction for one alternative need not imply (as with logit) a ten-percent reduction in each other alternative."^[3] As you may notice the relative percentages depend on the likelihood that respondent n will choose alternative i, L _ni , versus the likelihood that respondent n will choose alternative j, L _nj over various draws of β. Beta depends of which probability density function the research thinks is appropriate for his/her data.

[edit] Correlation in unobserved factors over time

Standard logit does not take into account how utility changes over time. This is a problem if you are using panel data, which is essentially repeated choices over time. By applying a standard logit model to panal data you are making the assumption that whatever you are observing is new everytime you observe it. That is a very unlikely assumption. By taking into account both random taste variation, and correlation in unobserved factors over time the utility for respondent n for alternative i at time t is as follows,

$U_{nit} = \beta_{n} X_{nit} + \varepsilon_{nit}$

where the subscription t is the time dimension. We still make the logit assumption which is that $\varepsilon$ is i.i.d extreme value. That means that $\varepsilon$ is independent over time, people, and alternatives. $\varepsilon$ is essentially just white noise.

For a normal distribution the βs will have a standard deviation, s, and mean, b. Then the utility equation becomes:

$U_{nit} = (b + s\eta_{n}) X_{nit} + \varepsilon_{nit}$

and η is the draws taken from the probability density function. Then that equation becomes:

$U_{nit} = b X_{nit} + (s \eta_{n} X_{nit} + \varepsilon_{nit})$

U n i t = b X n i t + e n i

In the preceding equation the observed factors are separated from the unobserved factors. Of the unobserved factor the $\varepsilon$ is independent over time, and s η _n X _nit is not independent over time.

Then the covariance is,

C o v (e n i, e n j) = s 2 X n i X n j

Then by adding random coefficients to the explanatory variables, X's, one should be able to get some correlation over time out of the simulation.

[edit] Example

If you observe a series of a decision maker's choices about what coffee maker he/she buys every time they need a coffee maker, then the probability of that sequence of choices is simply the product of the logit probability of each individual purchase/choice. (Assuming the error term, $\varepsilon$ , is i.i.d extreme value.)

Which mathematically looks like the following,

$L_{n} (\beta_{n}) = \prod_{t} \frac{e^{\beta_{n}X_{nit}}} {\sum_{j} e^{\beta_{n}X_{njt}}}$

Then the probability is simply the integral of the product of the logits over the density of &beta.

$P_{ni} = \int L_{ni} (\beta) f(\beta | \theta) d\beta$

Unfortunately there is no explicit solution to this equation so the researcher must simulate P _ni. Fortunately for the research simulating P _ni can be very simple for certain distributions. It can also be very difficult for other distributions. There are four basic steps to follow

1. Take draws from the probability density function that you assigned to the 'taste' parameter.

2. Calculate L_n (β_n) (The conditional probability.) This is done for each alternative, and the highest utility is identified.

3. Repeat many times.

4. Average the results

Then the formula for the simulation look like the following,

$P_{ni} = \frac {\sum_{r} L_{ni}(\beta^r)} {R}$

where R is the total number of draws taken from the distribution, and r is one draw.

Once this is done you will have a value for the probability of each alternative i for each respondent n.

[edit] References

^ CB495-06Drv.tex
^ Train, Kenneth
^ Train, Kenneth

[edit] External links

This statistics-related article is a stub. You can help Wikipedia by expanding it.

Categories: Discrete Choice | Logit | Statistics stubs

Hidden categories: Wikipedia articles that are too technical | Pages needing expert attention

Mixed logit

From Wikipedia, the free encyclopedia

Contents

[edit] Random taste variation

[edit] Unrestricted substitution patterns

[edit] Correlation in unobserved factors over time

[edit] Example

[edit] References

[edit] External links

Views

Navigation

Interaction

Search