Mixed logit

From Wikipedia, the free encyclopedia

Mixed logit is a fully general, statistical model for approximating utility functions. The inspration for the mixed logit model came from the limitatations of the standard logit, and probit models. The standard logit model has three problem which mixed logit solves. "It [Mixed Logit] obviates the three limitations of standard logit by allowing for random taste variation, unrestriced substitution patterns, and correlation in unobserved factors over time."[1] Mixed Logit can also take any distrbution unlike probit which is limited to the normal distribution.

Contents

[edit] Random taste variation

The standard logit model's "taste" cofficients, or betas, are fixed, which means the betas are the same for everyone. Mixed logit has different betas for each respondent or person.

The utility of person n for alternative i with the standard logit model is:


 U_{ni} = \beta x_{ni} + \varepsilon_{ni}


with


 \varepsilon_{ni} ~ iid extreme value


The utility of person n for alternative i with the mixed logit model is:


 U_{ni} = \beta_n x_{ni} + \varepsilon_{ni}

[[ ]] with


 \varepsilon_{ni} ~ iid extreme value


 \quad \beta_n \sim f(\beta_n | \theta)


where θ is the distribution parameters over the population. It is also called random coefficient model since βn is a random variable. It allows for the slope of the model to be random, an extension from the random effects model where only the intercept was stochastic.

The distribution of the probability density function of the parameters over the population can be modeled with a variety of distributions. This allows the programmer more flexibility then with probit, where the distribution is fixed.

[edit] Unrestricted substitution patterns

The mixed logit model does not have a restrictive substitution pattern because unlike logit it is not independent of irrelevant alternatives (IIA). "The percentage change in the probability for one alternative given a change in the mth attribute of another alternative is


 E_{nix_{nj}^m} = -\frac{1} {P_{ni}} \int \beta^m L_{ni}(\beta) L_{nj}(\beta) f(\beta) d \beta = - \int \beta^m L_{nj} (\beta) \frac{L_{ni} (\beta)} {P_{ni}} f(\beta) d \beta


where β m is the mth element of β."[2] It can be seen that because the probability of respondent n with respect to alternative i, P ni , is not in the denominator of the integral that, "A ten-percent reduction for one alternative need not imply (as with logit) a ten-percent reduction in each other alternative."[3] As you may notice the relative percentages depend on the likelihood that respondent n will choose alternative i, L ni , versus the likelihood that respondent n will choose alternative j, L nj over various draws of β. Beta depends of which probability density function the research thinks is appropriate for his/her data.

[edit] Correlation in unobserved factors over time

Standard logit does not take into account how utility changes over time. This is a problem if you are using panel data, which is essentially repeated choices over time. By applying a standard logit model to panal data you are making the assumption that whatever you are observing is new everytime you observe it. That is a very unlikely assumption. By taking into account both random taste variation, and correlation in unobserved factors over time the utility for respondent n for alternative i at time t is as follows,


 U_{nit} = \beta_{n} X_{nit} + \varepsilon_{nit}


where the subscription t is the time dimension. We still make the logit assumption which is that \varepsilon is i.i.d extreme value. That means that \varepsilon is independent over time, people, and alternatives. \varepsilon is essentially just white noise.

For a normal distribution the βs will have a standard deviation, s, and mean, b. Then the utility equation becomes:


 U_{nit} = (b + s\eta_{n}) X_{nit} + \varepsilon_{nit}


and η is the draws taken from the probability density function. Then that equation becomes:


 U_{nit} = b X_{nit} + (s \eta_{n} X_{nit} + \varepsilon_{nit})


Unit = bXnit + eni


In the preceding equation the observed factors are separated from the unobserved factors. Of the unobserved factor the \varepsilon is independent over time, and s η n X nit is not independent over time.

Then the covariance is,


Cov(eni,enj) = s2XniXnj


Then by adding random coefficients to the explanatory variables, X's, one should be able to get some correlation over time out of the simulation.

[edit] Example

If you observe a series of a decision maker's choices about what coffee maker he/she buys every time they need a coffee maker, then the probability of that sequence of choices is simply the product of the logit probability of each individual purchase/choice. (Assuming the error term, \varepsilon, is i.i.d extreme value.)

Which mathematically looks like the following,


 L_{n} (\beta_{n}) = \prod_{t} \frac{e^{\beta_{n}X_{nit}}} {\sum_{j} e^{\beta_{n}X_{njt}}}


Then the probability is simply the integral of the product of the logits over the density of &beta.


 P_{ni} = \int L_{ni} (\beta) f(\beta | \theta) d\beta


Unfortunately there is no explicit solution to this equation so the researcher must simulate P ni. Fortunately for the research simulating P ni can be very simple for certain distributions. It can also be very difficult for other distributions. There are four basic steps to follow

1. Take draws from the probability density function that you assigned to the 'taste' parameter.

2. Calculate Ln (βn) (The conditional probability.) This is done for each alternative, and the highest utility is identified.

3. Repeat many times.

4. Average the results

Then the formula for the simulation look like the following,

 P_{ni} = \frac {\sum_{r} L_{ni}(\beta^r)} {R}

where R is the total number of draws taken from the distribution, and r is one draw.

Once this is done you will have a value for the probability of each alternative i for each respondent n.

[edit] References

  1. ^ CB495-06Drv.tex
  2. ^ Train, Kenneth
  3. ^ Train, Kenneth

[edit] External links