Talk:Exponential family

From Wikipedia, the free encyclopedia

This article is within the scope of WikiProject Statistics, which collaborates to improve Wikipedia's coverage of statistics. If you would like to participate, please visit the project page.

WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, which collaborates on articles related to mathematics.
Mathematics rating: Start Class Mid Priority  Field: Probability and statistics

Contents

[edit] Must H be a pdf

Question: must the reference distribution H be a probability distribution, or will a positive measure do? Note that the normalization condition applies to F, not to H. I am thinking of cases where the reference "distribution" is Lebesgue measure (to wit: the Normal distribution) or counting measure. — Miguel 11:08, 2005 Apr 16 (UTC)

Certainly it is Lebesgue measure in some cases. And counting measure on positive integers -- clearly not assigning finite measure to the whole space -- in some cases. Which causes me to notice that this article is woefully deficient in examples. I'll be back.... Michael Hardy 20:36, 16 Apr 2005 (UTC)

[edit] Article reversion

Would you mind explaining the reversion of my edits on the article on the Exponential family? — Miguel 07:14, 2005 Apr 18 (UTC)

It said:

A is important in its own right, as it the cumulant-generating function of the probability distribution of the sufficient statistic T(X) when the distribution of X is H.

You changed it to:

A is important in its own right, as it is the cumulant-generating function of the probability distribution of the sufficient statistic T(X).

The edit consisted of deleting the words "when the distribution of X is H. The statement doesn't make sense without those words. A cumulant-generating function is always a cumulant-generating function of some particular probability distribution.

Well, actually the derivatives of A(η) evaluated at &eta instead of at zero give you the cumulants of dF(x|η), which is what I meant. The cumulants of dH are actually irrelevant to dF, what is interesting is that the cumulants of the entire family of exponential distributions with the same dH and T are encoded in A. Miguel 09:34, 2005 Apr 19 (UTC)

I see now that you also changed some other things. I haven't looked at those closely, but I now see that you changed "cdf" to "Lebesgue-Stieltjes integrator". I don't think that change makes sense either. That it is the Lebesgue-Stieltjes integrator is true, but the fact that it is the cdf is more to the point in this context. Michael Hardy 21:59, 18 Apr 2005 (UTC)

Except that, as you agree above, dH need not be a probability distribution, and hence it need not have a cdf. It is a positive measure, and dH is the integrated measure. I have never seen x on the whole real line called a cdf. It is fine if you want to call it a cdf, but then you'll have to explain somewhere else that the corresponding probability distribution may be "non-normalizable", and that will raise some eyebrows (not mine, though).
You also reverted a lot of valid content about the relationship between the exponential family and information entropy, as well as a reorganization of the existing information into sections, plus placeholders for discussing estimation and testing. The two edits that so bothered you were the last of a long series spanning two days. You could have been a little more careful. The appropriate thing would have been to discuss these things on this page. Miguel 09:34, 2005 Apr 19 (UTC)

[edit] Disputed

The current revision says that the Weibull distributions do not form an exponential family. This seems to ignore non-canonical exponential families (where the natural parameter may be transformed by another function). What am I missing? --MarkSweep (call me collect) 23:28, 14 November 2005 (UTC)

I may have lifted that from another source without checking the pdf. This is not open to interpretation. Either the Weibull distribution is exponential or it isn't according to the definition. Transforming the natural parameter is not the issue: the normal distribution's natural parameters are not the mean and variance. Miguel 09:31, 15 November 2005 (UTC)

AFAIK, it's not an exponential family according to the definition given in this article. So I guess I put the {{dubious}} tag in the wrong place. What I was trying to point out is that there is a more general definition of exponential family in common use, see e.g. [1]. The difference between these definition becomes apparent when one considers the Weibull distribution: it's not an exponential family according to the definition used here, but it is according to the more general definition (unless I'm missing something). --MarkSweep (call me collect) 19:49, 15 November 2005 (UTC)

If you "transform" the natural parameter, you're using a different parametrization of the family of probability distributions involved, but you're not looking at a different family of probability distributions. Is that what you're talking about? Michael Hardy 22:46, 15 November 2005 (UTC)

The Weibull distribution is not in the exponential family according to the definition given here, which is one that can also be found in widely-used textbooks such as Casella and Berger, Statistical Inference (2nd edition), 2002, page 114. Later in the article it is noted that you NEED this specific definition for a distribution to have sufficient statistics (a result of Darmois, Koopman and Pitman from the 1930s), so you can't generalize the definition any further without penalty. To shorten the controversy about Weibull, I propose that that ALL MENTION of Weibull be dropped from the article, leaving only Cauchy as the agreed-upon non-member of the exponential family. Ed 02:01, 16 June 2006 (UTC)

OK, further study does not find any references supporting Weibull in the exponential family, so I suggest that we remove the "dispute" tag and let the original wording stand, the one where Cauchy and Weibull are both left out of the family. I consulted books on the exponential family by Lawrence Brown and Ole Barndorff-Nielsen. EdJohnston 17:14, 23 June 2006 (UTC)

It is an exponential distribution and is one according to the definition given in this article. The cdf defines a measure. It does not depend on the particular parameterization of the family of distributions. The natural parameters of the Weibull are (λ^-k,k-1) where λ and k are the parameters given in Wikipedia's defn of the Weibull. H is then Lebesque measure, and you can work out A yourself. Note that the Weibull is a generalization of the exponential distribution which is Weibull with k=1. CWoo on Planetmath gets this right (though his article seems to simplify some things). Odometer 05:34, 27 December 2006 (UTC)
To put Weibull in the exponential family needs a reference, in my opinion. Membership in the family is not expected to be invariant under transformation of the parameters. I left a question about this at User_talk:Odometer but have not received a response. I found that I was unable to 'work out A myself'. PlanetMath is interesting but it is not a reliable source for our purposes. EdJohnston 18:06, 8 January 2007 (UTC)
Look at the planetmath definition of exponential family. It's an equivalent definition when density functions exist, and it's the one that's often used because it's much easier to comprehend and pattern match on, so it's probably the more appropriate definition to use on wikipedia. Casella and Berger use that defn in their book. You just factor the density into a parameter part, a "data" part, and an interaction part. Generally speaking the 'A' doesn't matter that much when determining if it's an exponential family. 'A' is just the normalizing constant to make the density integrate to 1. It's sometimes called the log partition function while the rest of the density is sometimes called the kernel. In this case it's -k log(λ) + log(k) using the parameterization in wikipedia. You can stick that in the natural parameterization if you want. Also membership in the family is definitely invariant under transformation of the parameters. A family of distributions is just a collection of probability measures which are indexed by some parameters. You can change the indices, and it's still the same collection. Odometer 00:33, 16 February 2007 (UTC)
Can you provide the functions a, b, c and d needed at [2] for the Weibull distribution? I would be happy if a simpler pattern for the exponential family could be used in this article. It's not crystal clear that measure theory is essential for explaining this stuff. It does set a high prerequisite for understanding the article. EdJohnston 02:15, 16 February 2007 (UTC)

It's worth noting that McCullagh and Nelder's "Generalized Linear Models" talks about knowing the dispersion parameter (i.e. σ2 in the normal case) as being akin to having a one-parameter (μ) distribution. But this implies that the distribution could be placed in exponential family form before the dispersion parameter was fixed. They also talk about fitting with a Weibull distribution and mention that the fitting procedure isn't entirely inside the framework of a GLM--meaning it can't be placed in exponential family form except when it's coincident with the exponential distribution (pp 423). Pdbailey 06:54, 9 July 2007 (UTC)

[edit] Question

Questions: Is the "prior" mentioned in the opening section the same as the "prior distribution" mentioned later? If so isn't the convention that the first mention of something links to the wikipedia article on it? (And calling it the "prior distribution there would be good too - people like me don't actually know any stats)

Also shouldn't "cdf" appear in brackets after the words "Cumulative distribution function" for clarity rather than just straight in the text? 20:25, 18 May 2006 (BST)

[edit] Technical(expert) tag

I just added a technical(expert) tag to this page. I understand and have used exponential family forms of distributions many times, but I find this article extremely difficult to read and almost willfully obscure. I can understand why one might want to use Lebesgue-Stieltjes integration, but the examples I have read have managed to avoid the topic completely and still dealt with continuous as well as discrete probability distributions. See for example, McCullagh and Nelder, "Generalize Linear Models" pp 28, or Givens and Hoeting, "Computational Statistics" pp 5. A read though the mathematics Wikiproject page, specifically, the section titled, "Some issues to think about" and the Wikipedia page on making technical articles accessible.

The general idea is to start simple (i.e. without invoking complicated concepts), which is definitely possible as I have noted two examples from textbooks, and then to get to more rigorous treatments later on. Thanks in advance to anyone who helps with this! Pdbailey 17:17, 1 June 2007 (UTC)

Yes, this article would be more useful without the measure theory and the Lebesgue-Stieltjes integration. No books on my shelf seem to do a patient exposition of the exponential family. Maybe the treatment you found in McCullagh and Nelder would provide inspiration for our article? Also there's quite a bit of matrix notation used in the article and that adds to the reader's burden. EdJohnston 04:31, 24 June 2007 (UTC)
I think the MC&N method is too focused on GLM, I think others might be better. Pdbailey 14:43, 24 June 2007 (UTC)

[edit] minus sign mismatch? and no previous mention of K(u)

The definition of the exponential family is given at the top of the page by dF(x|\eta) = e^{-\eta^{\top} T(x) - A(\eta)}\, dH(x), which has a minus sign before the first η. However, section "Differential identities: an example" claims the "exponential family with canonical parameter" is not preceded by a minus sign. Is the minus absorbed into η in the "Differential identities" in order for the Expectation and Variance formulas to work? If so, perhaps the form of the exponential family should remain the same and this result be formulated consistent with the single form.

Also, section "Differential identities: an example" claims that "As mentioned above \scriptstyle K(u) = A(u + \eta) - A(\eta) ", but \scriptstyle K(u) is not previously mentioned.

Erik Barry Erhardt 21:10, 23 June 2007 (UTC)

[edit] uniform dist

This used to say that the uniform distribution is not in the exponential family. But the uniform dist is a special kind of beta dist, and the beta dist is in the family. Benwing 06:02, 22 August 2007 (UTC)

I've never seen anyone miss the point of a definition so completely. Yes the uniform distribution is a beta distribution, yes, the beta distributions form an exponential family. To suggest that that in some way implies that the uniform distributions form an exponential family is ridiculous. You really need to read and understand the definition of "exponential family" before you can understand such things. An exponential family is a not just a distribution; it is a family of distributions. Every uniform distribution belongs to some exponential family, and in fact to more than one exponential family. That does not mean that the family of distributions that includes only the uniform distributions is an exponential family. It obviously is not, since the support of the various distributions in the family varies. Michael Hardy 00:16, 23 August 2007 (UTC)
OK, fine, I made a mistake, but you have too -- you've forgotten to be civil, see WP:CIVIL. Benwing 07:40, 23 August 2007 (UTC)

[edit] far too technical

This page is a very good example of how a math page should *NOT* be. It's totally opaque to someone who doesn't already have a PhD in statistics, and such a person has no need for this page anyway. It would be far, far better if this page dispensed with all this Lebesgue measure business and gave a simple explanation, plus simple, clearly explained examples, plus a simple derivation of the relation to maximum entropy, etc. Then -- maybe -- include a completely, techically correct discussion at the bottom.

It should go something like this:

First, explain the simple case:

  • Explain the simple case of one parameter, f(x|t) = exp(a(t) + b(x) + c(t)*d(x)) = A(t)*h(x)*exp(c(t)*d(x)) = B(u)*h(x)*exp(u*d(x)).
  • Show briefly why these different definitions are equivalent, and how you can reparameterize from t to u to eliminate c(t), and describe that u is a "natural parameter".
  • Explain how A(t) is just a normalization factor.
  • Explain briefly that d(x) is the basis of a sufficient statistic.

Then expand the definition to cover vector-valued x and t.

Then rewrite sections 1-4 to eliminate the discussion in terms of CDF's, Lebesgue measure, Radon-Nikodym derivatives, Einstein's summation convention, and anything else that an undergrad is likely to find opaque. Show how the maxent relationship is derived, rather than just saying "it's a simple matter of variational calculus" -- and do it *without* invoking any calculus of variations. (If you're not sure how to do this, look up standard NLP papers on maximum entropy.)

Then, finally, if you want, include the full gory advanced details.

If you are a statistics whiz, you may think that doing what I suggested is intolerably stupid or obvious, etc. But keep in mind that Wikipedia articles are *NOT* addressed to fellow experts, but to a general audience. See WP:WPM and WP:MOSDEF for more discussion.

The corresponding article on PlanetMath would be a good place to start.

Benwing 07:40, 23 August 2007 (UTC)

Actually, I am no slouch at statistics, yet what you propose (aside for details to be fleshed out later) is neither stupid nor intolerably obvious. I hesitate to dive right in though, as my tendency is to keep as much intrinsic material if possible and appropriate but change the writing style, yet my comfort level with these topics would make my work quite drudging and painful.
A good start might be to adapt the section from generalized linear model; I should note I had a hand at writing that section. Check it out and let us know if that is closer to what you had in mind (I know it would have to be adapted somewhat away from GLM context...). Baccyak4H (Yak!) 14:29, 23 August 2007 (UTC)
I agree with every word Benwing said. In fact, I have been meaning to work on this but never seem to get around to it. I think there is no escaping a major rewrite here. --Zvika 18:55, 7 September 2007 (UTC)

"It's totally opaque to someone who doesn't already have a PhD in statistics"

That is nonsense. Any mathematics graduate student who knows the basic definitions in probability theory would understand it. Many mathematicians who do not already know this material would understand it readily. Michael Hardy 19:06, 7 September 2007 (UTC)

Perhaps nonsense in letter, but not in spirit. The thrust of this talk section is dicussion to improve the article, not to point out true but minor pedagogical points which will not help improve the article, and may as a side effect disparage well meaning editors. That aside, I know you have a good handle on these topics, your participation could be very helpful. Would you care to contribute to this effort? Baccyak4H (Yak!) 19:21, 7 September 2007 (UTC)
(edit conflict) The point is that a basically simple idea is explained in overly technical terms. You should think not of a mathematics graduate but of an engineer or scientist, who has taken a basic course in probability. These courses very often do not include any measure theory at all; many readers will certainly never have heard of a Lebesgue-Stieltjes integral. Yet the idea of an exponential family -- a class of distributions having a particular probability function -- can be understood both at the technical and at the intuitive level by such a reader. All we have to do is initially talk only about continuous and discrete distributions, and defer the general treatment to a later section. "Things should be made as simple as possible, but not any simpler." [3] --Zvika 19:33, 7 September 2007 (UTC)

I agree that it could be made accessible to a wider audience. But there's no reason to dismiss as worthless the many mathematicians not familiar with this concept who could learn it by reading this article. Michael Hardy 23:21, 7 September 2007 (UTC)

(edit conflict) Michael, do you agree with Benwing thought that maybe this is the least important audience(add text, "Wikipedia:Make_technical_articles_accessible suggests that this topic should be accessible to the widest possible audience?")
Would anyone else be willing to work on a (sandbox version)? Pdbailey 00:14, 8 September 2007 (UTC) (edit Pdbailey 00:41, 8 September 2007 (UTC)) -- edited to remove broken link Pdbailey 04:10, 30 September 2007 (UTC)

What, specifically, is the least important audience? I don't see anything above where Benwing says any particular audience is the least important one. Michael Hardy 00:21, 8 September 2007 (UTC)

[edit] proposed overhaul

Several editors have written an alternative version of this page that is intended to be more readable for readers with out Ph.D.s in maths. This effort resulted form the above conversation, and is located at (User:Pdbailey/Sandbox/Exponential_family). I intend to replace the body of this article in a few days, if there are no objections. Please also feel free to edit the linked page before it is moved or after it is moved. Pdbailey 19:51, 22 September 2007 (UTC) -- edited to remove broken sandbox link

Update: The rewrite has now been carried out. We hope you like it. Feel free to directly edit the exponential family page. --Zvika 08:38, 27 September 2007 (UTC)

[edit] Conjugate priors

Hello, the article says that "In the case of a likelihood which belongs to the exponential family there exists a conjugate prior, which is often also in the exponential family." This suggests that exponential families exist with conjugate priors that are not in the exponential family, is that true? But the form given for the conjugate prior, \pi(\eta) \propto \exp(-\eta^{\top} \alpha - \beta\, A(\eta)), looks like it is in the exponential family (for instance if we append to the vector η a component A(η)). Thanks in advance. A5 10:05, 1 November 2007 (UTC)

isn't it rather darmois koopman pitman theorem ? letters... +maybe the dates —Preceding unsigned comment added by 129.175.52.7 (talk) 13:45, 14 February 2008 (UTC)

[edit] Still too technical ? (and inaccurate)

This article is probably still too technical. While many basic books speak about "THE exponential family", this article confuses the lay (though mathematically educated) reader by suggesting that there are many such families. (The first sentence is very confusing as it suggests that any group of distribution functions that share any arbitrary property is an exponential family, e.g. the set of distributions uniform over a domain. Unless the word "class" implies the use of the exponential function, of course... but again this would be arcane stuff.) While there is certainly room for a presentation of the exponential famil(y/ies) at a more technical level in the body of this article, it should also clearly start at the level of graduated scientists and engineers who didn't have advanced courses in prob/stat. Belonging to the last category (and in summary) I found this article of little use. —Preceding unsigned comment added by 130.223.123.54 (talk • contribs) 19 February 2008, 08:52 (UTC)

I changed the first sentence and I hope you find it clearer. Regarding the use of the exponential family vs. an exponential family, two standard textbooks that I have with me both use the an option:
  • Lehmann and Casella, Theory of Point Estimation, 2nd ed., p.22.
  • Shao, Mathematical Statistics, 2nd ed., p.96.
In both cases, they define an exponential family as a parametric family of distributions; each such family is a different exponential family. So there is no "the" (in the sense of "one and only") exponential family. --Zvika (talk) 09:14, 19 February 2008 (UTC)
Previous talk on this page has tried to figure out if an exponential family is just a set of functions, or if any of the parameters also separate one family from another. I don't recall the result, but certainly the premise was that there are many families. Even if 130.223.123.54 can not produce a reference, it might be worth considering having a section that addresses exactly this topic--notably, I am not qualified to write it. Pdbailey (talk) 21:05, 20 February 2008 (UTC)