Template talk:Probability distribution

From Wikipedia, the free encyclopedia

1 Usage
2 Standard Plots
- 2.1 Discussion
- 2.2 Use of color
3 Status of usage
- 3.1 List
4 Testing
5 Addition field: support
6 Standard Layout for Probablity Distribution Pages
7 Subpages
8 Inverse-gamma
9 Italics or not?
10 Uniform Distribution
11 Entropy
12 Compound Poisson - help
13 additional field? Exponential family
14 Template needs repairing
15 background color clashes with math png

[edit] Usage

To use this template, put this in the article and fill it in as appropriate (see below code for detail):

{{Probability distribution|
   name       =|
   type       =|
   pdf_image  =|
   cdf_image  =|
   parameters =|
   support    =|
   pdf        =|
   cdf        =|
   mean       =|
   median     =|
   mode       =|
   variance   =|
   skewness   =|
   kurtosis   =|
   entropy    =|
   mgf        =|
   char       =|
 }}

Fields (data goes between the equal size and pipe):

"name" should be the name of the distribution without "distribution" in it (e.g., "Normal", "Exponential")
"type" should be either "density" or "mass", which corresponds to probability density function and probability mass function
"pdf_image" should be a full wikicode for an image (including the "[[Image: ...]]"
"cdf_image" same as "pdf_image"
The following should all be tex equations and exclude any function labels (exclude function portion like f(x;μ,σ²); brevity is key)
- "parameters" should be the parameters for the distribtion (such as $μ$ and $σ 2$ for the normal distribution)
- "support" should be the support of the distribution, which may depend on the parameters. Specify this as "<math>x \in some set</math>" for continuous distributions, and as "<math>k \in some set</math>" for discrete distributions.
- "pdf" the pdf/pmf
- "cdf" the cdf
- "mean" the mean
- "median" the median
- "mode" the mode
- "variance" the variance
- "skewness" the skewness
- "kurtosis" the kurtosis excess
- "entropy" the information entropy
- "mgf" the moment generating function
- "char" the characteristic function

If any of these don't exist, then put "Does not exist" (or something to the same effect); leave blank if unknown.

[edit] Standard Plots

Construction - Standard plots are generally done in approximately 6000 x 4500 pixel size, using Postscript Times or Symbol font size 48 and a line thickness of 17 pixels. The axes of the plot should be in a ratio of 4 wide to 3 high. The size of the image should also be in a ratio of 4 wide to 3 high. The image should consist of a few distinct colors only, different curves having different colors. The image should then be blurred (Gaussian 2.5 pixel in Photoshop) and reduced to 1300 x 975 pixels. When multiple curves are plotted, a table of parameter values and associated colors should be included on the plot. If one of the curves is "prototypical" or "standard", it should be in black. We want these plots to be usable in any language, so only numerals and mathematical symbols should appear in the image. Axes should be labelled with the appropriate symbols (x and p(x)/P(x) for PDF/CDF plots, k and p_k/P_k for PMF/CMF plots. Further explanation should be made in the text caption for the image. Continuous distributions should be done as solid lines. Display of discrete distributions should use points connected by lines, with a short explanation in the caption (e.g."connecting lines do not imply continuity"), unless a single plot is used, in which case "impulse" plots are best.

Upload Procedure - Images should be uploaded to the Wikipedia commons. The file names should be of the form XXX_distribution_ZZZ.png where XXX is the distribution name and ZZZ is either "PDF", "CDF", "PMF", or "CMF" depending on which function is plotted. The description page for the plots should contain a short description, the GFDL tag, and a link to Category:Probability distributions images". When done using gnuplot, the relevant instructions should be included.

[edit] Discussion

(I favor visible points connected by lines, with an explanation in the caption that the lines don't imply continuity. I know this goes against the "no caption" idea, but I really don't like losing the freedom to clarify things with a caption. Especially if we don't use axis labels! PAR 05:12, 10 Apr 2005 (UTC))

Comments: 1) In the past, we've used filenames of the form FOO_distribution_PDF.png (additional underscore/space before "PDF"). 2) Gnuplot "linespoints" style for discrete PMFs is a good idea. The alternative, using "impulses" or something similar, makes it difficult to display more than one distribution at a time. CDFs can be plotted using the floor function and one of the "steps" styles. Alternative, if only a single PMF is plotted, "impulses" is probably the best style to use. 3) I don't think including any kind of descriptive text is a good idea, because it renders the plots less useful for other editions of Wikipedia, which may want to include their own descriptive text in French, Japanese, etc. A brief legend with formulas is fine, of course. --MarkSweep 06:07, 10 Apr 2005 (UTC)

I included the file naming convention you mentioned in the rewrite. I took out the "no caption" requirement in the template description of the pdf field, because by making the requirement of no text in the caption and no text in the image, we're boxed into having no explanation capabilities at all, and we definitely need the freedom to explain. I also added that the axes of an image need to be labelled (math symbols only). I notice that the mathematical symbols vary among the pages. Should there be some slight standardization, like p(a,b,c;x) for the PDF and P(a,b,c;x) for the CDF, with parameters a,b,c? I tentatively threw these into the specification too. PAR 12:36, 10 Apr 2005 (UTC)

Regarding captions, they can be added below plots outside the image. Have a look at normal distribution or my recent edit to Poisson distribution.

About the formulas, I've seen $f(x\,|\,a,b,c)$ used for the PDF/PMF and $F(x\,|\,a,b,c)$ for the CDF in the articles themselves. Or perhaps use a more descriptive and/or conventional name instead of f, like $N(x\,|\,0,1)$ for the standard normal PDF and $\Phi(x\,|\,0,1)$ for the standard normal CDF. I've also used things like $\mathrm{Gamma}(\lambda\,|\,\alpha,\beta)$ (see exponential distribution#Bayesian inference for an example).

I'm not sure if axes need to be labeled. We've been fairly consistent in using x for continuous distributions and k for discrete distributions. So the label of the horizontal axis should be obvious. (Though I realize that repeating the obvious wouldn't hurt either.) --MarkSweep 19:39, 10 Apr 2005 (UTC)

I like that caption method you used. I prefer p(x) and P(x) (p for probability) but if you have any argument against it, lets do f(x) and F(x). For the parameters, $f(x\,|\,a,b,c)$ looks fine to me. PAR 20:26, 10 Apr 2005 (UTC)

I would prefer p(x) for pdf/pmf and P(x) for cdf and to use the captions I made for Normal (external to image in small font). Cburnett 22:37, Apr 10, 2005 (UTC)

I will leave the specification as it stands, then, using p and P. PAR 00:04, 11 Apr 2005 (UTC)

Suits me. --MarkSweep 00:24, 11 Apr 2005 (UTC)

Another thing though: I saw you added a CMF plot for the Poisson distribution with essentially the same caption as for the PMF saying that the function is only defined for integer values. I don't think that's strictly true: I would have expected to see a step function that's constant almost everywhere except for non-continuous jumps at integers 0 to n. --MarkSweep 00:24, 11 Apr 2005 (UTC)

I don't think that discrete distributions even deal in the real number system, just integers, (or integral multiples of something) at least as far as the random variate is concerned. The CDF is undefined between the integers because the random variates are not selected from the real number system, but from the integer number system, (or equivalently some real # time the integers). I looked at the CDF article and it only talks about continuous distributions. Maybe we should write an article for CMF's. PAR 01:58, 11 Apr 2005 (UTC)

For a discrete random variable X it's customary to define the CDF as

$P(x) = \Pr[X\leq x] = \sum_{k\leq x} p(k).$

That way, x can be a real number and P is then a step function $P: \mathbb{R} \to [0,1]$ . I don't think there is a need to define a separate notion of a CMF. --MarkSweep 02:48, 11 Apr 2005 (UTC)

I really think that is wrong. X and x have to be from the same set. X is discrete, x must be discrete. I mean its wrong by dimensional analysis. X and x must have the same dimensions. For example, when dealing with income distribution, there are N people ranked by income, and X is R/N which means XN has units of people. If x is just any real number then we could have xN=3.7. 3.7 what? 3.7 people, but there's no such thing as 3.7 people. In Poisson statistics there is no such thing as 3.7 counts. It's like saying we need to define the CDF over the complex number plane.

$P(x) = \Pr[X\leq \Re(x)] = \sum_{k\leq \Re(x)} p(k).$

That way, x can be a complex number and P is then a step function $P: \mathbb{R} \to [0,1]$ . Not only that, it screws up the plots :) PAR 16:10, 11 Apr 2005 (UTC)

Casella & Berger's Statistical Inference (ISBN 0-534-24312-6) defines the pmf — pedantically — as

$f_X(x) = P(X = x) = \left \{ \begin{matrix} (1-p)^{x-1} p & \mbox{for x=1,2,...} \\ 0 & \mbox{otherwise} \end{matrix} \right.$

which

f X

is defined on the real line and can use the same definition of cdf as the continuous RV's. But they seem to generally not write the otherwise condition. Cburnett 00:23, Apr 12, 2005 (UTC)

Ok, I went to check my books and found the name of the plot is a "cumulative frequency polygon" or a "cumulative ogive". Please google these terms. With regard to the definition of the cumulative distribution function for discrete variables, the relevant books I checked say:

Guenther, "Concepts of Statistical Inference": " $Pr(X\le r)$ means the probability that the expreiment yields a value less than or equal to r... Almost always r or x will be one of the numerical values which the experimaent can generate." All cumulative distribution functions for discrete variables are given as lists at the values of X. No plots.

Parsons, "Statistical Analysis": Section 2.4 is titled "Graphic representations of frequency distributions" and lists only the "cumulative ogive" as a method of plotting the cumulative distribution function. Again, the cumulative distribution function examples are given as lists in X.

Lindgren, "Statistical Theory": Theres no concise quote, but its clear that the CDF is defined as a continuous function on the real number line. Plots are done accordingly.

Basically, there is some disagreement as to the proper definition of the CDF and how to plot it. There is however, ample justification for the use of the "cumulative ogive" and since it is desireable to have multiple plots of the CDF that are easily readable, I favor the ogive plots.

Also, every reference I checked uses f(x) and F(x) as the PDF and CDF, so I think I will change my mind on that. PAR 05:11, 12 Apr 2005 (UTC)

I was about to make some Zipf and Zeta distribution plots, and I was thinking it would be very informative to plot these PMF's on a log-log scale, where they become straight lines. Assuming there's an explanation in the caption, does this sound like a good idea? PAR 21:07, 20 Apr 2005 (UTC)

Absolutely. I did the same for the Yule-Simon distribution (which I should re-do to match the standard style). Perhaps do both linear scale and double log scale plots for these three distributions? --MarkSweep 23:07, 20 Apr 2005 (UTC)

I uploaded the Zipf plots, but inadvertently labelled the CMF horizontal axis with k. Before I fix it, can anyone remind me of the reason for not labelling axes? Is it just to maintain flexibility in the text notation? PAR 01:33, 23 Apr 2005 (UTC)

[edit] Use of color

Discussion moved here from User talk:MarkSweep.

Thanks for your work on these graphics; they look great. I have a gripe though (sorry): could you guys use dashed/dotted/marked lines for the different colors in recognition of the needs of color blind people? Or at least make a link to a color blind version? The most common color blindnesses by far are protanopia and deuteranopia. Protanopes require some distinguishing scheme for green/yellow, green/orange, green/brown, blue/purple, and cyan/grey. I'm not sure about deuteranopes, but I imagine they would have trouble with red/orange, red/yellow, blue/cyan, and purple/grey. --Chinasaur 01:25, 11 Apr 2005 (UTC)

Right, I'm peripherally aware of the issue, especially concerning red/green color blindness (forget what it's called, if only I had an encyclopedia…). Unfortunately, the choice of colors provided by gnuplot is extremely limited and not at all compatible with color deficient vision. I seem to recall that it is possible to use a small set of colors that most people can distinguish easily. Do you have any advice on which colors to use? This is assuming that we can get gnuplot to use colors specified by arbitrary RGB triples. Failing that, one could use dashed and dotted lines. In any case, we're rapidly approaching the point where we need to automate the creation of these plots. Does anyone have experience with Gimp scripting? --MarkSweep 02:05, 11 Apr 2005 (UTC)

My understanding of color blindness isn't the lack of ability to see colors, just that shades of red/green (or whatever) appear to be the same. So unless you have someone with color blindness on hand to determine if two colors appear the same then I don't see it as worth the time *guessing* what they *might* see. Also, with providing source then anyone could generate their own plots (though not everyone will be able to, at least it's a start). Cburnett 03:42, Apr 11, 2005 (UTC)

Side note: Even considering that I personally have run octave at some point in my life, I think that assuming everyone (anyone?) will be able and willing to generate his/her own plots given the source is absurdly optimistic :)...

Main point: I gave you some suggestions above about colors that are difficult for protanopes (someone missing the "red", i.e. long wavelength cone cell) to distinguish; I am protanopic, so this is accurate. In general the principle is simple: if you take any color and change the R value in its RGB, this change will be hard for a protanope to notice. It's a little trickier for the deuteranopes because (to simplify grossly) the green of RGB does not so well match the "green" cone cell that they are missing. However, the corresponding principle should be adequate for your purposes.

For example, in the plots at Beta distribution, I believe there are blue (RGB:0 0 1) and purple (1 0 1) lines that I can barely distinguish, red (1 0 0) and black (0 0 0) lines that I can distinguish slightly but not well, and a light colored line that could be either green (0 1 0) or yellow (1 1 0). As you can see, the difficulties encountered by a real, live protanope are predictable by the principle given above.

My suggestions are:

For confusing colors, differ them in saturation and brightness in addition to differing them in hue. For example, rather than just blue=(0 0 1) and purple=(1 0 1), use blue=(0 0 .5) purple=(1 .5 1). For red and black, use (1 .25 .25) and (0 0 0). Etc.
Alternatively, for confusing colors differ the line style, so for blue and purple, make one dashed. Likewise for red and black, yellow green and orange, etc.

Whether you want to muck up your current graphs or create alternative colorblind versions and then somehow link to those, up to you.

These suggestions cover cases of dichromats, people missing one cone entirely. Another common form of red/green color blindness is anomalous trichromacy, people with all three cones but messed up spectra. I don't think your plots should be too problematic for anomalous trichromats.

Sorry this is so pedantic; probably there should be a more central WP color blindness styleguide for creating graphics; then I would only have to rant about this in one place. If anyone knows where and how to make this happen I will be happy to contribute. --Chinasaur 10:12, 11 Apr 2005 (UTC)

Regarding side point: most people that would be genuinely interested in generated pdfs/pmfs and cdfs of distributions are likely to know of a way to generate plots (either through gnuplot or matlab or some of the statistical packages).

Regarding main point: finally, someone complaining about color choice that is color blind! :) How's this for an idea: on each image page (i.e., Image:Normal distribution cdf.png) you/me/whomever picks a point (abscissa or ordinate) and then relates the order they appear to the legend on the graph. So for the normal cdf, I could say something like "at ordinate=0.3, the order of plots from left-to-right matches the legend top-to-bottom."? For most plots, there's some point where you can do this. For the pdf of the normal: "the plots with the peak at 0 are the top 3 plots in the legend in the same order; the bottom legend entry is the plot with peak at -2" Basically describing the plots instead of marking up the plots.

Though, now that I think of it, isn't it easy in gnuplot to add marks to lines and they show up in the legend? (By marks I mean symbols on top of a solid line, not using dotted or dashed lines). Cburnett 17:40, Apr 13, 2005 (UTC)

Yes, one could use "linespoints" style. However, that would look very similar to the plots of PMFs we have now and could be confusing. Probably best two either switch on "dashed" in gnuplot's PostScript "terminal", or to use the method you describe. Alternatively, one can put arbitrary labels on plots, which would require manual intervention. Overall I think "dashed" output and/or using a safer set of colors that vary on more than one dimension would be the best choice. --MarkSweep 19:35, 13 Apr 2005 (UTC)

Will anyone still read this...? Cburnett, good point about the nerdiness of people reading these articles. Your colors solution is clever, but looking at the figures and imagining the caption you would have to add it seems a little laborious. My suggestion is the use of two linestyles, solid and dashed. The colors that are likely to be confused are not that numerous, so you should be able to cover most confusing groups with two linestyles; use solid/dashed for pairs like red/black, blue/purple, green/yellow, red/brown, grey/cyan, grey/purple (see that's already way more lines than you need). --Chinasaur 11:57, 15 July 2005 (UTC)

A guide for creating plots has been started at Wikipedia:How to create graphs for Wikipedia articles. It is mostly about gnuplot so far.
It is not too difficult to alter the colors of a PostScript file after the fact in a text editor. Maybe someone could come up with a palette of colors that is distinguishable by pretty much anyone, and we could convert the gnuplot-generated colors to that palette? A .ps file text-conversion script could probably be made by someone who knew what they were doing.
I really don't like dashes or dotted lines. Sorry. :-) In my opinion they should only be used for special situations, like asymptotes of a function or whatever. Maybe we could come up with another scheme that doesn't look bad, like notating each plot with a symbol or something? - Omegatron 22:44, July 24, 2005 (UTC)

[edit] Status of usage

The following is a list of probability distribution pages, as classified by the probability distribution page. Following the list is an additional category "unclassified" which have not yet been entered into the probability distribution page, and need to be. The status of each page is given by the letters following the name of the page

A - has an infobox
B - has standardized plots of PDF/PMF and CDF/CMF
C - has all relevant infobox entries filled other than images
D - has gnuplot code in the above image description pages
E - uses "standard" notation

The status is not up to date!!! Please bring it up to date as you check out the pages.

[edit] List

Discrete univariate distributions
- With finite support
  - degenerate distribution ABC E
  - discrete uniform distribution ABC E
  - Bernoulli distribution A C E
  - binomial distribution A E
  - hypergeometric distribution A E
  - Rademacher distributionA C E
  - Zipf's law AB E
  - Zipf-Mandelbrot law A E
- With infinite support
  - Boltzmann distribution
  - geometric distribution (special case of negative binomial)
  - negative binomial distribution A
  - logarithmic distribution A
  - Poisson distribution ABC E
  - Skellam distribution AB E
  - Yule-Simon distribution AB E
  - zeta distribution AB
Continuous univariate distributions
- Supported on a bounded interval
  - continuous uniform distribution A C
  - beta distribution ABCDE (needs some copyediting)
  - Kumaraswamy distribution A
  - Raised cosine distribution AB E
  - triangular distribution ABC
  - von Mises distribution AB
  - Wigner semicircle distribution ABC
- Supported on semi-infinite intervals
  - chi distribution AB E
  - chi-square distribution ABC E
  - Erlang distribution AB E
  - exponential distribution ABCD
  - F-distribution A
  - gamma distribution AB DE
  - inverse-chi-square distribution A E
  - inverse-gamma distribution AB D
  - noncentral chi distribution A E
  - noncentral chi-square distribution A E
  - noncentral F-distribution
  - Lévy distribution ABC E
  - log-logistic distribution
  - log-normal distribution AB
  - Pareto distribution ABC E
  - Pearson distribution A
  - Rayleigh distribution ABC
  - Rice distribution AB
  - scale-inverse-chi-square distribution A E
  - type-2 Gumbel distribution
  - Wald distribution or inverse-normal distribution
  - Weibull distribution A E
- Supported on whole real line
  - Cauchy distribution ABCD
  - Dirac delta function ABC
  - Fisher-Tippett distribution A C E
  - Generalized extreme value distribution A E
  - hyperbolic secant distribution ABC
  - Landau distribution
  - Laplace distribution ABCD
  - Levy skew alpha-stable distribution ABC E
  - logistic distribution A C
  - noncentral t-distribution
  - normal distribution ABCDE
  - Student's t-distribution A
  - type-1 Gumbel distribution
  - Voigt distribution AB E
Multivariate distributions
- Two or more random variables on same sample space
  - Dirichlet distribution (multivariate Beta)
  - Ewens sampling formula
  - multivariate normal distribution
  - multinomial distribution
Matrix-valued distributions
- Wishart distribution
- matrix normal distribution
- matrix t-distribution
- Hotelling's T-square distribution
Other distributions
- Cantor distribution

[edit] Testing

I put this template on:

to test how it looked.

The purpose for the template is to consolidate the basic information in one spot since it seems rather spotty and inconsistent across the distribtion articles. Please give me some feedback. Cburnett 02:19, 10 Mar 2005 (UTC)

I like it. I've been meaning to work on these articles and to fill in all those details. Let's start with the most important distributions and also create some missing articles about the less common ones. --MarkSweep 07:17, 10 Mar 2005 (UTC)

I think it is an excellent idea, and have added it to a number of probability distribution pages that I am interested in. I have a problem with the idea of simply entering the name of the distribution without adding the words "distribution" after it. For example, the title of the exponential distribution infobox should read "exponential distribution" not just "exponential". Replicating what I wrote on the exponential distribution discussion page:

I understand that the infobox text reads "Name: Exponential" and that makes sense, but what is displayed is just "Exponential" and that makes no sense, its an adjective thats missing a noun to modify. Alternatively, we could change the infobox to display "{{{name}}} distribution"?

I mean, if you were writing a paper on the exponential distribution, would you title it "exponential? Even if its a name, like Poisson, its a modifier. Are there hidden benefits to having a poorly written title? PAR 17:29, 1 Apr 2005 (UTC)

P.S. to Cburnett - sorry if there was any aggravation, I didn't know this page existed.

Discussion is more relevant on this talk page instead of an individual distributions page.

My primary reasons for not wanting distribution in the infobox is that it a) takes up width of the template and will most likely cause it to wrap (ugh) and b) I don't see it as wholly necessary since the " distribution" is at the top of the page. Further point on b, I commonly say (and hear) things like "Let X be gamma" or "Let X be gaussian" since "distribution" is understood and, thusly, implicit. I have no qualms with excluding distribution from the infobox.

Actually, I would just assume drop the name from the infobox than clutter it up... Cburnett 19:03, 1 Apr 2005 (UTC)

Well, we could make it a smaller font, put in line breaks for long ones, etc., but I favor keeping it rather than dropping it. My qualms remain, but its a style issue, not a true-false issue, so I'm not fanatical. Can we try to put another interested contributor on the spot as a tie breaker, e.g. MarkSweep? PAR 19:30, 1 Apr 2005 (UTC)

I'm inclined to side with Cburnett on this, purely for reasons of space. We have articles with long titles like Scale-inverse-chi-square distribution, and I don't see how the full title would fit in an infobox, which shouldn't be wider than 350px. --MarkSweep 04:15, 2 Apr 2005 (UTC)

Ok, I accept the will of the majority but I reserve the right to complain endlessly about it. PAR 06:33, 2 Apr 2005 (UTC)

[edit] Addition field: support

I definitely think Support (mathematics) should be added right above parameters. Though, I think Support (statistics) or Support (mathematics)#Statistics should be created to specifically address pdf/pmf supports. (Posting this here since I won't get to it for a bit.) Cburnett 02:57, 21 Mar 2005 (UTC)

I concur. I'm adding this now. --MarkSweep 21:48, 23 Mar 2005 (UTC)

Actually, I changed the order: parameters first; then support (which may depend on the parameters, e.g. for the binomial distribution; then the pdf and cdf formulas, which depend both on the parameters and the support/domain. --MarkSweep 22:01, 23 Mar 2005 (UTC)

Now, how about a yes/no answer to "In exponential family?" Can't imagine that it'd be fun/easy to work that into each article. Easier to be in the infobox. Cburnett 00:28, 24 Mar 2005 (UTC)

Good idea. Perhaps a combination of that and a link to the conjugate prior distribution, if available? --MarkSweep 23:49, 7 Apr 2005 (UTC)

Oh, and also sufficient statistic etc. --MarkSweep 01:42, 9 Apr 2005 (UTC)

Perhaps we should get everything else done first then worry about adding stuff? :) With the number of distributions I think it'll still be a fair amount of work just to get them all using the template with distribution plots and somewhat cohesive articles to boot. Otherwise, I'd like to see expo family, conjugate prior if it has one, sufficient statistic for N samples, and anything else we can think of.

Though I have talked to some people on IRC and know it won't happen any time soon, but it'd be neat to have an online pdf/cdf generator using gnuplot or something. Put in the distribution and parameters and plot it. Cburnett 03:54, Apr 11, 2005 (UTC)

[edit] Standard Layout for Probablity Distribution Pages

I suggest we continue this on Wikipedia talk:WikiProject Probability. --MarkSweep 03:22, 19 August 2005 (UTC)

[edit] Subpages

These two are used in the template to only show "PDF" or "PMF" depending on the distribution:

[edit] Inverse-gamma

Can anyone verify the pdf image at Inverse-gamma distribution? I don't think I've ever actually seen a plot of the pdf so I'm not sure it's correct. Cburnett 05:44, Apr 7, 2005 (UTC)

I plotted them and, by eye, they look fine. PAR 12:07, 7 Apr 2005 (UTC)

[edit] Italics or not?

Should this be italicized or not?

$X \sim N(\mu, \sigma^2)$ or $X \sim \mbox{N}(\mu, \sigma^2)$

$X \sim Gamma(\alpha,\beta)$ or $X \sim \mbox{Gamma}(\alpha,\beta)$

$X \sim Inv-Gamma(\alpha,\beta)$ or $X \sim Inv\mbox{-}Gamma(\alpha,\beta)$ or $X \sim \mbox{Inv-Gamma}(\alpha,\beta)$

I guess I prefer the italics but tex interprets the hyphen as subtraction (first inv-gamma) but putting the hyphen into an \mbox{} makes it look better (second inv-gamma). So going italics would mildly complicate names unless we drop the hyphen altogher:

$X \sim InvGamma(\alpha,\beta)$ or $X \sim \mbox{InvGamma}(\alpha,\beta)$

Cburnett 17:25, Apr 13, 2005 (UTC)

Well, since no one gave input: I'm using no italics using \mathrm or \mbox (if there's a hyphen). Cburnett 05:14, Apr 24, 2005 (UTC)

That sounds fine. On a related topic, should there be an entry in the infobox to inform about these names? Perhaps the first entry after the plots could be "Formula:

Binom(n, p)

", for example. --MarkSweep 05:47, 24 Apr 2005 (UTC)

To be pedantic, I would prefer something like $X \sim \mathrm{Binomial}(k; n, p)$ which explicitly states the parameters, their order, and the variable used for the support. Though, for binomial I've also see "Bin". It's just whatever we want to set, I guess.

Maybe it's time to start Wikipedia:WikiProject Probability distributions or move this all to Wikipedia:WikiProject Probability (though all this about distributions is just a subset of probability so I think it merits its own project...but not if we're going to set notation of distributions). Cburnett 06:41, Apr 24, 2005 (UTC)

[edit] Uniform Distribution

We need some input on the Talk:Uniform distribution (continuous) page. Michael Hardy and I have had a running discussion on the values of the uniform distribution at the transition points. I think we have settled the text aspect of the problem, but the PDF plot is at issue now. Most of the discussion page is devoted to our back and forth, so if people could read the discussion and put in their two cents, I think we can settle this issue. PAR 03:10, 25 Apr 2005 (UTC)

[edit] Entropy

Is this the right kind of entropy in this context? Maybe "free entropy" should be used instead? (Michael Hardy)

I think MarkSweep or CBurnett added this to the template, we should ask them what they had in mind. I have had a question about this too, because the entropy is defined in the template as:

$S=\int_{-\infty}^\infty f(x)\ln(f(x))\,dx$

and I don't see that definition explicitly in the information entropy article. By the way, what is free entropy? PAR 11:10, 21 May 2005 (UTC)

I think it was me who added that entry to the infobox template. The definition of the entropy functional I've been using is the following:

$\mathrm{\Eta}(f) = - \int_{-\infty}^{\infty} f(x)\,\ln(f(x))\,dx\!$

with the added convention that $0\,\ln(0) = 0$ . This is how entropy is defined in the information entropy article, except that that article uses discrete distributions in its introductory examples and doesn't explicitly mention this integral. The above integral is also the definition that has been used in all other infoboxes (when it exists and can be expressed compactly). I don't see a reason for using a different notion of entropy for the Wigner semicircle distribution~~, just because it arises primarily in physics~~. --MarkSweep 20:59, 21 May 2005 (UTC)

[edit] Compound Poisson - help

I was playing around with a compound Poisson distribution, which is the sum of a number of identically distributed variables X_i. The number of elements in the sum is a Poisson-distributed variable. For every probability distribution, there will be a compound Poisson version. The one I was working on was one in which the X_i are zeta-distributed. My question is - what is the name of this distribution? "Compound Poisson/zeta" or what?. In general, if XXX is the name of a distribution, whats the name of the compound Poisson distribution over XXX? Thanks - PAR 01:15, 7 Jun 2005 (UTC)

[edit] additional field? Exponential family

Many distributions are in the Exponential family and I think including the exponential family form to would be a nice addition. It is true that even the exponential family page has only one distribution in the exponential family form, but I think this would be a worthwhile endevor for the wikipedia. I'm not sure how the specification of the natural parameter should be handled, another field or in the exponential family form field. Pdbailey 17:02, 24 April 2006 (UTC)

[edit] Template needs repairing

One of the box items points to a non-existing page Template:Probability distribution/link. The template therefore needs repairing. I don't know what was intended here, so could someone who does know please make the necessary correction. DFH 10:40, 31 March 2007 (UTC)

I think what is happening here is that this construction splices the entry in the "type" field (i.e., "mass" or "density") into "Probability <blank> function", which is then made into a wikilink. When there is no type entry or one besides those two (e.g., Cantor distribution), the construction fails to parse to anything linkable. How to fix it I don't know, but I hope this helps someone who does. Baccyak4H (Yak!) 20:10, 21 August 2007 (UTC)

I've just noticed that the key to this is the #subpages section above (Bit late for DFH I suspect). It should be straightforward to make another subpage e.g. for singular distributions, if someone really wants to. --Qwfp (talk) 11:53, 10 February 2008 (UTC)

[edit] background color clashes with math png

Whenever one of the fields is done using LaTeX math markup and the result converted to PNG rather than to HTML, the background is white, which clashes with the table background color. Is it possible to change either the table background or the rendering math background to be consistent or at least less clashing?

I believe I tried to find a way to change the math background some time ago but never succeeded, so suspect the table would be where it is possible. Any thoughts or suggestions? Baccyak4H (Yak!) 18:45, 15 April 2008 (UTC)