Talk:Probability distribution

From Wikipedia, the free encyclopedia

WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, which collaborates on articles related to mathematics.
Mathematics rating: B Class Top Priority  Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.

This article is within the scope of WikiProject Statistics, which collaborates to improve Wikipedia's coverage of statistics. If you would like to participate, please visit the project page.

Contents

[edit] CDF vs. PDF

For Continous Random Variable you ARE giving the Cumulative Distribution Function, not the Probability Density Function...eh?


The cdf is defined for all random variables, discrete or continuous, so it is a better starting point than either the probability function or the density function. In one case you use differences to get the probability function and in the other you use the derivative. Most students are introduced to the derivative before the integral, so this approach is a bit more accessible -- DickBeldin


The probability that a continuous random variable X takes a value less than or equal to x is denoted Pr(X<=x). The probability density function of X, where X is a continuous random variable, is the function f such that

  • Pr(a<=X<=b)= INTEGRAL ( as x ranges from a thru b) f(x) dx.

Correct, but F[b]-F[a] gives the probability of an interval directly without all the complications. We hide the complications in the cdf. It is inconvenient that we can't feature the explicit form of the cdf for many of the distributions we like to use, but it is important to build the concepts with proper spacing of the difficulties. One hurdle, then a straight stretch, then a curve, then another straight ... --DickBeldin----

You may present this material as you feel best. I don't disagree with your argument. But, mislabelling definitions is never okay. You have defined the probability density function for continuous random variables with the cumulative distribution function for the same. RoseParks

Surely you mean absolutely continuous. And defining the pdf from the cdf the right way to do things. If you want to go to first principles, you need to specify a Borel measure on the real numbers, and the best way to do that is using a Lebesgue-Stieltjes measure. In probability theory, you call measures distributions and the Lebesgue-Stieltjes measure is called the cdf. -- Miguel


[edit] Is this a joke?

Please read the first paragraph; this is a graduate-level description of probability distribution, yet probability distribution is a topic even high school students can be expected to lookup on Wikipedia. I'd fix it, but I am not smart enough. Perhaps you can fix this? —Preceding unsigned comment added by 128.12.146.118 (talk) 01:08, 29 November 2007 (UTC)

someone has already made substantial improvements since I posted above; thanks! —Preceding unsigned comment added by 128.12.146.118 (talk) 07:50, 29 November 2007 (UTC)

[edit] Restriction to real-valued variables

The definitions given on this page seem much too limited. A probability distribution can be defined for random variables whose domain is not even ordered (take the multinomial, for instance). In these cases, the cumulative distribution function makes no sense. To claim, as this page does, that the distribution must have the reals as the domain is nonsense.

Agreed, but it seems to be customary to use this restricted interpretation of the domain. It is possible to define the cdf for vector-valued random variables (incuding your example) but this is very clumsy. Vector-valued functions are usually treated as collections of correlated real variables. -- Miguel

[edit] The Boltzmann distribution

The so-called Boltzmann distribution is a strange beast to include in the list of discrete distributions, as are all the "special cases" listed under it. The reason is that the Boltzmann distribution is just a rule that, given a collection of states (not necessarily a set of real numbers) and their energies (not necessarily all distinct) gives a probability measure on the collection of states. It can be applied to discrete and continuous collections of states, and especially in the discrete case there is no reason why the states should be labelled by real numbers. Some of the special cases, for instance the Maxwell-Boltzmann distribution, are not even discrete! — Miguel 21:32, 2004 Apr 24 (UTC)

  • All true, but it is still an important distribution. The fact that it has a strong relationship to physics does not single it out. --Pdbailey 01:35, 1 Sep 2004 (UTC)

[edit] Rare events

I think calling the Poisson and associated "counting" distributions as regarding 'rare random events' is not quite right. For example, counting decayse of Potassium-40 with a gamma spectrometer, one could count a hundred per second... I would edit it, but I can't come up with a better way of saying it. Can you?--Pdbailey 01:32, 1 Sep 2004 (UTC)

Nonetheless, they are rare in the sense intended. The reason they are Poisson-distributed is that there is only one decay out of each zillion or so opportunities. Michael Hardy 14:18, 2 Sep 2004 (UTC)
... and besides, if you were looking at some very long time -- say several seconds -- you'd probably want to model it as a normal distribution rather than as a Poisson distribution. Michael Hardy 14:19, 2 Sep 2004 (UTC)
Well, I understand what you are saying, but the point is to make it easier to understand. If I get 2 counts per second, then after 30 seconds, the normal distribution isn't going to do it. But the events are not rare. Me washing my car is rare, something that happens at 2 Hz is not rare. Pdbailey 05:17, 4 Sep 2004 (UTC)
The original meaning of rare in this context was that the probability of two events occurring simultaneously is zero. You may disagree with that use of the word, but we're stuck with it for historical reasons. I have heard that one of its first uses in the 19th century was to model the occurrence of army officers being kicked by horses.
Also, it becomes clear in what sense the Poisson distribution is rare if you look at its derivation as a limit of the Binomial.
Finally, whether two decay events per second are rare depends on the timescales involved. For you a second seems like a very short time, but when talking about subatomic physics a second is an eternity. Similarly, you think washing a car is a rare event bacause you measure the frequency per day, or per week. If you do it per year it ceases to be "rare" by your definition.
There are three time scales involved here: the time resolution of the experiment; the average time interval between events; and the unit of time used to express frequencies. The unit is irrelevant. If the resolution is much smaller than the interval, you use Poisson. If it is much larger, you use normal (as an approximation: Poisson is still exact). — Miguel 16:25, 4 Sep 2004 (UTC)
Your arguments (refered to by paragraph) do not hold water. (1) The defintin of rare is well given by the wiktionary as "very uncommon; scarce", and how somebody misued it centurys ago while describing this perticular distribution does not change the fact that this is misleading. (2) it becomes clear not the it is rare but that each iota of the Poisson distributed events is unlikely, but not rare. Phone calls ariving at a help center will often be frequent (think many per second) and are Poisson distributed. The chance that any given person called is very low. (3) we can dismiss this out of hand with the previous example.
The Poisson distribution is the limit of the Binomial distribution when the probability of success goes to zero (hence rare events) but the average number of successes per unit time is kept constant when taking the limit. Hence, rare events. If you don't get it, you don't get it.
There is such a thing as historical accidents, conventions and tradition in the way science, technology and all of human knowledge is organized. You have to live with that, and wikipedia is not the place to revolutionize notation or terminology. If you don't get it, you don't get it.
Miguel 03:02, 7 Sep 2004 (UTC)
Miguel, please explain to my how a call center that receives 20 calls per second is observing rare events? Read the definition, "scarce." 20 calls per second is hardly a drought. What I have changed it to, "which describes a very large number of individually unlikely events" is more accurate (captures the derivation from binomial distribution). If you have another wording that is more accurate, please, propose it. Pdbailey 04:43, 7 Sep 2004 (UTC)
I told you the unit in which you measure time is irrelevant. You are talking about .05 calls per millisecond.
Now seriously, your change is ínaccurate because the Poisson distribution can describe a very small number of individually unlikely events, too. The name "distribution of rare events" is something we're stuck with for historical reasons, and it is a synonim for "Poisson distribution". try this:
The Poisson family of distribution describes rare independent events and is parameterized by the average number of events occuring. Note that the average number of events can be large or small depending on the situatin, and that it is the individual events that are "rare".
Here's the problem: the Poisson distribution is as ubuquitous as the normal distribution and has meny applications. The intuitive explanation why the Poisson distribution applies in one particular situation may be "misleading" in another situation. The list of probability distributions is not the place to discuss those nuances, that's what the article Poisson distribution is for. — Miguel
Miguel, please answer this set of questions. Rare is a relative word, if you want to be clear, almost any other word would be better. Please expalain why you want to use it. You keep saying that we are stuck with it for historical reasons. What is your argument? why do we have to use it based on 'historical reasons.' What are the historical reasons?
We are stuck with it for reasons of tradition. That is the name it was given in old texts, and such things propagate as people copy each other. — Miguel 17:17, 2004 Sep 12 (UTC)
I do not like your definition because it is overly wordy, "note...can be...depending...and that it is..."
My definition is very tight and accurate, let me argue for it.
which describes a very large number of individually unlikely events that happen in a certain time interval.
This is inaccurate: it can describe a very small number of events, too. Can't you see that the number of events (per unit time) can be any positive number, large or small? Can't you see that simply changing the unit of time can make this average as large or small as you please? — Miguel 17:17, 2004 Sep 12 (UTC)
Poisson distributed events must be a large number (look at the proof on the page for the Poisson distribution) each of which must be unlikely (look at the proof on the page). The time interval bit diferentiates it from the Erlang distribution. Again, this definition is short, accurate, and even hints at the proof.
Finally, this page (discussion) is for the discussion of entries on this page. So long as the sentence is on this page, discussion about it belongs here.--Pdbailey 00:10, 11 Sep 2004 (UTC)
Do as you please, I don't care any more. — Miguel 17:17, 2004 Sep 12 (UTC)

[edit] other distributions

Are the Rayleigh distribution and Rician distribution important enough to be included in the List? -FZ 15:13, 6 Jan 2005 (UTC)

What about the Nakagami-m distribution? -Mangler 12:05, 26 July 2005 (UTC)

[edit] Zipf and Zipf-Mandelbrot

Zipf's law is for a finite N = number of elements or outcomes, for example, the number of words in the English language. When N becomes infinite, Zipf's law becomes the zeta distribution. Zipf-Mandelbrot law is a generalization of Zipf. When N becomes infinity, Zipf-Mandelbrot becomes something, I don't know what at this point, but whatever it is, it involves the Lerch transcendent function just as the zeta distribution involves the Riemann zeta function. PAR 08:04, 12 Apr 2005 (UTC)

You're right. Sorry about moving it back – I had looked at the support field in the infobox for Zipf's law and decided to recategorize the article here. The right thing to do for me would have been to fix the support field, which suggested that k ranges over the full set of natural numbers. I'm gonna work on that now. --MarkSweep 13:58, 12 Apr 2005 (UTC)

Ok, I'll go ahead with some plots for Zipf and zeta. PAR 15:06, 12 Apr 2005 (UTC)


[edit] The Economist

This page was featured in The Economist at Psychology - Bayes Rules.

From: Schaefer, Tom PAX Tecolote Sent: Monday, May 08, 2006 9:24 AM To: 'robert.dragoset@nist.gov' Cc: McDowell, Jeff HSV Tecolote Subject: Uncertainty in Physical Values

Hi PhD Dragoset,

   I hope your presentation on units to OASIS went well, and the issue is being given the priority and urgency it deserves.
   The input values required for the execution of our cost estimating models are often uncertain, and rather than a discreet value, can be more accurately described as a range or distribution of values.  I would like you to consider helping to develop an XML standard for representing the uncertainty of a numeric value in a standard way that “Monte Carlo” and other analysis tools could interpret universally.  The intent would be to develop an XML element that could be used almost anywhere a quantitative attribute is currently used (the current discreet or point value being a subset of the element).  The parameters for common distribution types, such as uniform, triangular, Gaussian, beta, Poison, Weibull, etc. ( http://en.wikipedia.org/wiki/Probability_distribution ) would be supported, as well as a way to represent a data set of values to sample from.
   Do you have any interest in this topic?  It could have profound importance to the meaningfulness of data exchanges and quality of analysis.

Tom Schaefer Senior Technical Expert Tecolote Research, Inc.

[edit] Diagnostic Tool

I think there should be a section on randomness as a diagnostic tool in certain mathematical applications, such as regression etc. Just a thought... --Gogosean 20:58, 15 November 2006 (UTC)


[edit] Plots

The plots in general are quite illustrative and pretty. However, for the discrete distributions (Poisson, etc.) wouldn't it make more sense to have bar-like plots rather than connected points? The segements between the points have no meaning as far as the distribution is concerned; only the frequency value does. I understand it's tricky to superpose bar graphs, but there are ways around that, like outlining the bars or making separate plots. Also, the order of the eplots seems arbitrary. Why is the relatively obscure Skellam distribution near the top and the Gaussian at the very bottom? Shouldn't the beta and the gamma be closer? Shouldn't the t and F distributions be included? Or the binomial and negative-binomial? -- Eliezg 01:49, 9 December 2006 (UTC)

[edit] Links to source code & a Statistical Distribution Explorer from all Wikipedia Distribution entries.

I trust it will be in order to add the following link

Math Toolkit Boost (candidate)

to the 'other links' for ALL statistical distributions.

This links to information about C++ open-source which is soon to be reviewed for inclusion in the Boost C++ library. (Many of the Boost Libraries go on to become ISO Standard Libraries). It will soon also include a Windows Distribution Explorer using this C++ code that will allow evaluation of nearly all the properties (mean, variance...) of most distributions and their PDF, CDF, complements and quantiles: a direct link to this should also be useful.

--Paul A Bristow 14:52, 15 January 2007 (UTC)


[edit] Lattice Distribution

The random variable takes value a+nb where n is integer, and b>0. This is a discrete distribution with infinite support. Jackzhp 14:48, 18 February 2007 (UTC)

[edit] symmetric vs. asymmetric distribution

Can someone please classify all the distributions into symmetric and assymmetric distributions? Jackzhp 15:05, 24 February 2007 (UTC)

[edit] Merge from dicrete probability distribution!

Please merge in any text that was missed.

In probability theory, a probability distribution is called discrete, if it is characterized by a probability mass function. Thus, the distribution of a random variable X is discrete, and X is then called a discrete random variable, if

\sum_u \Pr(X=u) = 1\qquad\qquad\qquad(1)

as u runs through the set of all possible values of X.

If a random variable is discrete, then the set of all values that it can assume with non-zero probability is finite or countably infinite, because the sum of uncountably many positive real numbers (which is the smallest upper bound of the set of all finite partial sums) always diverges to infinity.

Typically, this set of possible values is a topologically discrete set in the sense that all its points are isolated points. But, there are discrete random variables for which this countable set is dense on the real line.

The Poisson distribution, the Bernoulli distribution, the binomial distribution, the geometric distribution, and the negative binomial distribution are among the most well-known discrete probability distributions.

[edit] Alternative description

Equivalently to the above, a discrete random variable can be defined as a random variable whose cumulative distribution function (cdf) increases only by jump discontinuities — that is, its cdf increases only where it "jumps" to a higher value, and is constant between those jumps. The points where jumps occur are precisely the values which the random variable may take. The number of such jumps may be finite or countably infinite. The set of locations of such jumps need not be topologically discrete; for example, the cdf might jump at each rational number.

[edit] Representation in terms of indicator functions

For a discrete random variable X, let u0, u1, ... be the values it can assume with non-zero probability. Denote

\Omega_i=\{\omega: X(\omega)=u_i\},\, i=0, 1, 2, \dots

These are disjoint sets, and by formula (1)

\Pr\left(\bigcup_i \Omega_i\right)=\sum_i \Pr(\Omega_i)=\sum_i\Pr(X=u_i)=1.

It follows that the probability that X assumes any value except for u0, u1, ... is zero, and thus one can write X as

X=\sum_i \alpha_i 1_{\Omega_i}

except on a set of probability zero, where \alpha_i=\Pr(X=u_i) and 1A is the indicator function of A. This may serve as an alternative definition of discrete random variables.

[edit] Merge from continuous...

Ditto!

In probability theory, a probability distribution is called continuous if its cumulative distribution function is continuous. That is equivalent to saying that for random variables X with the distribution in question, Pr[X = a] = 0 for all real numbers a, i.e.: the probability that X attains the value a is zero, for any number a.

While for a discrete probability distribution one could say that an event with probability zero is impossible, this can not be said in the case of a continuous random variable, because then no value would be possible. This paradox is resolved by realizing that the probability that X attains some value within an uncountable set (for example an interval) cannot be found by adding the probabilities for individual values.

Under an alternative and stronger definition, the term "continuous probability distribution" is reserved for distributions that have probability density functions. These are most precisely called absolutely continuous random variables (see Radon – Nikodym theorem). For a random variable X, being absolutely continuous is equivalent to saying that the probability that X attains a value in any given subset S of its range with Lebesgue measure zero is equal to zero. This does not follow from the condition Pr[X = a] = 0 for all real numbers a, since there are uncountable sets with Lebesgue-measure zero (e.g. the Cantor set).

A random variable with the Cantor distribution is continuous according to the first convention, but according to the second, it is not (absolutely) continuous. Also, it is not discrete nor a weighted average of discrete and absolutely continuous random variables.

In practical applications, random variables are often either discrete or absolutely continuous, although mixtures of the two also arise naturally.

The normal distribution, continuous uniform distribution, Beta distribution, and Gamma distribution are well known absolutely continuous distributions. The normal distribution, also called the Gaussian or the bell curve, is ubiquitous in nature and statistics due to the central limit theorem: every variable that can be modelled as a sum of many small independent variables is approximately normal. —Preceding unsigned comment added by MisterSheik (talkcontribs)

[edit] The intial sentence is terrible!!

Here it is:

In probability theory, a probability distribution is a function of the probabilities of a mutually exclusive set of events.

That is idiotic nonsense. I had no idea this article was in such profoundly bad shape. I'm going to have to think about how to rephrase this. Michael Hardy 17:24, 7 May 2007 (UTC)

[edit] And now another introductory sentence

Now it says:

In probability theory, every random variable is a function defined on a state space equipped with a probability distribution that assigns a probability to every subset (more precisely every measurable subset) of its state space in such a way that the probability axioms are satisfied.

That makes sense, except as an introductory sentence. I'll think about what would be good in that role. Michael Hardy 22:17, 11 August 2007 (UTC)

When you say state space ("... of its state space in such a way ..."), do you mean sample space? A state space is what you are defining. From the first paragraph of the random variable article, "Formally, a random variable is a measurable function from a sample space to the measurable space of possible values of the variable". Also, given that this article and the random variable article will very likely be referenced together, it might be nice for the two definitions to be more obviously equivalent. —Preceding unsigned comment added by 74.211.70.98 (talk) 07:57, 8 September 2007 (UTC)

Also, although outside the scope of this article, state space is not very well defined. —Preceding unsigned comment added by 74.211.70.98 (talk) 08:22, 8 September 2007 (UTC)

[edit] Accessibility

"Probability distribution" is a term that many non-mathematicians encounter while reading about the application of statistics to non-mathematical subjects. However, the introductory paragraph of the article is completely incomprehensible to anyone not trained to a fairly high level in statistics / probability theory. Would it be possible to summarize the concept in less specialized language as well as giving the formal definition? This is a question and not a complaint.Spiridens (talk) 15:58, 18 November 2007 (UTC)

In view of the many comments regarding the technical nature of the intro, and of the high importance of this article, I took a stab at making it general and comprehensible. I'm not sure how well I succeeded, but there it is. Best, Eliezg (talk) 02:44, 29 November 2007 (UTC)

[edit] "Greater than" vs. "different": a discourse on dartboards

An IP changed the following sentence in the introduction:

"The probability of landing within the small area of the bullseye would (hopefully) be greater than landing on an equivalent area elsewhere on the board."

to:

"The probability of landing within the small area of the bullseye could be different than landing on an equivalent area elsewhere on the board."

I reverted the change, which was clearly made in good faith and is probably technically more accurate. The original wording is meant to indirectly justify the presumably unimodal distribution of dart landings. Even a very poor dart player would have a slightly higher chance of landing IN THE BULLSEYE than within a bullseye-sized area far away, while an excellent dart thrower (aiming for the bullseye) would have a smaller variance on the landing distribution. The parenthetical "hopefully" is unattractive, and perhaps a total rewording or a different example would be preferable, but the idea is that the reader would (hopefully!) have an intuitive feel for the two-dimensional continuous distribution of dart landings. The great thing about dartboards, after all, is that they record all of the aggregated data, and it is invariably clustered around the bullseye (while featuring prominent outliers near the ceiling and the floor and, perhaps, the bartender's bottom). Best, Eliezg (talk) 04:57, 25 December 2007 (UTC)

[edit] Split

Should the List of important probability distributions be split off into its own article? It seems like a good idea, but I figured I would post here before doing it myself. Silly rabbit (talk) 14:13, 12 March 2008 (UTC)

There is a lot of overlap with List of probability distributions. One or the other needs tidying. Perhaps only a few very important ones should be listed in this page, leaving the job of a comprehensive list to the list page? Tayste (talk - contrib) 18:56, 12 March 2008 (UTC)
Oooo. There already is a list. I think the answer is to get rid of the list in this article (moving content to List of probability distributions as needed), and then to try to tie together the section here with prose. Silly rabbit (talk) 21:09, 12 March 2008 (UTC)
I am against removing the list. But it's o.k. to reduce its size here. The existence of the list in this page helped me realize that there are many distributions, and that they are divided into groups. I think that if, at all, we are going to split, then this article needs to have many references to the List of distributions, and highlight the existence of that page, or else, people will not be aware of this important and useful list. Sandman2007 (talk) 17:00, 3 April 2008 (UTC)