Talk:Cumulative distribution function

From Wikipedia, the free encyclopedia

WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, which collaborates on articles related to mathematics.
Mathematics rating: Start Class High Priority  Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.

This article is within the scope of WikiProject Statistics, which collaborates to improve Wikipedia's coverage of statistics. If you would like to participate, please visit the project page.

Contents

[edit] Distribution function

This page had a redirect from distribution function, which I've now made into its own article describing a related but distinct concept in physics. I'll try to modify the pages pointing here through that redirect so that the net change in the wikipedia is minimal.SMesser 16:12, 24 Feb 2005 (UTC)

I added a reference to here on the "distribution" page so that "distribution function" appears separately for statistics and for physics Melcombe (talk) 16:35, 21 February 2008 (UTC)

[edit] Cumulative density function

I originally created the redirect cumulative density function in March to point to this article. Why? A simple google test for cumulative density function shows 41,000 hits while cumulative distribution function shows 327,000 hits. Michael Hardy's contention is that "cumulative density" is patent nonsense (see deletion log) and a redirect shouldn't exist.

Regardless of the correctness of "cumulative density", there still is a significant usage of it in reference to this article and its content. "Cumulative density function" is even used in a doctoral thesis. Hardly patent nonsense.

Even if "cumulative density function" is incorrect, someone still may look for it, find nothing, and create an article paralleling this article. If you don't buy the "it's not patent nonsense, or even just nonsense" then I invoke (from WP:R#When should we delete a redirect?) that it increases accidental linking and therefore should not be deleted.

Michael, if you have a problem with the correctness of "cumulative density" then by all means add a section here or change the redirect to an article and explain it there. Either way, cumulative density function needs to be a valid link. Cburnett 14:42, 14 December 2005 (UTC)

I just saw this debate now. I've changed the redirect page into a navigation page explaining the severe confusion. Michael Hardy 21:59, 20 July 2007 (UTC)

[edit] Consistency

Please be consistent! In Probability theory the integral of the "probability density function" "PDF" is called "cumulative density function" CDF or simply "distribution function". Thus the adjective cumulative.

See http://mathworld.wolfram.com/DistributionFunction.html

The term "Cumulative distribution function" is nonsense because it implies the integral of the integral of the PDF. Utterly nonsense! Please correct this link! User:lese 4 Nov 2007.

"Cumulative distribution function" appears in Everitt's Dictionary of Statistics while "cumulative density function" does not. Similarly in the Unwin Dictionary of Mathematics. Melcombe (talk) 16:42, 21 February 2008 (UTC)

[edit] How is this a debate?

The word "cumulative distribution function" is used in many elementary books. It is a pretty stupid term, but we are stuck with it. The best we can do is acknowledge that the term is out there, that is should simply be "distribution function" and that it's definition MUST be with <= or else many tables, software routines, etc will be incorrectly used. —Preceding unsigned comment added by Jmsteele (talkcontribs)

I don't think its a stupid term and I have no problem with it. On the other hand "cumulative density function" is a horribly stupid term. Michael Hardy 21:59, 20 July 2007 (UTC)

[edit] Doesn't make sense

"Note that in the definition above, the "less or equal" sign, '≤' could be replaced with "strictly less" '<'. This would yield a different function, but either of the two functions can be readily derived from the other. The only thing to remember is to stick to either definition as mixing them will lead to incorrect results. In English-speaking countries the convention that uses the weak inequality (≤) rather than the strict inequality (<) is nearly always used."

Surely it doesn't matter at all! Since the probability of one single value is 0, hence the two interval boundaries can be included or excluded.

If you're only interested in integrals. Shinobu 22:50, 7 June 2006 (UTC)
The convention in the entire world is to use '≤' and it matters HUGELY for the binomial, poisson, negative binomial, etc. To use anything else and to rely upon the formulas in any text would lead substantial errors, say when one is using a table of the binomial distribution. Jmsteele 01:18, 21 October 2006 (UTC)
I'm not sure about that. The definition: F(x) = P(X <= x)
Because P(X <= x) = P(X < x) + P(X = x), F(x) = P(X < x) + P(X = x)
Now for normal functions (the kind of functions you mention) P(X = x) = 0.
Of course, there are things like deltafunctions, but that's not what you're talking about. Shinobu 16:27, 27 October 2006 (UTC)

Please consider some very important distributions: The Binomial, Poisson, Hypergeometric. You simply MUST use the definition F(x) = P(X <= x) or else all software packages and all tables will be misundestood. PS I am a professor of statistics, so give me some slack here. This is not a matter of delta functions it is a matter of sums of coin flips ... very basic stuff.

[edit] F(x) vs Phi(x)

I completely disagree with "It is conventional to use a capital F for a cumulative distribution function, in contrast to the lower-case f used for probability density functions and probability mass functions." From all the literature I have read, \Phi(x) \! is the cumulative distribution function and \phi(x) \! is used for probability density/mass functions. Where's the reference to make such a bold claim that F and f are convention? See the probit article which uses \Phi^{-1}(x) \! for the inverse to cdf. -- Thoreaulylazy 19:13, 3 October 2006 (UTC)

There is no such convention - you can pick any symbol you like, of course. It is common practice to use the capital for the cdf, because it's the primitive of the df. I've seen phi in quantum mechanical books, but I've also seen f and rho. Shinobu 22:58, 3 October 2006 (UTC)
From all the literatures I have read, the pair of F and f was the convention. I don't mean to say that Φ and φ are wrong, but how can you be so sure to declare something else as a bold claim? Many different fields have different notational conventions, and we just have to accept it. Musiphil 07:03, 3 December 2006 (UTC)

====This is a collapsed disctintion. One uses Phi for the normal distribution and phi for the normal density. These are reseved symbols for these purposes --- see any statistics book. One uses F and f fo the generic distributions and densities, but these are not reserved. In many books and papers one will find G g , H h etc. Each time the capital representing distribution and the lower case the density.

[edit] Programming algorithm

I've been looking for a better algorithm to generate a random value based on an arbitrary CDF (better than the one I wrote). For example, if one would like to obtain a random value with a "flat" distribution, one can use the 'rand()' function in C's math.h . However, I wrote this function to use an arbitrary function to generate the random value:

                   // xmin and xmax are the range of outputs you want
                        // ymin and ymax are the actual limits of the function you want
                        // function is a function pointer that points to the CDF
long double randfunc(double xmin, double xmax,  long double (*function)(long double), double ymin, double ymax)
{       long double val;
        while(1)
        {       if( (ymax-ymin)*( rand()/((long double)RAND_MAX + 1)) + ymin  < 
                    function(val= ((xmax-xmin)*( rand()/((long double)RAND_MAX + 1)) + xmin)))
                {        return val;
                }
        }
} 

I was trying to find a way to do it faster/better. If anyone knows of anything.. let me know. Fresheneesz 07:53, 27 December 2006 (UTC)

Wikipedia really isn't the place to ask these sorts of questions. The talk pages are more for discussion on the articles themselves. Anyway, i will tell you that Donald Knuth's textbook Numerical recipies in C has a good dissertation on random number generation and also includes algorithms. I would further advise that you read the text, not just implement the algorithms listed there, its quite good! User A1 13:48, 12 March 2007 (UTC)
I think it'd be nice to have something on algorithms on this page. I have actually found a better answer. It invovles either integrating the CDF, and using the definate integral instead of an indefinite integral, or if no definite integral is possible, preintegrate the function and use the numbers prerendered in memory. Fresheneesz 02:39, 13 March 2007 (UTC)
This is a well-understood problem that has been solved many times over. However, to really appreciate the solutions, I recommend that you pick up a graduate-level textbook on random variables (for example, the book by Papoullis and Pillai). I think you'll find that given an arbitrary CDF and a random variable that is uniformly distributed from 0 to 1, the inverse of the CDF will transform the uniformly distributed random variable into a randomv ariable with that CDF. That is, if your desired CDF is F, the function F-1 will transform a random variable distributed between 0 and 1 to a random variable distributed by F. This can be used to motivate such algorithms. HOWEVER, I think you'll find that if you know more about the particular random variable you are generating, there are much more efficient ways to generate that random variable from a uniform random variable. Again, an understanding of the underlying probability will greatly simplify the generation of such algorithms. (students of probability are often asked to generate such algorithms as homework problems in, for example, MATLAB) --TedPavlic 21:19, 8 April 2007 (UTC)

[edit] cdf vs pdf

Hello,

i removed the comment that probability distribution function is the same as CDF, which i assert to be wrong. My reference is "Probability and Statistics for Engineering and the Sciences" pp 140 (J. Devore) . The PDF is the same as the probability density function, not the CDF. The CDF is the integral of the PDF, not the PDF itself.

Please comment. 129.78.208.4 05:28, 12 March 2007 (UTC)

The "probability distribution function" is the same as the "cumulative distribution function" (CDF) and the "distribution function". The "probability density function" (PDF) is the derivative of the probability distribution function. See e.g. [1]. --X-Bert 22:22, 7 April 2007 (UTC)
For more information, you should look into measure theory, which is the basis for probability. Originally, the word probability was prepended to measure theoretic concepts to imply a special structure of the measure being used. However, because probability is now used by many who do not have the mathematical sophistication for measure theory, lots of other terms have been introduced (often accidentally) to hide the roots of probability. Thus, the language is now quite sloppy. Reviewing the measure theoretic roots of probability clears up any confusion about why the terms that are used in probability have the names that they do. --TedPavlic 21:10, 8 April 2007 (UTC)

[edit] Properties Notation

The formula after 'If X is a discrete random variable, then it attains values x1, x2, ... with probability pi = P(xi),' has the final sum of p(xi), why would this not be the sum of pi since we have already introduced pi? Also, I think pi = P(xi) should be introduced as pi = P(X=xi) for clarity? Chrislawrence5 17:31, 16 April 2007 (UTC)