Talk:Pareto distribution

From Wikipedia, the free encyclopedia

For those visual thinkers among us, can we have an example graph of this?

A very dull graph: starting at xmin, the density falls as x increases and the cumulative distribution rises, each with a slope which becomes shallower for large x.
I have removed the statement If the value of k is chosen judiciously then the Pareto distribution obeys the "80-20 rule" since it depends on a right truncation which this distribution doesn't have; allowing such truncation judiciously would mean most distributions met the "80-20" rule.--Henrygb 00:13, 6 Aug 2004 (UTC)

Contents

[edit] Technical cleanup tag

It was commented to me that articles like this are not and should not be aimed at non-technical readers. WikiProject Science and other communal efforts I've seen generally have the goal of making the first part of the article accessible to the general public, but allowing for later parts which may be intelligible only to technical readers. That's certainly possible to do in this case.

I'm the one who commented, and I didn't say "articles like this"; I said articles on probability distributions, and I didn't just say "non-technical readers"; I said readers not familiar with the mathematical theory of probability. Michael Hardy 02:47, 6 Mar 2005 (UTC)

Because Pareto distributions are used in economics and sociology with regard to political issues of public interest, it's entirely likely that non-technical readers will arrive at this article needing to know what this thing is. Not necessarily in precise detail, but in vague outline, at least.

This article isn't very accessible even to many technical readers. I have a degree from MIT, and I've taken math up through differential equations.

But the relevant question is whether you've studied probability theory. Course 18.440 at MIT, titled Probability and Random Variables, does not require "up through differential equations", but only first-year calculus, which most MIT students have before entering as freshmen (and "first-year" is construed differently at MIT in this case). Anyone who's studied continuous probability distributions (and not just at MIT!) can understand this article. But yes, probably some things could be said for "lay" readers initially. Perhaps because of this distribution's occurrence in social sciences, that would make more of a difference in this case than with most probability distributions. But generally, articles on mathematics shouldn't need to be comprehensible to everyone who's studied only high-school math. Michael Hardy 02:47, 6 Mar 2005 (UTC)

I could make a graph (either mentally, digitally, or on paper) that plots a typical Pareto distribution, but that would be a lot of work that I shouldn't really have to do. I'm sure there are many scientists and computer engineers who would benefit from a better introduction.

... actually, if I were trying to write an introduction for non-mathematically inclined social scientists, a graph wouldn't be the first thing I would attend to. Maybe I'll work on this at some point .... Michael Hardy 02:47, 6 Mar 2005 (UTC)

Fortunately, I think all this article needs to be much more widely accessible is a graph or two of typical Pareto distributions, with labels and a brief explanation. -- Beland 02:19, 6 Mar 2005 (UTC)

I did study probability theory, back in the day, and found the article a bit terse. What I hoped to see at the start was a few extra paragraphs:

1. A brief general introductory paragraph or two pitched at people with only a craps or texas-hold-em knowledge of probability - why it matters, the elevator speech statement of what it means, etc.

2. Move the short section on things claimed to match a Pareto from the bottom of the article, with perhaps a few hard numbers added to it. (The usual - for k=1, x% will be <=3, with similar for k=2 or 3. This is still fluffy, but gives a numerical feel to that graph and the fluffy stuff in the first paragraph

Pretty much the same content, but with the take home goodies near the top. --ScottEllsworth 08:00, 20 Mar 2005 (UTC)

[edit] Pareto density at xmin

Don't we need to come up with a value at the transition point x=xmin? I'm in favor of x=(1/2) k/xmin because it allows definition in terms of, say, the Heaviside step function, and F-1(F(p(x)))=p(x) uniformly where F is the fourier transform. Whatever we come up with, I will alter the graphic accordingly. Paul Reiser 23:27, 15 Mar 2005 (UTC)

For purposes of probability theory, the value of a density at a boundary point does not matter since it does not affect the value of any integral. But for pruposes of maximum likelihood estimation in statistical inference, you'd probably want to make it the maximum. Therefore I would not use half the maximum. But of course, the inverse Fourier transform of the Fourier transform of the density may give you half the maximum. Michael Hardy 00:08, 16 Mar 2005 (UTC)


        • Comment by an actuary*****

Why is the exponent called "k" ? This tends to make one think that the parameter only takes on integral values, which is not true. European actuaries use alpha, Americans use "Q"--either would be better.

Technically, when the exponent is < 1, the mean "does not exist"; "is infinity" is slightly off. Similar comment for < 2, variance.

Consider mentioning that the Pareto is often shifted so its support starts at 0; put in a reference to "shifted distribution."

Note that conditional distribution is also Pareto with the same exponent.

Maybe note that Method of Moments parameter estimation doesn't work (even more so than usual!). (Because setting the mean equal to the sample mean implies an assumption that the exponent is at least one.)

Asymptotic theory says that asymptotically, tails of distributions (if not of finite support) look exponential, or Pareto. Should link.

[edit] Error in CDF formula for Pareto

I believe that the expression for the cumulative distribution function has an error. It currently reads

cdf =1-\left(\frac{x_\mathrm{m}}{x_\mathrm{m}+x}\right)^k\!

and should read

cdf =1-\left(\frac{x_\mathrm{m}}{x}\right)^k\!

This is perhaps part of the confusion arising out of not shifting the origin to x_m. There should also be a reference to the excellent (highly technical) article in mathworld: http://mathworld.wolfram.com/ParetoDistribution.html

Unless I get, within a short period of time, some indication that I am wrong, I will change it in the main article.

[edit] I got the wrong PDF?

I Changed the

cdf =1-\left(\frac{x_\mathrm{m}}{x}\right)^k\!

For this one

cdf =1-\left(\frac{x_\mathrm{m}}{x_\mathrm{m}+x}\right)^k\!

I got the result from integrating a pdf...which is a bit different from the one given; it is essentially the same one but mine did not shift the origin to x_m ... this is not a fake result or anything. Something should indicate this "kinda" conflict between 2 version of the same probability function. but definitively...i will remove my mistake...only because the current cdp does not reflect the shifting nature of the pdf. I'll specify in the generating topic that it will generate a random sample from a non shifted pareto distribution.
Cyberyder 04:24, 6 April 2006 (UTC)

[edit] Alternative R code

The provided R code for random sample generation does not translate from the origin to lambda, and thus yields numbers lower than lambda. A good alternative that provides the wanted values directly can be found in [1].