Talk:Central limit theorem

From Wikipedia, the free encyclopedia

Contents

[edit] This is to bring contents to the top

I removed this:

An interesting illustration of the central tendency, or Central Limit Theorem, is to compare, for a number of lifts (elevators for those on the left-hand side of the Atlantic), the maximum load and the maximum number of people. For small lifts holding only a few people, the maximum load divided by maximum number of people is usually greater than it is in large lifts holding a larger number of people. This is necessary because some small groups of people who fill the lift may well have several people who are above average weight (just as, on other occasions, other small groups may have several who are well below average weight), whereas the larger the sample (the number of people in the large lift) the nearer the proportion of overweight people will be to the norm for the whole population.

While it is a nice example, it doesn't illustrate the Central limit theorem, whose gist is that the sum is normally distributed. I don't quite know where to put this example though. Maybe in standard deviation or normal distribution? AxelBoldt 21:02 Oct 14, 2002 (UTC)


I've encountered another definition of "the" central limit theorem.

My statistics textbook (Mathematical Statistics with Applications, 6th edition, by Wackerly, Mendenhall III, and Scheaffer) defines it in this way:

If Y1, Y2, ..., Yn are iid with μ and σ, then n1/2*(Ybar-μ)/σ converges to a standard normal distribution as n goes to infinity. (my paraphrase)

The HyperStat on-line basic statistics text says

The central limit theorem states that given a distribution with a mean m and variance s2, the sampling distribution of the mean approaches a normal distribution with a mean (m) and a variance s2/N as N, the sample size, increases. (quoted directly)

I suppose this follows from the definition given in this article. Nonetheless, it is not identical to the one given in the article.

Is there a general trend for more basic/applied statistics books to use this mean-centric definition, while more advanced/theoretical ones use the definition given in the article? Is the definition given in the article better somehow? (I assume the mean-centric definition can be derived from it, but not vice versa.) Should the article also mention the mean-centric definition, since it seems to be somewhat popular?

--Ryguasu 10:52 Dec 2, 2002 (UTC)

No --- the "mean-centric" version and the "sum-centric" version are trivially exactly the same thing; either can be derived from the other, and it's completely trivial: Just multiply both the numerator and the denominator by the same thing; you need to figure out which thing. Michael Hardy 04:34 Feb 21, 2003 (UTC)
Right. This became obvious to me sometime after posting the question. Nonetheless, I think I'm going to stick in the mean-based formulation at some point; I've found more books using only the mean-based definition, and I imagine that some not so mathematically inclined people who nonetheless have to brush up against the CLT (certain social scientists come to mind) might like having what is not trivial to them pointed out. I agree, however, that unless proofs of the CLT typically involve the mean-based formulation, the one currently given on this page should be presented as more fundamental. --Ryguasu

Maybe I'm getting in over my head here, but do you really need to normalize Sn to say anything precise here? Can't we clarify the first "informal" claim of convergence of Sn by saying, parallel to what AxelBoldt has said for the normalized (i.e. Zn) case

The distribution of Sn converges towards the normal distribution N(nμ,σ2n) as n approaches ∞. This means: if F(z) is the cumulative distribution function of N(nμ,σ2n), then for every real number z, we have
limn→∞ Pr(Snz) = F(z).

Is there a lurking desire here to state the non-standard normal part as a corollary, rather than as central to the CLT? That might be ok, although the general-purpose version looks more useful to me.

--Ryguasu 01:18 Dec 11, 2002 (UTC)

The problem is that on one side of your equality you have a limit as n approaches infinity, so that the value of that side does not depend on anything called n, and which CDF you've got on the other side does depend on the value of n. -- Mike Hardy

Actually, the CDF on the right hand size depends on z, not on n. There are no free ns anywhere. --Ryguasu

It does depend on n, but your notation inappropriately suppresses that dependency. You defined F(z) as the cumulative distribution function of N(nμ,σ2n). AxelBoldt 02:23 Dec 14, 2002 (UTC)

Excellent point. Nonetheless, I find it suspicious that someone with more mathematical experience than me can't express the "informal" claim in a rigorous manner. At Talk:Normal distribution, you mentioned "goodness of fit" tests. Couldn't you express the informal version formally, through some limit statement about the results of such a test as the number of samples/trials goes to infinity? --Ryguasu 02:11 Jan 30, 2003 (UTC)

Probably, I don't know. But the version given in the article is also a rigorous statement of the "informal" claim you have in mind. AxelBoldt 00:55 Jan 31, 2003 (UTC)


How about adding some examples? (This is something most of the math pages are lacking.) How about an illustration involving coin flips? I.e., X_n is defined on the probability space [0, 1] so that X_n is 1 with probability 1/2 and -1 with probability 1/2. A series of graphs and equations could be given.


In the article, there is a comment reading, "picture of a distribution being "smoothed out" by summation would be nice". I've created an animated gif to address this comment. Since animated gifs are considered questionable, I am posting it to the talk page to see if others think it's a good idea. (The image has a rather large footprint on the screen. If anyone can easily shrink it, that would be good. With the rather rudimentary image manipulation tools at my disposal, it would be a moderately involved undertaking for me, so I'm not going to do it unless it's a worthwhile effort.)

I also propose the following explanatory text:

The figure below demonstrates the central limit theorem in action. It shows the distribution of the random variable Y = nSn for values of n from 1 to 7. (In this particular case, the random variables Xi have variance equal to 1, so the variance of Sn is equal to n. The factor n scales Y so that its variance is equal to 1 independent of n.)

Image:clt_in_action.gif

Any and all comments appreciated. -- Cyan 22:15, 2 Feb 2004 (UTC)

Testing...

The central limit theorem in action
The central limit theorem in action

Yes, using the thumbnail feature would be a quick work-around. I don't know anything about this, but the diagram seems useful to me (it's particularly useful that it pauses between repetitions). You can count along in your head 1 to 7 as the shape of the graph changes, it doesn't rely on captions you need to read at the same time as observing the graph. I give it my uninformed support.  :) (Plus, if this is replicating information already included in the text then that's even better; relying on an animated gif to impart key information rather than to give an example of it would be a bad thing). fabiform | talk 04:32, 4 Feb 2004 (UTC)

An animation can't be printed, and I've always found animated diagrams to be very frustrating, particularly in a case like this. I have to wait for it to come around again if I'm trying to wrap my head around some individual part of it. There's no pause button, no frame forward, no rewind, at least in most browsers. I'd rather see such images side by side in most cases. Perhaps an animation in addition might be neat, but forcing it on readers is to me not friendly.
Here's a quick vertically flattened version (which could float to the side of the body text, for instance). A horizontal version might be better, or break it on two lines. --Brion 09:15, 4 Feb 2004 (UTC)
Image:Smoothing by summation sample.png
My $0.02:
a) This is, indeed, an example of an appropriate use of an animated GIF. There's no actual need to change it. However...
b) I actually think that in this particular case the separate pictures are really just as good. I find the animation irritatingly jumpy, and, of course, the constant-time steps are too fast for the early steps (where you might even want to take a moment to visualize the convolution in your head, and notice that you go from two sharp peaks to three blunt peaks to a single broad peak with four bumps), and too slow for the later steps (which all look alike). This is a nit-pick, though.
c) Footprint of the animated version is OK. Note, however, that you could easily reduce the extent of the X axis to +/- 3.5. Maybe by the last iteration there is some data outside those limits and maybe you know it's there, but visually it doesn't matter.
d) The individual thumbnails in Brion's version need a bit of work. They're currently too small and the vertical arrangement isn't very good. You're going to get a million "try this, try that" suggestions, each of which would be a couple of hours' work to try... mine is that you use a table and put them into some kind of comic strip format, maybe two rows of four, maybe four rows of two... yes, you'd need to provide an eighth image but since it would look just the same as the seventh that wouldn't be a problem... you'd need to tinker with the axis labelling, slightly bigger type, perhaps slightly fewer divisions... the axis labels (numbers) do NOT need to be TRULY legible, they should be reduced with antialiases smoothing, it's OK if they look blurry when you enlarge them, but they need to be just legible enough that you think you're seeing 1, 2, 3...
Very appropriate to the subject matter, by the way, and a nice illustration. Good stuff! Dpbsmith 11:37, 4 Feb 2004 (UTC)

Thanks for all the comments, folks! Here's what I'm going to do. As Dbpsmith and Brion suggest, I'm going to create a static image in 2 strips of 4 graphs. I'll play around with the x-axis limits for aesthetic effect, and I'll include a link to the animated gif for those of our readers who want to click on it. The reason to include it at all is that the last few panels will be indistinguishable as static images, but small changes will be apparent in the animated version, thus giving the viewer a sense of the scale of changes in distribution that occur past a certain value of n. -- Cyan 16:04, 4 Feb 2004 (UTC)

I looked at the different proposed diagrams, and I think I prefer the 2 strips of 4 graphs idea. I like the static images better than the animated image. -- It occurs to me that the illustration of the central limit theorem could be expanded by showing two or more different initial distributions, or adding a different distribution each time (not identical). After all the whole point of the theorem is that for a large class of distributions, adding them together brings you to the same limiting distribution. Thoughts? Happy editing, Wile E. Heresiarch 02:47, 18 Mar 2004 (UTC)
Oh, just a minor followup -- maybe it would help if the same example shown on the main central limit theorem page was the same as one of (hopefully several) examples shown in illustration of the central limit theorem. I'm thinking the main page could just show the phenomenon, and the illustration page could go into more detail. Thinking out loud, Wile E. Heresiarch 14:08, 18 Mar 2004 (UTC)
Yet another half-baked idea -- maybe the effect of the animation can be sort-of imitated by leaving each plotted line in the succeeding figures, but grayed-out or something like that. So you could see just how much the line is changing, and the old lines won't block out the new ones if we use a lighter/grayer color. Wile E. Heresiarch 14:15, 18 Mar 2004 (UTC)

I have to agree with the no-animation camp. While it does show the progression nicely, having to watch it repeat a few times isn't ideal, and it distracts from the article. The images are great though, and as shown above they work nicely in a line. One other problem with animation is that it can show effects that are not there - the line looks to move which kinda hides the fact that it is a convolution. There might be a case to argue for a link to the animated version, but I would argue it is unnecessary. Good work folks. Mat-C 00:41, 18 Apr 2004 (UTC)

Mat-C, maybe you can look at the figures in Student's t-distribution and tell me what you think -- I attempted to show the progression of the t distribution to the normal distribution by using different colors. How successful was that, do you think? Thanks for any comments, Wile E. Heresiarch 02:53, 19 Apr 2004 (UTC)

Just for those who are wondering, the reason I haven't followed up on producing a set of images is because I discovered that the numerical convolution method I'm using isn't actually converging to a Gaussian. The images above look like Gaussians, but in fact are flatter and have wider tails than a Gaussian actually has. In fact, if I start with a Gaussian, the convolution moves it away from Gaussianity, flattening it and widening the tails. I haven't the time to devote to correcting this problem right now... I may get to it at some less busy time in the future. -- Cyan 05:53, 18 Apr 2004 (UTC)

Hmm, can you tell me a little about how you're going about the convolution, then? The reason that I ask is that I have also computed a numerical convolution (via FFT) for the figures on the illustration of the central limit theorem page, and I'd like to try to make sure those figures don't have the same problem. Thanks for any info. Wile E. Heresiarch 02:53, 19 Apr 2004 (UTC)
I used a two-sided filter algorithm based on MATLAB's built-in one-sided "filter" function (more info on this function here). I convolved a vector containing discrete samples of the distribution with the original distributio, and then rescaled it back to standard deviation 1, which involves resampling the distribution so that the discrete grid matches that of the original distribution. Apparently this quick and dirty procedure is affected by some kind of numerical error, because the distribution it converges to is not Gaussian. If you want to check the convergence, why not just plot a Gaussian over your filter-derived distribution? -- Cyan 04:54, 19 Apr 2004 (UTC)
Thanks for your comments. Just a thought -- the problem that you describe might be caused by the discretization effects -- I ran into that when working on another convolution problem and found the convolution result slowly drifting away from the correct result. I think it might be possible to solve the problem without resampling, which could reduce the discretization error. I think I'll post the Octave code which I used to construct the figures -- then it can be inspected and compared, as well as making it possible to "try this at home". Happy editing, Wile E. Heresiarch 02:22, 20 Apr 2004 (UTC)

[edit] o(t2)

Just a note: o(t2), t → 0, refers to a function which goes to zero more quickly than t2 (like t3), and not a function 'like' t2, which would be O(t2). Hence, I have reverted the recent edits that changed o(t2) to o(t3). Notably, the article on Big-O notation does not discuss limits other than the limit as t → ∞. However, it should do so! Ben Cairns 06:56, 14 Feb 2005 (UTC).

[edit] o(t2) Reply

Sorry, I 've did not seen your message (Bjcairns) in the discussion enrty. I confused big O with small o. I though that this o is reffering to the higher order corrections of the Taylor's expansion formula. I suppose that you are right so I changed the article back to its previous version with o(t2) without being logged in. That ip 143.233.xxx.xxx etc is mine :) My version is perhaps correct if we consider the Big O and not the small one. Theofilatos 17:07, 17 Feb 2005 (UTC)


[edit] Needs layman's language too

This article seems to be very mathematically complex. It could benefit from some simple layman's language. Ian Howlett 13:24, 30 June 2005 (UTC).

[edit] Quotation marks de-emphasize

Quotation marks around a word often mean something like: that's what some people are often heard to call it, but I don't want to commit myself to agreeing. Thus they de-emphasize. If you write "John has a 'degree' from the University of Metaphysics", the quotation marks enclosing the word "degree" mean that maybe John and some others call it a "degree", but you don't necessarily agree. Often quotation marks mean "don't take this word literally." That is the meaning of the quotation marks around "the" in the section heading that says "The" central limit theorem. The word "the" in this context implies uniqueness: that there is only one central limit theorem. In fact there are many, with varying assumptions: sometimes independence is relaxed; sometimes identical distribution is relaxed; sometimes the random variables live in some space besides the real line, etc. The quotation marks mean that often people call this one "the" central limit theorem, but the word "the" should not be taken too literally. Michael Hardy 18:16, 16 September 2005 (UTC)

I am aware of the use of quotes in this way, I use them like that "every" day. :) However, I find it strange that somebody would quote the word the. Oleg Alexandrov 18:42, 16 September 2005 (UTC)
Ironic emphasis of "the" is common enough in informal American English (dunno if the Brits use it too). I don't think we want an easily misunderstood wordplay here. I've replace "The" central limit theorem with Classical central limit theorem. Feel free to find a different adjective. There are other uses of "scare quotes" in the article which should be reviewed. Regards & happy editing, Wile E. Heresiarch 03:08, 19 September 2005 (UTC)

[edit] Link to polymers

Hi guys I'm an editor who delves a lot in physics (statistical mechanics especially) and a bit in statistics. That means I use this theorem a lot. I'll have other things to say, but as of now I just need to share something that sprung to my mind (not that it's original work, someone probably thought of that before me):

there is very probably a link between the non-independant case and polymer physics. A real-world polymer is basically a correlated random walk, although this correlation tends to decrease exponentially. Yet the object follows the Central Limit Theorem. About this see the ideal chain and worm-like chain articles, especially the parts about the Kuhn segment.

Either mathematicians have a version of non-independant CLT corresponding to this, in which case as a polymer and random walk editor I need to know, or this should probably be added as another case of non-independant CLT, in some form or the other.(ThorinMuglindir 23:40, 25 October 2005 (UTC))

[edit] A few thoughts

From the article: "The density of the sum of two or more independent variables is the convolution of their densities (if these densities exist)."

This should also appear (and probably be explained in detail) in probability density function.

A chapter about dedicated to CLT and Fourier transform wouldn't be superfluitous either, as the CLT is quite easy to demonstrate in Fourier space. Such considerations are for the moment mentioned but in very little detail. That would lead us to being able to say that the convergence of CLT is faster in the low-fourier modes, and slower in the high fourier modes (if you don't renormalize the sum, there can even be no convergence at all in the high fourier modes in some, see below). Wouldn't attempt to formalize that in a clean mathematical way myself though.

[edit] Some singular cases that might be worth explaining

As I said yesterday I am very much into editing polymer and random walk stuff for physics, which leads me to linking to CLT a lot. There is a case where convergence toward CLT is singular, yet arises quite often in random walks (namely, that is random walk on a lattice).

Take for instance independant variables which can take -1 or 1 for values with proba 1/2 each, and sum them N times.

If you look at the density function you obtain, it is not strictly a gaussian. It is a series of successive dirac delta functions. Now CLT is not that far off because the amplitude of the Dirac peaks of the resulting sum varies according to a gaussian curve. So that if you look at the function in the low fourier modes, it will correspond to the gaussian curve that is predicted by CLT. For high fourier modes (k > N.2Π, or k > 2Π if you don't renormalize the sum) the density of the resulting sum has nothing to do with a gaussian.

The situation is not the same if you consider a sum of continuous variables, or a lattice-free random walk, the same problem does not arise.

For example consider the countinuous variable that is uniformely distributed in [-1/\sqrt2;1/\sqrt 2], and sum a large number of independant realisations of this variable. This new variable has the same mean and variance as the previous one, yet you won't obtain a series of dirac peaks like in the previous case. The resulting density density will look as a gaussian pretty much at any scale, including in the high fourier range...

All this will probably be clearer by writing a formula: for the latter variable (variance 1, mean 0), the (normalized) sum converges toward the density function P(X), corresponding to N(0,1/N). Now, strictly speaking, the former variable sum's density function does not converge toward P(X), but rather toward:

P'(X) = P(X)\Sigma_{k=-N}^{+N}\delta(X-k/N), where delta is the Dirac delta function

About this here are my questions: is the above somehow related to what you say about the nature of the third moment of the variable controlling the speed of convergence? Can this difference in convergence in the high and low fourier modes be formalized mathematically?(192.54.193.37 08:58, 26 October 2005 (UTC))

Of I again forgot to log on... Well, the section above is from ThorinMuglindir 09:00, 26 October 2005 (UTC)

Not all of what you are saying is immediately clear to me, but you seem to be talking about discrete random variables compared with continuous random variables. Discrete random variables do not have probability density functions, but the cnetral limit theorem is not about densities anyway. It is about convergence in distribution, i.e. about cumulative distribution functions. So there is less problem about comparing discrete and continuous random variables. The classic example is the normal approximation to the binomial distribution; even here the approximation can be misleading in the tails, as it often is when appling the central limit theorem. --Henrygb 23:55, 26 October 2005 (UTC)
thanks my question was indeed related to that binomial distribution. Just as a remark it is often possible and useful to define a probability density function for a discrete variable, using Dirac delta function. Mathematically speaking Dirac delta function is not a function, but it's still a distribution (distribution, not in the sense of statistics, but in the sense of topology, that is to say, an object in the adherence of the space of functions). Of coures when you do physics you couldn't care less about what exactly is a function and what is a distribution... What I wrote above is just a reformulation of the meaning of the graph that compares the curve and the histogram in the binomial distribution article, reformulation that is based on Dirac delta functions.(ThorinMuglindir 10:03, 27 October 2005 (UTC))

I'll add a very short bit to the article, explaining that CLT can also be adapted to sums of discrete variables, although in a slightly different form, and link to binomial distribution as an example. Be it just to not confuse a reader who comes here from, say the random walk article, where CLT is applied to a sum of discrete variables.ThorinMuglindir 10:04, 27 October 2005 (UTC)

[edit] sum has finite variance, or the random numbers themselves?

The first paragraph states: The most important and famous result is called simply The Central Limit Theorem which states that if the sum of the variables has a finite variance, then it will be approximately normally distributed.

The random variables must have finite variance, right? This was the impression I got from http://mathworld.wolfram.com/CentralLimitTheorem.html. I am not skilled at mathematics so I do not know if saying the sum has a finite variance is correct. Thank you. Jason Katz-Brown 05:32, 11 February 2006 (UTC)

They say the same thing: variances are non-negative so the sum of a finite number of them will be finite if and only if each of them is. --Henrygb 16:31, 11 February 2006 (UTC)

[edit] Organigram?!

From the article:

This means that if we build an organigram of the realisations of the sum of n independent identical discrete variables, the curve that joins the centers of the upper faces of the rectangles forming the organigram converges toward a gaussian curve as n approaches \infty. The binomial distribution article details such an application of the central limit theorem in the simple case of a discrete variable taking only two possible values.

Huh? We draw an organizational chart of what, and how? I suppose "independent identical" is meant to be iid (as independent and identical is a contradiction), but what about the rest of it? Given the binomial distribution article reference, "organigram" is probably meant to be "histogram", though I don't see how the curve would join the "centers" of the upper faces more than any other points on them (a description which makes more sense for organigrams, even though as a whole using organigrams to depict distributions would be a strange idea). The histogram, then, is presumably of the probability distribution of a random variable that is the sum of n iid discrete random variables (or approximation of the same by frequencies in a finite sample of the random variable, but then we need to take limit at infinite sample size or the other limit won't converge). Is this interpretation correct? I'm not sure. How to explain it in good encyclopedia style? I don't know. As it stands this part of the article is very confusing and should be fixed, preferably by someone who knows something about probability theory (I don't, so I'm not touching it). 82.103.195.147 20:51, 12 August 2006 (UTC)

[edit] What about the sum of non-identically distributed random variables?

This is a personal doubt, but probably more people comming to this page can have it. Is there any result related to the CLT that says anythin about the sum of random variables in general? For example, in my problem I have 40 Beta variables each of them with their own mean and variance. I think there is some result saying their sum is a Normal variable with mean and variace equal to their respective sums. Is that right?Arauzo 10:15, 17 September 2006 (UTC)

See the sections on Lyapunov condition and Lindeberg condition. --Henrygb 17:54, 17 September 2006 (UTC)
I read the article and understand that whether the random variables are identical or not, their sum will be normally distributed. I disagree with the condition that the random variables must be identical.--Piyatad 09:56, 28 November 2006 (UTC)
I doesn't say they must be identical. It says that IF their distributions are identical (the distributions, not the random variables!) THEN etc. etc. But it also says:
Several generalizations for finite variance exist which do not require identical distribution but incorporate some condition which guarantees that none of the variables exert a much larger influence than the others.
So there you have it: the article says the distributions do not need to be identical if "some other condition" holds. Just which other condition depends on which version of the theorem you're talking about. I think equal variances may be more than strong enough; and if I weren't writing this comment in some haste I just might say that's obvious.... Michael Hardy 01:45, 4 December 2006 (UTC)

[edit] Large sample size

The need for a large sample size should be included. n >= 30 to 70 for it to be large. 70.111.238.17 14:11, 1 October 2006 (UTC)

It says as n approaches ∞, and that is certainly quite large. But some "rules of thumb" could be added too. In many cases, "≥ 30" is quite conservative. Michael Hardy 21:29, 2 October 2006 (UTC)

[edit] central {limit theorem} or {central limit} theorem?

The article now says:

This is a limit theorem and is about central limits.

I've long thought it was the central theorem on limits, not the theorem on central limits. Can someone explain just what "central limits" are? The article's present comment seems to confuse rather than to clarify. Michael Hardy 02:51, 6 November 2006 (UTC)

I removed a bit that said that the theorem was NOT a 'central' theorem, but was a theorem about 'central limits'. The text now says what it is, not what it isn't. I hope that clears it up. 8_)--Light current 03:24, 6 November 2006 (UTC)

I don't see how it clears up what a "central limit" is. What is a "central limit"? Michael Hardy 03:42, 6 November 2006 (UTC)

...and now I've edited it to say it's a central theorem about limits, not a theorem about central limits. Can no one explain what a "central limit" is? I suspect no one can, because I suspect there's no such thing. Michael Hardy 03:45, 6 November 2006 (UTC)

I dont think you are correct. You should read the whole thing then youll see. Excerpt from page:

Note the following apparent "paradox": by adding many independent identically distributed positive variables, one gets approximately a normal distribution. But for every normally distributed variable, the probability that it is negative is non-zero! How is it possible to get negative numbers from adding only positives? The reason is simple: the theorem applies to terms centered about the mean. Without that standardization, the distribution would, as intuition suggests, escape away to infinity.

My itals, bolding--Light current 03:52, 6 November 2006 (UTC)

So a central limit is one that is evenly distributed about zero.--Light current 03:54, 6 November 2006 (UTC)

Hey Ive just noticed you are a statistician!! Why you asking me about stats? 8-)--Light current 03:55, 6 November 2006 (UTC)

Ive removed the controversial statement until we can get the proper dope on it 8-)--Light current 04:01, 6 November 2006 (UTC)

[edit] easier to understand

I reckon the section on Classical CLT should start by stating the theorem. Justification should come after this. So the passage might read

" The central limit theorem says that the means of samples are normally distributed."

comments please —The preceding unsigned comment was added by 212.159.75.167 (talk) 20:18, 3 January 2007 (UTC).