Talk:Law of large numbers

From Wikipedia, the free encyclopedia

This article is in dire need of being written so that non-mathematicians can successfully find their way through the first sentence. -- Wapcaplet 04:33, 1 Mar 2004 (UTC)

I agree, is there a way to explain it for the lay person?

The law of larges numbers basically says this: As the numbers of entities in a specific group increases, the likelyhood of a particular event occuring (however unlikely it may be) also grows. If you roll a die once, the chances of rolling a ONE is one-sixth. but if you rolls a die 100 times, the likely hood of rolling a ONE at some point during those 100 rolls is very near 100%.


  • No, the LLN is a much stronger result than this. It says that you are likely to get close to 17 ONEs in 100 rolls of a die. This is the fundamental thing that makes statistics worthwhile: if you have a lot of observations then the average of the sample is close to the average of the population. This is why you can , eg, do clinical trials in only a few thousand people and extrapolate to approving drugs for a population of hundreds of millions. I agree that the article could use revision in the introduction. I don't really like the proof, either. Only a very small number of people will understand it, and for them there is a simpler, shorter proof under similar assumptions: by algebra the variance of a sum is proportional to the number of summands, and so the variance of an average decreases as the number of summands increases. (TSL)

It should be noted that the proof of the weak law (with convergence in probability) has to be more or less as it is. A semi-intuitive argument using the decreasing variances may be useful to the understanding, but that doesn't really prove convergence in probability.

Can't you make this argument? I may be missing something, but it looks right to me:
Chebyshev's inequality tells us that \operatorname{P}( \left| \overline{X}_n-\mu \right|  \geq \varepsilon) \leq \frac{\sigma^2}{{n\varepsilon^2}} (as already noted).
By elementary properties of variance, \lim_{n \rightarrow \infty} \sigma^2 = 0.
Therefore, by Chebyshev's inequality, \lim_{n \rightarrow \infty} \operatorname{P}( \left| \overline{X}_n-\mu \right|  \geq \varepsilon) \leq 0.
--Delirium 08:08, 4 November 2005 (UTC)
Never mind, I suppose this may run into some trouble with limits also being defined in terms of epsilons, juxtaposed with the presence of an explicit epsilon in the same ratio. This explanation seems more intuitive to me, but perhaps the one in the article is more rigorous. --Delirium 08:09, 4 November 2005 (UTC)

Contents

[edit] Another possible definition of the LLN

The long-run relative frequency of repeated independant events get closer and cloer to the true relative frequency as the number of trials increases 202.7.183.131 11:42, 18 January 2006 (UTC)

  • True relative frequency? What is that? The law states that the average of repeated independent events converges toward the expectation. That is it. Aastrup 20:11, 18 January 2006 (UTC)

[edit] Should this article contain a proof of the strong law?

A proof of the strong law would be rather long, but I think that the strong law is such an important result that it ought to be shown with its proof. What do you think?

[edit] Some notes on history

I'm putting some notes on history here to remind myself to add a history section later. Someone else can feel free to do so though, especially if you know more about it than I do:

  • Throughout the 19th century, the "law of large numbers" simply meant the weak law (convergence in probability); I think Bernoulli proved this, but I have to look that up.
  • Some interesting stuff happened to clarify/extend it in the late 19th and early 20th centuries
  • Émile Borel proved a special case of the strong law (almost-sure convergence) in 1909, for Bernoulli trials (cite?)
  • Francesco Cantelli provided the first relatively general proof of the strong law (which he called "uniform convergence in probability") in his 1917 paper: "Sulla probabilità come limite della frequenza." Atti Reale Accademia Nazionale Lincei 26: 39-45.
  • Aleksandr Yakovlevich Khinchin coined the now-current term "strong law of large numbers" to describe what Cantelli had called "uniform convergence", in a short published letter of 1928: "Sur la loi forte des grands nombres." Comptes Rendus de l'Académie des Sciences 186: 285-87.
  • Andrey Kolmogorov proved that the strong law holds in cases other than independent identically-distributed variables, subject to some other conditions. I think this was in 1929 (cite?).

--Delirium 09:43, 4 November 2005 (UTC)

[edit] intro

in the first sentence do the words 'average' and 'mean' alternate on purpose? Spencerk 04:16, 5 December 2005 (UTC)

I think by 'average' they are refering to the average of the sample from a large population, but the term 'mean' refers to the entire population. However, I think the first sentence is wrong, because the law should refer to the size of the sample, not the size of the population. I will let someone else change it, if they agree.

Possibly more important, the wall street journal noted that people are frequently misusing the 'law of large numbers' for example, "in a January appearance on CNBC, eBay chief executive Meg Whitman said, "Now, our businesses are getting larger and we will obviously face the law of large numbers, but we have actually changed the trajectory of the growth curve in our two largest businesses over the last three quarters."


[edit] Suggested First Paragraph revision

The Law of Large Numbers is a fundamental concept in statistics and probability. Stated in a formal style of language the law is described as follows:

If an event of probability p is observed repeatedly during independent repetitions, the ratio of the observed frequency of that event to the total number of repetitions converges towards p as the number of repetitions becomes arbitrarily large.

This means that the more units of something that are measured, the closer that sample average will be to the average of all of the units -- including those that were not measured. (The term "average" is specifically "the arithmetic mean.)

For example, the average weight of 10 apples taken from a barrel of 100 apples is probably closer to the "real" average weight than the average weight of 3 apples taken from that same barrel. This is because the sample of 10 is a larger number than the sample of 3. And then, if you took a sample of 99 apples out of 100 apples, the average would be almost exactly the same as the average for all 100 apples.

While this rule may appear to be self-evident to many readers, the development and use of this law allows statisticians to draw conclusions or make forecasts that would not be possible otherwise. In particular, it permits precise measurement of the likelihood that an estimate is close to the "right" number.

There are two versions of the Law of Large Numbers, one version called the "weak" law and the other is called the "strong" law. This article will describe both versions in technical detail, but in essence the two laws do not describe different actual laws but instead refer to different ways of describing the convergence of the sample mean with the population mean. The weak law states that as the sample size grows larger, the difference between the sample mean and the population mean will approach zero. The strong law states that as the sample size grows larger, the probability that the sample mean and the population mean will be exactly equal approaches 1.0.

One of the most important applications of the Law of Large Numbers, is called the Central Limit Theorem which, generally, describes how sample means tend to occur in a Normal Distribution around the mean of the population regardless of the shape of the population distribution, especially as sample sizes get larger. (See the article Central Limit Theorem for details of this application, including some important limitations.) This helps statisticians evaluate the reliability of their results because they are able to make assumptions about a sample and extrapolate their results or conclusions to the population from which the sample was derived with a certain degree of confidence. See Statistical hypothesis testing as an example.

The phrase "law of large numbers" is also sometimes used in a less technical way to refer to the principle that the probability of any possible event (even an unlikely one) occurring at least once in a series increases with the number of events in the series. For example, the odds that you will win the lottery are very low; however, the odds that someone will win the lottery are quite good, provided that a large enough number of people purchased lottery tickets.

The remainder of this article will assume the reader has a familiarity with mathematical concepts and notation.


I offer this as a revision to the first paragraph to make the concept more accessable to readers who may not be familiar with statistics. It may not be right yet, but with editing perhaps it can be incorporated into the article. --Blue Tie 04:47, 11 July 2006 (UTC)

Oh... I also like the term "Miracle of Large Numbers" instead of "Law of Large Numbers"... :-) --Blue Tie 04:48, 11 July 2006 (UTC)

Didn't you mean "stated in informal language? Michael Hardy 14:57, 11 July 2006 (UTC)

Well, not a bad point from the perspective of a mathematician. However, I did not mean formal definition. I meant that the language was more formal than ordinary conversation (and I think this is a typical useage of the term "formal" when discussing general mathematical concepts in English Words rather than describing "forms", "proofs" or other specific items.). My audience is someone who does not know what the Law of Large Numbers is or even, perhaps, what statistics are or that there are such things as formal proofs. However, if you think that the language is wrong, perhaps it should be changed. However, I would not agree to the word "informal" for the audience I was looking to speak to. They would NOT agree!--Blue Tie 15:42, 11 July 2006 (UTC)

One reason I commented is that the proposed language does not distinguish between the weak and strong laws. That's OK if you're being informal and will come to the precise statement later. Michael Hardy 17:03, 11 July 2006 (UTC)


You are right, it does not distinguish between those two laws. I will see if I can adjust it by adding something in that regard, because even though the target audience may not have a full understanding, the opening paragraph should be a reasonably complete summary and an introduction to the topic. --Blue Tie 02:38, 12 July 2006 (UTC)

I have made some changes to the paragraph. PLEASE COMMENT AND CORRECT.--Blue Tie 12:24, 12 July 2006 (UTC)

I am still looking for comments or criticisms. --Blue Tie 21:45, 16 July 2006 (UTC)

[edit] What is this?

This is not the LLN as I learned it in college (two schools). What I learned was:

If a numerically-valued random event has known probabilities, the probability distribution of the average of a large number of independent occurrences of the event is concentrated near the expectation based on the probabilities.

Almost every mention of expectation in the article has been removed in favor of some concept of subsets samples vs. whole population. Who first stated the law formally and what was the original form? Gazpacho 19:36, 2 October 2006 (UTC)

How is your statement above, functionally different than this statement:
If an event of probability p is observed repeatedly during independent repetitions, the ratio of the observed frequency of that event to the total number of repetitions converges towards p as the number of repetitions becomes arbitrarily large.

Older versions of the page can be viewed on the "History Tab". Select a version of interest.

Incidentally, there is no discussion on here of subsets. There are examples for people who have not studied the concept of the Law of Large numbers showing how it applies conceptual examples. Is that what you meant?

It appears that you may not have a substantive problem... the page is not incorrect... but that you may have a problem with how it is worded. The effort is to make the page more accessable in first few paragraphs to people who are not experts. Do you have an improvement?

--Blue Tie 19:57, 2 October 2006 (UTC)

In the LLN proof using Chebychev, don't you need strict inequalities when switching to the probability *within* epsilon of mu?

I did not write the original equations here but looking at them they seem correct.. tautological. But perhaps if you were to add a constraint of epsilon > 0 you would be right. But I think without that constraint they do not have to be strict. --Blue Tie 22:10, 2 October 2006 (UTC)

The use of the word "converges" is troublesome as it suggests a certainty of outcome in the long run. I have edited the intro accordingly. Gazpacho 23:31, 2 October 2006 (UTC)


I have a couple of problems with your edit. First of all, you have made the reading of the article more dense and oblique rather than easier. You have done so because the term "converges" bothers you but that is not only "a" standard term for such processes but in this case it is "the" standard term particularly with respect to the "Strong" version of the law. Finally, you have removed some helpful descriptive content.
This is not an emotional matter for me nor is it a matter of ownership, but I think your edits have degraded the article. Before I edit them back I want to discuss it. First of all... why do you think there is a problem with the word "converges"? The truth is that the proof is that the mean does indeed converge to p... or that the probability of the mean being equal goes exactly to 1.00. It is not a matter of being "close" and your edit is technically wrong on that matter and somewhat contradicts the rest of the article. That cannot be permitted. Either the math must change or the heading must change. --Blue Tie 00:30, 3 October 2006 (UTC)

I believe it's important not to assume that people understand how "converges" is used here because that invites the misinterpretations that lead to the Gambler's fallacy. I have suggested a different wording. Gazpacho 01:18, 3 October 2006 (UTC)

We could link to Convergence so that they could understand the correct terminology. I am not sure how to deal with problems where people change the meaning of words, but I don't think that we should compromise wikipedia for those people. I realize "converge" is standard terminology and it has a very definite and accurate meaning in this context where it is supposed to be that this is the limit of the probability or the probability function leading inexoribly "almost surely" to mu. But, we could look at a thesaurus for other terms: assemble, coincide, combine, come together, concenter, concentrate, concur, encounter, enter in, focalize, focus, join, meet, merge, mingle, rally, unite. None of these quite get to the same meaning as converge, but "meet" and "coincide" are sort of close. Maybe those would fit.
saying "converging in probabiity toward p" is redundant. Not good form. I do not understand your objections to the clearer and more precise wording. --Blue Tie 02:23, 3 October 2006 (UTC)


I read your edit and I understand what you are trying to do though the wording seems awkward. As I grasp it, it seems that you are concerned that people will believe that they can predict the outcome of an individual random event based upon past events. I do not think that converge leads to that possible error. I think that the concept of statistical inferencing can lead to that misunderstanding though. Somehow I think you have got the wrong things labeled as the problem here. Something looks off about this. --Blue Tie 02:35, 3 October 2006 (UTC)

Maybe a separate section about "misconceptions" could be the answer. I do not know of anyone who has these misconceptions but I think maybe you have had some experience with people who have. --Blue Tie 02:37, 3 October 2006 (UTC)


[edit] Further Suggested Revision

I still think the article could use some simplification. Thoughts on these first few paragraphs?


The law of large numbers is a fundamental concept in statistics and probability that describes how the average of a random sample from a large population approaches the average of the whole population.

In formal language:

If an event of probability p is observed repeatedly during independent repetitions, the ratio of the observed frequency of that event to the total number of repetitions converges towards p as the number of repetitions becomes arbitrarily large.

In statistics, this says simply that as the size of a random sample increases, the average of the sample converges to the true average.

For example, consider flipping a fair coin (that is, it comes up heads 50% of the time). It is certainly possible that if we flip the coin 10 times, we may end up with only 3 heads (30%). The Law of Large Numbers shows that as you flip the coin more and more (say 10,000 times), the percentage of coin flips that are heads will come ever closer to 50%. Alternatively, it becomes less probable to maintain a rate of 30% heads as more coins are flipped, as this is below the true average.

While this rule may appear self-evident, it allows statisticians to draw conclusions or make forecasts that would not be possible otherwise. In particular, it permits precise measurement of the likelihood that an estimate is close to the "right" or true number.

Note, however, that the value of a single observation cannot be predicted using previous observations of independent events. (For example, if a coin has come up heads 30% of the time, it is incorrect to say that the coin is "due" to come up heads - see the Gambler's Fallacy.

--Topher0128 21:28, 25 October 2006 (UTC)

[edit] 99 out of 100 apples

The article had the example following example: "If you took a sample of 99 apples out of 100 apples, the average would be almost exactly the same as the average for all 100 apples." I removed it because in this example there is only one apple left to be counted, so the sample average will be close to the population average. It suggests that LLN works because 99% of the apples have been counted, which is not correct. The correct example to illustrate LLN would be say: "If you took a sample of 1,000,000 apples out of 100,000,000 apples, the average would be almost exactly the same as the average for all 100,000,000 apples." That is even though 99% of the apples have not been counted, still the sample average will be very close to the population average. JS 00:43, 11 January 2007 (UTC)


I believe what you have described is the "Central Limit Theorum" which derives from the Law of Large Numbers but is not the same thing. Consequently I have reverted back. --Blue Tie 13:18, 11 January 2007 (UTC)

CLT makes a statement about the (normal) distribution of means of samples. Here I am referring to LLN which does not say anything about distributions but rather convergence in probability of one particular estimator (the sample mean) to one particular number (the population mean).
The question is whether we can apply LLN when we have a large sample but have sampled only, say 1%, of a population. I believe we can, because of convergence in probability.
The probability of the sample mean being "significantly" different from the population mean becomes smaller as the sample grows larger. This probability "approaches" zero (convergence in probability) as the size of the sample becomes "large". This is based on an analysis of probabilities. In fact this paragraph would probably be a reasonable way to explain LLN to a layman.
The example 99 out of 100 on the other hand is not based on probabilities, but merely because most (99%) of the population has been counted. The example is based on algebra. Thanks! JS 17:53, 11 January 2007 (UTC)


But look at it this way: Doesn't the law of large numbers explicitly say that as N approaches infinity (for an infinite population) the probability that Xn = mu approaches exactly 1.00? Doesn't that translate into "As a sample measurement of a finite population encompasses the full population, the probability that the mean of the sample and the mean of the population converges to 1.00"? This is like saying that as you get closer and closer to 100 out of 100 apples, you will get closer and closer to the population mean. So the apples example is a specific and detailed example of the concept that is described later in the equations. It is what the Law of Large Numbers really says.
You are right that the sample mean is an estimate of the population mean -- an estimate with some degree of error to it. And the Law of Large Numbers says that is the case. What you are describing is an outgrowth of analysis under the Law of Large Numbers, which leads to the Central Limit Theorum and Hypothesis Testing using sample means. But that concept is better handled under those topics. This article is more limited; It is just the Law of Large Numbers... not an article about sampling probability, which is what you are addressing.

The example of 99 Apples was developed to help people who had no familiarity with Statistics to grasp the basic concept of the Law of Large Numbers. With that in mind, recognizing what the equations say, and realizing that the focus of the article is not sampling confidence probabilities, is it really a good idea to remove that example? I do not think so. --Blue Tie 15:21, 12 January 2007 (UTC)

Hello Blue, You wrote "Doesn't the law of large numbers explicitly say that as N approaches infinity (for an infinite population) the probability that Xn = mu approaches exactly 1.00?"
Yes, LLN does say that as N approaches infinity the probability approaches 1.
But LLN says much more than that. It says that even if you sample only 1% (or even less) of the population, the sample mean will still approach will still approach the population mean as long as the sample is "large".
Suppose you wish to know how 100 million voters are going to vote in the elections (a binomial distribution). You do not have to sample 99 million of them (as the example suggests), you only have to sample 10,000 to estimate the mean quite accurately.
The 99/100 apples example works because of algebra.
Population Mean = Mean of 99 apples * (99/100) + Mean of last apple * (1/100). As the mean of the last apple is divided by a large number (100), so its impact is small and the mean of 99 is "close" to the population mean. This is just algebra, there is no probability involved. The means are "close" because 99% have been counted.
LLN on the other hand says means will be close even if only 1% (or less) are counted, as long as the sample is "large".
The 99/100 apples example is misleading because the logic behind it is algebra, whereas the logic of LLN is probability theory.
The 99/100 example would also lead a reader to wrongly believe that most of the population (~99%) has to be sampled for LLN to work, whereas LLN will work for samples 1% or smaller as long as they are "large".
Regards, JS 17:53, 12 January 2007 (UTC)


I understand what you are saying. In essence, you are talking about the notion of events that may have an infinite number of trials, having success p and that a less than infinite number of repetitions is required to estimate p. I agree that this is true. I also agree that this is a reasonable conclusion from analysis under the law of large numbers. But that is not exactly what the law of large numbers says when we examine the equations in the article. They do not directly refer to the rate of convergence and, in fact, I can find no source that discusses rate of convergence as an intrinsic part of the Law of Large Numbers, but rather, all sources simply point out that as the numbers get larger, they get closer to the population mean. it is discussed as a limit when the sample size approaches infinity. As Bernoulli, (who first described the law of large numbers) said: Even the stupidest man knows by some instinct of nature per se and by no previous instruction that the greater the number of confrming observations, the surer the conjecture.
But, of course, when we apply the law, we find that the convergence generally exists and operates as a square function of the sample size.
So, I again would argue that the Apple example is not a bad one, but it may not be exactly complete, and by itself it could be misleading. What is missing is that the rate of convergence is rapid (a square law) so that reasonable conclusions may be developed with a sample that is much smaller than the whole population -- as long as the population is rather large. So I would change the example from 100 apples to 1000 apples and then point out that, while a sample of 999 would be the most accurate partial sample, a sample of only 40 or 50 will typically suffice under conditions where we accept the risk of certain errors. It is not that this sample size is expressly described by the Law of Large Numbers, but rather that it is a corellary in most cases.
Am I still wrong about that?--Blue Tie 04:07, 13 January 2007 (UTC)
1.There are rules of thumb about what number may be regarded as "large", and use convergence results without worrying about rate of convergence when the sample size exceeds the number. I am not a statistician, but I do remember an example of 30 (or was it 60?) being mentioned as a sample size large enough for CLT to be applied to a binomial distribution.
2.If you worry about the rate of convergence and wish to be strict, you can always use CLT rather than just LLN to estimate confidence intervals for the population mean.
3.I suppose it would help the lay reader if we were able to provide some rules of thumb for sample sizes.
4.The standard deviation of sample means decreases at the rate of square-root of sample size (CLT).
5.My problem with the 99/100 example is it seems to say "LLN works because almost all apples have been counted", whereas LLN really says "Leave 99% uncounted, or even 99.99% uncounted, just make sure the number you count is large and you will get close to the population mean."
6.I would say the example you suggest would be improved by saying that if there were 1,000,000 apples, then 99 would give a good estimate, 999 an even better estimate, 9,999 still better, 99,999 best yet and so on... The point is that "yes, larger samples improve accuracy". But LLN is an asymptopic result and in practice can be used for samples that exceed, say 100, without worrying about convergence. For example, if you are trying to estimate a binomial distribution mean, and the population mean is 0.5, then a sample of 100 would give the standard deviation to be 0.5 * 0.5 /sq-root(100) = 0.025. So the probability that the sample mean will lie outside 0.45 to 0.55 (interval length 4 sd) is less than 1%. I suppose whether you regard that interval as sufficiently convergent to 0.50 is a matter of taste. If you increase the sample size to 10,000, then the interval shrinks to 0.495 to 0.505.
Regards, JS 07:34, 13 January 2007 (UTC)
Your points one at a time:
1. Yes there are rules of thumb, but the LLN does not give them.
2. It is not that I am worried about the rate of convergence. I am saying that the LLN does not specifically address that. It simply talks about sample size going to infinity.
3.Regarding rules of thumb for sample size, this article is not about sample size or statistical inferencing. I think you should address those issues in these other articles. This is just about the Law of Large Numbers.
4.Yes you are right about the SD decreasing according to a square law, and you are right to cite the CLT for that, but this is not the CLT, this is the LLN.
5.In the 99 out of 100 apples examples, LLN works because MORE apples have been counted. Read the example. 3 is not as good as 9. 99 is better than 9. It is simply a restatement of the LLN. The larger the sample the more correct or accurate the mean. Your problem with the example is invalid. You are conflating the concepts of statistical confidence with the related but different concept of LLN.
6.Again, you are trying to address statistical inferencing and confidence. That is not really part of LLN but are things that come out of the Law of Large Numbers and the Central Limit Theorem.
I would ask, that if you feel that the LLN addresses the issues you raise, that you find a source that says so. But in the mean-time, look at the equations and see that the LLN simply states that as the sample size gets larger the mean approaches more surely the mean of the population or the theoretical mean. --Blue Tie 17:09, 13 January 2007 (UTC)
Hello Blue, you wrote "Yes there are rules of thumb, but the LLN does not give them." and "Regarding rules of thumb for sample size, this article is not about sample size or statistical inferencing. I think you should address those issues in these other articles. This is just about the Law of Large Numbers." Yes, strictly speaking LLN applies to infinitely large samples, but of course samples are never infinitely large. Does this mean LLN has no application in real life? No, all it means is that LLN can only be applied when the sample is regarded "large" enough, for which we need rules of thumb.
I repeat, when the example says 99 out of 100, it is suggesting to the reader that the means are close because most apples have been counted. If the example said 99 out of 10,000, or 999 out of 30,000 or 999 out of 20,000 or 100 out of 5,000 or 120 out of 60,000 or 1,354 out of 2,245,523 I would not have any problem. The problem is 99 out of 100 very strongly suggests that LLN works because most apples have been counted, whereas LLN does NOT require most of the population to be counted. It can work even if only 0.001% or less have been counted as long as the number counted is "large". Regards, JS 22:36, 13 January 2007 (UTC)
You wrote "Read the example. 3 is not as good as 9. 99 is better than 9. It is simply a restatement of the LLN." I agree that 99 more than 9 more than 3 is in the spirit of LLN. I propose we change the example from population size 100 to 100,000 apples. You should not have a problem with that as the 3 vs. 9 vs. 99 will be retained. Regards, JS 22:44, 13 January 2007 (UTC)
I would agree with that if you would agree that the example should go to 99,999 apples. But then, there would be no point for the change would there? You see the Law of Large Numbers does not explicitly say that there is some small number that is sufficient. It simply says that more is better and that infinite is best. You are trying to make it say something else -- something about a small sample size being enough. That certainly falls out of the analysis that can be conducted later but that is NOT what the Law of Large Numbers says. But again, if you can find a valid source that says so, then use that source and I shall be satisfied.
Let me be clear. I want the example to say that the Law of Large numbers works because MORE have been counted and when all but 1 have been counted that is next to the very best thing and when they are all counted that is best. I specifically disagree with this point that you keep making: "It can work even if only 0.001% or less have been counted as long as the number counted is 'large'". Although that statement is true -- sometimes -- it is not specifically what the LLN says.
I feel pretty strongly about this. And apparently so do you. Shall we seek outside comment?--Blue Tie 23:33, 14 January 2007 (UTC)
Hello Blue, I said "It can work even if only 0.001% or less have been counted as long as the number counted is 'large'". You replied "Although that statement is true -- sometimes -- it is not specifically what the LLN says." As this is math, there should be no ambiguity about what LLN says. When you say "sometimes" you imply that there exist examples where 0.001% is not sufficient even though the sample is "large". If you can find such an example where the sample size is "large" but LLN is not true you would have disproved LLN as it is currently stated.
Specifically look at the assumptions of LLN and you will find no assumption that says the sample size has to be larger than some fraction of the population. What you are saying is that for LLN to apply the sample size has to be (at least sometimes) larger than a certain fraction of the population. This would be a new assumption for LLN.
I can prove the equality of sample and population means for a large population of size M is all but one data is counted without resorting to probabilities.
Algebraically,
Population Mean = Sample Mean * (M-1)/M + Last Value * (1/M) = Sample Mean + (Last Value - Sample Mean) * (1/M)
=> Sample Mean = Population Mean - (Last Value - Sample Mean)*(1/M)
=>as M approaches infinity we have Sample Mean = Population Mean as (Last Value - Sample Mean) = finite.
Note this is just algebra, and not LLN.
I agree that an opinion by an outsider, especically a statistician, would be helpful.
JS 06:30, 15 January 2007 (UTC)

The apple example is horribly wrong. It is NOT because something near the whole population has been counted that the LLN works. The apple example implies trials being nowhere near independent when the sample size approaches the whole population size. In the apple example, the sample average approaches the population average precisely because of the LACK OF INDEPENDENCE. But the LLN relies very heavily on the assumption of independence. Picture sampling apples WITH REPLACEMENT, so that trials are independent. Maybe only 100 apples are there, but the sample size may be 1 million. As the sample size approaches infinity, the sample average approaches the population average, and that has nothing at all to do with whether anything near the whole population is represented when the sample size is big enough. Michael Hardy 21:39, 17 January 2007 (UTC)

I think you are right... The lack of independance invalidates the example. I would like to address this in more detail and produce a better example, but I am busy right now. Maybe later. --Blue Tie 11:35, 18 January 2007 (UTC)

I agree completely that the apple example is misleading. It encourages a common misconception (that the sample needs to be a large fraction of the population). I would rather suggest an example that contradicts this misguided intuition: "If you take photographs of the faces of a random sample of about 30 male American college students, and project them upon each other, then you will see a pretty good picture of the average face of all male American college students, even though that population is much larger than the sample. Moreover, if you take a new sample of about 30 male American college students, and project their 30 portraits upon each other, then you will see almost the same face again. That is, the first sample of 30 persons has approximately the same average face as the second sample, even though these samples have no persons in common!" JulesEllis 01:34, 19 January 2007 (UTC)

Hello Jules, I think the photograph example may be hard to understand. Normally when we say sample average, we think of a scalar. But a photograph is a vector of facial feature points, and it is not easy to associate sample average with such a vector. I think the old apple example with the total number of apples increased to, say 100,000 (rather than just 100) could work. Regards, JS 01:39, 23 January 2007 (UTC)
I do not know about a common misperception that the sample needs to be a large fraction of the population, but the problem with the example of 99 Apples is not that some people would not understand it (That may also be a problem but a different one). The problem is that the independence criteria might be violated. --Blue Tie 04:15, 19 January 2007 (UTC)

[edit] LLN follows from CLT, not the other way around

Currently the article says "One of the most important conclusions of the Law of Large Numbers is the Central Limit Theorem which, generally, describes how sample means tend to occur in a Normal Distribution around the mean of the population regardless of the shape of the population distribution, especially as sample sizes get larger." I think it would be more accurate to say that LLN arises out of CLT rather than the other way around. CLT gives the mean, the variance, and the distribution of sample means. As the variance of the distribution collapses as the sample size grows larger, it follows that the mean converges to a number (the population mean) in probability. This is the LLN. So LLN is a result that can be obtained from the CLT rather than the other way around. Of course there have to be some assumptions for this to work, for example finite variance etc. I am correcting the article accordingly. JS 18:08, 12 January 2007 (UTC)

Since LLN was proposed prior to CLT and since LLN is more basic than CLT there is no way that LLN could have come out of CLT. It may well be that LLN does not produce CLT (I think a case can be made that it does) but logically it is impossible that the LLN comes from CLT and historically it did not happen that way. Any changes you made to suggest that CLT produced LLN should be removed. --Blue Tie 04:30, 13 January 2007 (UTC)

In fact, isn't LLN the First Fundamental Theorem of Probability and CLT the second? --Blue Tie 04:39, 13 January 2007 (UTC)
I am not sure about the nomenclature, what is given what name. But I think it is true that if you have proven CLT, then you have also proven LLN. While on the way to proving CLT, you may prove LLN.
CLT does contain within it LLN. CLT "exceeds" LLN in the sense that not only does it give the mean value for the distributions of sample means to be equal to the population mean, but it also gives the variance and the nature of the distribution (normal).
If you have evidence that LLN preceeded CLT in time (which it probably did as LLN is a weaker result), and you regard the history of development of these results worth including, you are welcome to add that information. However the current statement "So LLN is a result that can be obtained from the CLT." is true and should be retained. Essentially it tells the reader that LLN is contained within CLT.
Regards, JS 06:47, 13 January 2007 (UTC)
I am not sure that is true. I have to think about it. The proof for both LLN and CLT can be similar but I do not think that CLT certainly contains LLN. CLT talks to the distribution around the mean of the sample. I have to think about whether it alone refers the mean to the mean of the population. Maybe it does. LLN talks to the mean of the sample vs the mean of the population but not to the distribution around the mean of either. I would agree however that the CLT certainly IMPLIES the LLN. But to settle the issue, can you find a source that says CLT contains LLN or that if you have proven CLT you have done more than imply LLN? --Blue Tie 17:29, 13 January 2007 (UTC)
The CLT does give the mean. It gives the entire normal distribution which includes the mean and variance. For example see the Wikipedia article on Central Limit Theorem. It says that the modified distribution Z approaches the standard normal distribution N(0,1). Z is obtained by subtracting the n * nu (where n is sample size and Greek letter nu is population mean) from sum of random variables and then dividing by sigma * sq-root(n). So CLT gives not only the nature of the distribution, but also the mean and variance. JS 17:45, 13 January 2007 (UTC)
But the CLT does not say that as the sample size increases, the estimate of the mean improves. It is a theoretical construct looking at a variety of possible sample means and explaining that these fall into a normal distribution pattern. Sure, it gives the mean, but it does not say whether the mean is "right" because the sample size is larger. --Blue Tie 20:44, 14 January 2007 (UTC)
"CLT does not say that as the sample size increases, the estimate of the mean improves." That is not right. The variance of the normal distribution as given by CLT decreases at the rate of the sq-root of sample size. Hence the estimate of the mean becomes more accurate as the sample grows larger. In CLT as the sample size approaching infinity, the variance approaches zero, the sample mean converges in probability to the population mean. This is the LLN. Regards, JS 21:16, 14 January 2007 (UTC)


I think that this is an example of the problem. The current wording is: "The variance as given by CLT collapses as the sample size grows larger, it follows that the mean converges to a number (which CLT says is the population mean)." But CLT does not directly address sample size. (In fact, sample size can be almost any positive number and CLT still works -- if the number of samples is large enough). It addresses the means of samples of ANY size following a normal distribution. It is the Law of Large Numbers that addresses sample size. Of course the LLN also addresses variance, but not really directly. In the same way, CLT addresses sample size, but not directly. I think that this distinction is why these two things are called the first and second fundamental theorems of probability: they work together. LLN describes sample size and CLT describes the distribution of possible sampling results around the true mean. These work together, and the results are stronger when you have both large sample size and large numbers of samples. But the normality of the distributuions follows the number of samples more than it follows the sample size of the samples. In addition, CLT was preceeded by LLN both historically and logically. CLT is, in a sense, a refinement of LLN in that it takes one very large sample and "looks" at what happens when it is divided into a number of smaller samples.
In any case, I think the current wording is not quite right.--Blue Tie 21:13, 14 January 2007 (UTC)

(Unindent)Hello Blue,

1) You wrote "CLT does not directly address sample size." That is not correct. CLT does consider the sample size, in fact the variance it provides for the normal distribution depends upon sample size (inversely proportional to square root). 2)Also see the article on CLT on Wikipedia. It starts by saying "any sum of many independent identically-distributed random variables". Note the use of the word "many". 3)Also go down to the proofs of CLT and you will see they are for n approaching infinity. 4)As a simple example, consider a binomial sample of heads = 0 and tails = 1 of size just 2. The probability distribution for the mean will be 0 with probability 1/4, 0.5 with probability 1/2, and 1 with probability 1/4. This certainly isn't a normal distribution. 5) So saying "sample size can be almost any positive number and CLT still works" is not right. Regards, JS 21:37, 14 January 2007 (UTC)

Your points, one at a time
1. CLT uses sample size, but it does not make a comment about how the mean of the sample approaches the mean of the population as sample size increases. It does view the variance as changing with sample size -- but LLN did this first and it is somewhat irrelevant to the point that it does so.
2. The article on wikipedia is written by just anyone. Perhaps it is a mistake to say "many" or perhaps they meant many "sets" or many "types". It is not the typical wording for CLT definitions and unless you are refering to types of variables then it does not make sense.
3. The proofs do use N going to infinity, but this is for a different effect. They are looking at dividing a continuum and it says that as N increases the distribution of the sample approaches Normality. But this says NOTHING about the original population. Indeed, the original population might be some a distorted, perhaps even discontinuous function and yet the sample means will be normally distributed.
4. Why do you believe that is not a normal distribution? It looks like one to me -- with just 4 points extracted. How do you figure it isn't?
5. Let's see.... you can see with your eyes that what I have said is right. take a look here: http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/ Try N=2 and # of Samples =1000 you will see a normal curve. That is with N=2. As I said, sample size does not matter, CLT still works. This is part of the miracle of the Central Limit Theorem. --Blue Tie 00:46, 15 January 2007 (UTC)

It's not altogether true that NOTHING is assumed in the CLT about the original population. Usually it's assumed to have finite variance, and sometimes weaker assumptions are involved. These assumptions may be weakened but not simply discarded, since the Cauchy distribution would then provide a counterexample to such a proposed modified version of the CLT. Also, the sample size matters, in the sense that with too small a sample, the distribution will fail to be a close approximation to a normal distribution. Michael Hardy 01:30, 15 January 2007 (UTC)

about assumptions, I agree. If I said that, (I do not see it above but if I said it I was wrong). When you talk about the Central Limit Theorem, sample size does not matter, unless for some reason you are trying to discuss some practical application of the Central Limit Theorem where you are (for some odd reason) not sure if your sample will be normally distribution, or if you are looking at the practical implications of the range in a hypothesis test. However, with regard to the Central Limit Theorem describing how the distibution of sample means will be normal, how does the distribution of the means get affected by the size of the samples that create those means? There may be some cross talk but essentially the normal distribution will occur in the sample means if you use sample sizes of 2 or 200.
If I take a sample of say --- 3 points and if someone else takes a sample of 30 points... both these samples are testable against some hypothesis by virtue of the central limit theorem. How did the sample size change that in either case? Certainly I can wish I had done the 30 observations -- I will get a tighter result. But, how did the lack of a large sample make the CLT void in that case? (I think this article makes the same case: http://en.wikipedia.org/wiki/Central_limit_theorem#Convergence_to_the_limit) --Blue Tie 01:57, 15 January 2007 (UTC)


1) If you take a sample of 3 points, and the underlying distribution is binomial, then the possible values of samples means will number 4. This certainly is not a normal distribution which has an infinite number of possible values. You cannot do hypothesis testing by approximating a distribution that has probability mass only at 4 points to a normal distribution.
2)I looked at the Rice simulation, but couldn't get it to work. What was the distribution of the population? Can you select binomial? If the distribution of the population is, say, normal, then samples of size N=2 will indeed have normal distributions.
3) I feel the discussion is digressing now. You wrote "CLT uses sample size, but it does not make a comment about how the mean of the sample approaches the mean of the population as sample size increases. It does view the variance as changing with sample size". There is a lack of mathematical precision in your statements. You need to define what these phrases mean: make a comment; approaches; view the variance.
4) The current sentence in the article that you find objectionable was (many posts back): "The variance as given by CLT collapses as the sample size grows larger, it follows that the mean converges to a number (which CLT says is the population mean)." To this you said "But CLT does not directly address sample size." Define "directly address". Also please state what part of the sentence in the current article is wrong, and precisely what the error is? Regards, JS 07:52, 15 January 2007 (UTC)


1)Yes, you can do testing on a small sample but I understand what you mean. As sample size increases, you are more confident of a normal distribution. Thus the small sample adjustments that modify the normal distribution.
2)You can choose a variety of non normal distributions such as skewed, uniform or custom distributions. You have to hit the button to the top left.
3)I agree generally and accept the blame. Sometimes it is that I have done an answer and then somehow it gets lost and I have to retype and I get impatient. (That just happened) I do not understand your last request though ("You need to define what these phrases mean: make a comment; approaches; view the variance.") But that might be a distraction anyway.
4)Actually, my main problem is that you are declaring LLN to be superceded or replaced by CLT. But maybe you are right. I don't think so, but maybe. I have been tossing it back and forth in my mind. And I can see the math.
I think that part of my problem with this is how I connect the two concepts. I think of LLN as saying "large enough samples can help you determine population mean and variance" while CLT says: "FURTHERMORE the sample is normally distributed". To me, the furthermore is important. I think you do really think about "futhermore" and instead see it this way: the CLT says that "a large enough sample can give you mean and variance of the population in a normally distributed response". All complete. You see CLT as including LLN and I see CLT as additive to LLN. Perhaps this is like a comparison of the set of counting integers with the set of rational numbers. Rational Numbers may include the counting integers, while the counting integers do not include the rational numbers but they do form the foundation for the rational numbers. I could go on, but this is long enough for now. --Blue Tie 12:47, 15 January 2007 (UTC)
Hello Blue, I do not care much how LLN and CLT are connected as long as they are accurately described. You wrote "I think of LLN as saying "large enough samples can help you determine population mean and variance" while CLT says: "FURTHERMORE the sample is normally distributed"." I agree with what you wrote except that LLN only provides mean, not the variance. I think what "precedes" what can be argued either way, and really I think it is unimportant.
If you do have only the CLT result, you can get to LLN. But if you have only LLN, you cannot get to CLT. But for the lay reader to know this may be unimportant, I would be satisfied if they were described as "related results". You can accordingly change the article if you believe the current version says that CLT "precedes" LLN. Regards, JS 22:27, 15 January 2007 (UTC)

[edit] LLN and likelihood

The article had the sentence "In particular, it permits precise measurement of the likelihood that an estimate is close to the "right" or true number." I removed it as I think this is wrong. "measurement of likelihood" requires the distribution, whereas LLN is only an asymptotic result. For distributions we need the CLT. LLN does not enable "precise measurement of likelihood". JS 22:54, 13 January 2007 (UTC)

An asymptotic result IS a distribution. The LLN provides for an estimate of both the population mean and the population standard deviation or variance. However, to define this in terms of likelihood may indeed require the Central Limit Theorem. So, rather than it permitting a precise measurement of the likelihood, it provides an estimate of the mean and an analysis of the "reasonable" range in which the mean may be found. --Blue Tie 20:39, 14 January 2007 (UTC)
One requires the distribution, and (estimate of) variance to estimate "likelihood". Do you have any reference showing LLN provides these? The reason I ask is that if LLN indeed provided the distribution and variance, then it appears to me that it would provide the results of CLT. JS 21:39, 14 January 2007 (UTC)
As I reflect on it, I do not think you can show likelihood without the CLT. For example, when those assumptions are not used, pollsters use an "error margin of XX %". But what that margin really means is not well defined. The CLT actually defines such things because it uses a distribution to provide a probability. However, the LLN certainly considers variance. Look at the assumptions. It assumes a population with a fixed mean and a fixed variance. Consider: As N increases toward infinity the sample variance will approach, asymptotically toward the expected variance. If you were to know, in advance, the population variance, you could use that information to provide a measure of closeness, if not exactly probability. I do not think anyone tries to do such analysis in depth because the research has gone in the direction of the CLT, but I believe it would be a natural progression if the CLT were not discovered.
Getting more to your point, I believe that the two fundamental theorems are co-equal and describe different things. The CLT looks at the distribution of sample means -- regardless of sample size -- while the LLN looks at the size of a sample without regard to how sample means are distributed.
But without original research, lets look outside of wikipedia. If the CLT answered the same question as the LLN then there would be no need for the LLN -- it would not be the 1st Fundamental Theorem of the Probability and CLT would be the 2nd. Instead, CLT would be the only one mentioned. It is not. That is because CLT does not describe the behavior of the mean with respect to the population as N approaches infinity. --Blue Tie 23:59, 14 January 2007 (UTC)
Hello Blue, There are multiple confusions in your last post.
1) You wrote "LLN certainly considers variance". Most statistical results require population variance to be finite and LLN is no different. However I never said that LLN does not consider population, what I said was that LLN does not provide the variance for the sample mean. And sample variance is also different from the variance of the distribution of sample means.
2) CLT does not look as sample mean distribution "regardless of sample size". Look at the proofs of CLT and you will see they require sample size to be "large" (infinite).
3) I do not know how the nomenclature 1st, 2nd etc. arose. Nor can you prove that CLT does not lead to LLN by saying "CLT would be the only one mentioned". These are mathematical issues, and there should be no reason for this kind of ambiguity. JS 07:08, 15 January 2007 (UTC)
You wrote "pollsters use an "error margin of XX %". But what that margin really means is not well defined." That is not correct. Pollsters are pretty good statisticians. They usually use 95% confidence intervals when they give error margins. If you delve into their reports you will find the details, though they may not be reported for the lay public. For example Gallup describing one of its polls says "For results based on this sample, one can say with 95% confidence that the maximum error attributable to sampling and other random effects is ±3 percentage points." [1] —The preceding unsigned comment was added by Jayanta Sen (talkcontribs) 07:59, 15 January 2007 (UTC).
1)LLN does not provide variance for the sample mean but it is a normally contemplated statistic for such samples (separate from either LLN and CLT). I have lost track about why this might be important.
2)The proof of both LLN and CLT contemplate infinity but is that really required in either case? Doesn't LLN hold true even if a sample has only 2 observations? Doesn't that also hold for CLT?
3)I do not know either, but they exist and are standard. The history would be intesting. Referring to my previous comment before and relating it to your current comment.. would you say that Rational Numbers lead to Counting Integers or that Counting Integers lead to Rational Numbers?
4)I agree that I erred in that example --Blue Tie 13:08, 15 January 2007 (UTC)
You wrote "Doesn't LLN hold true even if a sample has only 2 observations? Doesn't that also hold for CLT?" As far as I understand, both LLN and CLT require sample sizes to go to infinity. However these results are applied whenever the statistician believes the sample size is "large" enough.
Variance of the distribution of the sample means is important, because if statistical inferences are to be made it is required. My point was that as LLN doesn't provide the variance (or the distribution) it is not sufficient for estimating likelihood (testing statistical hypotheses). Regards, JS 22:33, 15 January 2007 (UTC)

[edit] Gambler's Fallacy etc.

The article had the sentences "However, in an infinite (or very large) set of observations, the value of any one individual observation cannot be predicted based upon past observations. Such predictions are known as the Gambler's Fallacy." These sentences seem to have no connection with LLN and I am removing them. The issues addressed by these sentences are sampling with or without replacement, independence of draws etc. Not central to LLN and confusing to the reader to have them here. JS 23:07, 13 January 2007 (UTC)

Big mistake. The gambler's fallacy is frequently mentioned in connection with misperceptions about the Law of Large Numbers. They should not be removed. --Blue Tie 16:32, 14 January 2007 (UTC)
At the end of a later paragraph which ends with the sentence "For example, the odds that you will win the lottery are very low; however, the odds that someone will win the lottery are quite good, provided that a large enough number of people purchased lottery tickets." the removed sentences make more sense. I am accordingly repositioning. JS 16:37, 14 January 2007 (UTC)
I have explicitly explained the misperception you refer to. JS 17:04, 14 January 2007 (UTC)

[edit] Sample mean may never become exactly equal

The current articles contains this sentence:

"The strong law states that as the sample size grows larger, the probability that the sample mean and the population mean will be exactly equal approaches 1."

This is not true. Consider a population with values 0 and 1, where 1 has probability 1/sqrt(2). Then the population mean is also 1/sqrt(2), which is an irrational number. The sample means are always rational numbers however. So the probability that the sample mean is exactly equal to the population mean is 0, and it remains 0 if the sample size goes to infinity.

Furthermore, the strong law does not say that some probability approaches 1, it says that a certain probability is exactly 1, namely the probability that the sample mean converges to the population mean.

A better formulation would be

"The strong law states that almost every sample mean will approach the population mean arbitrarily close as the sample size increases. Although one can theoretically conceive samples for which this does not hold (for example, throwing infinitely many fives with a dice), the law strong law implies that these samples jointly have a probability exactly 0, which means that they are practically impossible."

JulesEllis 23:18, 18 January 2007 (UTC)

The sample mean will approach 1/sqrt(2) as the sample size goes to infinity. There is no problem with this. Your restatement would not be right, because as long as the assumptions hold, the law will hold also... not "almost all". I will try to return this weekend to go further, but not now.--Blue Tie 04:12, 19 January 2007 (UTC)

I believe Jules is right about the current "exactly equal" has a specific meaning in mathematics, different from "approaches" or "converges". A well known fact is that an irrational number cannot be expressed as the ratio of two integers, and that is what Jules is referring to here. But Jules, shouldn't the better formulation be "samples have a probability "approaching" or "converging" to 0, rather than "exactly" zero? JS 08:07, 19 January 2007 (UTC)
It seems to me what I stated in the above post would remove the difference between the weak law and the strong law. As this is "technical" I will remove myself from this discussion. JS 17:19, 19 January 2007 (UTC)

Blue Tie, of course there is no problem with the law, the problem is the formulation in the quoted sentence of the article. My restatement is exactly right, and not in contradiction with the law. The only problem is that you obviously do not know the meaning of the term almost all, which has an exact meaning in measure theory and probability theory (where it usually rephrased as almost surely). This meaning is that the complement of the event has measure 0, or in the special case of a probability space, probability 0. This implies that the probability of the stated event is 1. Exactly 1. Not merely approaches 1. The complement having probabiliy 0, however, does not mean that the complement is the empty set (= the impossible event). The situation is entirely analogous to the length of a mathematical point on a line. The point has length 0 (exactly 0) even though it exists. Anyhow, many other readers will probably not know the meaning of almost either, so on this point a rephrasing of my formulation is needed. The present formulation in the article is evidently an error though. Believe me, or read a good book like Billingsley, Probability and measure, 1986.

JS, indeed, that would rather be the weak law. JulesEllis 23:29, 19 January 2007 (UTC)

It occurs to me that for non-technical readers I can just delete the word almost in my reformulation. That there are exceptions with probability 0 is explained in the sentence after it. JulesEllis 10:59, 20 January 2007 (UTC)

I just noticed that the sentence before it, about the weak law, is wrong too. The person who wrote this seems to confuse the weak and the strong law! I wonder why someone writes about something he obviously didn't understand. I will change it shortly. JulesEllis 10:19, 22 January 2007 (UTC)

[edit] Removed lottery example

I removed the text "For example, the odds that you will win the lottery are very low; however, the odds that someone will win the lottery are quite good, provided that a large enough number of people purchased lottery tickets." I would like to see some cites that this is really a "less technical way to refer". Actually LLN is inappropriate for this example, as LLN speaks of sample means, whereas this is about the probability of one success in a sample of very low probability events. Besides as it says "win the lottery", it violates the independence requirement of LLN. JS 19:58, 3 March 2007 (UTC)

You are missing the point. It is a description of how some people use the term and usually this is their informal way of viewing it. Though informal, it is not technically incorrect. With a large enough number of trials, events with low probability may happen... and this derives from the law of large numbers. Because, as with the roll of the dice, the law of large numbers says that each side will get, on average, very close to its fair share of events given enough rolls. Thus, with enough trials, something with a probability of .000000001 may still be likely to occur at least once or more often. So, though the lottery may pay off with a probability of .0000001, it will happen if enough people buy tickets. --Blue Tie 03:20, 4 March 2007 (UTC)

[edit] Moving CLT

I have a problem with this paragraph:

The central limit theorem (CLT) gives the distribution of sums of identical random variables, regardless of the shape of the distribution of the random variables (as long as the distribution has finite variance), as long as the number of random variables added is large. CLT thus applies to the sample mean of a large sample as the mean is a sum. The variance as given by CLT collapses as the sample size grows larger, it follows that the mean converges to a number (which CLT says is the population mean). This is the LLN. So LLN is a result that can be obtained from the CLT.

CLT allows statisticians to evaluate the reliability of their results because they are able to make assumptions about a sample and extrapolate their results or conclusions to the population from which the sample was derived with a certain degree of confidence. See Statistical hypothesis testing.


Here is part of my problem: The central limit theorum is about the distribution of the sample mean. The law of large numbers, not only says that the probability of all rolls will be the expected mean, but that the probability of each outcome will be exactly 1/6. This is not quite the same thing as the CLT. Furthermore, this paragraph justifies by saying that the variance collapses as the sample size grows, which is not exactly true.

Another part of my problem is that this is an article about the Law of large numbers and not about the central limit theorum. The law of large numbers does not depend upon the central limit theorum and no discussion of the CLT is required to understand the LLN. So, in this article it becomes confusing. It is further confusing in this paragraph because the wording is a bit obtuse.

For these reasons I have moved it here.

--Blue Tie 03:57, 4 March 2007 (UTC)

[edit] Article has become less useful?

There have been large scale changes made in one day to this article. In my opinion it has made the article to be less helpful to a reader wanting to understand LLN. Specifically the history of LLN and the quotes provided are confusing and unhelpful. If other editors agree then we should revert these changes. Also the reasons for the creation of the two sections "Probability" and "Statistics" and the distinction between the two are not apparent to me. To make it easier to judge, here is the version of the artcle prior to the changes: http://en.wikipedia.org/w/index.php?title=Law_of_large_numbers&oldid=112389587 and here is the version after the changes: http://en.wikipedia.org/w/index.php?title=Law_of_large_numbers&oldid=112521145 Regards, JS 12:23, 4 March 2007 (UTC) I am okay with removing the section about CLT as it may or many not aid the reader in understanding LLN, but the rest of the edits are problematic. JS 13:14, 4 March 2007 (UTC)

I think it is helpful to read what Bernoulli originally said as he discussed the theory. He expressed his views in a way that are helpful for non-mathematicians to understand. When someone can do that, it is useful. Einstein tried to do the same thing with his papers.
The difference between probability and statistics is this: The Law of Large Numbers is expressed in terms of probability. It may not be exactly clear to some readers how it applies to statistics.
I think the structure is not quite right. I thought so last night, but it was late. So I will try to fix it. But my main interest is to make the opening paragraphs more accessable to people who do not care about the mathematical proofs -- people who hear about the Law and want to know briefly what it is, or parents having to deal with kid's homework. --Blue Tie 13:47, 4 March 2007 (UTC)
Check it now. I still think it would be nice to quote Bernouilli more fully but this is probably sufficient. I hope it reads better --Blue Tie 16:50, 4 March 2007 (UTC)

[edit] Maybe a silly question

I'm not a mathematician but I can follow the basic idea in the article (I think!) It explains how the law describes certain behavior. My question, though, is why the behavior occurs in the first place. What is it that maintains the tendency for randomness to "even out" over large samples of events? In other words, why should I expect that the coin would come up roughly 500 times heads and roughly 500 times tails, rather than (e.g.) 499 heads and 1 tail (etc.)? Can anyone answer this? Thanks. 89.100.149.237 11:16, 4 April 2007 (UTC)


It's not a silly question. It gets to the heart of what "probability" is. One of the reasons that this is called the "Law" of large numbers is that it is considered somewhat axiomatic -- it "just is". Physcially, perhaps the reasons are rooted in physics. We have a certain number of dimensions in which a coin flip operates. We have gravity. We have mass. But really, it is a thought experiment. There can be only four possible alternatives: 1)The coin turns up heads. 2) The coin turns up tails. 3) The coin lands on its edge. 4)The coin vanishes into thin air when tossed. Ok, clearly the last is not reasonable. Coins actually CAN come to rest on their edges and sometimes do, but for this experiment we PRESUME that there are only two final outcomes -- heads or tails -- even if the coin lands on its edge it falls one way or the other. No other outcomes are allowed to be thought about. Assuming (this is a thought experiment)that the coin is "fair" -- that means that the coin is perfectly round and has no weight anomalies that favor landing on one side or the other -- andy toss may result in either a head or a tail. Since there are only two possible outcomes and neither outcome is favored by the coin, it will "choose" to fall one way sometimes and the other way other times, in a fashion that is entirely random. On average, this randomness will not favor either side but will be a perfect split between the two choices -- 50%. --Blue Tie 12:23, 4 April 2007 (UTC)

Thanks! I guess what bothers me is the concept of probability in the first place. Since the coin (obviously) doesn't "know" which way to fall in one case--i.e the result is entirely random, or let us suppose so for the sake of argument--then there is no principle at work in the individual case other than randomness. What I don't understand is why randomness multiplied by n equals some kind of pattern. (I mean, I understand that it does--I just don't understand why.)89.100.149.237 17:18, 4 April 2007 (UTC)

You are right. It is guided entirely by randomness.
Lets see if we can understand where the confusion is. When the coin comes down, there is absolutely no way to know, in advance which side will be up. It could be heads or it could be tails. But it can ONLY be one of those two choices.
First you must recognize that there are only two possible outcomes. Is that a problem?
Next you must recognize that out of these possible outcomes, there is only going to be one result each time - either heads or tails. Do you recognize that?
Third, and maybe this is the hardest one of all, you must recognize that the coin is "fair", that is that it does not "prefer" to fall one way or the other and it does not "remember" how it fell in the past. Each toss is its own toss and heads may turn up just as easily as tails each and every time. There is no magical force that will cause it to suddenly prefer to fall one way or the other.
Finally, since there are 2 possible outcomes but only 1 of these will actually occur, the chance of that one occurrance is the number of things that will actually occur divided by the number that might have possibly occurred. In this case, for example, the chance that the coin will land heads is 1/2 =50%. Incidentally, the chance that the coin will land tails is also 1/2 =50%. This means that the chance the coin will land either heads or tails is 1/2 +1/2 =1 =100% of the time it will be either heads or tails.. (We ignored landing on its edge or disappearing into air). Is that part clear?
Maybe what you are really asking is "Why does it not work perfectly? Why is it that I can toss a coin 10 times and sometimes it will be 7 heads and 3 tails. Other times it will be 4 heads and 6 tails. Why is it not exactly 5 heads and 5 tails?" I will answer that, but I have to use a smaller example of 4 tosses instead of 10. You will see why in just a moment.

Looking into the future, the chance that you will get a head or a tail is 50% on any throw. So what is the chance that you will get 2 heads and 2 tails out of 4 throws? First, figure out how many different ways the coin can come up in 4 tosses. This is called an "Outcome Table". Here it is:

Outcome Table - 4 Coin Tosses

First Second Third Fourth
H H H H
H H H T
H H T H
H H T T
H T H H
H T H T
H T T H
H T T T
T H H H
T H H T
T H T H
T H T T
T T H H
T T H T
T T T H
T T T T

One thing you notice immediately is how long it is. The number of final possible outcomes is the number of outcomes possible on a toss (2) raised to the number of trials or 2^4. That equals 16. If we had gone with 10 tosses our table would have been 2^10 =1024 rows long. Way too long. That is why I chose just 4 for the example.

If you look at the rows that are highlighted, you can see that those rows have exactly 2 heads and 2 tails. That means that the chance of tossing the coin and getting exactly 50-50 heads and tails is only 6/16 = 37.5% - a bit more than 1 in 3. That means we are more likely to get a some other combination of heads and tails. You would think that with a fair 50-50 coin you would get exactly 2 heads and 2 tails more often. But you won't. This is called the Binomial Probability Distribution by the way.

If we had chosen to look at 10 coin tosses, out of 1024 possible outcomes we would have seen exactly 512 heads and 512 tails 252 times. That would be 252/1024 = about .246 -- not quite one in four.

We can also see the probability of getting 4 heads in a row. (That is the one on the top row) It is 1 in 16. About 6%. If we looked at getting 10 heads in a row it would be 1 in 1024 -- less than 1 chance in a thousand.

I hope that helps a bit --Blue Tie 20:50, 4 April 2007 (UTC)