Talk:Normal distribution
From Wikipedia, the free encyclopedia
Please add your comments to the end of the article.
[edit] Archives
- Pre-2005 (Range: 2002 - 2004)
- 2005 comments with two from 2006 (Range: 2005 - 2006)
[edit] Geometric mean and asset returns?
The usual mean is μ, what's the geometric mean? If μ is 1,15 and σ 0,2 the geometric mean seems to be around 1,132 or something? There's a formula for it right? -- JR, 10:29, 8 April 2007 (UTC)
- It doesn't make sense to speak of a geometric mean of a random variable that's not always positive. Michael Hardy 20:26, 8 April 2007 (UTC)
- I took a second look at the book (John Hull, chapt 13). It's assumed that the realized cumulative (geometric) return is ø(μ - σ^2/2, σ/T) over for example 100 periods T. So that distribution can not be used to test the real cumulative return. It sems to assume that the return per period T is distributed as ln x ~ ø(μ - σ^2/2, σ) (from eq 13.2) which is e ^ ø(8%, 20%) when μ = 10% and σ = 20%. The artithmetic mean of that lognormal distribution is 10.517... % and the geometric mean is 8.3287.... % per period T over a large number of periods. So e ^ ø(8%, 20%) is the return that can be tested for a large number of periods. It will give a compounded return of close to 8.3287.... %. So my question is, why would you call an expected cumulative return of 8.3287....% 10%? I'll see if reading the other chapters will clear things up. -- JR, 11:31, 15 April 2007 (UTC)
[edit] error in the cdf?
You need to be more specific about what exactly you think might be wrong. --MarkSweep✍ 00:19, 8 September 2005 (UTC)
[edit] Integrating the normal density function
can any1 tell me wat the integral of 2pi^-0.5*e^(-o.5x^2) is?? i tried interating by parts and other methods but no luck. can sum1 help
- The antiderivative does not have a closed-form expression. The definite integral can be found:
- See Gaussian function for a derivation of this. Michael Hardy 20:40, 22 May 2006 (UTC)
- I didn't find an explicit derivation in the Gaussian function article, so I created this page: Integration_of_the_normal_density_function. Would it be appropiate to place this link somewhere in the normal distribution article? Mark.Howison 06:32, 1 February 2007 (UTC)
Sorry---it's not in Gaussian function; it's in Gaussian integral. Michael Hardy 21:31, 1 February 2007 (UTC)
[edit] Gaussian curve estimation
I came to this article looking for a way to approximate the gaussian curve, and couldnt find it on this page, which is a pity. It would be nice to have a paragraph about the different ways to approximate it. One such way (using polynoms on intervals) is described here: [1] I can write it, any suggestion for where to put this ? top level paragraph before trivia ? --Nicolas1981 15:53, 6 October 2006 (UTC)
- I think it would fit there. Michael Hardy 19:52, 6 October 2006 (UTC)
-
- I added it. I felt a bit bold because it is very drafty when compared to the rest of the page, but I hope that many people will bring their knowledge and make it an interesting paragraph :-) Nicolas1981 21:37, 6 October 2006 (UTC)
I just noticed that the French article has a good paragraph about a trivial way to approximate it (with steps). There is also this table on wikisource. I have to go out now, but if anyone wants to translate them, please do :-) Nicolas1981 21:54, 20 October 2006 (UTC)
[edit] reference in The Economist
Congratulations, guys - the Economist used Wikipedia as the source for a series of pdf graphs (the normal, power-law, poisson and one other) in an article on Bayesian logic in the latest addition. Good work! --Cruci 14:58, 8 January 2006 (UTC)
[edit] Typesetting conventions
Please notice the difference in (1) sizes of parentheses and (2) the dots at the end:
Michael Hardy 23:54, 8 January 2006 (UTC)
[edit] Quick compliment
I've taught intro statistics, I've treatments in many textbooks. This is head and shoulders above any other treatment! Really well done guys! Here is the Britannica article just for a point of comparison (2 paragraphs of math with 2 paragraphs of history) jbolden1517Talk 18:54, 5 May 2006 (UTC) <clapping>
-
- Thank you. (Many people worked on this page; I'm just one of those.) Michael Hardy 22:02, 5 May 2006 (UTC)
[edit] Eigenfunction of FFT?
I believe the normal distribution is the eigenfunction of the Fourier transform. Is that correct? If so, should it be added? —Ben FrantzDale 16:57, 26 June 2006 (UTC)
- That was quick. According to Gaussian function, all Gaussian functions with c2=2 are, so the standard normal, with σ=1, is an eigenfunction of the FFT. —Ben FrantzDale 16:57, 26 June 2006 (UTC)
[edit] q-function
i'm trying to find out what a q-function is, specifically q-orthogonal polynomials. I searched q-function in the search and it came here. I'm guessing this is wrong. —Preceding unsigned comment added by 149.169.52.82 (talk)
[edit] Added archives
I added archives. I tried to organize the content so that any comments from 2006 are still on this page. There was one comment from 2006 that I didn't think was worth keeping. It's in the 2005 archive. If you have any questions about how I did the archive, ask me here or on my talk page. — Chris53516 (Talk) 14:51, 7 November 2006 (UTC)
[edit] Can you please link the article to the Czech version
Hello, can you please link the article to the Czech version as follows?
I would do it myself but as I see some characters as question marks in the main article I am afraid that I would damage the article by editing it. Thank you. —Dan
- Ok, I did it. Check out how it is done, so you can do it yourself in the future. PAR 10:47, 12 November 2006 (UTC)
[edit] Kurtosis Clarity
Is there a way to make clear that the kurtosis is 3 but the excess kurtosis (listed in table) is 0? Some readers may find this confusing, as it isn't explicitly labeled.
- Well, it looks clunky, but I changed it. PAR 01:48, 15 November 2006 (UTC)
huh?-summer
what? PAR 00:42, 14 December 2006 (UTC)
[edit] Standard normal distribution
In the section "Standardizing normal random variables" it's noted that "The standard normal distribution has been tabulated, and the other normal distributions are simple transformations of the standard one." Perhaps these simple transformations should be discussed? —The preceding unsigned comment was added by 130.88.85.150 (talk • contribs) 11:36, 4 December 2006 (UTC).
- They are discussed in the article, just above the sentence that you quote. Michael Hardy 21:58, 4 December 2006 (UTC)
-
- I reworded the section slightly to make that clearer. --Coppertwig 04:35, 5 December 2006 (UTC)
[edit] Jonnas Mahoney?
Um... This is my first time commenting on anything on Wiki. There seems to be an error in the article, although I'm not certain. Jonnas Mahoney... should really be Johann Carl Friedrich Gauss? Who's Jonnas Mahoney? :S
Edit: lol. fixed. that was absolutely amazing.
-- —The preceding unsigned comment was added by Virux (talk • contribs).
[edit] PDF function
I believe there is an error in the pdf function listed, it is missing a -(1/2) in the exponent of the exp!!! —The preceding unsigned comment was added by 24.47.176.251 (talk) 19:02, 11 December 2006 (UTC).
- Well, scanning the article I find the first mention of the pdf, and clearly the factor of −1/2 is there, where it belongs:
-
- The probability density function of the normal distribution with mean μ and variance σ2 (equivalently, standard deviation σ) is a Gaussian function,
-
- where
-
- is the density function of the "standard" normal distribution, i.e., the normal distribution with μ = 0 and σ = 1.
- similarly I find the factor of −1/2 in all the other places in the article where the density is given. Did I miss one? If so, please be specific as to where it is found. Michael Hardy 21:26, 11 December 2006 (UTC)
One thing about the PDF: I was for a moment under the mistaken impression that the PDF can't go higher than 1. This mistaken impression was supported by the fact that the graphs have y values < 1. However, I believe it can go arbitrarily high (max , where σ can be arbitrarily small). I wonder if someone could produce a graph with y values higher than one, just for illustration. dfrankow (talk) 22:36, 1 March 2008 (UTC)
[edit] Definition of density function
I know I'm probably being somewhat picky, but here goes: In the section "Characterization of the Normal Distribution," we find the sentence:
The most visual is the probability density function (plot at the top), which represents how likely each value of the random variable is.
This statment isn't technically accurate. Since a (real-valued) Gaussian random variable can take on any number on the real line, the probability of any particular number occuring is always zero. Instead, the PDF tells us the probability of the random variable taking on a value inside some region: if we integrate the pdf over the region, we get the probability that the random variable will take on a number in that region. I know that the pdf gives a sort of visual intuition for how likely a particular realization is, so I don't want to just axe the sentance, but maybe we can find a way to be precise about this while avoiding an overly pedantic discussion like the one I've just given? Mateoee 19:46, 12 December 2006 (UTC)
- I took a try at it, staying away from calculus. It's still not correct, but its closer to the truth. PAR 23:50, 12 December 2006 (UTC)
-
- I think I found a way to be precise without getting stuck in details or terminology. What do you think? Mateoee 03:19, 14 December 2006 (UTC)
-
-
- Well, its correct, but to a newcomer, I think its less informative. Its a tough thing to write. PAR 03:41, 14 December 2006 (UTC)
-
The new version seems a bit vague. But I don't think this article is the right place to explain the nature of PDFs. It should just link to the article about those. Michael Hardy 17:05, 14 December 2006 (UTC)
[edit] Summary too high depth
I linked this article for a friend because they didn't know what a normal distribution was. However the summary lacked a breif english language notion of what one is. The summary is confusing for people who haven't had some statistics. If there's not immense negative reaction to altering the summary, I'll do that tommorow. i kan reed 23:08, 2 January 2007 (UTC)
[edit] weird way of generating gaussian
Does anyone know why the following method works?
Generate n random numbers so that n>=3
Add the results together
Repeat many times
Create a histogram of the sums. The histogram will be a "gaussian" distribution centered at n/2. I put "gaussian" in quotes because clearly the distribution will not go from negative infinity to infinity, but will rather go from 0 to n. It sounds bogus, but it really works! I really wish I knew why though. --uhvpirate 23:04, 16 January 2007 (UTC)
- The article titled central limit theorem treats that phenomenon. Michael Hardy 01:34, 17 January 2007 (UTC)
[edit] lattice distribution
Can someone add a link about lattice distribution? Of course, and add an article about lattice distribution. Jackzhp 23:40, 7 February 2007 (UTC)
[edit] Open/closed interval notation
In this sentence: "' uniformly distributed on (0, 1], (e.g. the output from a random number generator)" I suspect the user who called this a "typo" and changed it to "[0, 1]" (matching square brackets) didn't understand the notation. "(0, 1]" means an interval that includes 1 but does not include 0. "[0, 1]" includes both 0 and 1. Each of these intervals also includes all the real numbers between 0 and 1. It's a standard mathematical notation. Maybe we need to put a link to a page on mathematical notation? --Coppertwig 13:10, 13 February 2007 (UTC)
[edit] sum-of-uniforms approximation
The sum-of-uniforms approximate scheme for generating normal variates cited in the last section of the article is probably fine for small sets (<10,000), but the statement about it being 12th order is misleading. The moments begin to diverge at the 4th order. Also, note that this scheme samples a distribution with compact support (-6,6); so it is ill-advised for any application that depends on accurate estimation of the mass of extreme outcomes. JADodson 18:58, 15 February 2007 (UTC)
- Applications that depend on accurate estimation of the mass of extreme outcomes are rare, and they are rarely exactly normal, because the normal distribution is often used as an approximation to some nonnormal distribution, such as a gamma or beta or poisson or binomial or hypergeometric distribution. So an unsophisticated method is called for, such as the sum of uniforms. Bo Jacoby 16:01, 9 April 2007 (UTC).
[edit] Complex Gaussian Process
Consider complex Gaussian random variable,
were x and x are real Gaussian variables, with equal variances σr = σx = σy. The pdf of the joint variables will be,
since , the resulting PDF for the complex Gaussian variable is,
[edit] Parameters
In the article as it stands, the distribution has parameters μ and σ2, where as the distribution function has parameters μ and σ (in addition to its argument, x). I have found sources corrobating this choice, but it seems odd. I am aware that wikipedia should report on the state of affairs, not try to repair on it. But if some sources could be found that use either σ in both cases, or σ2 in both cases, we might do the same, and just indicate briefly that other sources do it differently. Any comments?--Niels Ø (noe) 12:06, 30 April 2007 (UTC)
- I've changed it: they now all say μ and σ2 (I think the comment by user:209.244.152.96 misses the point). Michael Hardy 19:56, 27 August 2007 (UTC)
This is an unnecessary discussion. The two parameters are μ and σ2. It just so happens that in the pdf the square root of σ2 appears. And since no one would write a non simplified item into a pdf function they wrote it as σ. It does not mean there is any discrepancy in the statement of the parameters.
If you need further explanation think of it like this. Whether you use σ or σ2 as the parameter will essentially NOT change the pdf at all!
Remember that in a given normal distribution σ has some specified decimal value. If you use σ then that value will simply remain unchanged in the overall denominator and then squared in the exponent of e; if you use σ2 then that value’s square root will be taken in the denominator and it will remain unchanged in the exponent of e. But in either case when you write the generalized function the denominator will always have σ and the exponent of e will always have σ2, regardless of which one you choose to put in the statement of parameters. It does not matter and YOU CANNOT use BOTH at once in the statement of parameters. In addition, x is not a parameter, it is the representation of specific decimal values for the normally distributed random variable X. —Preceding unsigned comment added by 209.244.152.96 (talk) 18:53, August 27, 2007 (UTC)
[edit] A totally useless article for the majority of people
I consider myself a pretty smart guy. I have a career in IT management, a degree, and 3 technical certifications. Granted, I am certainly not brilliant, nor am I an expert in statistics. However, I was interested in learning about the normal curve. I have a only a fair understanding of standard deviation (compared with the average person who has no idea what SD is) but wanted to really "get it" and wanted to know why the normal curve is so fundamental. Basically, I wanted to learn. So I googled "normal curve". As always, Wiki comes up first. But sadly, as (not always, but usually), the article is hardly co-herent. This article to me was written by the PhD for the PhD. it is not condusive to learning...it is condusive to impressing. It reminds me of a graduate student trying to impress a professor "look Dr. Stat, look at my super complex work". This article has defeated the purpose of wiki to me, which is to educate people. Now I will go back to Google and search for another article on the normal curve that was written for the average person who wants to learn, rather than the stat grad. Wiki is chronic for this. Either articles are meant as a politically biased rant (so much bias here), or written for a "niche" community (like this article). But so few of them are actually written to introduce, explain, and heighten learning. I read 2 paragraphs of this article, and that was more than enough. You might think I'm just too stupid to understand, and thats fine. But when I make contributions to articles that are about internet protocols and networking, I make sure that the layperson is kept in mind. This was not done here. What is so hard...seriously...about just introducing a topic and providing a nice explanation for people who do not have statistical degrees?—The preceding unsigned comment was added by 24.18.108.5 (talk) 19:57, 1 May 2007 (UTC).
-
- I think it was written to be understood by people who do not already know what the normal distribution is, and it succeeds in being comprehensible to mathematicians who don't know what the normal distribution is, and also to anyone who's had undergraduate mathematics and does not know what the normal distribution is.
- Granted, some material at the beginning could be made comprehensible to a broader audience, but why do those who write complaints like this always go so very far beyond what they can reasonably complain about, saying that it can be understood only by PhD's or only by people who already know the material?
- And why do they always make these abusive suggestions about the motives of the aauthors of the article, saying it was intended to IMPRESS people, when an intention to impress people would (as in the present case) clearly have been written so differently?
- I am happy to listen to suggestions that articles should be made comprehensible to a broader audience, if those suggestions are written politely (instead) and stick to that topic instead of these condescending abusive paranoid rants about the motives of the authors. Michael Hardy 01:00, 2 May 2007 (UTC)
-
- Another reply.
- I agree with you that many mathematics articles do not do a good enough job of keeping things simple. Sometimes I even think that people go out of their way to make things complicated. So, I empathize with you.
- My advice to you is that after you do your research, ir would really be awesome if you came here and shared with us some paragraphs that really made you "get it". The best person to improve an article that "is written by PhDs" is you! One thing to keep in mind though is that an encyclopaedia has to function as a reference first and foremost. It's not really a tutorial, which is what you're looking for. Maybe in a few years the wikibooks on statistics will be better developed. As a reference, I think this page works well. (For example, suppose that you want to add two normal distributions, then the formula is right there for you.)
- Perhaps if you're struggling with the introduction, it occurred to me that you might not know what a probability distribution is in the first place. You might want to go to probability theory or probability distribution to get the basics first. One of the nice things about wikipedia is that information is separated into pages, but it means that you have to click around to familiarize with the background as it's not included in the main articles. MisterSheik 01:15, 2 May 2007 (UTC)
MisterSheik, do you have ANY evidence for your suspicion that anyone has ever gone out of their way to make things complicated? Can you point to ONE instance?
I've seen complaints like this on talk pages before. Often they say something to the general effect that:
- The article ought to be written in such a way as to be comprehensible to high-school students and is written in such a way that only those who've had advance undergraduate courses can understand it.
Often they are right to say that. And in most cases I'd sympathize if they stopped there. But all too often they don't stop there and they go on to attribute evil motives to the authors of the article. They say:
- The article is written to be understood ONLY by those who ALREADY know the materials;
- The authors are just trying to IMPRESS people with what they know rather than to communicate.
Should I continue to sympathize when they say things like that? Can't they suggest improvements in the article, or even say there are vast amounts of material missing from the article that should be there in order to broaden the potential audience, without ALSO saying the reason those improvements haven't been made already is that those who have contributed to the article must have evil motives? Michael Hardy 01:51, 2 May 2007 (UTC)
Hi Michael. I think that the user's complaint was definitely worded rudely, and so I understand your indignation. It's not like he's paying for some service, but he's looking for information and then complaining that it isn't tailored for him. So, rudeness aside.
I'm going to go through some pages, and you can tell me what you think. (Apologies in advance to the contributors of this work.) Look at this version of mixture model: [2]. Two meanings? They're the same meaning.
But, what about this? [3] versus now pointwise mutual information.
There's a lot of this wordiness going on as well: [4], [5]], [6], [7], [8], [9] and [10].
And equations for their own sake: [11] and [12] (looks like useful information at first, but it's just an expansion of conditional entropy.)
Maybe all of the examples aren't perfect, but some are indefensible.
I like to see things explained succinctly, but making the material instructional instead of a making it function as a good reference is a bad idea, I think. And that's one of the things I told the person: find the wikibook.
But I still haven't answered your point about
- The article is written to be understood ONLY by those who ALREADY know the materials;
- The authors are just trying to IMPRESS people with what they know rather than to communicate.
Maybe it's not happening intentionally, or even consciously, but how do people produce some of the examples above without first snapping into some kind of mode where they are trying to speak "like a professor does"?
MisterSheik 03:33, 2 May 2007 (UTC)
- I'm afraid I don't understand your point. You've shown examples of articles that are either incomplete or in some cases inefficiently expressed, but how is any of this even the least bit relevant to the questions you were addressing? I said I'd seen it claimed that some articles are written to be understood only by those who already know the material; you have not cited anything that looks like an example. I said I'd seen it claimed that some articles were written as if the author was trying to impress someone. Your examples don't look like that. You say "maybe it's not happening intentionally", but you seem to act as if the articles you cite are places where it's happening. I don't see it. What in the world do you mean by speaking like a professor, unless that means speaking in a way intended to convey information? Are you suggesting that professors typically speak in a manner intended simply to impress people? Or that professors speak in a manner that communicates only to those who already know the material? Maybe you can mention some such cases, but you're actually acting as if that's typical.
- Could you please try to answer the questions I asked? Do you know any cases of Wikipedia articles where the author deliberately tried to make things complicated? You said you did. Can you cite ONE? Michael Hardy 21:57, 3 May 2007 (UTC)
-
- PS: In mixture models: No, they're not the same thing. Both involve "mixtures", i.e. weighted averages, but they're not the same thing. Michael Hardy 21:57, 3 May 2007 (UTC)
Hi Michael, it's fine to say that these ideas are inefficiently expressed, but why are they inefficiently expressed? I think it's because writers are subconsciously aiming to make things difficult in order to achieve a certain tone: the one that they associate with "a professor". In other words, I think that people are imagining a target tone rather than directly trying to convey information succinctly. ps they are both examples of a "mixture model", which has one definition ;) MisterSheik 23:00, 3 May 2007 (UTC)
- Well I think it's because they just haven't worked on the article enough. If you're going to make claims about their subconscious motivations, you have a heavy burden of proof, and you haven't carried it, so I'm not convinced, to say the least. Are you going to make assertions about what you believe, or are you going to try to convince me? And is that relevant to this article? Is there anything in this article that looks as if someone's trying to make things difficult for the reader, consciously or otherwise? It looks as if it's not written for an audience of intelligent high-school students, and possibly that could be changed with more work, but it is written for mathematicians and others who don't know what the normal distribution is. And you speak of what they associate with "a professor". You know what you associate with a professor; how would you know what others associate with a professor? The simple fact is, it's harder to write for high-school students than for professionals. Don't you know that? It takes more work, and the additional work has not been done, yet. Are you saying people did not do that additional work because they're trying (subconsciously, maybe?) to make things difficult for the reader? What makes you think that? Be specific. When people try to feign sounding like a professor, they typically misuse words in ways that look stupid to those who actually know the material. "An angry Martin Luther nailed 95 theocrats to a church door." That sort of thing. Using words in the wrong way and unintentionally sounding childish. That's not happening in this article. It's also not happening in the ones you cited. Some parts of those are clumsily written; some parts are hard to understand because there's not enough explanation there. This article is generally well-written, and that would be impossible if someone were trying to fake sounding like a professor.
- You're shooting your mouth off a lot, telling us about people's subconscious motivations, as if we're supposed to think you know about those, and it's really not proper to do that unless you're going to at least attempt to give us some reason to think you're right about this. Michael Hardy 23:45, 3 May 2007 (UTC)
Whoa. I'm not "shooting my mouth off". I made it really clear that it was my impression that sometimes I think that authors make things difficult to understand. How is that "improper". I'm just sharing my opinions about the motivations of authors unknown. No one is attacking you. I don't have a "heavy burden of proof", because they're just my opinions and you're entitled to disagree. I showed you some examples of what convinced me and asked you what you thought. Ask yourself if you're getting a bit too worked up over nothing here?
(On the other hand, when you use rhetoric like "Don't you know that?", I can't see that you're kidding, and so it sounds like you are shooting your mouth off.)
Regarding this article, I think its fine. I guess the "overview" section could be renamed "importance" since it's not an overview at all. And, the material could be reorganized a little bit since occurrence and importance have similar information, but maybe not.
You make a really good point about people feigning sounding like a professor, and we have both seen that kind of thing. That's not what I meant though. I was trying to get at professionals or academics who know the material going out of their way to word things awkwardly. Let's take one example: "A typical examplar is the following:" Are we supposed to believe that someone actually uses that kind of language day-to-day? Someone is trying to impress the reader with his vocabulary, or achieve an air of formality, or what? Whatever it is, it's bad writing that, due to its unnaturalness, seems intentional (to me). I'm not saying someone is intentionally trying to trip up the reader. I'm saying that someone is trying to achieve something other than inform the reader in the most succinct way. I was trying to illustrate with my examples "undue care" for the presentation of information. MisterSheik 00:12, 4 May 2007 (UTC)
- I didn't think you were attacking me, but I did think you were asking me to believe something far-fetched without giving reasons. If you're talking about wordiness, I think it often takes longer to express things more simply. Michael Hardy 00:21, 4 May 2007 (UTC)
[edit] Error in Standard Deviation section?
Hi, I think there's an slight error in the "Standard Deviation" section of this article. That is, the article says that the area underneath the curve from − nσ to nσ is:
However, if is defined as:
Then
Which is incorrect. However,
is correct. So, I think that the area underneath the curve in the article should be:
Here's the R code that shows this:
> erf <- function(x) 2 * pnorm(x / sqrt(2)) - 1
> erf(c(1,2,3)/(sqrt(2)))
[1] 0.3829249 0.6826895 0.8663856
> erf(c(1,2,3)*(sqrt(2)))
[1] 0.6826895 0.9544997 0.9973002
Thoughts? -- Joebeone (Talk) 18:19, 18 May 2007 (UTC)
- I was wrong. I had the wrong formula written down for the relationship between R's
pnorm()
and .
- Here's a quick justification... From the defintion of (See: Error function),
- Now, the normal distribution function (
pnorm()
in R) is
- so ( is the cumulative normal distribution function[13]):
- Now substitute
- so
- or
- Now, using the definition in the article for the area underneath the normal distribution from − nσ to nσ:
- we calculate
- using the following R code:
> erf <- function(x) 2 * pnorm(x * sqrt(2)) - 1
> erf(c(1,2,3)/(sqrt(2)))
[1] 0.6826895 0.9544997 0.9973002
[edit] photon counts
Photon counts do not have a Gaussian (normal) distribution. Photon generation is a random process that can be approximated with the Poisson distribution (counting statistics). —Preceding unsigned comment added by 129.128.54.121 (talk)
...and of course the Poisson distribution can be approximated by the normal distribution. Michael Hardy 02:14, 26 July 2007 (UTC)
...which means it isn't a good example of the normal distribution showing up in nature. MisterSheik 07:27, 26 July 2007 (UTC)
...Well, the normal distribution never shows up in nature, does it? But the law of large numbers implies that the normal distribution is a good approximation in many cases, including this one - at least assuming that the count is large. I suppose the argument gets complicated if you take into account dead-time in the counter and what not, but all the same, I think it's a fine example.--Niels Ø (noe) 08:58, 26 July 2007 (UTC)
There are physical effects that are the sum of many small errors, which are normally distributed, e.g., noise. These are better examples. MisterSheik 09:02, 26 July 2007 (UTC)
- The probability distribution function of a normal random variable is mathematically scary, and so it seems to be an advanced concept. However, there are easier and better ways to describe random variables. See cumulant. The derivative of the cumulant generating function, g '(t) , is a nice description of a random variable. The photon count is described by the poisson distribution for which g '(t) = μ·et = μ+μ·t+μ·t2/2+... I you truncate the series to just one term you find g '(t) ~ μ, which describes a constant. This approximation is appropriate for bright light where the quantum fluctuation of light intensity is neglected. Include one more term to get g '(t) ~ μ+μ·t. This describes a normal distribution having mean value = μ and variance = μ. This approximation is appropriate for dim light where the fluctuation of light intensity is important, but where the granularity of photons can be neglected. If photons are counted one by one, then these approximations are insufficient and the poisson distribution is used. So the normal distribution is the two-term approximation of any random variable with well defined variance. Bo Jacoby 09:05, 26 July 2007 (UTC).
-
- Cool. It would be good to expand the section titled "photon counting" so that this is clear. I'm not sure, but it seems that the normal distribution crops up here not because of the central limit theorem, but because, as you said "the normal distribution is the two-term approximation of any random variable with well defined variance." If that's a different reason, then it should be in a different paragraph, I think. Thanks for clarifying this by the way. MisterSheik 09:12, 26 July 2007 (UTC)
- Thank you sir. The use of cumulant generating functions is not as common as it deserves, probably for historical reasons. The central limit theorem is sophisticated when expressed in the language of probability distribution functions, but straight forward when expressed in terms of cumulant generating functions. In the article Multiset#Cumulant generating function the central limit theorem is derived based on cumulant generating functions. A finite multiset of real numbers is an important special case of a random variable, and it is much easier to understand than the general case, so I prefer to study finite multisets before I proceed to general random variables. The important concept of a constant, g '(t) ~ μ, is described in Degenerate distribution. It is a random variable even if it in neither random nor variable. Bo Jacoby 13:23, 26 July 2007 (UTC).
-
- I'm not sure I follow everything, but I'll give summarizing a shot: a) because of the central limit theorem, processes that are the sums of a lot of small errors are normally distributed, and b) because of the central limit theorem, processes that have well-defined variances are nearly normally distributed, which includes processes that are better-modeled by other distributions. My wording may be imprecise, but this is the gist, right? I think we should have two paragraphs. Noise is in the first paragraph, and photon counting in the second. MisterSheik 04:34, 27 July 2007 (UTC)
- a) Yes. b) No. The random variable of playing heads or tails is represented by the multiset {0,1}. It is the simplest case of the bernoulli distribution, with p=1/2. It has mean value 1/2 and standard deviation 1/2, and the variance, being the square of the standard deviation, is 1/4. The derivative of the cumulant generating function is g '(t) = 1/2+t/4+ terms of higher order. (Actually g '(t) = (e−t + 1)−1, see Cumulant#Cumulants of particular probability distributions). If you play it with hundredfold stake, the shape of the distribution function is unchanged and the derivative of the cumulant generating function becomes g '(t) = 50+2500·t+ terms of higher order. However, if you rather play the game a hundred times the distributions function becomes bell-shaped, and the derivative of the cumulant generating function becomes g '(t) = 50+25·t+ insignificant terms of higher order. Even if the distribution of heads-or-tails is not at all normal, it acts in the same way as a normal distribution when played many times, because only the low-order terms in the cumulant generating functions matter. Bo Jacoby 11:41, 27 July 2007 (UTC).
It does indeed follow from the cental limit theorem that the Poisson distribution is approximately normal when its expected value is large. Michael Hardy 16:33, 27 July 2007 (UTC). Yes, I agree. Bo Jacoby 10:12, 28 July 2007 (UTC).
Hi,
I think there is an error in the section Properties. It claims that if X, and Y are independent normal variables, then U=X+Y and V=X-Y are independent. However, though this holds for STANDARD normal X, Y, it does not hold generally.
Cov(X+Y,X-Y)=Var(X)-Var(Y)
and hence if Var(X) differs from Var(Y) then U and V are not independent.
Could you please correct it? Based on this incorrect information, I got inconsistent results in my computations and it took me half a day to find the source of the error. —Preceding unsigned comment added by 213.151.83.161 (talk) 08:18, August 26, 2007 (UTC)
[edit] Mandelbrot
There are critiques of the normal curve, not simply Stephen Jay Gould-type critiques (though they might be relevant to consider in terms of the social implications of uncertainty). In fact, mathematicians like Mandelbrot recognized flaws in the assumptions behind the normal curve; but provided no alternatives and believed despite its imperfections, the use of the bell curve could not be sacrificed. Can anyone intelligently comment further and provide discussion of these views on the page? --Kenneth M Burke 01:45, 8 September 2007 (UTC)
[edit] history error
According to O'Connor and Robertson (2004) De Moivre's 'The Doctrine of Chance' was published on 13 November 1733, not 1734 as the article says. The date 1733 is confirmed by Ross (2002, p209). Ross goes on to tell us that the curve is so common it was regarded as
"'normal' for a data set to follow this curve....Following the lead of the British Statistician Karl Pearson, people began refering to the curve simply as the normal curve." Ross (2002, p209).
Ross, S. (2002), An Introduction to Probability, 6th edition, prentice hall, new jersey.
O'Connor and Robertson(2004), Abraham de Moivre, University of St Andrews, Available: http://www-history.mcs.st-andrews.ac.uk/Biographies/De_Moivre.html —Preceding unsigned comment added by Ikenstein (talk • contribs) 02:01, 9 September 2007 (UTC)
[edit] Central Limit Theorem
A while ago I edited the first paragraph on the central limit theorem from:
The normal distribution has the very important property that under certain conditions, the distribution of a sum of a large number of independent variables is approximately normal. This is the central limit theorem.
to:
The normal distribution has the very important property that under certain conditions, the distribution of a sum of a large number of identically distributed independent variables is approximately normal. This is the central limit theorem.
I thought I was so clever. But recently I talked to a math grad student friend of mine and he said that it's not necessary that the independent variables be identically distributed so long as other conditions are met. (He didn't go into detail about what those other conditions were, and I must confess, I probably wouldn't have followed if he had.)
Now when I reread the paragraph, I think my addition of identically distributed is generalized by (and therefore made redundant by) under certain conditions, which are probably the very condition my friend was thinking of.
Thoughts? Expert opinions? —Preceding unsigned comment added by 143.115.159.53 (talk) 17:03, 13 September 2007 (UTC)
- I agree with your second take on it. "under certain conditions" is general and covers the iid case, as well as others. I recommend having just that and linking to the CLT article. I should add that even independence is not necessary, although deviations from that cannot be too large. Here is an issue: there are more than one "central limit theorem"s, although one could argue the iid case is the canonical one. Baccyak4H (Yak!) 17:17, 13 September 2007 (UTC)
There are lots of different versions of central limit theorems. The one most frequently stated assumes the random variables are i.i.d. and have finite variance. Some versions allow them not to be identically distributed, but instead make weaker assumptions. Some get by with weaker assumptions than independence. I think this article can content itself with stating the most usual one, mentioning briefly that there are others, and linking to the main CLT article, which can treat those other versions at greater length. Michael Hardy 20:01, 13 September 2007 (UTC)
I've edited it to read Under certain conditions (such as being independent and identically-distributed), the sum of a large number of random variables is approximately normally distributed — this is the central limit theorem.; I think it's more concise and clear. Thoughts? ⇌Elektron 19:11, 14 September 2007 (UTC)
[edit] IQ tests
The paragraph
- Sometimes, the difficulty and number of questions on an IQ test is selected in order to yield normal distributed results. Or else, the raw test scores are converted to IQ values by fitting them to the normal distribution. In either case, it is the deliberate result of test construction or score interpretation that leads to IQ scores being normally distributed for the majority of the population.
skillfully evades the question of whether IQ tests that yield normally distributed scores are always deliberately constructed to do so, or if a normal distribution of scores is to be expected for any reasonably broad test. The latter question was answered in the positive in the following paragraph:
- Historically, though, intelligence tests were designed without any concern for producing a normal distribution, and scores came out approximately normally distributed anyway. American educational psychologist Arthur Jensen claims that any test that contains "a large number of items," "a wide range of item difficulties," "a variety of content or forms," and "items that have a significant correlation with the sum of all other scores" will inevitably produce a normal distribution.
However, the latter paragraph was commented out. Is it incorrect? AxelBoldt (talk) 02:57, 27 December 2007 (UTC)
-
- I think the statement refers to the IQ tests rather that to the concept of IQ. Any test which is composed of a large number of independent subtests will approximately provide normally distributed results. Historically the IQ tests are important, however. Bo Jacoby (talk) 15:01, 27 December 2007 (UTC).
[edit] Carl Friedrich Gauß, not Gauss
Actually his surname is written Gauß (German sharp s) , not Gauss. I'll change that I'll leave that to you, but most articles also contain the spelling in the mother tongue) --Albedoshader (talk) 21:18, 30 April 2008 (UTC)
-
- When writing in English, it is usually written "ss" rather than "ß", and sometimes when writing in German it's done that way (especially in Switzerland). Michael Hardy (talk) 00:06, 1 May 2008 (UTC)
[edit] Very important topic
Lead to article is excellent, and the first few sections are readable, but topic is essential to a basic understanding of many fields of study and therefore a special effort should be made to improve the accessibility of the remaining sections. 69.140.159.215 (talk) 13:00, 12 January 2008 (UTC)
[edit] Rows of Pascal's triangle?
Hi, I'm only in grade 11, so go easy on my maths, but I couldn't help noticing the other day that if you take a row of Pascal's triangle, and use each of the numbers as the y-value for consecutive points, it looks distinctly like a normal distribution. e.g. for the 6th row (x,y) (0,1) (1,6) (2,15) (3,20) (4,15) (5,6) (6,1) I find that the 40th row is pretty clear. Any comments? Cheers —Preceding unsigned comment added by 58.168.190.63 (talk) 11:11, 9 May 2008 (UTC)
- You are correct. This is a well-known result when approached from a slightly different context. The values in Pascal's triangle are the binomial coefficients and the probability masses in a binomial distribution which has parameter p=0.5 are proportional to these. You should find more in the binomial distribution article about how the distribution behaves as the "size" parameter N increases. Melcombe (talk) 12:37, 9 May 2008 (UTC)
OK, thanks for that. —Preceding unsigned comment added by 58.168.190.63 (talk) 23:29, 10 May 2008 (UTC)
[edit] N(-x)
it's worth including that N(-x)=1-N(x). It's implied by 2N(x)-1=N(x)-N(-x) but it would be good to state it explicitly. 96.28.232.4 (talk) —Preceding comment was added at 16:12, 15 May 2008 (UTC)
- I presume that by N you mean what is often called Φ, the cumulative distribution function. Michael Hardy (talk) 17:08, 15 May 2008 (UTC)
[edit] An Easy Way to Help Make Article More Comprehensible
Correct me if i am wrong but an easy way to make it easier for people highchool through Ph.D level would be to leave as is but work though easy examples in the beginning. —Preceding unsigned comment added by 69.145.154.29 (talk) 23:29, 10 May 2008 (UTC)