Talk:True variance
From Wikipedia, the free encyclopedia
Hi Michael, please do not deface my site with ugly stop signs without stating your case. Say what your objections are and let's work together to make this entry I just finished writing better. Cruise 02:35, 24 August 2005 (UTC)
Hi Mark and Michael, The "true variance" does not mean that your variance entry is not right.
-
- That is true. Michael Hardy 22:15, 3 September 2005 (UTC)
It is just a terminus technicus.
-
- No, that's wrong. There are good reasons for this usage, but User:Cruise seems to want to keep them out of the article. Michael Hardy 22:15, 3 September 2005 (UTC)
Your excellent entry treats variance from the viewpoint of the estimation theory, this entry describes variance from the viewpoint of data analysis.
-
- I don't think so. Michael Hardy 22:15, 3 September 2005 (UTC)
The resulting values of variance are the same, only the perspectives differ. Please do not start deletion wars. I have better things to do with my life than to revert deleted entries. Take care, Cruise 05:12, 24 August 2005 (UTC)
As I see it, there are several problems. First, it's highly unusual to have two distinct articles about the same concept. Second, I have no problem with "true variance" as a technical term. However, I do have a problem with the way it's used here. There are three distinct concepts:
- The underlying variance of the population one is sampling from. This is what I'd call the "true variance", and it's generally unknown and has to be estimated. Also note that it can generally only be defined in terms of moments: if the underlying population is assumed to be Gamma-distributed, say, then the variance can be derivedusing calculus, but not by finite summation. The population variance, which is not guaranteed to always exist, is usually denoted by σ2.
- The variance of a finite sample viewed as a finite population. (Obviously, this is a confusing and pedagogically tricky issue.) Simply put, take a finite sample, and forget about the fact that it is a sample that is (hopefully) representative of something bigger. Just look at it as a finite collection of data, i.e., a finite population. That population always has a variance (except in trivial cases), and that variance is also often denoted by σ2, and sometimes called "sample variance". You seem to call this "true variance", but it is only identical to the first concept for finite populations and exhaustive sampling.
- The unbiased estimator s2 of the true population variance. This is also sometimes called the "sample variance". You call it "unbiased variance", but that's at best a shorthand for "unbiased variance estimate" (it's the esitmate that's unbiased, not the variance).
The main substantive problem I'm having with the current article is that the term "true variance" used here seems to conflate the first two concepts.
The main stylistic problem is that the article reads like a chapter from a textbook. There is nothing wrong with that per se, but a better place would be WikiBooks. For an encyclopedia article, we do want some of the concrete examples found here, but the article should also be comprehensive and be useful to readers with different backgrounds. I personally think that Wikipedia's math articles are often excellent when they find that middle ground that's neither covered by educational sites (too verbose for a reference work) nor by MathWorld (too many trivia, not enough text, no mention of applications; how's that for a ten word review?).
There are a number of formatting problems. It's very easy to use math formulas and tables on Wikipedia. No need to use images.
Finally, your work that went into this article is certainly appreciated. But keep in mind that anything and everything can and will be edited mercilessly. It's not possible to look at an article in isolation. Rather, we have to consider what's already there, how similar topics are covered here and elsewhere, etc. In this case, I don't see any reason for keeping this article alongside the article on variance. The only solution I can think of is to merge the two articles and redirect this one to variance. That way we'll have one comprehensive article which will be better overall than the two current individual articles.
Cheers, --MarkSweep 07:01, 24 August 2005 (UTC)
Hi Mark, I read your comments and the modifications you made to the variance article, especially the integration of the Press et al. comments with great interest and think that you made sincere effort to integrate both the 'variance' and 'true variance' entries. I also agree with you that the 'true variance' reads like a chapter from a textbook and has a number of formatting problems. This could be remedied, by editing and taking out the rather childlish Allan, Beth ... example, but before undertaking this rather boring work, let's think about the real issues you talk about. Variance is the key concept of statistics and problems you mention reflect two schools of thought. Variance is the microcosm of these conceptual differences which diverge as you progress into differences that are hard to reconcile. This is the reason why some people try to start a new discipline of 'visual statistics' as different from the 'statistics.'
I wrote the 'true variance' as a footnote to the matrix subtraction article, since the firm understanding of this concept helps to understand the next entry I was going to write on the implicational scales. Within this context, the 'true variance' entry is a trial sonde whether the Wikipedia community will be able to tollerate divergent perspectives on these issues or whether these issues will have to be voiced elsewhere. Best Wishes, David Cruise 15:10, 24 August 2005 (UTC)
Hi Mark,
I wish we would be able to incorporate the 'true variance' into the 'variance,' however the differences in notation and conceptualization are difficult to overcome. I thought about making a Faustian bargain with you. I replaced pictures with formulae, removed the section on the degrees of freedom, and cut the narrative to the bone. As I need this entry for the matrix subtraction article, I promise that I'll not add any references to any other variance entry to keep the true variance entry a footnote, as it was meant to be.
If you would like to, you can add the picture about the Monte Carlo simulation to the variance, degrees of freedom, or some other entry, as I think it is rather cool. Looking forward to working with you and Michael on making the Wikipedia diverse and exciting. Best wishes, David Cruise 19:00, 24 August 2005 (UTC)
- To some extent I hope my case is made by my recent edits. There's more to do; I'll be back. Michael Hardy 19:27, 24 August 2005 (UTC)
Hi Michael,
Your recent addition is an excellent elucidations of concepts we are talking about here. Thanx. Best wishes, David Cruise 19:56, 24 August 2005 (UTC)
Mark, talking about the Monte Carlo study, it was based on 100,000 repeated samples. In the old days of the original IBM PC I had to run it over several consecutive nights. Take care, David Cruise 19:56, 24 August 2005 (UTC)
Contents |
[edit] Deletion?
Maybe this article should be improved by deleting it.
What the hell is going on here? The "true variance" differs from the "unbiased variance" only in the denominator???? Someone who got an "F" in a statistics course would write that; it's hard to imagine anyone else doing so.
Here is the truth: the expression with n − 1 in the denominator that is a conventional unbiased estimate of a population variance is based on an i.i.d. random sample. The "true" variance, more properly called the "population variance" is based on the whole population. It is absurd to simply change the denominator in the population variance and call it "unbiased variance". Michael Hardy 22:26, 3 September 2005 (UTC)
Mr. Hardy,
Persons who use in the course of professional discussion words like hell and crap are not worth of my reply.
Sincerely Yours,
David Cruise 17:40, 4 September 2005 (UTC)
Population and sample quantaties become confused in this article. I honestly think that Wikipedia would benefit if the article were deleted. (PJP)
[edit] Factual accuracy dispute
The current version of the article is simply incorrect and confusing. In general the concept of "true variance" cannot be defined in terms of finite sums (that would only work for finite populations). For example, if you're sampling from a population that follows a Cauchy distribution, you can sample till you're blue in the face, and you'd still be wrong: any finite sample will necessarily have a sample variance, and based on that you can try to estimate the population variance using a biased or unbiased estimator. But whatever you do, you will be wrong, because the true variance of the population is undefined. Hence it is necessary to make a strict distinction between (a) the underlying population, which may not have a well-defined variance, skewness, etc.; (b) the mean, variance, etc. of a finite sample from that population; and (c) various estimators of the population moments, which may or may not involve sample moments. I've tried to say as much in my comments above. There should be at least some pointer to a discussion of these issues, and the article should make explicit any assumptions it makes for purposes of exposition. For example, if this article is only about finite populations, it should say so and point to a more general discussion. --MarkSweep✍ 03:15, 4 September 2005 (UTC)
[edit] Different points of view
Hi Mark, Your article on variance describes this concept from the standpoint of the sampling theory. This article describes the concept of variance from the standpoint of data analysis. There is nothing wrong with either your approach, or with this approach. They just reflect different points of view. David Cruise 05:04, 4 September 2005 (UTC)
[edit] Working together
Hi Michael and Mark,
Last night I was thinking about the core nature of this controversy and this morning I subscribed all equations in the article and made explicit that those definitions are not the abstract definitions of true or unbiased variances, but the computational definitions of true variance for the variable X, unbiased variance for the variable .... Michael, I also noticed your edits for the matrix subtraction entry, and I thank you for your help.
As you can see from this discussion page, it appears that it is only the three of us who care about these issues anyway, I propose to burry the hatchet, link each other's works, and let the reader decide if he/she needs more information on the theoretical or computational facet of statistics, the discipline we love and care about.
Yours,
David Cruise 17:56, 5 September 2005 (UTC)
-
- Hello. I'll return to this article before the end of the week, but probably not today ....... Michael Hardy 20:57, 6 September 2005 (UTC)
Hi Michael,
In the spirit of working together I restored your entry on the Conventional language of computation. If possible, please keep the style uniform. E.g., since the formulae are not numbered throughout, the (1) at the end of the Equation (1) looks somewhat odd. I know that many people use x bar to signify the arithmetic mean, while some prefer the symbol M. I use the M as I am a devotee of the principle of parsimony in math notation. The x bar consists out of two elements while M is singular, as is R for multiple correlation, etc. Also, please gather your comments toward the end of the article, since the Conventional language of computation. provides a link to other articles on this topic. I am a strong supporter of the dialectic method which mandates the thesis - antithesis - synthesis order of discussion.
Best Wishes,
David Cruise 19:37, 8 September 2005 (UTC)
[edit] About infinity and eternity
Hi Mark,
Thank you for your comments; they are constructive, incisive and helpful. The key point you make is that if this article is only about finite populations, it should say so. Today I elaborated on this point in the preamble to the article and it reads much better. David Cruise 12:32, 9 September 2005 (UTC)
[edit] Lag Time
As the lag time is up, I deleted the disputed tab. Cruise 19:24, 18 September 2005 (UTC)
[edit] Copyright violation
Parts of this article (see http://www.visualstatistics.net) violate copyright of Cruise Scientific and were removed. For further information contact info@visualstatistics.net. —Preceding unsigned comment added by 71.211.66.24 (talk)