Wikipedia talk:WikiProject Statistics

From Wikipedia, the free encyclopedia

1 Things on boundary of scope
2 can I just add wpstatistics?
3 Data matrix
4 Outlier
5 "Temporal mean"
6 Guttman scale
7 Proportionality principle
8 a posteriori probability and Empirical probability
9 Covariate
10 Additive smoothing
11 Eigenpoll
12 "Exact test"
13 Category:Probability and statistics, Category:Probability, Category:Statistics
14 SkewLogistic
15 "Statistical law"
16 merge binomial test and sign test ?

[edit] Things on boundary of scope

We probably need to have a policy about what to do about articles having a statistical backgroud/relevance but which set in a different context. For example, I came across Evidence under Bayes theorem which seems dedicated to a legal context. It is not (yet) listed in the list of statistical topics, and perhaps it shouldn't be? No doubt there are others that contain applications of statistical ideas but are not strictly about statistics. But perhaps these would be more distantly related and so more obvious. Should there be a "list of non-statistical topics related to statistics"?

Melcombe (talk) 17:08, 2 April 2008 (UTC)

I would consider "statistical thinking" and conceptual topics to be relevant. Best, --Shirahadasha (talk) 20:10, 3 April 2008 (UTC)

Another example might be statistical multiplexing. I tried adding it to Category:Statistics but this was reverted. Btyner (talk) 14:01, 26 May 2008 (UTC)

I have added it to Category:Queueing theory which does contain some telecoms stuff and which is under Category:Statistics. Melcombe (talk) 09:55, 30 May 2008 (UTC)

[edit] can I just add wpstatistics?

Can I just add {{WPStatistics}} as I did to Bias_of_an_estimator or do I have to add something somewhere else too? Sorry, I'm new to this whole project thing. Pdbailey (talk) 23:43, 5 April 2008 (UTC)

I don't think there's anything else that needs to be done. Michael Hardy (talk) 05:10, 6 April 2008 (UTC)

Melcombe (talk) 08:55, 14 April 2008 (UTC)

[edit] Data matrix

Data matrix (lower-case m) now redirects to Data Matrix (capital M). The latter is about a topic in computer science. Several statistics articles link to the former and get inappropriately redirected. Some disambiguation work is needed. Michael Hardy (talk) 18:58, 10 April 2008 (UTC)

Changed Data matrix into a disambig page for Matrix (mathematics), Data matrix (statistics), and Data matrix (computer). For now, made Data matrix (statistics) a redirect to Matrix (mathematics) but this approach permits it to be built as a separate article when someone is ready. Best, --Shirahadasha (talk) 21:26, 10 April 2008 (UTC)

I've bypassed the disambig page for the 5 links to data matrix from article space, of which three now point to data matrix (statistics), namely Biplot, Origin of birds and Cluster analysis. Qwfp (talk) 07:35, 11 April 2008 (UTC)

For now, I've hidden the Data matrix (statistics) entry at Data matrix since it is just a redirect to Matrix (mathematics) as noted by Shirahadasha above. I also added Data matrix (statistics) to Category:Redirects with possibilities. Btyner (talk) 14:23, 26 May 2008 (UTC)

Those thinking of these pages might want to consider also the article Dataset, which seems close to implying that a dataset is a single data matrix. Melcombe (talk) 09:17, 11 April 2008 (UTC)

[edit] Outlier

Can someone who knows how these things are best done sort out the recent overwriting of article Outlier in some acceptable way? Melcombe (talk) 08:55, 14 April 2008 (UTC)

I reverted it, but the article itself could do with a few changes. For one thing, defining an outlier in terms of standard deviations is poor form -3mta3 (talk) 09:15, 14 April 2008 (UTC)

[edit] "Temporal mean"

What should we do with the stub article titled temporal mean? Michael Hardy (talk) 16:00, 19 April 2008 (UTC)

Let's see, what are the options? Transwiki to wiktionary? Merge to mean? Both?? Qwfp (talk) 18:57, 19 April 2008 (UTC)

...or expand into a substantial article? Michael Hardy (talk) 23:10, 19 April 2008 (UTC)

Is there much substantially more to say than (essentially) "temporal mean means mean over time"? I don't know myself as time series and related topics are not something i've ever really studied. Qwfp (talk) 10:58, 20 April 2008 (UTC)

Well, there may be something more to say using the context of space-time modelling and data, so that a temporal mean would often be spatially varying. Also, for "ordinary" time-series, there might be something relevant to say about reducing datasets of say daily data to monthly, using monthly means etc., so as to create time-series of temporal means. However, I have not found a relevant reference in which the phrase is used, although I did find "temporal autocorrelation". Melcombe (talk) 08:49, 21 April 2008 (UTC)

Based on the comment by Melcombe, I think the better option would be to move it into an article on temporal statistics or high frequency statistics. A brief search turned up nothing, if others find nothing, I say delete it and the conept must wait until the other article is written. Pdbailey (talk) 15:38, 21 April 2008 (UTC)

What about a redirect to Moving average? --Lambiam 23:14, 28 April 2008 (UTC)

I think a deletion would be best at present, as there are many possible somewhat distinct meanings and any possible redirect is likely to be off-target. The article seems not to have any substantive articles linking to it (?) ... one guess is that it originated in a list of topics found on other general maths/stats websites. Melcombe (talk) 08:53, 29 April 2008 (UTC)

Given the present content of Temporal mean, the redirect is on the dot. Should different and notable meanings of the term "temporal mean" emerge later, we can always change this then into, for example, a disambiguation page. --Lambiam 16:03, 30 April 2008 (UTC)

[edit] Guttman scale

The article titled Guttman scale is a profoundly terrible mess. One is led by various clues to suspect (and the fact that one can only suspect is part of what's so bad about the article in its present form) that this has something to do with statistics. Please see talk:Guttman scale. Michael Hardy (talk) 17:20, 22 April 2008 (UTC)

Note that there was some text in article Homogeneity (statistics) (now hidden) that implied that the Guttman scale was associated with this (and there is still presently a link). This article was also in a mess, but for info it was/is in category Pschometrics but not Statistics, while Guttman scale is in both as well as Market Research. Melcombe (talk) 17:35, 22 April 2008 (UTC)

See also Scale (social sciences)#Comparative scaling techniques which seems uninformative, but a google does find some stuff that seems understandable. Melcombe (talk) 17:58, 22 April 2008 (UTC)

I've added a lede. Please review and improve. --Lambiam 08:47, 29 April 2008 (UTC)

[edit] Proportionality principle

Does the "proportionality principle" as described at [1] have a more well known name? I'm thinking about adding a section to Monty Hall problem with this analysis, but I'm a little hesitant without a better reference backing up the basic principle. -- Rick Block (talk) 16:09, 26 April 2008 (UTC)

It's a special case of the likelihood principle. I'm not sure if there's any standard name for this special case. Michael Hardy (talk) 16:20, 26 April 2008 (UTC)

It's not really a special case of the likelihood principle, which is more concerned with inference. The ref given indicares that it is really Bayes' Theorem presented in a way that allows the avoidance of some mathematical expressions. Melcombe (talk) 08:50, 28 April 2008 (UTC)

Except that Bayes theorem is used in inference. The likelihood principle says identical inferences should be drawn from proportional likelihood functions; this is the case in which the inferences are the posterior probabilities. So it's a special case of the likelihood principle. Michael Hardy (talk) 15:08, 28 April 2008 (UTC)

[edit] a posteriori probability and Empirical probability

The article a posteriori probability is essentially a disambig which leads to both Bayesian stuff and to Empirical probability. Empirical probability is brief and seems to imply that a posteriori probability is covered by what is meant by Empirical probability without saying much else. This seems doubtful to me. Any thoughts on this? There seems to have been an attempt in the past to convert the article a posteriori probability which was then simply a redirect to Empirical probability into a redirect to posterior probability, but this was then changed to point both ways. Melcombe (talk) 10:38, 29 April 2008 (UTC)

The term is used on these slides in the slogan "Hypothesis testing compares a posteriori probability with a priori probability" – which seems based (in my opinion) on a misunderstanding. Hypothesis testing does compare an posterior probability P, but not with a prior probability, but with a priorly selected confidence level. Here P is the posterior probability under the null hypothesis of an outcome deviating (one-sided or two-sided) at least as much from the null-hypothesis norm as the experimentally observed outcome. On the slides the term "a posteriori probability" is indeed construed as being the experimentally observed relative frequency. I haven't examined if this misuse of the term is sufficiently widespread to warrant inclusion of this mistaken meaning in Wikipedia. --Lambiam 18:19, 30 April 2008 (UTC)

I suggest that Empirical probability should be sent to AfD. One of its two references is at answers.com! I challenge anyone to find a widely-used textbook of probability or statistics that has the phrase 'empirical probability' as a term in the index. The current article makes empirical probability simply a relative frequency. I think we can use the term 'relative frequency' for that. EdJohnston (talk) 19:10, 30 April 2008 (UTC)

I did find "empirical probability" in my dictionary of mathematics (Unwin) and it did define it as a posterior probability ... but without saying anything about a prior probability, so it may well be wrong. As for your challenge, I found "empirical probability" in the index of Mood & Graybill's Intro to the Theory of Statistics (2nd Edition)(1963), but the term doesn't seem to be in the text ... it uses "relative frequency" (only) in a section headed "A Posteriori or Frequency Probability". Melcombe (talk) 13:33, 14 May 2008 (UTC)

Maybe it should be redirected to empirical distribution function. Michael Hardy (talk) 20:09, 30 April 2008 (UTC)

I think Empirical probability can usefully be revised to fill the context where, if there is a continous rv X being observed, there is the choice between (i) estimating Pr(X>x) by counting such events in the observed data set and (ii) fitting a parametric distribution function F and esimating Pr(X>x) as 1-F(x). But if no-one sees an equivalence between a posteriori probability and Empirical probability, then perhaps the simplest would be to redirect the former to posterior probability with a little rephrasing of the latter. Melcombe (talk) 09:42, 1 May 2008 (UTC)

Given the above finding in Mood&Graybill, I have now left "a posteriori probability" to point to both places. I have revised "empirical probability" mainly by adding in some statistical context and to indicate alternatives to estimation using empirical probabilities. In that article I have said that the use of the term "a posteriori probability" is not directly related to Bayesian inference (simply "after the event"?). If someone wants to put in exactly how the empirical probability estimate can be obtained as a Bayesian estimate, they might well do so. Additionally, I note that where the article apparently links to "relative frequency" it actually goes to frequency (statistics). Melcombe (talk) 13:34, 14 May 2008 (UTC)

[edit] Covariate

Does covariate need some work? Michael Hardy (talk) 17:53, 30 April 2008 (UTC)

All I see are the possibilities: (i) include other near-equivalent words such as "explanatory variable" for regression and exogenous and endogenous variables for econometrics; (ii) and example application where the term can reasonably be used. Melcombe (talk) 09:47, 1 May 2008 (UTC)

I have modified the article and it may now be clearer. I did not add exogenous and endogenous variables, as these are subtly different ideas. As usual, more might be done. Melcombe (talk) 14:02, 14 May 2008 (UTC)

[edit] Additive smoothing

The nearly orphaned article titled Additive smoothing could probably use some work. Michael Hardy (talk) 01:11, 16 May 2008 (UTC)

[edit] Eigenpoll

Eigenpoll is also deficient. Michael Hardy (talk) 01:38, 16 May 2008 (UTC)

[edit] "Exact test"

At talk:exact test I've asked if someone can fill in certain items of information in the article that I could not. Further comments on that page are welcome. (Or on this page.) Michael Hardy (talk) 20:13, 16 May 2008 (UTC)

[edit] Category:Probability and statistics, Category:Probability, Category:Statistics

I'm sure this has been debated before, but what use does Category:Probability and statistics serve? Certainly there are articles that belong in both categories, but is the intersection of these categories really a useful category itself? Note that Probability and statistics, the "main article" for Category:Probability and statistics, is essentially a disambiguation page. Btyner (talk) 14:07, 26 May 2008 (UTC)

Category:Probability and Category:Statistics are both subcategories of Category:Probability and statistics , together with Category:Randomness. At present there are many articles listed directly under Category:Probability and statistics that might be better removed/moved to other categories. Are there any obvious other categories that should reasonably be added as subcategories to Category:Probability and statistics rather than just being subcategories of either Category:Probability and Category:Statistics ? How is operations research dealt with? Melcombe (talk) 11:25, 28 May 2008 (UTC)

I have added this task, and revision of some articles mentioned above to the "Todo" lists in the project page. Melcombe (talk) 10:53, 29 May 2008 (UTC)

[edit] SkewLogistic

Can anyone help with the "SkewLogistic" distribution? It is used in the "Related distributions" sections of the Chi-square distribution, Gamma distribution and Exponential distribution articles, but doesn't have its own article and doesn't appear anywhere else. It seems it need to be some type of Gumbel or extreme value distribution to fulfill what is in the articles where it appears. Melcombe (talk) 15:55, 29 May 2008 (UTC)

I was wrong about the extreme value distribution bit, but there are still problems. It seems that the "SkewLogistic" distribution here needs to a generalized logistic distribution of Type I according to Johnson,Kotz&Balakrishnan terminology, whereas the "literature" (ie. google) comes up with a very different distribution for "skew-logistic". Melcombe (talk) 15:36, 30 May 2008 (UTC)

[edit] "Statistical law"

What are we to make of the stub article titled Statistical law? As it stands, I'm not sure there's any precisely defined concept here. Michael Hardy (talk) 23:36, 6 June 2008 (UTC)

We do not have articles titled Mathematical law, Geometrical law, Topological law, etcetera, nor should we, for the simple reason that these are not established concepts. I likewise see no raison d'être for this article – which at best would be a dictionary definition. --Lambiam 03:48, 7 June 2008 (UTC)

Should we also get rid of Category:Statistical laws? There seem to be a variety of ways of speaking that people have used in the past. I suppose we don't have to take notice of all of them. But is Zipf's law not a law? Is it not statistical? That article was put in the category Statistical laws in August, 2006 but our article Statistical law was only created this week. I agree that the current text of the article Statistical law doesn't seem right. EdJohnston (talk) 04:40, 7 June 2008 (UTC)

One role for the article Statistical law would be as a target link from the article Scientific law to act as another marker that not all scientific laws concern physics . There may be a need to make distinctions between probability-theory-based laws and statistical- observation-based laws: note that there are some "laws" under Category:Statistical theorems.

A specific suggestion is to place Category:Statistical laws not only directly under Category:Statistics but also under Category:Statistical theory, so that there would be the following sub-categories of this: Estimation theory; Hypothesis testing; Statistical inequalities; Probability interpretations; Statistical approximations; Statistical theorems. Thus "inequalities", "approximations", "theorems" and "laws" would form a natural grouping of categories.

Another suggestion is to make article Statistical law (renamed) a lead article for Category:Statistical laws, with a content saying something about the types of things "statistical laws" are, which might be something like... "types of empirical behaviour commonly observed across many different collections of data". Perhaps the article could then have a brief introduction, from an empirical point of view to things like the central limit theorem (to avoid having to place the theory under "laws"). And perhaps some of the articles under Category:Statistical laws could be moved to other subcategories.

Melcombe (talk) 11:55, 11 June 2008 (UTC)

As there are no articles of substance which link to it, I would say delete it. If it is to be kept, I agree that it should be renamed to something like emperical statistical laws to distinguish it from probability theorems like the law of large numbers. -3mta3 (talk) 11:21, 12 June 2008 (UTC)

It may be difficult to distinguish ... many things now backed-up by theorems may have started off as empirical observances. Melcombe (talk) 10:08, 13 June 2008 (UTC)

[edit] merge binomial test and sign test ?

Aren't these really the same thing? I proposed this merge in Nov. 2006, but forgot about it and the tags were removed in Oct. 2007 without any discussion for or against. Any comments from the crowd here? Btyner (talk) 23:16, 11 June 2008 (UTC)

They are the same thing in a mathematical sense eventually, but only after going through a layer or two of reduction from different contexts, and it is these different contexts that make it reasonable to have separate articles. I suggest putting a link to binomial test into sign test and expanding the latter to include either/both more discussion about nonparametric tests of shifts of location (which this isn't quite of course) and/or links to other such tests. If the article ever got particularly detailed, there could be discussion of the power of the test against shift-alternatives, which wouldn't really fit immediately into a more general article on binomial test. Melcombe (talk) 09:00, 12 June 2008 (UTC)