Talk:Multiple comparisons

From Wikipedia, the free encyclopedia

[edit] Multiple comparisons or multiple testing ?

I have always heard of the problems explained in the article under the name multiple testing (cf also the Benjamini & Hochberg paper cited at end of the page), so I would be tempted to suggest to move the article to this name, but it may simply be a bias on my side. Any opinion ? Schutz 21:00, 21 September 2005 (UTC)

Multiple comparisons is what I've always heard; multiple testing to me sound like simultaneous testing of multiple null hypotheses, and that is not the same topic. So I oppose such a move. Michael Hardy 22:09, 15 October 2005 (UTC)
multiple testing is exactly what you call multiple comparisons, but I'd be interested to know of any reference that documents the meaning that you indicated above. As I wrote above, I have read mainly papers that use the terminology multiple testing, but this may be a bias among researchers specialised in a specific area. For example, almost all the literature on the statistical analysis of DNA microarray use this term (I just added a sentence on this on the microarray page). Given that noone else has answered my suggestion (thanks for jumping in !), I will not move the page, but I will indicate clearly that multiple testing is (also) what we are talking about here. Schutz 00:03, 16 October 2005 (UTC)
Ok, after rereading the article again, it seems to me that it is very confusing in its present state ! If one reads only the first sentence, multiple comparisons is indeed not the same as multiple testing: if, by multiple comparison, this article means basically "what you do after obtaining a significant ANOVA F-test", then indeed we are not talking about multiple testing in general (which covers more generally the problems of "using statistical tests repeatedly" as indicated in the intro). Do you agree ? In this case, most of the discussion could go in a more general article on multiple testing, that I will be happy to start. Unfortunately, all the definitions I have been able to find so far for multiple comparisons blur the distinction with multiple testing. Schutz 00:53, 16 October 2005 (UTC)
I have to agree with Michael Hardy that the term often used is multiple comparisons. I have used it myself, and had reviewers of my own research papers claim I need to do "multiple comparisons corrections." Also, google returns 177,000 hits for the search bonferroni+"multiple comparisons", and 48,000 for bonferroni+"multiple testing". Debivort 08:09, 3 January 2006 (UTC)
Sorry, I am lost here; please see the last comment I have written above. It seemed clear to me that this article was about the ANOVA F-test multiple comparison problem; indeed, most of the procedures linked from this page are specifically about "comparing sets of means", as was the lead section. Based on this, the article has been split between multiple comparisons (this page), and multiple testing (the general problem). The (good) changes you have made are about the general problem. If the consensus is that multiple comparisons is the general problem, then the two pages should be merged — and this article should be cleaned up. But I must say that I like the split approach, and it seems logical: with the ANOVA, you are really comparing a set of means, while testing really refers to the application of multiple statistical tests, whatever they are. The google searches do not tell us if the two terms have the exact same meaning (some of the links for multiple comparisons point towards the ANOVA question only; some talk about the more general problem). For the record, even though it is not relevant to this particular discussion, I have mostly seen the term multiple testing for the general problem, including in reviews of research papers. Hey, the only paper in the bibliography of the article that mention anything says multiple testing, and it is about the general problem ;-). Schutz 15:30, 3 January 2006 (UTC)
Mathworld says that Bonferroni corrections address the multiple comparisons problem. They alas do not have an entry on "multiple testing". It seems like the article text (parts not by me) and all of the statistical tests linked below that I am familiar with address "multiple comparisons" as the problem is conceived by me and Mathworld. Is your conception of multiple comparisons (i.e. the ANOVA f-test) a specific example of multiple testing/my conception of multiple comparisons? I wonder if we aren't just running into a linguistic rather than content-based hurdle here. Debivort 16:54, 3 January 2006 (UTC)
Basically, I first thought it was only a linguistic question when I started this discussion a few months ago. It is only based on the comments above (it was mentioned in particular that multiple testing and multiple comparisons were not the same thing), and the content of the article that I assumed that multiple comparisons (i.e. the ANOVA f-test) was a specific example of multiple testing — while it was not my conception, I was ok with the distinction and spinned-off the multiple testing article, which no one objected about. This is why I am a bit puzzled about the going back. I wonder if there may be a systematic difference in vocabulary between statisticians working in different fields; the statistical papers I have seen so far were all about multiple testing (starting with Benjamini-Hochberg, as mentioned above). This is probably why I easily believed that multiple comparisons was the special case, but it may be only a bias. In any case, if the consensus is that multiple testing==multiple comparison (hopefully other people will say something), then the first priority would be to merge the other article, instead of rewriting it (although it may be too late). As a side note, at least some of the linked articles are indeed specially related to ANOVA. Schutz 17:35, 3 January 2006 (UTC)
Yeah, it does seem like we need to rope in some other comments. I'll ask around if anyone has the time to comment on it. Maybe you can do the same? Debivort 05:20, 4 January 2006 (UTC)
Try asking at Wikipedia talk:WikiProject Mathematics. linas 15:08, 5 January 2006 (UTC)

Disclaimer: I don't know if I am biased by may concrete problem, as I am not statistician, neither I am english native speaker, but I'll try to help. According to dictionary:

Testing n. 1. The act of testing or proving; trial; proof. [1913 Webster]

Comparison n. 1. The act of comparing; an examination of two or more objects with the view of discovering the resemblances or differences; relative estimate. [1913 Webster]

With these definitions, I think that making _multiple test_ is repeating a test some times. An example, if we want to test if A is better than B (or equal, or whatever). After that we got C and we want to test A vs C, and B vs C. Then comes D and I want A-D, B-D, C-D... If we do that, with a t-test or wilcoxon, it is more likely having false positives (the coin example in the article). In this way, we would be accepting a false hipotesis for example saying that A has the same mean that D. For this reason, we have tests designed to avoid this: ANOVA (parametric), Friedman (nonparametric), others??...

After performing ANOVA or Friedman, we only know that for example H0: A = B = C = D is not true. Then we would probably want to know which one is different from the others. For this purpose, we can apply one of the techniques that allow us to _compare_ every pair: Tukey test, Nemenyi, Bonferroni...

The previous could clearly split article in two, but probably I have left other ideas, like those about techniques to repeat a test in order to increase power that I do not know of. I think we should clearify which contents do we want here before deciding about one or two articles. Arauzo 18:59, 20 April 2006 (UTC)

Revising some bibliografy, in (Zar, 1999) these are chapters 10 an 11:
  • Multiple Hypotheses: the analysis of variance. This chapter introduces the problem of repeating the same test to over different samples to confirm various hypothesis over them (coin example). Then explains ANOVA and their non-paramentric extensions like Kruskal Walls and points to chapter 14 for other techniques with more than one factor ex. Friedman.
  • Multiple comparisons. This chapter explains how the comparisons among pairs of the samples tested in an ANOVA test should be done and different test for comparisons like Tukey.
In the start of chapter 11: 'The term "multpliple comparisons" was introduced by D. B. Duncan in 1951', according to (David 1995).
H. A. David First (?) occurrence of common terms in mathematical statistics. Amer. Statist. 49: 121-133, 1995.
Jerrol H. Zar, Biostatistica Analysis, 4th ed. Prentice-Hall 1999, ISBN 013081542X
Arauzo 11:35, 23 April 2006 (UTC)

Strong Support. I'm late to this discussion, but I've never heard of multiple testing, until just now. I use multiple comparisons as a term all the time (especially Bonferroni and friends). Could it be a UK/US thing, or a case where SPSS has dictated the vocabulary to the world? Otherwise, I think the time for merger may be here. I'll plan to do it in a couple of days, if I don't here from anyone else. -Scott Alberts 03:59, 6 September 2006 (UTC)

[edit] Lead section

I think that the lead section should be a little more accessable. The big picture in plain language. There is plenty of room for the subtulties of the concept further down. ike9898 01:59, 8 October 2005 (UTC)

I agree. I don't know what the sentence "The experimentwise α level increases exponentially as the number of comparisons increases." means. What is an α level, or where do I go to look it up? Not really a field I know that much about, so I look forward to a more clear article. -- Jake 07:13, 15 October 2005 (UTC)
I agree as well, and will take a crack at an edit with a more accessible intro, taking into account the current trent in multiple comparisons v testing (above). Debivort 08:10, 3 January 2006 (UTC)

[edit] Tukey's Studentized Range Test/Distribution

There is a nice summary of this by NIST at [1] which I believe is in the public domain, as NIST is a US government agency. In fact I made a template for this: NIST-PD. Btyner 18:57, 15 May 2006 (UTC)