Talk:Resampling (statistics)

From Wikipedia, the free encyclopedia

This article is within the scope of WikiProject Statistics, which collaborates to improve Wikipedia's coverage of statistics. If you would like to participate, please visit the project page.

I originally wrote the permutation test article. I understand that in the Wikipedia world that doesn't mean much, but it had gotten so convoluted with parenthetical phrases and qualifications and such that it was virtually impenetrable. So I edited it. I appreciate the helpful additions as well as putting the article into the general rubric of resampling which makes sense. - Respectfully, WJF.

I made changes to Bootstrap, Jackknife and wrote a much longer text on Permutation test and moved the reference list after Bootstrap to the reference list at the end of the chapter. My writing is based on my experience as an applied statistician and a developer of statistical software with emphasis on resampling techniques, except for the text about Jackknife, which borrows heavily from Mooney & Duval (see list of references). I have tried keep as much as possible of the original text, but in some cases where it clashed with my own writing it was removed. For example the sentence
"permutation tests usually involve calculation of test statistics from and permutation of the observed data, as opposed to other non-parametric tests which may involve analysis of the ranks of data points"
have been removed because it may confuse the readers, as rank tests ain't 'other non-parametric tests'. Rank tests, for example Mann-Whitney U and the Spearman rank correlation test, are permutation tests. - Respectfully, VS.


I made some changes to the Permutation test section to correct some vagueness and misleading comments. I am not an expert in this area, but the current section does not appear to present a balanced perspective on parametric vs. permutation tests. In addition, all sections would benefit mightily from simple examples of each technique. - Ken K 21:15, 1 March 2006 (UTC)


I changed the "approximation" section title to Monte Carlo Testing" and some of the language therein. Monte Carlo testing is not an approximation, but an exact test (meaning that the true alpha = nominal alpha) and is asymptotically equivalent to the test performed by enumerating all of the possible arrangements.Ken K 19:45, 30 March 2006 (UTC)

Wikipedia wrote "An important consequence of the exchangeability assumption is that tests of difference in location (like a permutation t-test) require equal variance" I'm wondering... requires equal variance to infer what? Do you mean to draw an inference about the population from which the samples are drawn? Okay, maybe so. But there is a radically different way of thinking about permutation tests - as not only distribution free but POPULATION FREE. If the inference is limited to the sample at hand (or to put it a different way, if the entire population is being measured) then I don't see how equal variance is necessary. Why do we need statistical inference if we have the whole population? Because we need to know whether the difference between groups is plausibly attributable to chance (random assignment or simply chance factors).

Answer to the previous post: A test of group difference is not 'POPULATION FREE'. It is a test if the observed data belong to one population or two different populations. This is regardless of if the test is parametric or non-parametric, and also the requirement of exchangeability is independent of if we regard the observed sample as a random sample from a larger population or as the population in it self. For a comprehensive explanation of this, read the article by Welch. But it also is easy to understand this requirement if we think about a concrete example about testing that two groups have the same mean. Assume that we have two samples (or two complete populations) with different variance, and we randomly draw one observation from the combined sample, and that observation happens to have a value in the tail of the combined distribution. A permutation test is a conditional test, and this means that the marginal distribution of the combined sample is fixed, so if we observe an extreme value and (for example) know that the first group have larger variance than the second group, the probablity of that observation to belong to the first group is larger than the probability of belonging to the second group if the null hypothesis is true. This invalidates the basic assumption of a permutation test that all permutations of the observed sample have equal probability when H0 is true. Permutation in this situation is equivalent to the allocation of an observation to the first or second group. This means that if the two groups have very different variance, the significane from the permutation test of group difference in mean may be completely misleading. V.S. 28 July 2006

the external link at the bottom of the page (to some random verizon user's page) is (i) broken and (ii) an advertisement (to a "Statistical Consultants for Clinical Trials, Legal Affairs, and Marketing." company). i suggest deletion. -c.w., nyc, Mon Sep 4 05:37:07 EDT 2006


This article is not very clear about what a permutation test actually is. I read the main section on permutation test several times and compared it to other sources and I'm still kind of fuzzy on it. Specifically I'm confused over how you compare the results after the permutations. Is it necessarily implied by permutation test that you order all of the test statistic values, find the number of t values "more extreme" than your t value (I'll call it k), and say that your confidence of the null hypothesis is (k/n!)? Or is that just one way to do it? -Anadverb 16:24, 24 September 2006 (UTC)


Contents

[edit] Question

What's a "reference distribution"? There's no definition, and no wikipedia article on it.

[edit] Permutation test

With the aim of claryfing the permutation test, I added to its section a couple of paragraphs describing how the test is performed. In the next future, I could add an example. Gideon fell 14:09, 9 March 2007 (UTC)

Would be nice not only for this test but for the others as well. Stevemiller (talk) 04:15, 3 March 2008 (UTC)
Presumably the example is a test for not whether the samples come from the same distribution, but whether they come from distributions with equal means - should this be corrected? —Preceding unsigned comment added by 128.243.220.21 (talk) 15:46, 3 March 2008 (UTC)

[edit] Richard von Mises

"Richard von Mises was the first to conceive and apply the jackknife" I thought it was Quenouille. Googling quenouille jackknife and then "von mises" jackknife suggests I might be correct. Is there a source for the statement? —The preceding unsigned comment was added by Tolstoy the Little Black Cat (talkcontribs)

17:56, 14 March 2007 (UTC).

[edit] Misconceptions

Some misconceptions have crept into this article since I last read it.
Misconception 1: (some authors speak of permutation tests in this last case only, using the term randomization test in the previous situation).

Maybe they do, but then they don’t understand what a permutation test is, and it would be best to keep quit about this. A permutation test is a test that derives the distribution of the test statistic from the permutation distribution defined from the observed data. When we perform the test in a practical situation, it may entail the enumeration of all permutations, or a random selection of them, but that does not mean that we have two different tests, nor two variants of the same test. The test in itself is the same in both cases; the only difference is that we in some situations prefer to take a time saving short cut when we calculate the p-value of the test. So even if there is a difference (of no practical importance) on a practical level, they are the same test on the theoretical level.

This misconception is also shown in the sentence: This type of permutation test is known under various names: approximate permutation test, Monte Carlo permutation tests or random permutation tests[2].

There exists only one type of permutation test from this perspective. Even if there are a few alternative ways to calculate the p-value, this is only a matter of computational detail and does not lead to the raise of different tests.

Misconception 2 The Student's t test is exactly a permutation test under normality and is thus relatively robust. The F-test (z-test) and chi-squared test are far from exact except for in large samples (n > 5, or 20).
The Student's t test is not a permutation test in any situation.
Valter Sundh 2007-04-14

[edit] Requests

  • "butcher knife" method [1] [2] — DIV (128.250.204.118 07:58, 20 July 2007 (UTC))

[edit] wha?

What is the meaning of the "(1 - )" in the description of permutation testing? 96.241.2.69 (talk) 04:16, 8 May 2008 (UTC)