Talk:Sample size

From Wikipedia, the free encyclopedia

This article is within the scope of WikiProject Statistics, which collaborates to improve Wikipedia's coverage of statistics. If you would like to participate, please visit the project page.

WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, which collaborates on articles related to mathematics.
Mathematics rating: Start Class High Priority  Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.

Contents

[edit] Rule of Thumb in an article about statistics!!

Under estimating proportions a paragraph begins "The rule of thumb for (a maximum or 'conservative')"

I think Deming is rolling over in his grave!70.22.59.134 (talk) 20:33, 8 April 2008 (UTC)

[edit] Maximum Error

I see someone has changed the inequalities around again - the reason they were set the way that they were is that if we want a certain maximum error epsilon, then we require the half-width of the CI to be at most epsilon i.e. B<= epsilon. Therefore we will obtain a minimum, (rather than a maximum) sample size required (which the inequalities would suggest in their current state). I can see no interpretation which would lead us to set these inequalities the other way around, particularly that would give us a maximum value of n, rather than a minimum! HyDeckar 00:48, 22 March 2007 (UTC)

Look, I'm really quite confident about this - if anyone feels like responding, I'm more than willing to figure out what is right, but I'll change it for now (but come back at me here rather than start a possible 'edit war') HyDeckar 14:22, 23 March 2007 (UTC)

This is, I'm afraid, a fundamental error of interpretation on your part. You want to be able to say that the sampling error from your procedure is no larger than B, i.e. < B. The sample size is still the minimum required to assure that. As n increases, B and, consequently error, decrease. -MBHiii 12:41, 30 March 2007 (UTC)

I still disagree - I am claiming that the sampling error is no larger than epsilon (as *epsilon* -not B- is the required maximum error, B is simply a working variable which is the half width of the CI). Therefore, we derive an inequality which must be satisfied by n for this to occur. It is clear that this inequality _must_ be of the form n \geq some value, as small n leads to larger error. HyDeckar 12:32, 6 April 2007 (UTC)
Now I see what you're doing. The confusion for me was in using epsilon for a fixed quantity. I contend epsilon's usually used for a random variable denoting error. Calling B a variable is also non-standard (B is for "Bound"), and it's a simple function of parameters, not variables. By calling B a variable, you require the concoction of a new fixed boundary, your epsilon. The fewer the steps, the better. I still contend you should use epsilon for sampling error, a variable, and B for the bound on that error. I see what you want: a small error pushes up the required n. My formulation turns the emphasis around: allowing large error pushes down the required n. Both are correct, but I contend mine is more standard and focuses on limiting the random error (epsilon) using a value of B that's determined before starting.MBHiii 20:59, 6 April 2007 (UTC)
Ok, that seems fair, but I contend that it is hardly clear to someone unfamiliar to the subject what exactly is going on. Therefore, I've changed these inequalities to approximations, as that at least is unambiguous in meaning HyDeckar 12:44, 7 April 2007 (UTC)

[edit] Question

I am unsure about the statement

Note, if the mean is to be estimated using P parameters that must first be estimated themselves from the same sample, then sample size should be n+P.

If we mean to increase the sample size by P so as to maintain enough degrees of freedom, then we would arguably need to consider the use of degrees of freedom throughout, i.e. t distn's etc... Also, such a situation would typically involve covariates, which may allow us greater accuracy than the CI given here. Given all of this, I'm just not sure that this comment is a useful one to have here.

An article on sample size is a good place to introduce the concept of degrees of freedom. Suggest:

Note, if the mean is to be estimated using P parameters that must first be estimated themselves from the same sample, then to preserve sufficient "degrees of freedom" sample size should be at least n+P.

Looks good HyDeckar 00:48, 22 March 2007 (UTC)


There seems to be some information missing in the Required Sample Sizes for Hypothesis testing section - the formulae seem to be missing, unless I'm reading it incorrectly or it isn't displayed on my screen. The proof doesn't seem to be appearing! Vickie —Preceding unsigned comment added by 163.156.240.17 (talk) 10:11, 13 February 2008 (UTC)

[edit] Old Stuff

Can someone please stop User:Lgallindo from vandalizing this page! I see he has problems with someone else vandalizing a page on sampling, but this has nothing to do with that. -- Mbhiii 18:23, 23 October 2006 (UTC)

[edit] New Version

New version now online HyDeckar 16:36, 20 March 2007 (UTC)

I have written a tentative new version of this page, it is available for comment on my user page. Unless I get a huge negative response, I'll load it up in a couple of days. HyDeckar 15:10, 19 March 2007 (UTC)

It seems like a big improvement to me. I say go for it. -- Avenue 03:06, 20 March 2007 (UTC)

What you wrote is good with improved notation and generally, BUT you DELETED the useful "rule of thumb" and its derivation - a bad move on your part. I'm restoring it. --63.98.135.196 19:47, 20 March 2007 (UTC)

Sorry about that, accidental 'friendly fire' - I've rehashed the "rule of thumb" (not under that name) to line up with the rest of the article stylewise. HyDeckar 08:28, 21 March 2007 (UTC)

[edit] Clarification

This article is crazy-confusing. Piuro 22:17, 23 October 2006 (UTC)

Please do not remove the confusing tag until the article is much clearer. Piuro 19:12, 24 October 2006 (UTC)


Hi, I hope this works. The main ideas I'm trying to answer are (1)just what does "sample size" mean, (2)what are its effects, and (3)how can you estimate it? I think the first paragraph answers (1), the first and second answer (2), and the third and fourth answer (3). --Mbhiii 16:23, 25 October 2006 (UTC)


Hello Khatru2, thanks for bolding "Sample size", but you should know I wrote every word of Sample size and take responsibility for it. That's what my (or anyone else's) signature means, so please leave it. It provides a quick link to my contact information for further, detailed or ancillary discussion. --Mbhiii 12:40, 26 October 2006 (UTC)

Whoops, as per Wikipedia:Ownership of articles, no signature. --Mbhiii 17:16, 26 October 2006 (UTC)

[edit] Notation

There is a misconception that the sample size is denoted N, but all serious sampling texts (e.g. Cochran, or Sarndal et al) use n for the sample size and N for the population size. I accordingly changed N to n throughout the article. However 68.221.1.30 (talk) changed it back to N. I see this as a very retrograde step, which is likely to confuse our readers. It was only slightly mitigated by the addition of an end note that N usually denotes the population size. Another note was added to say that N is being used for legibility.

Why should we use the wrong notation? Legibility is not a good reason; lower case italics are the symbols used most often in mathematics, so anyone who will understand the formulae should be very used to reading this sort of text. A note saying that we are doing this intentionally does not solve the problem; it just makes us look foolish rather than ignorant. -- Avenue 01:06, 4 February 2007 (UTC)

Fixed. Hopefully, it'll stay that way. — Xaonon (Talk) 18:33, 19 February 2007 (UTC)
I'll leave it. My eyes are old, so legibility is a top priority. It'd be nice if someone made the small n larger. --mbhiii 15:16, 14 March 2007 (UTC)
Did you know that you can increase the size of displayed text in most browsers by pressing the Control and "+" keys simultaneously? (See the links at Computer_accessibility#Web_browser_accessibility_features for a lot more information.) While I agree it's important to keep accessibility in mind, I don't think changing the size of individual elements such as the small n is a good idea. -- Avenue 21:58, 14 March 2007 (UTC)