User:Cryptfiend64/for school

From Wikipedia, the free encyclopedia

You are reading Hamlet, when you recall that one of your favorite math teachers said, "On average, Shakespeare writes 1,500 words per act, with a standard deviation of 125." You count the words in each act of Hamlet, and you find that there are, on average, 1,600 per act. You find this odd, and decide to read your statistics book to see if Shakespeare decided to make Hamlet extra long. To do this, you can use the 1-sample Z-test, which requires you to know the following:

  • σ (the population standard deviation, which is 125 in this case)
  • μ (the population mean, which is 1,500 in this case)
  • \overline{x} (the sample mean, which is 1,600 in this case)
  • n (the number of samples we took; in our case, the "samples" are each Act, and there are five Acts in Hamlet, so n = 5)

When using any Z-test, we assume that the population standard deviation (&sigma) is already known (usually not a realistic assumption). There is a nice rule known as the 68-95-99.7 rule. This rule simply states that on any normal curve, 68% of the values that \overline{x} can take lie within one standard deviation of the population mean. For example, in the above problem, 68% of the values that \overline{x} can take lie within the interval 1500\pm125, or (1375 - 1625). Similarly, 95% of the values that \overline{x} can take lie within 2 standard deviations of the population mean, and 99.7% of the values that \overline{x} can take lie within 3 standard deviations of the population mean.

We look at our original problem again. In order to see if Shakespeare was being odd while writing Hamlet, we need two statistical hypotheses - H0 and Ha. H0 is known as the null hypothesis, while Ha is known as the alternative hypothesis. The null hypothesis states that the population mean is equal to what was established before; in our case, it would be: μ = 1500. On the other hand, to attempt to disprove the null hypothesis (which seems to be false in our case), we must formulate an alternative hypothesis, which in our case would be: μ > 1500.

The purpose of a 1-sample Z-test is to see if we can disprove the null hypothesis. To see how that works, it's best to try a problem. First of all, the formula for an actual 1-sample Z-test is:

z = \frac{\overline{x}-\mu}{\sigma/\sqrt{2}}

This value z is merely a standardized value. If you look at Table A in the front of your statistics book, you will see a Z-value chart. The values down the left column indicate the z-value you want to look for (up to the tenths place), the values along the top row indicate the hundredths value. For example, to find the corresponding area for a z-value of -1.91, you'd look for -1.9 on the left column, then for 0.01 on the top. The number happens to be 0.0281. This means that at the z-value of -1.91 (which is simply a standardized value on a normal curve), the area to the left of that value on a normal curve is 0.0281.

normalcdf(-1E99,Z) will find the area to the left of a value on a normal curve

Let's look at the Hamlet problem again. Through calculation, you find that the z-value is 1.7889. If you interpret it in the chart, you find that the p-value is 0.9625. We must interpret this p-value in turn. We are assuming that the null hypothesis (that the population mean equals 1,500) is true (because our value μ is 1,500 when we did the z-test). Through inference, we found out that the values left of 1,600 (when μ=1,500) make up 0.9625 of the all values. To the right of 1,600, on the other hand, are 0.0375 of the values. What the latter value means is that if the population mean were actually 1,500, there would be a 3.75% chance that you would get values of \overline{x}\ge1600. 3.75% (or 0.0375) is the p-value. There is a value α that is used by statisticians. It generally equals either 0.05 or 0.01. If you find that the p-value is less than 0.05 (for most experiments) or less than 0.01 (for experiments of life and death, for example), then the results are considered statistically significant, and you can confidently reject the null hypothesis. In this case, our p-value is 0.0375, which is less than 0.05; there is no risk in rejecting the null hypothesis at an α-level of 0.05, so we can confidently reject the null hypothesis.

Problems

  • There is a normal curve with sample mean 10 and a standard deviation of 1. How many values lie within the values (8,12)?
    1. 50%
    2. 60%
    3. 95%
    4. 99.7%
    5. 10 out of every 13
  • Say you calculated a z-score of 2.15. What is the area to the left of this value on a normal curve?
    1. .9824
    2. .9842
    3. .9991
    4. .9142
    5. 9.82%

You have recoiled from your awesome discovery, and you decide to go on another quest of statistical inference. This time, you decide to work on your Culminating project, which deals with the intelligence of teachers and professors. You have each of your (non-math) teachers take an extremely hard calculus test. Their scores are the following:

  • 56, 61, 46, 62, 50

You wonder what the actual average score of non-math teachers in Stafford High School would be.

If you're ever watched the news, you've seen Gallup Polls (which are covered in a later chapter). They always have a margin of error, usually around 3%. If, say, there were a poll, "do you support President Bush?" If the results were 47% yes, 53% no, due to the margin of error, the yes answers in the real population could be anywhere from 44-50%, and the no answers in the population could be anywhere from 50-56%. A confidence interval, although a little bit more complicated than that, has the same general characteristics. We need the population standard deviation, the sample mean, the number of subjects in our sample, and an upper critical value (known as z * , or "Z star"). The upper critical values can be found in Table C in the back of your book -- if we want 95% confidence in our interval, we must use an upper critical value of 1.96. The formula for finding a confidence interval is as follows:

\overline{x} \pm z^{*}\frac{\mu}{\sqrt{n}}

This seems reasonable -- we start out with the sample mean (\overline{x}), and we can add or subtract the margin of error (the other stuff) to find an interval of values in which the actual population mean probably lies. It is helpful to know the statistical definition of "95% confidence":

If we were to take a large number of random samples from a population, and took confidence intervals on each one, 95% of those intervals would contain the true population mean.

Anyway, let's go back to our test score problem. Say we know from previous experience that the standard deviation of the population is 12. We want a 95% confidence interval for all test scores for the population (which consists of non-math teachers in Stafford High School, assuming this is a random sample) We can then define all values:

  • \overline{x} is 55, if you calculate it
  • z * is 1.96, because we are calculating a 95% confidence interval
  • σ is 12, as defined previously
  • n is 5, because we had five teachers take the test

So, the equation should be:

55 \pm {1.96*12\over\sqrt{5}}

55 is the sample mean, and we can calculate the margin of error. After calculations, the answer turns out to be:

55 \pm 10.518, or an interval of (44.482, 65.518)

So, if we took repeated samples, the true population mean would lie within 95% of those samples' confidence intervals. Apparently, it looks like all of the teachers failed the test.

Problems

  • Say you were a forestologist, and you decided to visit six rain forests. You computed a total of 14, 20, 21, and 37 boll weevils in each forest. However, scientific evidence suggested that there were, on average, 20 boll weevils in a rain forest, with a standard deviation of 3.4. Perform a 1-sample z-test on these values.
  • You then decided to pursue your alternative hobby of translating the Bible. After translating the Bible five times, you found 101, 210, 331, 142, and 121 mistakes in your translations, compared to the King James edition. We know that the standard deviation is about 10.2 for translation mistakes. Perform a 95% confidence interval with these values.