Talk:Student's t-distribution

From Wikipedia, the free encyclopedia

This article is within the scope of WikiProject Statistics, which collaborates to improve Wikipedia's coverage of statistics. If you would like to participate, please visit the project page.

Mathematics Portal

This article is within the scope of WikiProject Mathematics, which collaborates on articles related to mathematics.

Mathematics rating:

B+ Class

Top Priority

Field: Probability and statistics

One of the 500 most frequently viewed mathematics articles.

Please update this rating as the article progresses, or if the rating is inaccurate. Click to show/hide comments.
Please add to or update the comments to suggest improvements to the article.
Consider a GA nomination! Geometry guy 01:31, 5 June 2007 (UTC)

1 Miscellany
2 Should we put an index n
3 Definition of ψ and B
4 Table
5 Missing definition of F_1 in table
6 Expected Mean
7 reasons for my recent reversion
8 Reason for reversion on 11-Jun-2006
9 Alternate forms of the t-table
10 comment moved from article
11 Better explanation
12 plots under student's t are not comprehensible
13 Redirect for t-value?
14 Explanation of 80% confidence interval
15 Confidence interval: 80% or 90%
16 Used Template:Abramowitz_Stegun_ref to make the A&S reference clickable
17 Slash in denominator: double division? Is the formula correct?
18 Remove external link to Shaw's paper about the capital-T statistic?
19 Clear definitions of the quantities in the table
20 Student did not actually present the t-statistic in 1908
21 Please comment if you have an opinion on the opening section
22 t-table
23 Article hard to understand?
24 Incorrect pdf Formula
25 what does "t" stand for?
26 standardization
27 p-value and A(t | ν) ambiguous
28 A(t|nu) discussion not very enlightening
29 Examples are always helpful
30 Error in example after table?
31 Corrected error in table.
32 please explain the formulas
33 Table of Student t's

[edit] Miscellany

The mgf diverges, a note has been made in the summary table. WT

This page contains special entities (characters) that cannot be displayed on many browsers. See Maxwell's equations for a way to fix this. David 21:17 Oct 15, 2002 (UTC)

As of the beginning of 2003, there is a much better way to fix that problem, and this page now (in displayed math, as opposed to symbols embedded in lines of text) takes advantage of that advance. Michael Hardy 01:14 Feb 9, 2003 (UTC)

[edit] Should we put an index n

 ijil l on our sample variance S²? AxelBoldt 07:58 Feb 9, 2003 (UTC)

I agree with AxelBoldt - the n subscript is confusing. I prefer n-1 or no subscript. The n subscript is usually used when you want to highlight the fact you divided by n instead of n-1 when calculating the sample variance. It's especially confusing if you are trying to cross refrence the Wikipedia article on variance - extract from that article follows:

$s_n^2 = \frac 1n \sum_{i=1}^n \left(y_i - \overline{y} \right)^ 2 = \left(\frac{1}{n} \sum_{i=1}^{n}y_i^2\right) - \overline{y}^2,$

and

$s^2 = \frac{1}{n-1} \sum_{i=1}^n\left(y_i - \overline{y} \right)^ 2 = \frac{1}{n-1}\sum_{i=1}^n y_i^2 - \frac{n}{n-1} \overline{y}^2,$

Both are referred to as sample variance. Most advanced electronic calculators can calculate both $s_n^2$ and

s 2

at the press of a button, in which case that button is usually labeled

σ 2

or $\sigma_n^2$ for $s_n^2$ and $\sigma_{n-1}^2$ for

s 2

--Jconnolly 02:20, 27 February 2007 (UTC)

Now I've done that. There are times when that matters; those are when the dependence on the sample-size n is important, especially when a limit as n approaches infinity is to be mentioned. Perhaps this article is not such an occasion, especially considering that Student's distribution is most important when n is small. But I've already put the subscript on the sample mean, so consistency makes it preferable to put it on the sample variance too. And it might also not hurt to mention the limiting distribution as the number of degrees of freedom grows. Michael Hardy 21:29 Feb 9, 2003 (UTC)

I'm a french user named Thorin. You might be interested in reading my article on Student's distribution: fr:Loi de Student.

I have noticed in the history that someone mentions he found some wrong value on the table. Related or not, I personally think the table isn't wrong, but the example below is not quite right.

The definition of t_k,x is that if T is a variable following Student(k), then the probability that T>t_k,x is equal to x.

Symetrically, the probability that T<-t_k,x is also equal to x.

This means that the probability that -t_k,x<T<t_k,x is equal to 1-2x (and not to 1-x, as the example assumes). it's only the probability that T<t_k,x that is equal to 1-x.

This means the confidence level shown on the table is the confidence level for having: mean < Xn + A sqrt(S/n). But it is not the confidence level for the interval [Xn - A sqrt(S/n),Xn + A sqrt(S/n)] (hence the passage from 90% to 80% that I added in the article).

I just did a few minor edits to fr:Loi de Student. My feeble grasp of French is such that I would not attempt more than very minor edits. Please note that in TeX you don't need to write ">=". I was identified in the edit history only by the IP number 128.101.152.68. Michael Hardy 19:48, 19 October 2005 (UTC)

THORIN writes: I'll give a few references to back me up(sorry I'm not yet familiar with external links).

The first reference is in english:

http://www.itl.nist.gov/div898/handbook/eda/section3/eda3672.htm

I draw your attention on the following quote, and on the graph you will find above the table:

"The most commonly used significance level is alpha = 0.05. For a two-sided test, we compute the percent point function at alpha/2 (0.025)."

Their formalism, as well as their variable names seem to be pretty close to yours.

Here's the second reference (in french, sorry):

http://rfv.insa-lyon.fr/~jolion/STAT/node144.html

It counts confidence level the other way around, but it is not the only difference. This alternative table is built for a formalism that is different from ours, so that you'll notice it is their .20 collumn which corresponds to your .90 collumn. In fact their table counts two times the probability corresponding to UCL_1-alpha (.20 = 2(1-.9))

The third reference is the one I've been using as my main source.

http://newton.mat.ulaval.ca/pages/belisle/Notes-tableaux/Lois-khi2-t-F.pdf

It confirms the two others.

THORIN writes: thanks for the tip on how to adjust bracket size, Michael. It seems that now we should focus on the graphs.

What is the meaning of psi and B functions in the entropy? I haven't seen them defined anywhere.User:ThorinMuglindir

Those denote the digamma function and Beta function, respectively. --MarkSweep✍ 23:49, 23 October 2005 (UTC)

Thanks Mark, I'll just say that in the article (ThorinMuglindir 22:01, 27 October 2005 (UTC))

[edit] Definition of $ψ$ and $B$

Mark, I have to disagree with you reverting my change in the table. First, we simply can't put a formula in an article that uses non-defined items. So that links to the definition of psi and B have to appear some place or the other. I'll thus revert the change.

If you really think that these can't be in the table, then I engage to move them to some other part of the article rather than deleting them outright. If you wish to do this I draw you attention on two points: putting that in the article's body will make the expression less straightforward for a reader, and will force you to add a "see text for definitons of psi and B" note in the table, which will hardly take less room than the links themselves. Second, if this table is intended as a summary, then it should be self-sufficient to the greatest extent possible, which for me speaks in favor of putting links to the two functions inside the table... ThorinMuglindir 08:22, 28 October 2005 (UTC)

Could the confidence limit section be made clearer with regard to confusion of one-tailed and two-tailed confidence limits? Under the heading "Confidence intervals derived from Student's t-distribution" it currently says "The interval whose endpoints are ... where A is an appropriate percentage-point of the t-distribution, is a confidence interval for μ." I think if you want a 95% confidence interval, then the "appropriate percentage-point" is 97.5% (right?) But the sentence as it currently stands can very easily be misinterpreted as meaning that where A is the 95% percentage point, then the given interval is a 95% confidence interval. If I'm right, how about putting immediately after this sentence, "For example, if A is the value of the t-distribution for the 97.5% percentage point, then the interval is a 95% confidence interval." Cathy Woodgold 2005 Nov 8, 00:02 UT

I've taken the liberty of changing the sentence so it reads "Therefore the interval whose endpoints are ... is a 90% confidence interval for mu." because that is what the previous equation means: the probability that mu lies in the stated interval is 0.9. Simon Duane 2005 Nov 29, 14:53 UT

[edit] Table

OK, the table was not as grossly wrong as I thought, but it was badly explained. I will replace it with a further edited version. Michael Hardy 00:46, 8 November 2005 (UTC)

In the example for the table it sais:

So that at 90% confidence, we have a true mean lying between

$10\pm1.37218 \frac{\sqrt{2}}{\sqrt{11}}=[9.41490,10.58510]$

But in case of a double sided confidence interval it should change from 90% to 80% if one still uses the value r for the 90% one sided confidence interval. Please revise this. 137.132.3.12 14:14, 3 May 2006 (UTC)Konrad

[edit] Missing definition of F_1 in table

Hi, The cdb function stated in the summary table uses the Function F_1 that I couldn't find a definition for anywhere in the article. Thanks for any explanation - Maybe someone knows what the cdb should look like. Thanks, --ISee 13:13, 2 February 2006 (UTC)

I added the link to the hypergeometric function.Pdbailey 20:32, 20 April 2006 (UTC)

[edit] Expected Mean

I'm not entirely sure, but isn't the expected mean only defined for degrees of freedom > 1? Student's w/one DoF is supposed to be equivalent to standard Cauchy, which definitely has undefined mean.

I would update the page (it's only a minor edit) if I were more sure about my math. Pcastine 18:06, 30 March 2006 (UTC)

[edit] reasons for my recent reversion

The edits by user:129.137.87.25 that I just reverted don't make sense.

The central limit theorem is not in fact involved in the way it said, because it was given that the random variables involved were already normally distributed, and moreover, if one did use the central limit theorem, one would have to speak of convergence to the normal distribution, rather than of being exactly normally distributed.

One should not use the same symbol, the lower-case t, to refer both to the random variable and to the argument to the density function.

Michael Hardy 01:40, 3 April 2006 (UTC)

[edit] Reason for reversion on 11-Jun-2006

user:128.135.17.110 recently edited the page and removed an external link that I had added to a free online Student-t calculator. The comment left by user:128.135.17.110 indicated that s/he believed I added the external link to improve the Google page rank of the link target. I will state plainly that this is not the case; on the contrary, the external link was added simply to improve the utility and usefulness of the page. Prior to my edit, the page contained external links to an offline software-based t-value calculator and a t-distribution generator, but did not contain an external link to a free online t-value calculator. As I believe that many of the researchers/scientists who use this page will find a free online t-value calculator to be valuable to their work, I am reverting the page to the previous version that contains the external link. If this is problematic, please post your concerns here and I will be happy to respond to them. --DanSoper 08:50, 11 June 2006 (UTC)

The text you wrote looks like an advertisement, you'll forgive me for mistaking it for one. You toned down the text quite a bit when you update it, please do so with the other links you made to your page. In addition, I'm not sure that the like adds something since there is already links at many of these pages that does exatly the same thing. 128.135.133.123 18:17, 20 June 2006 (UTC)

External link - Input please

Greetings all,

Recently User:128.135.17.105 removed an external link (Free Student t Calculator) that I had placed on this page to an online Student t calculator that is available for free on my website. The reason given for this removal by User:128.135.17.105 is that the calculator adds no value to the page. I would like to hear whether or not this is the majority opinion, as I believe that a link to the free Student t calculator provides a great deal of value to the page. Here's why:

1. The other online calculator that is linked to from this page (VassarStats) cannot supply t-values when there are fewer than 10 degrees of freedom.

2. The other external link (Distribution Calculator) can only perform Student t calculations if the user downloads and installs the software on their computer.

3. Well over 300 people from Wikipedia have used the free Student t calculator on my website since I posted the link two weeks ago -- a clear indication of the value of the free calculator to the readers of this page.

Out of respect for the opinion of User:128.135.17.105, I will not repost the link right away. If anyone agrees that there is value in the external link that User:128.135.17.105 removed, please let the community know by posting your thoughts here. I would particularly enjoy discussing this issue further with User:128.135.17.105, as I believe that (in the spirit of Wikipedia) we can resolve this issue amicably. :-)

--DanSoper 23:49, 22 June 2006 (UTC)

the main discussion is at Talk:Chi-square distribution. 128.135.226.222 00:36, 28 June 2006 (UTC)

[edit] Alternate forms of the t-table

It would be very useful to explain the alternate form of the t-table such as the one found here. If my memory serves me correctly it is based on the population standard deviation rather than the stadard deviation. I think it is okay to not address the tables that use the residual (alpha) as that can be easily figured out. But the table presented here and these other tables are not obvious.--Nick Y. 23:25, 11 June 2006 (UTC)

Your memory serves you very badly in this case. The table you cite is 2-sided, as opposed to the 1-sided table given here. The population-versus-sample issue has nothing at all to do with it. The explanation is essentially here in this article, where it talks about 1-sided versus 2-sided, but the explanation is missing from the page you cite. Michael Hardy 23:41, 11 June 2006 (UTC)

[edit] comment moved from article

user:193.11.239.45 put the following comment into the article; I've deleted it from there and pasted it here:

Comment: We are novices but strongly belive that the value of v is wrong, it should be 0,879 which corresponds to v=10 and 80% in the table above. /Emil and Christian

I don't understand which "v" is being referred to. Michael Hardy 18:36, 20 September 2006 (UTC)

[edit] Better explanation

In the first example of how to calulate we have: "We can determine that at 90% confidence, we have a true mean lying below 10+1.37218+(squareroot of 11)/(squareroot of 2)=10.58510"

In the second example with 80% confidence we have: "So that at 80% confidence, we have a true mean lying between 10+-1.37218+(squareroot of 11)/(squareroot of 2)=9,41490 and 10.58510"

We have the same value of "a" in both cases, which is wrong. In the 90% case it is correct, 1.37218. In the 80% case it is wrong, it should be 0.879 taken from the table.

So the correct calculation of 80% case is: 10+-0.879(squareroot of 11)/(squareroot of 2)=7,9386 and 12,0624

Hope this explanation is better! :) /The swedes

[edit] plots under student's t are not comprehensible

Both plots suffer from missing labels on their x-axes. One has to guess at the x-axis variable (t).

And the formula for probability density function has no variable k in it. So are the various plots in the graphs done for different values of ν instead?

same for cumulative distribution function...

[edit] Redirect for t-value?

I think there should be a redirect to this article from t-value and t value. Those two are common names for the t statistic as well. --Big Wang 11:51, 14 November 2006 (UTC)

Done! Next time, just go ahead and put in redirects yourself, if you think they're appropriate and helpful. See Wikipedia:Redirect and Help:Starting a new page. Just now, when trying to find the "Starting a new page" instructions for you, I had a little trouble finding it, so after I did find it I also put in a few redirects such as Wikipedia:Create --> Help:Starting a new page.

You can do things like that, too. New users are the best ones to know what redirects are needed in the help pages. It's good to have lots of redirects; one advantage is that then someone won't go ahead and write a new article not realizing that an equivalent article already exists. If it turns out that the redirect is inappropriate somehow, it can be deleted later or expanded into a full article -- so you can be bold. --Coppertwig 12:23, 14 November 2006 (UTC)

[edit] Explanation of 80% confidence interval

This is to discuss recent disagreements in the last sentence or two of the section "Table of selected values".

Recently, someone changed "80%" to "90%" and I changed it back. I then figured that other readers might also have trouble understanding why this should be 80% rather than 90%, so I inserted a sentence to try to explain this: "(It has a 10% chance of being above that range, and a 10% chance of being below that range, so it has a total of 20% chance of being outside that range, either above or below.)" It's possible I didn't use the correct statistical terminology in this sentence, but I'm pretty confident :-) that the general idea I'm trying to get across is correct. If there is no objection, I'll re-insert the same sentence. If someone can improve the sentence, that's even better. Please discuss. --Coppertwig 14:45, 17 November 2006 (UTC)

I object. I am not a probability theorist, but I don't think your statement is accurate. We need an expert on the subject. – Chris53516 ^(Talk) 14:49, 17 November 2006 (UTC)

Is it acceptable to use a Wikipedia page as a citation (Confidence interval)? I believe I can do that while also rewording the sentence to be more correct. Or does it have to actually come from a book? Do the examples have to have all of the same numbers as examples in cited material, or can Wikipedia pages use their own examples? Remember, stuff is not supposed to be copied verbatim from books. --Coppertwig 15:04, 17 November 2006 (UTC)

NO. Internal references are not acceptable. If they made a reference in that article, use that reference. When referencing, never plagiarize. – Chris53516 ^(Talk) 15:22, 17 November 2006 (UTC)

Another question: I believe you stated in the edit summary "I don't think that is what it means." Would you please explain here what you do think it (the sentence and formula immediately above my edit which you reverted) means? Thanks for watching a statistics page and helping make sure it's right! --Coppertwig 15:16, 17 November 2006 (UTC)

Your sentence is an extrapolation from the "80% confidence interval". The interval may or may not be evenly distributed. Therefore, your extrapolation that 10% must be above and 10% must be below is most likely incorrect. However, we need an expert on the topic to provide an answer. Additionally, you provided no source for verification of your statement. – Chris53516 ^(Talk) 15:24, 17 November 2006 (UTC)

Look a little further back: if you look at the two previous sentences, (starting from the second sentence after the table), they do state that 10% is above and 10% is below. I should have said "probability" rather than "chances". The Confidence interval page indicates that probability is what is meant. --Coppertwig 15:31, 17 November 2006 (UTC)

So? Got a reference? You have no evidence that what you say is true other than another Wiki page. – Chris53516 ^(Talk) 15:57, 17 November 2006 (UTC)

How does the following look? I've put the sentences I want to insert in italics here so you can see what I'm adding, but they would be normal text in the article. I think I understand Chris53516's objection. In place of the original sentence I had inserted, I inserted a different sentence which IMO achieves my goal of making the text more understandable for the non-expert reader, but which I believe does not give rise to what Chris53516 was objecting to. I've also inserted two other sentences. Each of the three additions has the purpose of clarifying for the non-expert what was just said in the sentence before it. What do you think, Chris5316?

For example, given a sample with a sample variance 2 and sample mean of 10, taken from a sample set of 11 (10 degrees of freedom), using the formula:

$\overline{X}_n\pm A\frac{S_n}{\sqrt{n}}$

We can determine that at 90% confidence, we have a true mean lying below:

$10+1.37218 \frac{\sqrt{2}}{\sqrt{11}}=10.58510$

(In other words, the probability that the true mean is higher than 10.58510 is 0.10.) And, still at 90% confidence, we have a true mean lying over:

$10-1.37218 \frac{\sqrt{2}}{\sqrt{11}}=9.41490$

(In other words, the probability that the true mean is lower than 9.41490 is 0.10.) So that at 80% confidence, we have a true mean lying between

$10\pm1.37218 \frac{\sqrt{2}}{\sqrt{11}}=[9.41490,10.58510]$

(In other words, the probability that the true mean is outside that interval, either above it or below it, is 0.20.)

Wait! Wait! No, the words in italics above which I was going to put in are wrong. Sorry. That was the prosecutor's fallacy. Let me try again -- how about the following?

For example, given a sample with a sample variance 2 and sample mean of 10, taken from a sample set of 11 (10 degrees of freedom), using the formula:

$\overline{X}_n\pm A\frac{S_n}{\sqrt{n}}$

We can determine that at 90% confidence, we have a true mean lying below:

$10+1.37218 \frac{\sqrt{2}}{\sqrt{11}}=10.58510$

(In other words, on average, 90% of the times that an upper threshold is calculated by this method, the true mean lies below this upper threshold.) And, still at 90% confidence, we have a true mean lying over:

$10-1.37218 \frac{\sqrt{2}}{\sqrt{11}}=9.41490$

(In other words, on average, 90% of the times that a lower threshold is calculated by this method, the true mean lies above this lower threshold.) So that at 80% confidence, we have a true mean lying between

$10\pm1.37218 \frac{\sqrt{2}}{\sqrt{11}}=[9.41490,10.58510]$

(In other words, on average, 80% of the times that upper and lower thresholds are calculated by this method, the true mean is both below the upper threshold and above the lower threshold. This is not the same thing as saying that there is an 80% probability that the true mean lies between a particular pair of upper and lower thresholds that have been calculated by this method -- see confidence interval and prosecutor's fallacy.)

Please let me know what you think of adding the italicized words above to the article (though I would not italiicize them in the article). Thanks --Coppertwig 23:12, 18 November 2006 (UTC)

[edit] Confidence interval: 80% or 90%

On 24 September 2007, user 141.212.137.29 performed the following change (under the header 'Confidence intervals derived from Student's t-distribution':

$\overline{X}_n\pm A\frac{S_n}{\sqrt{n}}$

is a 90-percent confidence interval for μ

He changed it to:

$\overline{X}_n\pm A\frac{S_n}{\sqrt{n}}$

is a 95-percent confidence interval for μ.

I think that the old version was correct, however, I'm not sure. Can anyone respond on this? Basten 09:35, 11 October 2007 (UTC)

[edit] Used Template:Abramowitz_Stegun_ref to make the A&S reference clickable

Since this page refers to the famous work of Abramowitz and Stegun, I went ahead and replaced the citation of A&S in this article's reference list with a template invocation. The template was originally created by User:William Ackerman and explained at [1] because Abramowitz and Stegun are cited so often in the physics pages. A sample invocation is {{Abramowitz_Stegun_ref|26|985}}. This expands into a cite of chapter 26, where the number 26 is clickable so that it opens up page 985 of the online version of A&S. In this case I had to replace the original '26.7' with '26' because the template won't take decimal points in the chapter field. If this bothers you, revert the change. Up till now the template has always used without subst. EdJohnston 16:20, 17 November 2006 (UTC)

Thanks for the references! I didn't know there was an online copy of Abramowitz! That will sure come in handy for lots of things, not just editing this page! And a link to the original work by Gosset -- that's great! I still see some problems with this article; I hope you and Chris53516 will stick around and help work them out. I'll probably make some more comments and/or changes soon. --Coppertwig 03:23, 18 November 2006 (UTC)

[edit] Slash in denominator: double division? Is the formula correct?

Consider the following formula from the article (3rd formula under the heading "Occurrence and specification of Student's t-distribution".)

be the sample variance. It is readily shown that the quantity

$Z=\frac{\overline{X}_n-\mu}{\sigma/\sqrt{n}}$

Note that there is a slash in the denominator, immediately after sigma. What does this mean? It seems to mean double division; that is, it seems to mean that the rhs is (xnbar - mu)/(sigma/sqrt(n)), which simplifies to:

$(\sqrt{n})\left(\frac{\overline{X}_n-\mu}{\sigma}\right)$

However, I believe that this formula is wrong and that the slash should simply be deleted. Later when I have time to think more clearly I may figure this out one way or the other. Meanwhile maybe someone else can figure it out. Either the slash is correct, in which case the expression should be simplified by putting the sqrt(n) in the denominator, or (as I believe) the slash is wrong and should be deleted. Or possibly the slash has some meaning I don't know, in which case it should be explained in the text.

Exactly the same problem (a slash, possibly spurious, in the denominator) also occurs in the following places: (2) The very next formula after the one I mentioned; and (3) the third formula in the section "Confidence intervals derived from Student's t-distribution"; and (4) possibly the first formula in the section "Further theory", although it may be correct even if the others are wrong; if so, the latter formula could possibly be improved by putting the material inside the square root sign into parentheses, or perhaps it's fine as-is.)

Thanks in advance for anyone shedding light on this. --Coppertwig 03:52, 18 November 2006 (UTC)

The above formulas are both correct, and the slash is ordinary division. See for example Casella and Berger, Statistical Inference, 2nd ed., page 222. EdJohnston 05:20, 18 November 2006 (UTC)

Now that I look at it again, I see that you're right. The formulas are mathematically correct. However, they need to be simplified down to standard form. If you look in Gosset's paper, you don't see any slashes in the denominators. He has horizontal division lines in denominators, but almost always only when necessary, for example as part of multiterm expressions inside an integral or square root sign. I think maybe he has one other one in the middle of a calculation. Other than that, he presents formulas in the conventional way, which means you don't have division going on in the denominator if you can help it. So, I think the formula needs to be edited to be one of the following forms or something similar (if I've done this right, these are all mathematically equivalent to each other and to the formula currently in the article, but are in a more standard form).:

$Z = (\sqrt{n})\left(\frac{\overline{X}_n-\mu}{\sigma}\right)$

$Z = \sqrt{n}\left(\frac{\overline{X}_n-\mu}{\sigma}\right)$

$Z = \frac{\sqrt{n}(\overline{X}_n-\mu)}{\sigma}$

--Coppertwig 23:33, 18 November 2006 (UTC)

I disagree. The form with the fraction in the denominator is easier to understand, because σ/√n is the standard deviation of the random variable $\overline{X}_n-\mu\,$ , so it makes it obvious that you're just subtracting the rv's expected value from it and then dividing by its SD, the usual standardization. Michael Hardy 02:46, 19 November 2006 (UTC)

[edit] Remove external link to Shaw's paper about the capital-T statistic?

The last item in our external links section is a paper by William T. Shaw which concerns drawing random samples from what he calls the univariate 'T' distribution, when working in a multivariate setting. This paper seems rather esoteric for the current article, and the non-standard usage of capital 'T' could be confusing. The Shaw paper is not explicitly mentioned in the text of the article and no reference to its subject matter is made. If no-one objects I'll remove this paper from the reference list. EdJohnston 06:02, 18 November 2006 (UTC)

I have just noted Ed's removal of the reference to my paper on this, which is now published in the Journal of Computational Finance. There seems to be a little confusion here about relevance, though this is partly my fault for not going through with an planned edit of the main article to discuss quantile functions (i.e. the inverse of the CDFs) to explain their relevance, and indeed to post some other discussions on quantiles for other distributions in the relevant bits of Wiki. These are fundemental to the sampling of the univariate case for any distribution. In the case of Student's t the inverse CDF, which makes it trivial to do sampling, has been not understood for many decades and now it is. There are some nice special cases which will go in the special cases section as well. I will do a tentative edit presently and let people make their own judgement. I must confess to regarding the comments about capital T vs lower case t rather bizarre - quite why anybody would be confused by this is beyond me - William Shaw

[edit] Clear definitions of the quantities in the table

(I thought I had already made this comment but don't see it; I must have forgotten to click "save changes".) Re the (large) table in the section "Table of selected values": I would like to have clear explanations in the article of the definitions of the three types of quantities in the table: the integers along the left, the percentages along the top, and the numbers in the main body of the table. Note that one of the problems I see is that it's not clear whether the $ν$ in the corner refers to the numbers along the left or to the percentages along the top. I suggest something like the following. Please critique. I would put this just below the table:

The number at the beginning of each row in the table above is $ν$ which has been defined above as $n - 1$ . The percentage along the top is $100%(1 - α)$ . The numbers in the main body of the table are $t α,ν$ . If a quantity $T$ is distributed as a Student's t distribution with $ν$ degrees of freedom, then there is a probability $1 - α$ that $T$ will be less than $t α,ν$ .

--Coppertwig 23:52, 18 November 2006 (UTC)

That seems less clear to me than the present explanation. Perhaps it will help to specify which entry is intended? Septentrionalis 00:25, 19 November 2006 (UTC)

Sorry; I don't understand what you mean by "which entry is intended" and I don't know what part of the article you're referring to by "the present explanation". Maybe you mean the example shown immediately above the table? An example is fine, but I would like to see also a concise definition that applies to all elements of the table and that someone can refer to, in order to get the meaning of the table, without having to work through an example again. An example is not a substitute for a definition. --Coppertwig 03:58, 19 November 2006 (UTC)

[edit] Student did not actually present the t-statistic in 1908

I think the intro might need to be reworded slightly, since the t-statistic was actually first defined by R.A.Fisher in a paper of 1924. The quantity whose distribution was discussed by Student was 'z', where t = z sqrt(n-1) is the relation between the old and new definitions. This 'z' was not the one we use nowadays with the normal distribution. The external link to 'Earliest known uses...' makes this evident. When time permits I'd like to make a try at rewording the opening paragraph, and will offer it here on the Talk page for discussion. Also I think it's NOT desirable to capitalize 't', which happens further down on the page. Regular statistics books don't do that. Moreover I think Student DID NOT wrote the formula that is attributed to him in the article where t is related to the gamma function. He was not a mathematician, but Fisher was.

Here is the actual quote from the 'Earliest known uses of some of the words of mathematics' web site which makes clear that Student did not call it 't':

In his 1908 paper, "The Probable Error of a Mean", Biometrika, 6, 1-25 Gosset introduced the statistic, z, for testing hypotheses on the mean of the normal distribution. Gosset used the divisor n, not the modern (n - 1), when he estimated and his z is proportional to the modern t with t = z sqrt (n - 1). Fisher introduced the t form because it fitted in with his theory of degrees of freedom (q.v.). Fisher used the t symbol and described Student's distribution (and others based on the normal distribution) and the role of degrees of freedom in "On a Distribution Yielding the Error Functions of Several well Known Statistics", Proceedings of the International Congress of Mathematics, Toronto, 2, 805-813. Although the paper was presented in 1924, it was not published until 1928 (Tankard, page 103; David, 1995). According to the OED2, the letter t was chosen arbitrarily. A new symbol suited Fisher for he was already using z for a statistic of his own (see entry for F). -- EdJohnston 21:24, 20 November 2006 (UTC)

If you've got the references for it, change the article and cite your sources. – Chris53516 ^(Talk) 21:31, 20 November 2006 (UTC)

This seems like a pretty minor point comparted to what the section heading about might lead one to suspect. So Student's statistic introduced in 1908 was not exactly identical to the version that is now conventional. Nonetheless, the hypothesis tests and confidence would be exactly the same. So it's worth mentioning, but not as big a deal as one might expect after reading "Student did not introduce the T-statistic." Michael Hardy 23:52, 20 November 2006 (UTC)

Perhaps not an earth-shaking issue, but the first version of this article, as created in 2002, got the terminology and the attribution correct. Somewhere along the way Student morphed into Fisher. Another issue with the article is that the term 'statistic' never gets defined, so the presentation seems incomplete. For modern readers it is most natural to describe Student's t-distribution as the distribution of the t-statistic, even though Student did not use that terminology. (Fisher introduced the term 'statistic' in 1922). I think that only minor rewording would be enough to make this clear. EdJohnston 06:34, 21 November 2006 (UTC)

[edit] Please comment if you have an opinion on the opening section

I'm proposing this new version, to (1) state what the t-distribution really IS in the first two sentences, (2) get the sequence of events right, so Student isn't credited for Fisher's work. As you see, I've reused most of the existing language, but changed the order. Please give me your comments on this alleged improvement. If I don't hear anything back, I'll make the change in a few days. I'll also add the necessary references. EdJohnston 04:14, 22 November 2006 (UTC)

In probability and statistics, the t-distribution or Student's t-distribution is the probability distribution of the t-statistic for samples of a fixed size repeatedly drawn from a normal population. The t-statistic is the difference between the sample mean and the true population mean, divided by a standard deviation computed from the sample, and multiplied by the square root of the sample size.

Student's distribution arises when (as in nearly all practical statistical work) the population standard deviation is unknown and has to be estimated from the data. Textbook problems treating the standard deviation as if it were known are of two kinds: (1) those in which the sample size is so large that one may treat a data-based estimate of the variance as if it were certain, and (2) those that illustrate mathematical reasoning, in which the problem of estimating the standard deviation is temporarily ignored because that is not the point that the author or instructor is then explaining.

The t-distribution can also be generalized to the case of two samples drawn from related populations, and be employed to compute confidence intervals. The Student's t-distribution is a special case of the generalised hyperbolic distribution.

The mathematical form of what is now called the t-distribution was presented in 1908 by William Sealy Gosset, while he worked at a Guinness brewery in Dublin. He was not allowed to publish under his own name, so the paper was written under the pseudonym Student. The t-test and the associated theory became well-known through the work of R.A. Fisher, who called the distribution "Student's distribution". Student himself called it 'the frequency distribution of a quantity z', where z was an expression for a certain kind of a normalized deviation in a small sample. Fisher later introduced the quantity 't', a deviation normalized in a slightly different way, and established all its mathematical properties.

A t-test is any statistical hypothesis test in which the test statistic has a Student's t-distribution if the null hypothesis is true.

EdJohnston 04:14, 22 November 2006 (UTC)

This proposal says:

That is not a generalization of the t-distribution. It's still exactly the same distribution. It's a different statistical test, but the same distribution. Michael Hardy 19:18, 22 November 2006 (UTC)

You're right. I'll try to come up with a correct version. EdJohnston 03:39, 23 November 2006 (UTC)

I am not impressed by this new lead. It leaves the third paragraph of the lead untouched, which I felt was one of the weak points of the lead. It explains in words something that is already explained much more clearly using equations in the first section of the article. It uses more technical terms than the previous lead. It uses two very short paragraphs. And it clarifies the origins of the subject in a way that would be better done in a separate section in the main body of the article. In essence, I feel it don't feel the changes are compatible with the lead section guidelines. Remember a lead is meant to provide an accessible overview, not be longer than three paragraphs for an article of this size and establish context. My strong preference is to stick with the old lead. Cedars 00:54, 23 November 2006 (UTC)

Thanks for your reply. My concern was that the article took so long to get to the point (saying what the t-distribution really is). If no-one else thinks the intro is too slow, I may reduce my proposal just to clarifying the history. At present I think there are (minor) factual errors regarding the attribution of who discovered what, between Student and Fisher, and I have the references needed to be sure of the accuracy. (Some of them listed here [2] on my Talk page).

You mention the third paragraph, the one that starts 'Student's distribution arises when..'. The value that I saw in that paragraph was it is the only place currently where the point of small-sample statistics is explained. Do you have any ideas for revising or replacing that paragraph? EdJohnston 04:02, 23 November 2006 (UTC)

[edit] t-table

The t-table is incorrect, every value should be moved horizontally to the left one box. —The preceding unsigned comment was added by 134.10.2.125 (talk)

Yep, I agree. It's incorrect. 72.142.195.237 00:55, 17 July 2007 (UTC)

[edit] Article hard to understand?

Not to seem frustrated, but good God Almighty, could we make this article just a smidgen more accessable to the average user? I'm a college senior, I'm majoring in biology, I've used this test many, mnay times before and was simply looking for a light refresher on its nuances. Even so, I'm having a serious trouble following what's going on in this article. Prehaps you should split it into several sections, ie Overview (for regular people) and a Detailed (for statisticians and mathematicians) section. Otherwise, I suspect that it'll remain totally useless to the vast majority of users.

Antagonistrex 14:37, 26 February 2007 (UTC)

You're mistaken: this page is not supposed to be about any particular statistical test. There are various statistical tests that rely on Student's t-distribution, and there are separate articles about those. Michael Hardy 20:01, 26 February 2007 (UTC)

Try Student's t-test. Michael Hardy 20:02, 26 February 2007 (UTC)

... also, could you be SPECIFIC about which parts you're having trouble with? I've just looked at the parts on "occurence and specification" and "confidence intervals". The parts about the density function are asserted without any explanation of where they came from, but otherwise the two sections looked as if you don't need to know much to read them except things that are naturally prerequisites to this topic. (I still wonder if your problem is that you were looking for something on how to use this distribution in statistical tests, and failing to realize that's found in separate articles.) Michael Hardy 20:16, 26 February 2007 (UTC)

From Student's t-test: "To determine or calculate significance, see Student's t-distribution." This in contradiction to your assertion that the use of this distribution in statistical tests is explained in the articles for those tests. rah 15:18, 18 June 2007 (UTC)

[edit] Incorrect pdf Formula

I have just fixed a t-pdf function in the info-box, that was different from the t-pdf function in the text. They were both correct in essence, but the difference is confusing.

This is the formula in the text:

$f(t) = \frac{\Gamma((\nu+1)/2)}{\sqrt{\nu\pi\,}\,\Gamma(\nu/2)} (1+t^2/\nu)^{-(\nu+1)/2}$

This is the one I fixed:

$f(t) = \frac{\Gamma((\nu+1)/2)}{\sqrt{\nu\pi\,}\,\Gamma(\nu/2)\,(1+t^2/\nu)^{(\nu+1)/2}}$

Note the (-) sign in the exponent of the first function.

Dyaka 05:46, 14 March 2007 (UTC)

The pdf function in the text was not different from the one in the info-box; only the notation was different (just barely). There was certainly nothing "incorrect" in either (unless both were incorrect, in which case they still are---I'll look closely later). Michael Hardy 18:34, 14 March 2007 (UTC)

Yes, bad title, but I changed one of the formulas to avoid confusion. Dyaka 04:15, 22 March 2007 (UTC)

[edit] what does "t" stand for?

can anyone add the definition of t?

Jfermiller 17:25, 13 May 2007 (UTC)

Well, basically, t stands for the domain of the probability distribution: ${-\infty} < t < \infty$ ... much in the same way z stands for values of the domain of the [standardized] normal CDF and PDF, or $χ 2$ stands for domain values for the Chi-square distributions.

It is a dummy variable, in the sense that g or q could be used instead; the usage of "t" however, inmediatly suggests that the pertaining distribution is precisely Student's.
Pallida Mors 76 00:20, 5 November 2007 (UTC).

[edit] standardization

when v>2, the variance is defined which is v/(v-2), how to standardize it to one? Is this its pdf: : $f(t) = \frac{\Gamma((\nu+1)/2)}{\sqrt{(\nu-2)\pi\,}\,\Gamma(\nu/2)} (1+t^2/(\nu-2))^{-(\nu+1)/2}$ I hope someone can verify this and put it in the article. Jackzhp 15:05, 13 July 2007 (UTC)

Change scale on t. I trust this is covered under Normalization. Septentrionalis PMAnderson 15:37, 13 July 2007 (UTC)

Normalization is a disambiguation page with lots of entries: normalization in metallurgy, normalization in sociology, text normalization, maybe even normalization of diplomatic relations. Try normalizing constant. Michael Hardy 01:40, 17 July 2007 (UTC)

[edit] p-value and A(t | ν) ambiguous

I spent hours to figure out why the p-value equation has a 2 on the denominator. And finally get to the conclusion that the p-value here is a single-sided/single-tailed p-value while the A(t|v) is a double-sided/double-tailed probability. I would argue a cumulative distribution integrated from negative infinity is much closer to the convention. Or at least use that to introduce the two-sided A(t|v). And the sentence "For the statistic t, with ν degrees of freedom, A(t | ν) is the probability that t would be less than the observed value if the two means were the same" made it even more confusing because the sentence assumed t to be positive and did not give a definition of t. I assumed t was the same as the T in the previous sections, but it actually is its absolute value. I also argue a two-sided p-value is the "default" for many people, like in this article: [3] (in PDF). An absolute value "t" counting for both side used, why would you make the p-value single-sided??? —Preceding unsigned comment added by 128.227.105.227 (talk) 23:19, 27 September 2007 (UTC)

[edit] A(t|nu) discussion not very enlightening

I've edited the page to try to explain the relation between A(t|nu) and f(t). As noted above, it still needs t defined. This is actually done on the "Student t test" page, so it could do with a reference. The whole business about using the absolute value of t for the test based on A(t|nu) is still untidy, not helped by absence of discussion that 1-A(t|nu) is two-sided probability. But I don't have time for more. Also see Abramowitz and Stegun for definition of A(t|nu) and condition on relation to beta function. JohnPhysicist. Sorry, no Wikipedia account (or I've forgotten its name)77.99.31.195 21:58, 4 November 2007 (UTC)

[edit] Examples are always helpful

Can someone add a few examples, perhaps using the TTEST() function from MS Excel or with a simple set of data? I think this would add greatly the usability of your content.

[edit] Error in example after table?

Am I mistaken or is there an error in the example given of how to use the values from the table under "Table of Selected Values"? The text says the following:

For example, given a sample with a sample variance 2 and sample mean of 10, taken from a sample set of 11 (10 degrees of freedom), using the formula

$\overline{X}_n\pm A\frac{S_n}{\sqrt{n}}.$

We can determine that at 90% confidence, we have a true mean lying below

$10+1.37218 \frac{\sqrt{2}}{\sqrt{11}}=10.58510.$

But when I look at the table, the value (A) for 90% with 10 degrees of freedom should be 1.812, not 1.37218. The 1.372 value appears to correspond to 80%, not 90%.

I'm certainly not an expert in this area, so it's possible I'm misunderstanding something.. Am I reading the table wrong, or is this example using the wrong number? -- Foogod (talk) 00:21, 22 November 2007 (UTC)

Yes, the table is wrong, the T value quoted in the text is correct excapt for the other example. I've checked the T values using a calculator and also Mathematica, and it turnes out that the percentages are wrong: 80% should be 90%, 90% should be 95% etc. Count Iblis (talk) 17:59, 29 November 2007 (UTC)

Aha.. Thank you, that answers another question I was rather confused about (namely, if it is a one-tailed probability table, then shouldn't the 50% numbers, by definition, all be 0?) That table makes more sense now.. -- Foogod 01:41, 4 December 2007 (UTC)

[edit] Corrected error in table.

The listed probablities (first row of table) were wrong, I've corrected it. I suggest that we all check that it is correct and also check every single critical T value that islisted to make sure they are all correct. Count Iblis (talk) 19:00, 29 November 2007 (UTC)

This may partly be my responsibility - many apologies. I was trying to patch up some IP edits made on 14th November. When restoring the column headings I looked at (contrary to the article statements and original version prior to the IP edit) a two-sided distribution reference. Apologies again - second math error I've made in the past few weeks, on a small number of edits so my percentage rate is quite appalling. Asperal 20:11, 29 November 2007 (UTC)

[edit] please explain the formulas

This article may be too technical for a general audience.
Please help improve this article by providing more context and better explanations of technical details to make it more accessible, without removing technical details.

The introduction and the section "Why use the student's t-distribution?" are excellent but the rest of the article is difficult to understand. 69.140.159.215 (talk) 12:57, 12 January 2008 (UTC)

[edit] Table of Student t's

I just want to comment here that there is a mismatch between student t values and the confidence interval values. For sure the values under 97.5 % are the 95 % confidence interval student t's. These table headings should be corrected. Thanks. —Preceding unsigned comment added by Julescarlson (talk • contribs) 05:03, 29 February 2008 (UTC)

Oops... I think I see what you are indicating, that these are one sided student t's and I'm talking about 2-sided student t's. Maybe you could just make sure this is clear (or maybe it's just me). contribs) 05:03, 29 February 2008 (UTC) —Preceding unsigned comment added by Julescarlson (talk • contribs)

Hidden category: Wikipedia articles that are too technical

Talk:Student's t-distribution

From Wikipedia, the free encyclopedia

Contents

[edit] Miscellany

[edit] Should we put an index n

[edit] Definition of $ψ$ and $B$

[edit] Table

[edit] Missing definition of F_1 in table

[edit] Expected Mean

[edit] reasons for my recent reversion

[edit] Reason for reversion on 11-Jun-2006

[edit] Alternate forms of the t-table

[edit] comment moved from article

[edit] Better explanation

[edit] plots under student's t are not comprehensible

[edit] Redirect for t-value?

[edit] Explanation of 80% confidence interval

[edit] Confidence interval: 80% or 90%

[edit] Used Template:Abramowitz_Stegun_ref to make the A&S reference clickable

[edit] Slash in denominator: double division? Is the formula correct?

[edit] Remove external link to Shaw's paper about the capital-T statistic?

[edit] Clear definitions of the quantities in the table

[edit] Student did not actually present the t-statistic in 1908

[edit] Please comment if you have an opinion on the opening section

[edit] t-table

[edit] Article hard to understand?

[edit] Incorrect pdf Formula

[edit] what does "t" stand for?

[edit] standardization

[edit] p-value and A(t | ν) ambiguous

[edit] A(t|nu) discussion not very enlightening

[edit] Examples are always helpful

[edit] Error in example after table?

[edit] Corrected error in table.

[edit] please explain the formulas

[edit] Table of Student t's

Views

Navigation

Interaction

Search

Talk:Student's t-distribution

From Wikipedia, the free encyclopedia

Contents

[edit] Miscellany

[edit] Should we put an index n

[edit] Definition of ψ and B

[edit] Table

[edit] Missing definition of F_1 in table

[edit] Expected Mean

[edit] reasons for my recent reversion

[edit] Reason for reversion on 11-Jun-2006

[edit] Alternate forms of the t-table

[edit] comment moved from article

[edit] Better explanation

[edit] plots under student's t are not comprehensible

[edit] Redirect for t-value?

[edit] Explanation of 80% confidence interval

[edit] Confidence interval: 80% or 90%

[edit] Used Template:Abramowitz_Stegun_ref to make the A&S reference clickable

[edit] Slash in denominator: double division? Is the formula correct?

[edit] Remove external link to Shaw's paper about the capital-T statistic?

[edit] Clear definitions of the quantities in the table

[edit] Student did not actually present the t-statistic in 1908

[edit] Please comment if you have an opinion on the opening section

[edit] t-table

[edit] Article hard to understand?

[edit] Incorrect pdf Formula

[edit] what does "t" stand for?

[edit] standardization

[edit] p-value and A(t | ν) ambiguous

[edit] A(t|nu) discussion not very enlightening

[edit] Examples are always helpful

[edit] Error in example after table?

[edit] Corrected error in table.

[edit] please explain the formulas

[edit] Table of Student t's

Views

Navigation

Interaction

Search

[edit] Definition of $ψ$ and $B$