Talk:Misuse of statistics
From Wikipedia, the free encyclopedia
Contents |
[edit] Data Dredging
The paragraphs on data dredging seem completely reasonable to me. I checked them against the main data dredging page, and against my textbook.
However, I thought that it would be far too possible for people casually checking this page to use this section as reason to believe that this kind of analysis is always wrong, and that's just not true. So I added a short paragraph explaining the caveat: it's OK to do it as long as you check yourself!
This is the first time I've edited Wikipedia... hope I did it right.71.193.16.80 (talk) 17:53, 2 March 2008 (UTC)
[edit] Almost certain that this interpretation of the 95% C.I is wrong
"In marketing terms all a company has to do to promote a neutral (useless) product is to find or conduct, for example, 20 studies with a confidence level of 95%. Even if the product is really useless, on average one of the 20 studies will show a positive effect purely by chance (this is what a 95% level of confidence means)"
The 95% C.I refers to the "middle" 95% of the distribution; this means that the outer 5% of the data counts both over and under the interval. So to just consider the "unusually good" experiments would only be the top 2.5% of the sample.
Long story short, I think that on average, only 1 in 40 studies will show the positive effect
P.S: It's kind of ironic that an article about the misuse of statistics would (accidently) misuse statistics Akshayaj 21:31, 2 July 2007 (UTC)
[edit] Oversimplification of C.I (again)
"Data mining is the examination of large compilations of statistics in order to find a correlation. Since the required confidence interval to establish a relationship between two parameters is usually pegged at 95% (meaning that there is a 95% chance that two parameters are related), there is also a 5% chance to find a correlation between two sets of completely random variables."
Usually, a researcher says there is a significant relationship between two R.Vs if they can reject the Null Hypothesis, which means that there is <5% chance of this correlation showing up just by chance. While this is basically what the paragraph above is saying, I think it confuses the issue, as well as unfairly always pegs the error rate at 5%.
As I'm not really sure how this is different from the section I editted above, I'll delete it and let someone who knows the specifics of data mining edit it back in. Akshayaj 21:51, 2 July 2007 (UTC)
[edit] Update
I see the data mining section has been put back in. I'm pretty sure this setion is incorrect. For one thing, nobody ever uses Confidence Intervals to establish correlations, but rather uses R^2. The idea that there's a 5% chance that two independent variables show correlation "just by chance" is wrong
However, I'm no expert on data mining, so I'll defer to the author of the paragraph and ask for another poster to confirm or deny. Akshayaj 19:09, 18 July 2007 (UTC)
[edit] Good book to read
The book _Lies, damn lies, and statistics_ is a good reading on misuse of statistics, as commonly used to promote politicians, policies, products, ideologies, and medical ideas regardless of their merit.
From Talk:Misuse Of Statistics:
The latest presidential election might provide some food for thought about the misuse of statistics.
What do you think of the exit polls number not matching the vote on voting machines? Is the wide discrepancy proving anything? Does those exit poll numbers amount to a good statistic misused, is it flawed, or is it good?
So many ways to misuse a statistic. So little time! (-;
[edit] bad example
" With a subject on which the general public has no personal knowledge of, you can fool a lot of people. For example you can say on TV "Most autistics are hopelessly incurable if raised without parents or normal education" and many people will only remember the first part of the claim, "Most autistics are hopelessly incurable". "
The suggestion that autism is curable is itself arguable. Whether hopelessly or otherwise - the behavioural symptoms can be addressed, but the underlying condition cannot be cured.
Unfortunately I can't think of a better alternative - but this is a wrong 'un.
[edit] Additional info???
I am wondering if there might be a box or section to devoted to specific procedures for DIAGNOSING/CURING misuses? Or does that belong as a whole different entry like: "Detecting Misuse of Statistics"?
[edit] Quality of the article and NPOV
Obviously there are unlimited examples of people with something to sell abusing statistics, but lets keep the examples here off of hot button issues, there's no benefit to it and people can debate contemporary topics under the appropriate topic headings. The Michael Fumento bit reads like an example of the very thing it's attempting to illustrate: selective reporting. And its citation is a dead link. --DKEdwards 21:12, 21 November 2006 (UTC)