Talk:Relative risk

From Wikipedia, the free encyclopedia

This article is within the scope of WikiProject Statistics, which collaborates to improve Wikipedia's coverage of statistics. If you would like to participate, please visit the project page.

Contents

[edit] Signal-Noise approach

I'm uncomfortable with the signal-noise ratio approach to significance. I realise it maps conceptually pretty well onto significance, but it really belongs to another domain and will not clarify matters for people used to thinking in stats terms rather than signal processing terms. The issue here is one of sampling variability, and it should be presented in those terms. BrendanH 12:34, 8 March 2006 (UTC)

BrendanH wrote: ....will not clarify matters for people used to thinking in stats terms rather than signal processing terms
Signal-to-noise (SNR) is important in all measurements. It is rational to discuss it in connection with significance, a view I think is consistent with Sackett's and the paper of his I reference.
Sackett DL. Why randomized controlled trials fail but needn't: 2. Failure to employ physiological statistics, or the only formula a clinician-trialist is ever likely to need (or understand!). CMAJ. 2001 Oct 30;165(9):1226-37. PMID 11706914. Free Full Text.
Sackett, in case you’re wondering, is one of the fathers of evidence-based medicine. His work together with Gordon Guyatt has been very influential in shaping the practise of medicine in the past 15 years. I think it is fair to say he is not an expert in signal processing--he is, however, a trialist and he sees SNR as an essential part of determining whether there are differences between populations.
I think SNR is poorly understood and, unfortunately, not taught despite that it is essential for analysing different systems/comparing populations.
Variability in sampling is only one factor that plays into significance. It depends on the noise and sampling error. It is possible to have very small variability and a statistically insignificant result (if the difference between the means of the populations being compared (the signal) is small). It is possible to have a high variability in the samples (i.e. a poor estimate of the mean and standard deviation) of two populations one is comparing and still have a significant result (if the means are far apart and the standard deviations do not overlap (high signal)). Sampling variability changes the estimates of the mean and standard deviation. While the variability can without doubt influence significance it is only part of the picture. The above stated, I think a discussion of the signal-to-noise ratio is essential.
It is possible that my explanation is lacking and I encourage you to edit as you see fit. I think a good example (e.g. ASA (aspirin) use for heart attacks) is missing. That said, I think the theoretical basis is solid. If you disagree after reading the above I encourage you to read Sackett's paper. If you still disagree after that-- explain yourself here and supply easy to understand references. I look forward to your comments and edits. Nephron 03:24, 9 March 2006 (UTC)
Very brief comment: I read Sackett's paper as providing a quick and easy way of understanding significance without getting lost in statistical inference. That can be very effective for certain audiences (esp. clinical practitioners) but I think the conventional approach would be more adequate here. However, I don't have time to implement anything just yet. BrendanH 09:45, 9 March 2006 (UTC)
Actually, looking at the section again (and thinking how I'd change it) it seems to me that it has nothing to do with relative risk per se, but is a digression on statistical significance (which applies to any sort of statistical measure). In so far as it's useful, the material should relate to the Statistical significance article (maybe an article of its own, with a bit more than Sackett's "just think of it this way" heuristics, because there is value in viewing stats from the SNR point of view (and it is much more complex than suggested here)). In so far as this article needs to mention significance (as part of the Brignell nonsense), it should link to Statistical significance. BrendanH 17:02, 9 March 2006 (UTC)
I agree that signficance is somewhat of a digression but (like you said) you need to talk about it to debunk Brignell's baloney. That said, I think it is appropriate to have some duplication (which is alright 'cause Wikipedia, unlike a paper encyclopedia, doesn't have a space limitation). Any case, some of the stuff I wrote should probably be reproduced in the statistical significance article (which I've noted is quite sparse and could be improved with a discussion about SNR). Nephron 05:31, 10 March 2006 (UTC)

I have re-ordered the Size/Relevance and Significance/SNR sections to make the latter a subheading of the former, rather than vice versa. I think this reflects the importance to the general argument better (i.e. we're interested in significance because of the size/relevance argument). BrendanH 10:16, 10 March 2006 (UTC)

Revisiting the issue-- I think the SNR is important as it explains the relationship between confidence and signal (effect size). As I point-out below-- if the signal is smaller one is less confident. Brignell's argument essentially is
It's difficult to show a difference if the differences are small and thus small differences (i.e. small relative risk values) should be discounted.
Once the fallacy is clearly explained. The argument turns into one about how small the confidence intervals should be and whether it is worth pursuing a smaller RR vs. a larger one. Nephron 07:22, 15 March 2006 (UTC)

[edit] The fallacious argument that because of publication bias we ought ignore lower RR values

I hope that someone will edit this section to reflect perspectives from different fields that use stats such as RR's. This section currently appears to be based upon a very narrow area of study, such as in small clinical trials. I work with demographics and public health issues, where a required RR of 2 would be non-sensical for most of our studies. I suspect this mirrors a difference in outlook between researchers that work with narrow, small, well defined groups as opposed to those whose data is not from larger populations. For example, herd immunity levels for most childhood diseases are between 80% to 95%; whereas many US populations of children have vaccination rates of below 70-75%. An intervention that raised the vaccination rate from 70% to 90% would be lauded as wildly successful, (we have no such interventions by the way), even though this has a low RR. If an RR of 2 was a requirement, yes, I'd start trying to pick a subpopulation that was justifiable as not a-priori likely to have gotten vaccinations without the intervention. The game of picking or recruiting subpopulations is far more likely to introduce bias than is the use of a whole population and a lower RR. Yet I do recognize that in areas such as drug trials, where a statistically improved if substantially meaningless improvement in outcome is usually the basis for the introduction of new, more expensive medicines, this is a meaningful issue. 170.104.105.237 23:51, 12 April 2007 (UTC)SG Robison


I'm very uncomfortable with this notion of RR>2 is a must. I can agree that would we have a significance of exactly 95% we might apply stricter criteria to RR; however, through my work I'd say that in reality confidence for RR can easily be 99% (and it is normally reported). Should we just ignore the effect only because it happens to be less than 2? As far as this go, instead of increasing the sample I could a) make a bunch of subsamples and b) report a subsample which happens to have RR at 2.0 with confidence, say, 95% sharp. Approach "look at the effect size only" is a real no-no, in my opition - Cyberodin

Engjs - I agree that publication bias can influence the significance. That said, I have no proof John Brignell made that argument. What I did find is:

"A relative risk of 1.5 is not acceptable as significant" [1][2] (not true -- if confidence intervals are narrow) and a lengthy discussion about how high relative risk values are significant (usually true -- but depends on the confidence intervals).

Any case, I'll address the fallacy your argument. You argue that its difficult to show a difference if the differences are small and then use that to say a small relative risk should be discounted. I agree that smaller differences are harder to measure-- that doesn't mean they are irrelevant-- if they are measured accurately (i.e. the confidence internal is small).

If a million people smoke and they are compared to a million non-smokers and a million passive smokers and the absolute risk of lung ca is 1/40 and the RR of smokers is 20 and passive smokers is 1.5. (I've just made-up the RR numbers-- but I imagine they are in the ballpark.)

  • 25,000 non-smokers will get lung ca
  • 500,000 smokers will get lung ca (50,000*20=25,000)
  • 37,500 passive smokers will get lung ca (25,000*1.5=37,500)

The RR is essentially the impact size. As for relevance-- you get the most bang for the buck if you get the smokers to quit. The passive smokers-- there is less bang for the buck if they stopped passively smoking (i.e. got their partner to smoke outside). The argument that a RR of 1.5 is irrelevant is clearly fallacious (if the confidence interval is very narrow and we're sure the RR is close to 1.5). The difference between passive smokers and non-smokers is 12,500 extra cases of lung ca-- in my opinion clearly not irrelevant!

Now lets talk confidence intervals -- it is a function primarily of the sample size. If the sample size is large you can be pretty darn sure something is happening... and for a large population it means a lotsa extra death (as the above example illustrates). Nephron 06:11, 15 March 2006 (UTC)

I support Nephron's deletion of Engjs's argument, in particular his claim that an RR of over 2 is required for significance. This is simply a misunderstanding of the concept of significance. Significance is orthogonal to effect size. BrendanH 09:07, 15 March 2006 (UTC)

Firstly, I am reporting the argument Brignell makes in his book, Sorry Wrong Number, not what can be gleaned from his website. His website is written under the assumption that the reader will have read his book, and he does not detail everything there as it is presented in his book. If you want to check what he says, you need to read his book. What I have presented is verifiable, but not by looking on the web.

People have read the website and made assumptions about what he is saying that are simply not true. The cases where he applies his relative risk rule of thumb are ones where researchers study 1063 people, find 16 with the factor where they expected 9, and report a relative risk of 1.78, but avoid reporting the underlying numbers. You do the maths. Or studies where they examine 10,000 people, expect 44, find 30, 42, 57, and 66, and report only the last as being significant with a relative risk of 1.5, but don't factor in that they've done four tests rather than just one. He even gives the example of using rbinom in mathcad to generate two lots of random numbers with mean and standard deviation similar to those in the studies he criticises, and then compares them case by case looking for significance and relative risk, showing just what sort of numbers you can get when there cannot be any link. If you can get a relative risk of 1.64 from random noise, and a similar study finds a relative risk of 1.43, what does that say about the study?

The value the EPA calculated for RR of lung cancer from passive smoking is 1.19. "If a million people smoke..." I picked out one of the better (tier 1) studies in the report at random, the A.4.18 KALA study. It states "Marriage of a nonsmoking woman to a smoker was associated with a relative risk for lung cancer of 2.1 (95% C.I. = 1.1, 4.1)". 64 of 90 women with lung cancer were exposed to passive smoking, compared to 70 of 116 cancer free controls who were also exposed. This is not a million people, it is 206. Again, you do the maths, and then try to figure out how they got this figure.

Brignell criticised the study for a number of things: applying meta-analysis to studies that include a lot of noise, changing the confidence level to 90% after finding no significance at 95%, excluding at least one study that found significance the other way, and ignoring about 20 confounding factors, and then only finding a relative risk of 1.19.

"Significance is orthogonal to effect size." In a Poisson distribution the standard deviation is the square root of the mean. If you know the mean, that alone determines the shape of the binomial curve. So if you know the excess number of cases you know the area under the curve and therefore you know the significance. Similarly, if you know the mean and the excess number of cases you know the relative risk. So significance and relative risk are not independent. In a standard binomial distribution things are different, because the standard deviation is not a function of the mean. Brignell is talking about studies based on Poisson distributions. Engjs 13:55, 15 March 2006 (UTC)

In much research, the Poisson distribution is found to underestimate variation (overdispersion), and whether the variance is equal to the mean is an empirical question, not a theoretical or methodological one. You can't just use the Poisson distribution, you also have to test that your model fits properly. And you're forgetting that the significance is based on the standard error, not the standard deviation (spot the difference? it's an important one). I'm not sure whether you have a poor grasp of the concept or are getting carried away in the argument, but significance is orthogonal to effect size. BrendanH 15:05, 15 March 2006 (UTC)


I was taught statistics as part of a maths degree, from first principles, starting with set theory and working up to deriving the equations for the various tests; I wasn't just taught how to use a bunch of equations, so my take on things may be different to yours. In particular I'm not used to working in confidence intervals but rather in terms of probability.

The probability associated with an event is the value of the probability function at that point. The probability of an interval is the area under the probability function for that interval. For example the probability that x>=6 is the area under the probability function from 6 to infinity. The total area under the function is 1.0. For discrete functions the probability of an interval is a summation rather than an integral. When you perform a statistical test you are calculating what the area under the curve is for that part of the curve greater (or lesser) than your found value. That is what the p-value is. Another, equivalent way of putting it is that the p-value is the probability of getting an event as rare or rarer than what you saw. If the p-value you get is .025, that means that there was a 1/40 chance of getting the event you saw just by chance. The test part is that you accept or reject that the event occured just by chance based on the probability you see. You can do that by comparing it with a confidence level, you can do it by calculating confidence intervals, or whatever. The basis is always that you are saying that the event is too rare to have happened by chance.

In a binomial distribution the shape of the curve depends on the mean and moments of the data set. A Poisson distribution is a binomial distribution where the underlying p or q is relatively small and the sample size relatively high. When that happens, the binomial terms include powers of small numbers and so go to zero, causing the function to collapse on itself. A side effect is that the moments of the function become functions of the mean, so the shape of the curve is only dependent on the mean. If you know the mean you can calculate both the p-value and the relative risk for whatever number of samples you found.

As the mean increases, you move away from a Poisson distribution towards a normal distribution and the relationship no longer holds good. Also, if the underlying distribution is not a Poisson distribution the relationship will not hold good. Brignell is looking at studies where the underlying distribution is a Poisson distribution, where the number of cases is small and where the CI's are wide.

In any binomial distribution, 95% of events fall within 2 standard deviations of the mean. So if you know the mean and the standard deviation (or for a Poisson distribution just the mean) you can calculate exactly how many extra samples you need to find significance at a 95% confidence level. And from that you can calculate the relative risk at which significance will be found.

But all this is missing the point. The people who created the sourcewatch article quoted above have not understood the argument that Brignell makes, have misrepresented it, and then attacked the misrepresentation. Surely that's not appropiate in a forum like wikipedia, and it needs to be corrected.

"I agree that publication bias can influence the significance. That said, I have no proof John Brignell made that argument." See [3] and [4].

It also occurs to me that the word significant is being used here with two different meanings. There is significant as in the p-value is less than 0.05, and there is significant as in the study has found a worthwhile result. Brignell is using the word in the latter sense, not the former, arguing that the significance found is these sort of studies is mostly due to Type I errors, and is therefore not significant. Engjs 00:44, 16 March 2006 (UTC)


I'm with BrendanH about significance and effect size-- they CANNOT be equated.
As the mean increases, you move away from a Poisson distribution towards a normal distribution and the relationship no longer holds good. Also, if the underlying distribution is not a Poisson distribution the relationship will not hold good. Brignell is looking at studies where the underlying distribution is a Poisson distribution, where the number of cases is small and where the CI's are wide.
AFAIK most studies, in medicine, the mean and standard deviation of the populations are not related. So, the argument sounds pretty hooky to me. As for the latter part-- I don't think you have a Poisson distribution by virtue of a wide CI. The large CI is a result of (1) random noise (2) sampling error (the impact of both can be reduced with a larger sample size).
I don't dispute that there are a large number of weak studies out there. Also, I don't dispute that, at times, some people jump to conclusions based on small data sets. What I take issue with is that small effects don't matter. I think they clearly do matter-- not as much as large effects... but they still matter.
A few weaker studies aside-- your argument boils down to ... it's difficult to show a difference if the differences are small and thus small differences (i.e. small relative risk values) should be discounted.
The first part is true the conclusion is NOT. Also, if I take the RR of 1.2 -- (in the above example) there are 5000 extra cases due to passive smoking. The point of my example (above) was to direct things to what are important-- that is the inference to the larger population. A RR of 1.2 may seem small but if you apply it to large populations -- it is a lot of unnecessary death. The US population is 300 million-- if you assume 50 million of those are passive smokers and you assume the risk of lung ca in non-smokers is 1/40 and the RR of passive smoking 1.2 you get 1,5 million cases of cancer in that 50 million-- 250,000 additional cancer cases... cases of cancer that could be prevented if the people didn't smoke passively (and several hundred thousand deaths-- 'cause lung ca has a very high mortality (unlike breast ca.)).
We can argue about whether the RR is 1.1 or 1.2 or 1.0 (i.e. not significant)-- and confidence intervals. That said, if the relative risk is 1.2 (and the confidence interval narrow e.g. 1.13-1.36 P<0.001 like in this meta-analysis: Copas JB, Shi JQ. Reanalysis of epidemiological evidence on lung cancer and passive smoking. BMJ. 2000 Feb 12;320(7232):417-8. PMID 10669446.)--it has a significant impact on the population and is worthy of action. Type I errors are related to the CI. I think Sackett's paper may be worth a read if you're not familiar with it.
Sackett DL. Why randomized controlled trials fail but needn't: 2. Failure to employ physiological statistics, or the only formula a clinician-trialist is ever likely to need (or understand!). CMAJ. 2001 Oct 30;165(9):1226-37. PMID 11706914. Free Full Text.
A good book on stats is Biostatistics: The Bare Essentials 2/E, 2nd Edition David L. Streiner, Geoffrey R. Norman B C Decker Inc. ISBN 1550091239. Most of my general biostats knowledge derives from that. I have done a course is statistical process control. I look forward to your future comments. Nephron 07:00, 16 March 2006 (UTC)
AFAIK most studies, in medicine, the mean and standard deviation of the populations are not related. So, the argument sounds pretty hooky to me. As for the latter part-- I don't think you have a Poisson distribution by virtue of a wide CI. The large CI is a result of (1) random noise (2) sampling error (the impact of both can be reduced with a larger sample size).

A Poisson distribution is a binomial distribution where the sample size is large and the probability of an event is small.[5] More properly, it's the limit of the binomial distribution when the sample size tends to infinity and the probability of an event tends to zero while their product doesn't change, but for big and small values it's a very good approximation. When you have big and small values, terms in the binomial expansion which include powers of small numbers collapse and leave you with a simpler distribution. If you examine 50,000 people and find 50 cases, then your sample size is large (50,000) and your event probability is small (1/1000), so the results are in a Poisson distribution. If the underlying distribution is not binomial, then of course your data won't be in a Poisson distribution, but then most data is binomial (or normal for real data). How many studies have you seen that are based on examining thousands of cases to find the handful that have a particular condition? Those studies are of data that is almost certainly in a Poisson distribution.

One of the neat things about Poisson distributions is that the variance is equal to the the mean (see equation (28) in the above reference). And given that for any binomial distribution 95% of events fall within 2 standard deviations of the mean, that makes it easy to calculate from the mean how many extra cases you need for significance. For the above example, if you break your 50 cases into five sets, you expect 10 cases in each. For 95% significance you then need more than 16 or less than 4 cases in a set.

"What I take issue with is that small effects don't matter." Small effects are acceptable where the sample size is large or where the confidence level is high. If you do a test on 50 cases at 99% and find significance that's fine. If you do a test on 500 cases at 95% and find significance that's fine. Brignell makes that quite clear in his books, and on his website if you know where to look. If you find significance on 20 cases with 95% that's where things start to look dodgy.

"if the relative risk is 1.2 (and the confidence interval narrow e.g. 1.13-1.36 P<0.001" If this were a single study, Brignell would call it a good study. As it's a meta-analysis, Brignell would want to look at the underlying studies. Engjs 12:22, 16 March 2006 (UTC)


Engjs writes: It also occurs to me that the word significant is being used here with two different meanings. There is significant as in the p-value is less than 0.05, and there is significant as in the study has found a worthwhile result. Brignell is using the word in the latter sense, not the former. If Brignell is using the word "significant" in a statistical context, but meaning something entirely different from its very specific technical definition, he is going to be misunderstood, and it is entirely his fault. BrendanH 10:35, 16 March 2006 (UTC)
Nevertheless, if you accept that he has been misunderstood, you have to accept that the attempt to demolish a claim he never made is inappropriate and shouldn't be here. I can'tm change what is here because any changes I make will be reverted. Engjs 11:16, 16 March 2006 (UTC)
I think the first five paragraphs in the Relative risk#Size of relative risk and relevance section are a clear, correct and adequate summary of the argument. Brignell (and Milloy) have made these claims, so it is correct to mention them. A far more intellectually coherent argument is coming from Ioannidis, and that deservedly gets more space. If there is to be an argument about Brignell it should be at his page, not here. BrendanH 11:27, 16 March 2006 (UTC)
Have you read the following [6], under "Why RR>2.0?". This is where Brignell explains why he criticises RR<2.0. Note the sentence "If reasonable levels of significance were observed (RR>2, P<0.01) there would be virtually none of the contradictions." How can you possibly read this as an argument for replacing a criteria based on significance with one based on relative risk? It's a call for extra stringency, not for different criteria. Then note the following "For those who did not follow the link on RR above, the statement about the unacceptability of 1.5 applies to observational studies, not necessarily to properly randomised double blind surveys that produce a highly significant result." [7] He is saying that it should only be applied to weak studies. The article as it currently is clearly is misrepresenting his views on the subject. Engjs 12:56, 16 March 2006 (UTC)
If justice needs to be done to Brignell, it should be done at John Brignell. He's only a bit player in this article, and features only because he has made bald claims about RR<2. The fact that he has outlined his position in a more sophisticated way at other times is not particularly relevant here (but would be at his own page). Moreover, he's not really talking about relative risk, but rather effect size and observational data in general, and uses RR simply as a common and convenient example, which is another reason for dealing with the controversy elsewhere. BrendanH 13:14, 16 March 2006 (UTC)


Look, all you need to do to fix the thing is to change the words "have argued instead for a requirement that the point estimate of RR should exceed 2" to "have argued in addition for a requirement that the point estimate of RR should exceed 2 when 95% confidence levels are used". Then what you are saying is accurate. Is that too much to ask? Engjs 14:21, 16 March 2006 (UTC)

I don't think the contributors so far have grasped the problem. The problem is not the Relative Risk per se but the data behind it. A high relative risk is meant to guard against the acceptance of a weak alternative hypothesis. However, it cannot completely remove such a problem.

Simply saying that smaller confidence intervals and larger sample sizes make up for this is misleading. For example, observing a misleading correlation 1,000 times as opposed to 100 times will create narrower confidence intervals and vastly increase the likelihood of a significant result. This, however, does not mitigate the fact the misleading correlation is still there. The larger sample size merely amplifies it.

No matter what, you can't make 'scientific' a poorly designed study, where possible confounding effects are not controlled for, 'scientific' by virtue of statistical sleights of hand. The issue of Relative Risk needs to be examined from the point of view primarily of data collection and not of data analysis. Mixino1 22:46, 11 November 2006 (UTC)

This point is already made in the article, in the following para
In addition, if estimates are biased by the exclusion of relevant factors, the likelihood of a spurious finding of significance is greater if the estimated RR is close to 1. In his paper "Why Most Published Research Findings Are False" [4], John Ioannidis writes "The smaller the effect sizes in a scientific field, the less likely the research findings are to be true. [...] research findings are more likely true in scientific fields with [...] relative risks 3–20 [...], than in scientific fields where postulated effects are small [...] (relative risks 1.1–1.5)." "if the majority of true genetic or nutritional determinants of complex diseases confer relative risks less than 1.05, genetic or nutritional epidemiology would be largely utopian endeavors."
I get the impression from this and your other comment that you have been looking at a very old version of the articleJQ 02:06, 12 November 2006 (UTC)
There is no perspective in the article as it stands. The layman still won't know the relative risk of passive smoking is around 1.16. The article merely states "The issue has arisen particularly in relation to debates about the effects of passive smoking, where the effect size appears to be small (relative to smoking), and exposure levels are difficult to quantify in the affected population." This is wholly misleading. The Relative Risk of passive smoking is small compared to anything in general - not just smoking. It then goes on to say "(In the case of lung cancer, however, the base risk is substantial)." which also doesn't give a point of reference to whether the smoking is passive or first-hand. The article then goes on to make an entirely spurious claim "A relative risk of 1.10 may seem very small, but over a large number of patients will make a noticeable difference. Whether a given treatment is considered a worthy endeavour is dependent on the risks, benefits and costs.". Evidence? Where is the benefit in treating a confounding correlation? The article seems deliberately vague. Mixino1 12:23, 12 November 2006 (UTC)
If you think confounding variables are a problem in studies of passive smoking you should probably take the question up in that article. Since there's a lot of bogus material from tobacco lobbyists in this debate, be sure to cite articles from reputable sources to support your view. Since (as that article shows) there is currently general agreement by scientific authorities that the link between passive smoking and cancer (and other problems) is real, that's the basis on which the question is treated here. The example of passive smoking demonstrates that there is no general rule saying that studies with RR<2 should be discarded.JQ 20:27, 12 November 2006 (UTC)
No. It is a problem throughout epidemiology. The drop in RR, caused by making passive smoking look dangerous, has caused all kinds of things to appear to become 'dangerous' that clearly are not. Stop trying to pretend this is an isolated issue. You state "Since (as that article shows) there is currently general agreement by scientific authorities that the link between passive smoking and cancer (and other problems) is real" although you know well enough this is conjecture not supported by statistical analysis. As a statistician, I really couldn't care if there is a consensus in what non-statisticians think about statistical evidence. All ETS evidence is statistical and, as such, statisticians should have the last say, not medical doctors or wishful thinkers. Mixino1 21:02, 12 November 2006 (UTC)

[edit] Nephron's reversion of Engjs's defence of Brignell

I have to say I support Nephron's reversion of Engjs's paragraph: This is a misrepresentation. What John Brignell has suggested is that where statistical significance is found in studies based on small numbers of actual cases, the probability of the significance being due to publication bias or other factors rather than an actual link is fairly high. Hence some proportion of such published studies are reporting false significance; some estimate well over half of them. Those studies most likely to be affected are those that find the weakest levels of significance, so such studies should be rejected. For Poisson distributions where the mean expected value is known or can be estimated, relative risk is an estimate of statistical significance, and so can be used as a basis for rejecting studies based on weak levels of significance. Hence Brignell suggests the above rule of thumb as a guide in deciding which studies to reject.

It may well be the case that Brignell has a more balanced statement of the RR>2 rule than the examples I have seen, but it is the case that he has made fairly bald claims about it from time to time, and it is echoed in the public domain. However, Engjs's paragraphs also needs to be rejected on the basis of incoherence about the Poisson distribution. First, were a statistic Poisson distributed, though we know its estimated standard deviation from the mean we do not know its signficance without knowing the sample size (needed to calculate the standard deviation). Thus it is simply incorrect that "relative risk is an estimate of statistical significance". Moreover, in a Poisson regression the relative risk parameter does not have a Poisson sampling distribution, though the dependent variable does. Not only does the standard error depend on the sample size, it also depends on the rates in the two groups, both their levels and their ratio.

There is one good argument in favour of large effect sizes with observational data, and it is that subtle unanticipated confounding processes are less plausible than they are with small effect sizes (this is a paraphrase from memory from Rosenbaum, Observational Data, Springer). Another point for the record: if one finds standards of proof too low, it is far more satisfactory to insist on higher confidence levels, for instance p<=0.01 instead of p<=0.05. I've done some simulations with Poisson distributed data and this excludes many more results than insisting on RR>2, and does so in a way that is statistically more satisfactory. BrendanH 21:04, 19 March 2006 (UTC)

The article as currently presented implies that John Brignell favours replacing significance testing with a test based on RR>2.0. This is a false implication; he has never made such a claim. It is perfectly clear from reading his website that he chooses to apply such a test only where significance has been found, and only when it has een found at the 95% level or worse. Until this is changed the article is conveying false information. If you do not fix this I will tag the article as containing errors of fact and list the matter for mediation. Engjs 23:43, 19 March 2006 (UTC)
BrendanH has done a fairly good job at explaining things here. I will, however, address one point Engjs made more fully-- they stated:
Look, all you need to do to fix the thing is to change the words "have argued instead for a requirement that the point estimate of RR should exceed 2" to "have argued in addition for a requirement that the point estimate of RR should exceed 2 when 95% confidence levels are used". Then what you are saying is accurate. Is that too much to ask?
I think this is very much to ask. A study that has a RR>2 isn't necessarily more robust statistically than one that has a RR<2 -- it depends on the P value. I think this clear if you've examined Sackett's paper, specifically the equation:
confidence=signal/noise*sqrt(sample size).
The above said, if I were to buy into Engjs' logic we'd have to change how we deal with linear correlations where the slope is quite small. With the same sample size, it is harder to show a linear association (y=mx+b) with a small slope (m)-- this follows from Sackett's equation. Like BrendanH stated earlier-- it is the P value that matters. Stated differently, RR=1.2 & P<0.01 is less likely to be by chance than RR=20 & P=0.05. Also, RR=1.2 & P=0.05 is less likely to be by chance than RR=20 & P=0.06.
Engjs argument essentially is--
... it is harder to show small effect size therefore we should have a higher standard for small effect sizes.
The first part is true-- never disputed that. That said, the argument doesn't make sense. The P value reflects the probability of getting a result that isn't there. RR=20 & P=0.05 and RR=1.5 & P=0.05 are equally likely to be a chance occurance-- or do you disagree? Nephron 23:56, 19 March 2006 (UTC)
I think we may be talking at cross-purposes here. If I understand Engjs correctly, he sees Brignell as requiring both a P value <0.05 and an effect size RR>2, whereas the text made it appear as if Brignell only wanted RR>2. Although you can find statements from Brignell to support either interpretation, this one seems more plausible since it lets him reject more studies. I've made a change that reflects this interpretation. JQ 01:07, 20 March 2006 (UTC)

The references you have given to Brignell's site are to (a) his blog and (b) a wiki run at the behest of a pair of pro-environmental zealots. Neither are good sources. I am replacing them with a link to the page where he outlines his opinion on relative risk. You need to provide a link to show that Milloy has offered the same opinion. Engjs 02:38, 20 March 2006 (UTC)

"The issue has arisen particularly in relation to debates about the effects of passive smoking, where the difficulty of distinguishing levels of exposure means that typical estimates of RR are less than 2." I can't comment on Milloy, but Brignell applies this standard to lots of different issues, not just passive smoking. The lacing of the argument with constant references to the issue of passive smoking can only be seen as a needless attempt to stir up a prejudiced reaction from the reader, making the article POV. They are extraneous to the argument and should be removed. Engjs 02:46, 20 March 2006 (UTC)

This is a simple statement of fact about the RR>2 controversy. Whatever Brignell may or may not have written, most of the attention has been on passive smoking, not surprisingly because there's big money at stake and Milloy's whole organisation is based on tobacco money. If you think there's another issue where the question is equally or more important, add it in. BTW, I've added the requested link to Milloy JQ 06:42, 20 March 2006 (UTC)

I read Milloy's article and can see why you didn't like it. Like Brignell, he's writing for the masses rather than for academia. It's interesting that the very next chapter talks about statistical significance. I've added an extra phrase to the comment, which I know to be true in Brignell's case, though I can't vouch for it in Milloy's.

[edit] Ad hominem titles

BrendanH: this encyclopedia has credibility problems as it is, without you adding to the problem by giving Steven Milloy the soubriquet of "tobacco lobbyist". Steve Milloy is a biostatistician. If this wiki needs anything, it needs tasteless innuendo against people, who are not here to defend themselves, to be removed. If you don't like his politics, take it to him personally and keep it off Wikipedia.

I didn't introduce the term -- check the history. Moreover, it is factually correct, so I object to its removal by an anonymous contributor. As for being a biostatistician, I can believe he has training in that discipline but it seems to me he is primarily a communicator, producing funded PR from an industry perspective. I've made enough reactive changes to this article for today so I'll leave it for now, but it certainly seems to me that "tobacco lobbyist" is more appropriate than "biostatistician". BrendanH 20:00, 20 March 2006 (UTC)
I deleted the description altogether. Milloy's work as a tobacco lobbyist is amply demonstrated by the linked article, so no real need to refer to it here, I guess. JQ 20:20, 20 March 2006 (UTC)


BrendanH, I think we're all used to people with non-PC views being labelled as anything but their correct title. A term like "tobacco lobbyist" is obviously more useful to discredit someone than "Biostatistician". The weak in science like straw men. The anti-science/pro-PC brigade will never accept that statisticians think they are talking nonsense. As a statistician myself, I refuse to accept this.Mixino1 01:37, 12 November 2006 (UTC)

I'm puzzled by this contribution to discussion. The description of Milloy as a tobacco lobbyist was deleted as unnecessary eight months ago, so why are you piling on now? JQ 02:02, 12 November 2006 (UTC)
He is now referred to as "the "Junk Science" commentator for FoxNews.com" not a "Biostatistician" (when you follow the link). Is being labelled "Fox News" infinitely better than being labelled "tobacco lobbyist"? To most people of a liberal bent, the "Fox News" label is meant to infer a lack of credibility. I suspect this is known and that's why it is used instead of his professional title. I realise this is probably more relevant on Milloy's page than here. Mixino1 12:48, 12 November 2006 (UTC)
This is his description because that's what he does for a living and what brings him to public prominence. Having, say, a physics degree does not make you a physicist, and neither does have a degree in biostatistics make you a biostatician. But, as you say, this is more relevant on Milloy's page. As regards credibility I'd say Milloy does more to detract from Fox's than vice versa. He scores a highly negative mention in Fox News Channel controversies. JQ 20:36, 12 November 2006 (UTC)
Another ad hominem attack to defend an ad hominem attack. You are clearly scraping the barrel. Mixino1 21:16, 12 November 2006 (UTC)

[edit] Deletion

I've felt unhappy about the list of "effects" from Science for a long time, because I think it exemplifies the problem of taking isolated results out of context. Small significant effects are important when they are (i) part of a well conducted analysis (proper controls, good measurement etc), (ii) have a good theoretical rationale, and (iii) are replicated in other analyses. The list seems arbitrary and some of the observed effects obviously need to be analysed more closely to see if there actually is a causal chain linking the effect to the outcome.

So I've deleted it. BrendanH 09:47, 21 March 2006 (UTC)

Fair enough. I moved all this stuff out of John Brignell, where it was part of a side debate there. I thought the Wikipedia process would deal with it appropriately and in due course. JQ 10:15, 21 March 2006 (UTC)
No offense to John, but I think it was a good edit. Nephron 00:49, 22 March 2006 (UTC)
No offence taken :-) JQ 06:10, 22 March 2006 (UTC)

[edit] Request for Mediation

This article was up for a Request for Mediation. The request was rejected (see RFM archive - entry for RR). Nephron  T|C 02:02, 11 June 2006 (UTC)

[edit] Please define "standard approach"

In the section titled "Size of relative risk and relevance", I come across the text Critics of the standard approach.... But the "standard approach" has not been defined or even mentioned elsewhere in the article so I am left scratching my head. One may infer from the following text that the "standard approach" does not require that the point estimate of RR should exceed 2, I suppose? It should be stated explicitly. What exactly does the standard approach require? --Yath 03:05, 25 September 2006 (UTC)

I agree. This is deliberately vague. It is meant to appear like there is a standardised approach used by epidemiologists and statisticians. The reason it is vague is because the Relative Risk level that publications find acceptable has fallen markedly over the last 10-15 years. This really has more to do with the drive to make passive smoking look like a risk rather than any improvement in the precision of the methodology used. It should really say something like "The modern use of relative risks close to one is criticised by...". Mixino1 13:28, 12 November 2006 (UTC)

[edit] Question about "The log of relative risk ... has an approximately normal distribution"

Is anyone able to tell me where I can find more information about this statement: "The log of relative risk is usually taken to have a sampling distribution that has an approximately normal distribution?" I am talking about relative risk in my senior thesis and am at a stand still until I can get proof of this statement. Any information about the distribution of relative risk would be helpful. Thank you. --Joyo711 16:58, 4 October 2007 (UTC)

This is a modelling assumption rather than a statement that can be proved true in general. You may find it useful to look at Logistic regression where the log-odds ratio is the LHS of a standard linear model, for which we usually assume that errors are (approximately) normal. There are various tests of normality that can be applied, maybe a text on limited dependent variables would say which is best here. Hope this helps.JQ 03:07, 7 October 2007 (UTC)