Talk:Conditional probability

From Wikipedia, the free encyclopedia

This article is within the scope of WikiProject Statistics, which collaborates to improve Wikipedia's coverage of statistics. If you would like to participate, please visit the project page.

Socrates This article is within the scope of the WikiProject Philosophy, which collaborates on articles related to philosophy. To participate, you can edit this article or visit the project page for more details.
Start This article has been rated as Start-Class on the quality scale.
Mid This article has been rated as mid-importance on the importance scale.
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, which collaborates on articles related to mathematics.
Mathematics rating: Start Class Mid Priority  Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.

I've struck out the sentence about decision trees. There is certainly no sense in which conditional probability calculations are generally easier with decision trees. Decision trees can indeed be interpreted as conditional probability models (or not), but in any event, they are a very, very small part of the world of conditional probability, and making an unwarranted assertion about a minor topic is out of place. Wile E. Heresiarch 17:13, 1 Feb 2004 (UTC)

Contents

[edit] Wrong?

Conditional probability is the probability of some event A, given that some other event, B, has already occurred

...

In these definitions, note that there need not be a causal or temporal relation between A and B. A may precede B, or vice versa, or they may happen at the same time.

This statement is totally confusing - if event B has already occurred, there has to be a temporal relation between A and B (i.e. B happens before A). --Abdull 12:50, 25 February 2006 (UTC)

I've reworded it. --Zundark 14:32, 25 February 2006 (UTC)
Great, thank you! --Abdull 11:24, 26 February 2006 (UTC)

Since the subject of the article is completely formal, I dislike the references to time, expressions like "temporal relation" or one event "preceeding" another, because I find them informal in this context. In the framework of probability space where we are working time is not formally introduced: what "time" does an event A take place? In fact, when we specifically want to represent or model how our knowledge of the world (represented by random variables) is growing as time passes, we can do it by means of filtrations. And I feel the same goes for the "causal relation", in the article such notion is not defined formally.--zeycus 15:22, 23 February 2007 (UTC)

The purpose of this paragraph is to dispel the common misconception that conditional probability has something to do with temporal relationships or causality. The paragraph is necessarily informal, as a probability space does not even have such concepts. (By the way, contrary to your suggestion on my Talk page, this paragraph was added by Wile E. Heresiarch on 10 February 2004. The rewording I mentioned above did not touch this paragraph, it simply removed incorrect suggestions of temporal relationships elsewhere in the article. All this can be seen from the edit history.) --Zundark 08:36, 24 February 2007 (UTC)
I apologize for attributing you the paragraph. I understand what you mean, but I think it is important to separate formal notions from informal ones. So I will add a short comment afterwards. --zeycus 9:42, 24 February 2007 (UTC)

[edit] Undefined or Indeterminate?

In the Other Considerations section, the statement If P(B) = 0, then P(A \mid B) is left undefined. seems incorrect. Is it not more correct to say that P(A \mid B) is indeterminate?

If P(B) = 0, then P(A \cap B) = 0 regardless of P(A) or P(A \mid B).

Bob Badour 04:36, 11 June 2006 (UTC)

It's undefined. If you think it's not undefined, then what do you think its definition is? --Zundark 08:54, 11 June 2006 (UTC)
Indeterminate as I said, the definition of which one would paraphrase to incalcuable or unknown. However, an indeterminate form can be undefined, and the consensus in the literature is to call the conditional undefined in the abovementioned case. There are probably reasons for treating it as undefined that I am unaware of, and changing the text in the article would be OR. Thank you for your comments, and I apologize for taking your time. -- Bob Badour 00:07, 12 June 2006 (UTC)

Something about this is bothering me. Suppose X is normal standard. I am considering A\equiv X=0 and B\equiv X\in\{0, 5\}, for example. Clearly P(A) = P(B) = 0. However, I feel that P(A\mid B) should be defined, and in fact equal to \frac{f(0)}{f(0)+f(5)} where f is the density function of X. In order to informally justify this, I would define Aε = ( − ε,ε) and B_\epsilon=(-\epsilon, \epsilon)\cup (5-\epsilon, 5+\epsilon) for any ε > 0. Then, if I am not wrong, \lim_{\epsilon\to 0^+}{P(A_\epsilon\mid B_\epsilon)} = \frac{f(0)}{f(0)+f(5)}\approx 0.999996273.

Suppose someone tells me that a number has been obtained from a normal standard variable, that it is 0 or 5, and that I have a chance for a double-nothing bet trying to guess which one of them it was. Shouldn't I bet for the 0? And how can I argument it, if not with the calculations above? Opinions are most welcome. What do you think? -- zeycus 18:36, 22 February 2007 (UTC)

I think you are absolutely right. However, the theory needed to obtain this is a lot more complicated than the theory needed to understand conditional probability as such. Should the article state clearly from the start that we are deling with discrete distributions only, and then perhaps have a last section dealing with generalization to continuous distributions?--Niels Ø (noe) 19:33, 22 February 2007 (UTC)
It's already valid for continuous distributions, at least now that I've cleaned up the definition section. It's not usual to define P(A|B) when P(B) = 0, but if someone can find a decent reference for this then it might be worth adding to the article. --Zundark 11:25, 24 February 2007 (UTC)
I was not able to find any source defining P(A\mid B) when P(B) = 0. I posted in a math forum, and after an interesting discussion someone gave a nice argument (a bit long to be copied here) justifying why it does not make sense. I consider the point clarified. --zeycus 15:30, 28 February 2007 (UTC)
I'm afraid these comments are almost entirely wrong. It is perfectly possible to condition on events of probability zero, and this is in fact common. Consider tossing a coin. If one does not know if the coin is fair or not, in the Bayesian world one assigns a probability distribution to the parameter p representing the probability of getting a head. This distribution reflects the degree of belief one has in the fairness of the coin. In the event that this distribution is continuous it is perfectly reasonable to condition on the event that p = 1 / 2, even though this event has probability zero. To define these conditional probabilities rigourously requires measure theory, and this approach agrees with the naive interpretation given in first level courses. A good reference is Basic Stochastic Processes by Zastawniak and Brzezniak. PoochieR 21:45, 6 November 2007 (UTC)
To repeat myself from above: Should the article state clearly from the start that we are deling with discrete distributions only, and then perhaps have a last section dealing with generalization to continuous distributions?--Niels Ø (noe) 12:31, 7 November 2007 (UTC)
No, because we aren't dealing only with discrete distributions. --Zundark 12:39, 7 November 2007 (UTC)
Some of the most important modern uses of conditional probability are in Martingale theory, with direct practical applications in all areas of mathematical finance. It is simply impossible to deal with these without conditioning on events of probability zero, so I think it's important that you should include these. A way round would be to make it clear that the definition you have given is a naive definition, which only works for conditioning on events with probability > 0; however to give the definition which works for conditioning on any event requires the use of measure theory. The measure theoretic definition agrees with the naive definition where that is applicable. The natural way to express the measure theoretic formulation is in terms of conditional expectations, conditional on sigma-algebras of events; in this formulation P(A | B) = E(I(A) | σ(I(B))), where I(A) is the indicator random variable of event A. A better reference than the one I gave before is: Probability with Martingales, David Williams, Ch.9. PoochieR 09:41, 8 November 2007 (UTC)
I am quite happy to edit your definition with respect to P(A | B) when P(B) = 0 but you cannot leave it as it is. The correct definition to make the discrete case correspond with the more general case is to define P(A | B) = 0 when P(B) = 0. There are no problems then with the naive interpretation, and the benefit of agreeing with the more sophisticated approach. In many ways it is similar to the debates that used to go on regarding 0!.PoochieR 18:16, 15 November 2007 (UTC)

This is a general encyclopedia. I think it's important to write a readable and accessible article, as far as possible, and as a matter of presentation, I think we do that best by limiting ourselves to discrete situations. The purely continuous cases (requiring integrals and such), and the mixed cases (requiring measure theory) can either be treated

  • further down in the article,
  • in separate articles,
  • or by reference to external sources like MathWorld.

--Niels Ø (noe) (talk) 14:00, 23 November 2007 (UTC)

[edit] Use of 'modulus signs' and set theory

Are the modulus signs in the "Definition" section intended to refer to the cardinality of the respective sets? It's not clear from the current content of the page. I think the set theory background to probability is a little tricky, so perhaps more explanation could go into this section?

I absolutely agree.--Niels Ø (noe) 14:13, 29 January 2007 (UTC)

I may be wrong, but it seems to me that the definition P(A\mid B) = \frac{\mid A\cap B\mid}{\mid B\mid} is not just unfortunate, but simply incorrect. Consider for example the probability space (Ω,F,P) with Ω = {a,b,c,d}, the set of events F = 2Ω and probabilities P({a}) = 0.4, P({b}) = 0.3, P({c}) = 0.2 and P({d}) = 0.1. Let A = {a} and B = {a,b}. Then P(A\mid B) = \frac{P(A\cap B)}{P(B)} = \frac{P(A)}{P(B)} = \frac{4}{7}. However, \frac{\mid A\cap B \mid}{\mid B \mid} = \frac{\mid A \mid}{\mid B \mid} = \frac{1}{2}. --zeycus 4:46, 24 February 2007 (UTC)

The text talks about elements randomly chosen from a set. The author's intent clearly is that this implies symmetry.--Niels Ø (noe) 08:29, 24 February 2007 (UTC)
Yes, you are absolutely right. But then, why defining conditional probability only in that particular case, when it makes sense and is usually defined for any probabilistic space with the same formula P(A\mid  B) = \frac{P(A\cap B)}{P(B)}. --zeycus 8:43, 24 February 2007 (UTC)
I agree this is a weak section in the article; one should not have to guess about the author's intentions. Anyway, I think the idea is to gereralize from the fairly obvious situation with symmetry to the general formulae. Of course, that kind of reasoning does not really belong under the heading "Definition". Go ahead; try your hand at it!--Niels Ø (noe) 10:07, 24 February 2007 (UTC)
I've restored the proper definition. --Zundark 11:06, 24 February 2007 (UTC)

[edit] Valid for continuous distributions?

Two events A and B are mutually exclusive if and only if P(A∩B) = 0...

Let X be a continuous random variable, e.g. normally distributed with mean 0 and standard deviation 1. Let A be the event that X >= 0, and B the event that X <= 0. Then, A∩B is the event X=0, which has probability 0, but which is not impossible. I don't think A and B should be called exclusive in this case. So, either the context of the statement from the article I quote above should be made clear (For discrete distributions,...), or the statement itself should be modified.

Would it in all cases be correct to say that A and B are exclusive if and only if A∩B = Ø ? Suppose U={0,1,2,3,4,5,6}, P(X=0)=0 and P(X=x)=1/6 for x=1,2,3,4,5,6 (i.e. a silly but not incorrect model of a die). Are A={X even}={0,2,4,6} and B={X<2}={0,1} mutually exclusive or not?--Niels Ø (noe) 14:13, 29 January 2007 (UTC)

I wonder if that definition is correct. In the article mutually exclusive, n events are defined as exclusive if the occurrence of any one of them automatically implies the non-occurrence of the remaining n − 1 events. Very similarly, in mathworld:
n events are said to be mutually exclusive if the occurrence of any one of them precludes any of the others.
As Niels said, that is in fact stronger than saying P(A\cap B)=0. Somehow, I think the definition now in the article should be labeled as "almost mutually exclusive". Shouldn't we just say that A and B are mutually exclusive just if A\cap B=\emptyset, and avoid all this fuss?--User:Zeycus 10:03, 20 March 2007 (UTC)
No answer in three weeks. In a few days, if nobody has anything to say, I will change the definition in the article.--User:Zeycus 14:03, 9 April 2007 (UTC)
Done.--User:Zeycus 8:30, 13 April 2007 (UTC)

PLAIN ENGLISH

Thank you for your fine work. However, it would be useful to more people if you would provide a definition of your math notation such as http://upload.wikimedia.org/math/6/d/e/6de3a4670340b7be5303b63574cb3113.png

[edit] An example?

Here's an example involving conditional probabilities that makes sense to me, and usually also to the students to whom I teach this stuff. It clearly shows the difference between P(A|B) and P(B|A).

As the example is currently in the article, there's no need to repeat it here.--Niels Ø (noe) (talk) 11:28, 16 December 2007 (UTC)

So that's my example. Do you like it? Should I include it in the article? Can you perhaps help me improve on it first?--Niels Ø (noe) 11:48, 13 February 2007 (UTC)

No replies for 10 days. I don't know how to interpret that, but I'll now be bold and ad my example to the article.--Niels Ø (noe) 09:25, 23 February 2007 (UTC)
Suggestions for improvement:
* It's a bit confusing that the odds of having the disease and odds of false positive BOTH being 1%. It would be better to have one be different, say 10%.
* I think some people (including myself) see things better graphically. You can represent the same problem (using your original numbers) as a 1.0 x 1.0 square, with one edge divided up into 0.99 and 0.01, and the other edge divided up into 0.99 and 0.01. Now you have one large rectangle (0.99 x 0.99) which represents the those that test negative and are negative, and a tiny rectangle (0.01 x 0.01) that represents those that are testing negative but are positive. The remaining two tall and skinny rectangles (0.99 x 0.01) and (0.99 x 0.01) represent those who are testing positive. One of those skinny rectangles represents positive and testing positive, the other represents negative and testing positive. Those are about the same size, so that would give you the half the false positive rate. I think exploding the rectangles, exaggerating the size of the 0.01 portons, and clearly labelling them would help to.
Clemwang 04:05, 22 March 2007 (UTC)

So, my example has been in the article for about half a year now. My text has some imperfections - e.g. the way equations are mixed into sentences, which is grammatically incorrect. I hoped someone with a better command of English than myself might correct that, but nothing has happened. I wonder, did anyone actually read this example?

Replies to Clemwang: I don't think having all three probabilities equal 1% is really a problem. Of course, without making the example less realistic, one might let them be 1%, 2% and 3%, say. - I like the type of diagram you suggest; in my experience, they are good for understanding this type of situation, but (surprisingly to me), tree diagrams are more helpful for solving problems (i.e. fewer students mess things up that way). In this particular example, the graphical problem of comparing 1% to 99% is severe; the best solution would actually be if someone could come up with a meaningful example to replace mine, where the three probabilities are 10%, 20% and 30%, say.--Niels Ø (noe) 11:23, 17 September 2007 (UTC)

Is it just me or is there a typo in the example of the article where it gives the final result of false positives as .5% Shouldn't it be 50%, as it says on this page? 146.244.153.149 22:22, 29 September 2007 (UTC)

I'm not sure what you mean. It says at one place: "0.99% / 1.98% = 50%". Reading "%" as "times 0.01", it just says 0.0099 / 0.0198 = 0.50, which is correct.--Niels Ø (noe) 06:53, 2 October 2007 (UTC)

[edit] Improving the Independence Section

Someone ought to expand the discussion of independence to n terms. For instance, three events E, F, G are independent iff:

P(EFG)=P(E)P(F)P(G),

P(EF)=P(E)P(F)

P(EG)=P(E)P(G)

P(GF)=P(G)P(F)

And so on. Verbally, every combination k (as in n choose k) of the n events (k=2,3,...,n), must be independent for ALL of them to be independent of each other. Most textbooks I've seen include independence definitions for more than two events.

—The preceding unsigned comment was added by 171.66.41.25 (talk • contribs).

This article is only concerned with independence as far as it relates to conditional probability. The general case is covered in the article on statistical independence. --Zundark 21:27, 20 July 2007 (UTC)

[edit] WikiProject class rating

This article was automatically assessed because at least one WikiProject had rated the article as start, and the rating on other projects was brought up to start class. BetacommandBot 03:52, 10 November 2007 (UTC)

[edit] P(A | B,C) etc

The article is all about the conditional probability of A given B. What about the probability of A given B AND C? (Plus extensions to more variables.)

Maybe the answer is blindingly obvious to people who know about the subject, but to an ignoramus with only three numerate degrees to his name, who nevertheless finds probability theory the most counter-intuitive maths he's ever studied, it's as clear as mud.

--84.9.73.211 (talk) 18:17, 1 February 2008 (UTC)

A, B and C would be "events". An event is any subset of the "sample space" U, i.e. the set of all possible outcomes. What you call P(A|B,C) or P(A|B and C) would be P(A|B\cap C), i.e. the probability of A given the event B\cap C has happened. Here, B\cap C is the intersection of B and C, i.e. the event that happens if both B and C happen at the same time. Confused? Try reading this again, with the following dicing events in mind:
U={1,2,3,4,5,6}
A={X even}={2,4,6}
B={X>3}={4,5,6}
C={X<6}={1,2,3,4,5}
B\cap C={4,5}
A\cap B\cap C={4}
Then, P(A|B\cap C)=\frac{P(A\cap B\cap C)}{P(B\cap C))}=\frac{1/6}{2/6}=\frac12.
As this is equal to P(A), in this case A happens to be independent of B\cap C.
Did that help?--Noe (talk) 20:01, 1 February 2008 (UTC)
Well, partially, but I'd hoped for a formula in terms of conditional and marginal probabilities. Moreover, I think bringing time into it confuses the issue. For example, suppose A = "It rains today", B = "It rained yesterday", C = "It rained the day before yesterday". Clearly the events described in B and C can't happen "at the same time" in any sense, although the propositions B and C can both be true. P(A|B,C) then means "the probability that it rains today knowing that it rained on the previous two days". OK, suppose I know P(A), the probability of rain on any single day; P(A) = P(B) = P(C) because the day labels are arbitrary. Suppose I also know P(A|B), the probability of rain on one day given rain the previous day; P(A|B) = P(B|C) by the same argument. How do I work out P(A|B,C) in terms of these probabilities (and possibly others)?
You need to supply something like the probability of it raining three days in a row -- P(A,B,C|I); or alternatively, the probability of it raining both the day after and the day before a rainy day -- P(A,C|B).
Does C give you any more information about A than you already have through B ? Maybe it does, maybe it doesn't. It depends, given the data, or the physical intuition, that you're assessing your probabilities from. Jheald (talk) 12:35, 11 March 2008 (UTC)
BTW, the \cap notation used here (intersection) is applicable to sets. When dealing with logical propositions such as the ones above, it's more appropriate to use the notation of conjunction: \wedge. However, most publications seem to use the comma notation instead. Also Pr(.) is often used nowadays for a single probability value, to distinguish it from p(.) or P(.) for a probability density. So Pr(A|B,C) would be my preference.
--84.9.94.255 (talk) 00:02, 11 March 2008 (UTC) (formerly 84.9.73.211)

[edit] Merge with Marginal distribution

The Marginal distribution article does not present enough information to stand on its own. It should be merged into this article.

Neelix (talk) 14:52, 13 April 2008 (UTC)

I disagree. Marginal distribution should be expanded. Michael Hardy (talk) 15:47, 13 April 2008 (UTC)
Oppose, per Michael Hardy. Marginal distribution is a sufficiently important and distinctive idea that it deserves its own article. Plus it's a rather different thing from a conditional distribution. Jheald (talk) 16:06, 13 April 2008 (UTC)

Maybe I should expand on this a bit. "Marginal probability" is a rather odd concept. The "marginal probability" of an event is merely the probability of the event; the word "marginal" merely emphasizes that it's not conditional, and is used in contexts in which it is important to emphasize that. So the occasions when it's important to emphasize that are very context-dependent. For those reasons I can feel a certain amount of sympathy for such a "merge" proposal. But on the other hand, just look at the way the concept frequently gets used, and that convinces me that it deserves its own article. Wikipedia is quite extensive in coverage, and it's appropriate that articles are not as clumped together as if coverage were not so broad. Michael Hardy (talk) 18:05, 13 April 2008 (UTC)

That makes does make sense. I will remove the merge suggestion. The marginal distribution article, however, still requires expansion. I will place a proper notice on that article. Neelix (talk) 18:10, 13 April 2008 (UTC)

[edit] First impression

I feel that the whole page needs rewriting. The following statement, for example, cannot be a definition because it contains many implications and does not make sense when taken alone:

Marginal probability is the probability of one event, regardless of the other event. Marginal probability is obtained by summing (or integrating, more generally) the joint probability over the unrequired event. This is called marginalization. The marginal probability of A is written P(A), and the marginal probability of B is written P(B). —Preceding unsigned comment added by 207.172.220.58 (talk) 15:42, 14 May 2008 (UTC)

That paragraph was not optimally clear; I've tried to rewrite it, but there ought to be a section on Marginal probability in the main body of the article. An example with a table of joint probabilities in which the margins are the marginal probabilities might help to clarify this; something like:
     B1   B2   B3  | TOT
A1   23%  17%  31% |  71% 
A2   16%   4%   9% |  29%
-------------------+-----
TOT  39%  21%  40% | 100%
but preferably with something concrete, meaningful, and realistic, instead of the abstract Ai and Bj. The totals, given in the margins, are marginal probabilities.  --Lambiam 09:18, 19 May 2008 (UTC)