Talk:Conditional probability

From Wikipedia, the free encyclopedia

I've struck out the sentence about decision trees. There is certainly no sense in which conditional probability calculations are generally easier with decision trees. Decision trees can indeed be interpreted as conditional probability models (or not), but in any event, they are a very, very small part of the world of conditional probability, and making an unwarranted assertion about a minor topic is out of place. Wile E. Heresiarch 17:13, 1 Feb 2004 (UTC)

Contents

[edit] Wrong?

Conditional probability is the probability of some event A, given that some other event, B, has already occurred

...

In these definitions, note that there need not be a causal or temporal relation between A and B. A may precede B, or vice versa, or they may happen at the same time.

This statement is totally confusing - if event B has already occurred, there has to be a temporal relation between A and B (i.e. B happens before A). --Abdull 12:50, 25 February 2006 (UTC)

I've reworded it. --Zundark 14:32, 25 February 2006 (UTC)
Great, thank you! --Abdull 11:24, 26 February 2006 (UTC)

Since the subject of the article is completely formal, I dislike the references to time, expressions like "temporal relation" or one event "preceeding" another, because I find them informal in this context. In the framework of probability space where we are working time is not formally introduced: what "time" does an event A take place? In fact, when we specifically want to represent or model how our knowledge of the world (represented by random variables) is growing as time passes, we can do it by means of filtrations. And I feel the same goes for the "causal relation", in the article such notion is not defined formally.--zeycus 15:22, 23 February 2007 (UTC)

The purpose of this paragraph is to dispel the common misconception that conditional probability has something to do with temporal relationships or causality. The paragraph is necessarily informal, as a probability space does not even have such concepts. (By the way, contrary to your suggestion on my Talk page, this paragraph was added by Wile E. Heresiarch on 10 February 2004. The rewording I mentioned above did not touch this paragraph, it simply removed incorrect suggestions of temporal relationships elsewhere in the article. All this can be seen from the edit history.) --Zundark 08:36, 24 February 2007 (UTC)
I apologize for attributing you the paragraph. I understand what you mean, but I think it is important to separate formal notions from informal ones. So I will add a short comment afterwards. --zeycus 9:42, 24 February 2007 (UTC)

[edit] Undefined or Indeterminate?

In the Other Considerations section, the statement If P(B) = 0, then P(A \mid B) is left undefined. seems incorrect. Is it not more correct to say that P(A \mid B) is indeterminate?

If P(B) = 0, then P(A \cap B) = 0 regardless of P(A) or P(A \mid B).

Bob Badour 04:36, 11 June 2006 (UTC)

It's undefined. If you think it's not undefined, then what do you think its definition is? --Zundark 08:54, 11 June 2006 (UTC)
Indeterminate as I said, the definition of which one would paraphrase to incalcuable or unknown. However, an indeterminate form can be undefined, and the consensus in the literature is to call the conditional undefined in the abovementioned case. There are probably reasons for treating it as undefined that I am unaware of, and changing the text in the article would be OR. Thank you for your comments, and I apologize for taking your time. -- Bob Badour 00:07, 12 June 2006 (UTC)

Something about this is bothering me. Suppose X is normal standard. I am considering A\equiv X=0 and B\equiv X\in\{0, 5\}, for example. Clearly P(A) = P(B) = 0. However, I feel that P(A\mid B) should be defined, and in fact equal to \frac{f(0)}{f(0)+f(5)} where f is the density function of X. In order to informally justify this, I would define Aε = ( − ε,ε) and B_\epsilon=(-\epsilon, \epsilon)\cup (5-\epsilon, 5+\epsilon) for any ε > 0. Then, if I am not wrong, \lim_{\epsilon\to 0^+}{P(A_\epsilon\mid B_\epsilon)} = \frac{f(0)}{f(0)+f(5)}\approx 0.999996273.

Suppose someone tells me that a number has been obtained from a normal standard variable, that it is 0 or 5, and that I have a chance for a double-nothing bet trying to guess which one of them it was. Shouldn't I bet for the 0? And how can I argument it, if not with the calculations above? Opinions are most welcome. What do you think? -- zeycus 18:36, 22 February 2007 (UTC)

I think you are absolutely right. However, the theory needed to obtain this is a lot more complicated than the theory needed to understand conditional probability as such. Should the article state clearly from the start that we are deling with discrete distributions only, and then perhaps have a last section dealing with generalization to continuous distributions?--Niels Ø (noe) 19:33, 22 February 2007 (UTC)
It's already valid for continuous distributions, at least now that I've cleaned up the definition section. It's not usual to define P(A|B) when P(B) = 0, but if someone can find a decent reference for this then it might be worth adding to the article. --Zundark 11:25, 24 February 2007 (UTC)
I was not able to find any source defining P(A\mid B) when P(B) = 0. I posted in a math forum, and after an interesting discussion someone gave a nice argument (a bit long to be copied here) justifying why it does not make sense. I consider the point clarified. --zeycus 15:30, 28 February 2007 (UTC)

[edit] Use of 'modulus signs' and set theory

Are the modulus signs in the "Definition" section intended to refer to the cardinality of the respective sets? It's not clear from the current content of the page. I think the set theory background to probability is a little tricky, so perhaps more explanation could go into this section?

I absolutely agree.--Niels Ø (noe) 14:13, 29 January 2007 (UTC)

I may be wrong, but it seems to me that the definition P(A\mid B) = \frac{\mid A\cap B\mid}{\mid B\mid} is not just unfortunate, but simply incorrect. Consider for example the probability space (Ω,F,P) with Ω = {a,b,c,d}, the set of events F = 2Ω and probabilities P({a}) = 0.4, P({b}) = 0.3, P({c}) = 0.2 and P({d}) = 0.1. Let A = {a} and B = {a,b}. Then P(A\mid B) = \frac{P(A\cap B)}{P(B)} = \frac{P(A)}{P(B)} = \frac{4}{7}. However, \frac{\mid A\cap B \mid}{\mid B \mid} = \frac{\mid A \mid}{\mid B \mid} = \frac{1}{2}. --zeycus 4:46, 24 February 2007 (UTC)

The text talks about elements randomly chosen from a set. The author's intent clearly is that this implies symmetry.--Niels Ø (noe) 08:29, 24 February 2007 (UTC)
Yes, you are absolutely right. But then, why defining conditional probability only in that particular case, when it makes sense and is usually defined for any probabilistic space with the same formula P(A\mid  B) = \frac{P(A\cap B)}{P(B)}. --zeycus 8:43, 24 February 2007 (UTC)
I agree this is a weak section in the article; one should not have to guess about the author's intentions. Anyway, I think the idea is to gereralize from the fairly obvious situation with symmetry to the general formulae. Of course, that kind of reasoning does not really belong under the heading "Definition". Go ahead; try your hand at it!--Niels Ø (noe) 10:07, 24 February 2007 (UTC)
I've restored the proper definition. --Zundark 11:06, 24 February 2007 (UTC)

[edit] Valid for continuous distributions?

Two events A and B are mutually exclusive if and only if P(A∩B) = 0...

Let X be a continuous random variable, e.g. normally distributed with mean 0 and standard deviation 1. Let A be the event that X >= 0, and B the event that X <= 0. Then, A∩B is the event X=0, which has probability 0, but which is not impossible. I don't think A and B should be called exclusive in this case. So, either the context of the statement from the article I quote above should be made clear (For discrete distributions,...), or the statement itself should be modified.

Would it in all cases be correct to say that A and B are exclusive if and only if A∩B = Ø ? Suppose U={0,1,2,3,4,5,6}, P(X=0)=0 and P(X=x)=1/6 for x=1,2,3,4,5,6 (i.e. a silly but not incorrect model of a die). Are A={X even}={0,2,4,6} and B={X<2}={0,1} mutually exclusive or not?--Niels Ø (noe) 14:13, 29 January 2007 (UTC)

I wonder if that definition is correct. In the article mutually exclusive, n events are defined as exclusive if the occurrence of any one of them automatically implies the non-occurrence of the remaining n − 1 events. Very similarly, in mathworld:
n events are said to be mutually exclusive if the occurrence of any one of them precludes any of the others.
As Niels said, that is in fact stronger than saying P(A\cap B)=0. Somehow, I think the definition now in the article should be labeled as "almost mutually exclusive". Shouldn't we just say that A and B are mutually exclusive just if A\cap B=\emptyset, and avoid all this fuss?--User:Zeycus 10:03, 20 March 2007 (UTC)

[edit] An example?

Here's an example involving conditional probabilities that makes sense to me, and usually also to the students to whom I teach this stuff. It clearly shows the difference between P(A|B) and P(B|A).

In order to identify individuals having a serious disease in an early curable form, one may consider screening a large group of people. While the benefits are obvious, an argument against such screenings is the disturbance caused by false positive screening results: If a person not having the disease is incorrectly found to have it by the initial test, they will most likely be quite distressed till a more careful test hopefully shows that they do not have the disease. Even after being told they are well, their lives may be affected negatively.

The magnitude of this problem is best understood in terms of conditional probabilities.

Suppose 1% of the group suffer from the disease D. Choosing an individual at random, P(D)=1%=0.01 and P(W)=99%, where W=D' means the person is well. Suppose that when the screening test is applied to a person not having the disease, there is a 1% chance of getting a false positive result, i.e. P(P|W)=1%, and P(N|W)=99%, where P means positive result, and N=P' means negative result. Finally, suppose that when the test is applied to a person having the disease, there is a 1% chance of a false negative result, i.e. P(N|D)=1% and P(P|D)=99%.

Now, calculation shows that:

P(W\cap N)=P(W)\times P(N|W)=99%\times99%=98.01% is the fraction of the whole group being well and testing negative.
P(D\cap P)=P(D)\times P(P|D)=1%\times99%=0.99% is the fraction of the whole group being ill and testing positive.
P(W\cap P)=P(W)\times P(P|W)=99%\times1%=0.99% is the fraction of the whole group having false positive results.
P(D\cap N)=P(D)\times P(N|D)=1%\times1%=0.01% is the fraction of the whole group having false negative results.

Furthermore,

P(P)=P(W\cap P)+P(D\cap P)=0.99%+0.99%=1.98% is the fraction of the whole group testing positive.
\scriptstyle P(D|P)=\frac{P(D\cap P)}{P(P)}=\frac{0.99%}{1.98%}=50% is the probability that you actually have the disease if you tested positive.

In this example, it should be easy to relate to the difference between P(P|D)=99% and P(D|P)=50%: The first is the conditional probability that you test positive if you have the disease; the second is the conditional probability that you have the disease if you test positive. With the numbers chosen here, the last result is likely to be deemed unacceptable: Half the people testing positive are actually false positives.

So that's my example. Do you like it? Should I include it in the article? Can you perhaps help me improve on it first?--Niels Ø (noe) 11:48, 13 February 2007 (UTC)

No replies for 10 days. I don't know how to interpret that, but I'll now be bold and ad my example to the article.--Niels Ø (noe) 09:25, 23 February 2007 (UTC)
Suggestions for improvement:
* It's a bit confusing that the odds of having the disease and odds of false positive BOTH being 1%. It would be better to have one be different, say 10%.
* I think some people (including myself) see things better graphically. You can represent the same problem (using your original numbers) as a 1.0 x 1.0 square, with one edge divided up into 0.99 and 0.01, and the other edge divided up into 0.99 and 0.01. Now you have one large rectangle (0.99 x 0.99) which represents the those that test negative and are negative, and a tiny rectangle (0.01 x 0.01) that represents those that are testing negative but are positive. The remaining two tall and skinny rectangles (0.99 x 0.01) and (0.99 x 0.01) represent those who are testing positive. One of those skinny rectangles represents positive and testing positive, the other represents negative and testing positive. Those are about the same size, so that would give you the half the false positive rate. I think exploding the rectangles, exaggerating the size of the 0.01 portons, and clearly labelling them would help to.
Clemwang 04:05, 22 March 2007 (UTC)