I. Element of probabilistic models
1. Every probabilistic model involves an underlying process, called the experiment.銆�銆�( Example. Flip two coins )
2. The experiment produces exactly one out of several possible outcomes.銆�銆�( Example. four outcomes: {饾惢饾惢, 饾惢饾憞, 饾憞饾惢, 饾憞饾憞} )
3. The set of all possible outcomes is the sample space.銆�銆�( Example. Ω = {饾惢饾惢, 饾惢饾憞, 饾憞饾惢, 饾憞饾憞} )
4. Event is a subset of sample space.銆�銆�(Example. 饾惔饾惔 = {饾惢饾惢, 饾憞饾憞} , the event that the two coins give the same side.
5. The probability law assigns our knowledge or belief to an event 饾惔 a number 饾憙(饾惔) ≥ 0. It specifies the likelihood of any outcome.
II. Probability Axioms
1. (Non-negativity) 饾憙(饾惔) ≥ 0, for every event 饾惔.
2. (Additivity) For any two disjoint events 饾惔 and 饾惖, 饾憙(饾惔 ∪ 饾惖) = 饾憙(饾惔) + 饾憙(饾惖) In general, if 饾惔1, 饾惔2, … are disjoint events, then 饾憙(饾惔1 ∪ 饾惔2 ∪ 鈰�) = 饾憙(饾惔1) + 饾憙(饾惔2) + 鈰�
3. (Normalization) 饾憙(Ω) = 1.
III. Discrete model & Continuous model
In discrete models, it holds that for any event 饾惔 = {饾憥1, … , 饾憥饾憶}, 饾憙(饾惔) = 饾憙(饾憥1) + 鈰� + 饾憙(饾憥饾憶). When the probability law is uniform, then 饾憙(饾惔) = |饾惔| / |Ω|.
However, sample space can also be infinite, and continuous. For continuous sample spaces, the probabilities of the single-element events may not be sufficient to characterize the probability law.
A natural candidate: For a continuous model Ω = [0,1]. Define the probability on any subinterval [饾憥, 饾憦] ⊆ [0,1] to be 饾憙([饾憥, 饾憦]) = 饾憦 − 饾憥. (i.e. Probability = “the length of the interval.”)
IV. Properties of Probability Laws
Consider a probability law, and let 饾惔, 饾惖, and 饾惗 be events.
1. If 饾惔 ⊆ 饾惖, then 饾憙(饾惔) ≤ 饾憙(饾惖) .
2. 饾憙(饾惔 ∪ 饾惖) = 饾憙(饾惔) + 饾憙(饾惖) − 饾憙(饾惔 ∩ 饾惖) .
3. 饾憙(饾惔 ∪ 饾惖) ≤ 饾憙(饾惔) + 饾憙(饾惖) .
4. 饾憙(饾惔 ∪ 饾惖 ∪ 饾惗) = 饾憙(饾惔) + 饾憙(饾惔' ∩ 饾惖) + 饾憙(饾惔' ∩ 饾惖' ∩ 饾惗).
ps. 饾惔' is the complement of 饾惔.
V. Conditional Probability
Definition: Conditional probability of A given B is P(A鈹侭) = P(A ∩ B) / P(B), where we assume that P(B)>0.
銆�銆�ps. If P(B) = 0: then P(A鈹侭) is undefined.
If the possible outcomes are finitely many and equally likely, then P(A鈹侭) = |A ∩ B| / |B|.
VI. Total Probability Theorem
Definition: For disjoint events A1, . . . ,A_n , assume P(A_i)>0 for all i . Then, for any event B , we have,
P(B) = P(A_1∩B)+鈰�+P(A_n∩B) = P(A_1 )P(B鈹侫_1 )+鈰�+P(A_n )P(B鈹侫_n ).
VII. Baye's rule
Let A_1,A_2, . . . ,A_n be disjoint events that form a partition of the sample space, and assume that P(A_i)>0, for all i.
P(A_i鈹侭)=P(A_i∩B) / P(B) = P(A_i)P(B|A_i) ) / (P(B) = P(A_i)P(B|A_i) ) / (P(A_1)P(B|A_1)+鈰�+P(A_n)P(B|A_n).