# CS代考计算机代写 Bayesian Bayesian Learning

Bayesian Learning

Prof. Dan A. Simovici

UMB

1/28

Outline

1 Probability Spaces

2 Random Variables

3 Conditional Probabilities

4 ML and Conditional Probabilities

5 Bayes Theorem and Concept Learning

2/28

Probability Spaces

Suppose that (Ω,E,P) is a probability space, E is a family of subsets of Ω known as events, and P is a probability. The elements of Ω are elementary events.

In many cases, E consists of all subsets of Ω, and we will make this assumption unless a special statement says otherwise.

3/28

Probability Spaces

Example

Roling two dice is described by a finite probability space that consists of 36 elementary events: (1, 1), (1, 2), . . . , (6, 6).

6 5 4 3 2 1

123456

4/28

Probability Spaces

An event in the previous example is a subset of {1,…,6}×{1,…,6}, that is, a subset of the set of pairs {(u,v) | 1 u 6,1 v 6}.

Example

throws that have the same number of both dice:

S = {(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)}

throws such that the sum of the numbers is greater than 8:

B = {(2,6),(3,5),(3,6),(4,4),(4,5),(4,6),(5,3),(5,4),(5,6),(6,6),(6

5/28

,

Probability Spaces

Note that Ω consists of 36 elementary events and there are 236 ≈ 1012 events in this very simple probability space

Probability of an event V in this context is the number P(V) given by

P(V) = |V|. |Ω|

Example

We have and

P(S)= 6 =1, 36 6

P(B) = 18 = 1 36 2

6/28

Random Variables

Informally, Borel sets of R are the sets that can be constructed from open or closed sets by repeatedly taking countable unions and intersections.

Definition

Let (Ω,E,P) be a probability space. A function X : Ω −→ R is a random variable if X−1(U) ∈ E for every Borel subset of R.

7/28

Random Variables

Definition

A simple random variable is defined by a table: x1 x2 ··· xn

X:pp···p, 12n

where x1,…,xn are the values assummed by X and pi = P(X = xi) for 1 i n. We always have p1 + · · · + pn = 1.

The expected value of X is

E[X]=x1p1 +x2p2 +···+xnpn.

8/28

Random Variables

Example

A random variable X whose distribution is: 0 1

where p + q = 1 is said to have a Bernoulli distribution with parameter p. Note that E[X] = p and var(X) = pq.

X:qp,

9/28

Random Variables

Example

Letp,q∈[0,1]betwonumberssuchthatp+q=1. Considerthe random variable defined by

01···k···n X: qn nqn−1p ··· nqn−kpk ··· pn ,

1k

We refer to a random variable with this distribution as a binomial random

variable. Note that

n nn−1 nn−kk n n

q + 1 q p+···+ k q p +···+p =(q+p) =1.

10/28

Random Variables

Example cont’d

The expectation of a binomial variable is E[X] = np.

The variance of a random variable X is

var(X) = E[(X − E(X))2] = E[X2] − (E[X])2.

In the case of a binomial variable the variance is var(X) = npq.

11/28

Random Variables

The Characteristic Function of an Event

If A is an event, then the function 1A : Ω −→ {0, 1} defined by 1 ifω∈A,

is a random variable,

1A(ω) = 0 otherwise, 01

1A: 1−P(A) P(A)

Note that E(1A) = P(A) and var(1A) = P(A)(1 − P(A)).

12/28

Random Variables

The event A∧B takes place when both A and B occur; the event A∨B takes place when at least one of A and B occur.

Example

The event S ∧ B takes place when the result of throwing the dice results in a pair of numbers (n, n) whose sum is greater than 8 and consists of the pairs:

(4, 4), (5, 5), (6, 6)

Therefore,P(S∧B)= 3 = 1 . 36 12

13/28

Conditional Probabilities

Definition

If B is an event such that P(B) > 0 one can define the probability of an event A conditioned on B as

P(A|B) = P(A ∩ B). P(B)

Example

The probability of the event S conditioned on B is

P(S|B)=

P(B|S)=

P(S ∧ B) 1 1 =12= ,

P(B) 1 6 2

and

P(S ∧ B) 1 1 =12= .

P(S) 1 2 6

14/28

Conditional Probabilities

Definition

Two events A, B are independent if P(A ∧ B) = P(A)P(B). If A, B are independent events, then

and

P(A|B) = P(A ∧ B) = P(A)P(B) = P(A) P(B) P(B)

P(B|A) = P(B ∧ A) = P(B)P(A) = P(B). P (A) P (A)

Note that B and S are independent events because

P(B∧S)= 1 =P(B)P(S). 12

15/28

Conditional Probabilities

The product rule or the Bayes theorem:

P(A ∧ B) = P(A|B)P(B) = P(B|A)P(A).

The sum rule:

P(A ∨ B) = P(A) + P(B) − P(A ∧ B).

The total probability rule: if A1, . . . , An are mutually exclusive and

ni=1 P(Ai ) = 1, then

n

P(B) = P(B|Ai )P(Ai ).

i=1

16/28

ML and Conditional Probabilities

In ML we are often interested in determining the best hypothesis from some space H given the observed data S.

“Best” means in this context, the most probable hypothesis given

the data S, and

any initial knowledge of prior probabilities of hypotheses in H.

17/28

ML and Conditional Probabilities

“Prior probabilities” (or a priori probabilities) mean probabilities of hypotheses before seeing the data S.

“Posterior probabilities” mean probabilities of hypotheses after seeing the data S.

If no prior knowledge exist all hypotheses have the same probability. In ML we are interested to compute P(h|S) that h holds given the observed training data S.

18/28

ML and Conditional Probabilities

Bayes’ Theorem in ML

For a sample S and a hypothesis h we have P(h|S) = P(S|h)P(h)

P(S)

Note that:

P(h|S) increases with P(h) and with P(S|h).

P(h|S) decreases with P(S) because the more probable is that S will be observed independent of h, the less evidence S provides for h.

19/28

ML and Conditional Probabilities

Learning Scenario

Consider some set of candidate hypotheses H and seek the most probable hypothesis given the observed data S.

Any such maximally probabile hypothesis is called a maximum a posteriori hypothesis, MAP.

hMAP is

because P(S) is a constant.

hMAP =

argmaxh∈H P (h|S )

= argmax P (S |h)P (h)

h∈H P(S)

= argmaxh∈H P (S |h)P (h)

20/28

ML and Conditional Probabilities

Maximum Likelihood Hypothesis

In some cases we assume that every hypothesis of H is apriori equally probable, that is, P(hi) = P(hj) for all hi,hj ∈ H.

Now,

hMAP = argmaxh∈HP(S|h). P(S|h) is known as the likelihood of S given h.

21/28

ML and Conditional Probabilities

Example

A medical diagnosis problem:

The hypothesis space contains two hypotheses:

h0: patient has no cancer;

h1: patient has cancer.

An imperfect diagnosis test that has two outcomes; ⊕ and ⊖.

P(⊕|h1) = 0.98 P(⊕|h0) = 0.03 . P(⊖|h1) = 0.02 P(⊖|h0) = 0.97

Prior knowlege: Only 0.08% of population has cancer; 99.2% does not.

22/28

ML and Conditional Probabilities

Example (cont’d)

The test returns ⊕. Should we conclude that the patient has cancer? The MAP hypothesis is obtained as

hMAP = argmaxh∈HP(S|h)P(h). P(⊕|h1)P(h1) = 0.98 ∗ 0.008 = 0.0078,

P(⊕|h0)P(h0) = 0.03 ∗ 0.992 = 0.0298. The MAP hypothesis is h0; the patient has no cancer.

23/28

Bayes Theorem and Concept Learning

Brute-Force Bayes Concept Learning

For each hypothesis h ∈ H calculate the posterior probablity: P(h|S) = P(D|h)P(h)

P(S) Output the hypothesis hMAP with

hMAP = argmaxh∈HP(h|S).

24/28

Bayes Theorem and Concept Learning

Assumption for the Brute-Force Bayes Concept Learning: TrainingdataisS={(x1,y1),…,(xm,ym)},whereyi =f(xi)for

1 i m and it is noise-free.

The target hypothesis is contained in H.

We have no apriori reason to believe that any hypothesis is more probable than the other

25/28

Bayes Theorem and Concept Learning

Consequences

P(h)= 1 ; |H|

The probability of S given h is 1 if S is consistent with h and 0 otherwise:

1 ifyi =h(xi)for1im 0 otherwise;

P(S|h) =

26/28

Bayes Theorem and Concept Learning

Let VSH,S be the subset of hypotheses of H that is consistent with S. If S is inconsistent with h then P(h|S) = 0·P(h) = 0.

P(S)

If S is consistent with h then

1·1 1·1

1

P(h|S)= |H| = |H| = P(S) |VSH,S|

|VSH,S |

|H|

27/28

Bayes Theorem and Concept Learning

Since the hypotheses are mutually exclusive (that is, P(hi ∧ hj ) = 0 if i ̸= j), by the total probability law:

P(S) = P(S|hi)P(hi) hi ∈H

= 1·1+0·1 h∈VS |H | h̸∈VS |H |

H,S H,S = 1· 1 =|VSH,S|.

h∈VS

|H| |H|

H,S

Note that under this setting every consistent hypothesis is a MAP hypothesis.

28/28