# CS代考 Bayesian Networks – cscodehelp代写

Bayesian Networks

[AIMA 4G]

Chapter 13.1-13.3, 13.5

[13.2.3, 13.2.4, 13.4]

Artificial Intelligence

COSC1127/1125

Semester 2, 2021

Prof.

* Many slides are based on those and , and some based on ‘s ones. Thanks to all them!

Wominjeka!

Week 10

Bayesian Networks

Prof.

What are the chances that there was an earthquake given that the alarm went off and Sophia did not call?

Beds are burning (cover) – Midnight Oil 1987

20 years from “Beds are burning” in

“After Midnight Oil toured through the Outback in 1986, playing to remote Aboriginal communities and seeing first hand the seriousness of the issues in health and living standards, , and wrote “Beds Are Burning” to criticize how said populations were often forcibly removed from their lands, highlighted by the pre-chorus lines “it belongs to them, let’s give it back”. – Wikipedia

Artificial Intelligence

8 Markov Decision Processes

9 Reinforcement Learning

10 Bayesian Reasoning

11 Recap: Prob+BN, MDP+RL

11 Practical Reasoning & Philosophical AI, AI @ RMIT

*

Some news…

Preliminary submission passed…

We are processing results and contributions.

Remember to certify each member.

THE has been marked, results soon

More in next slide

Bonus Project 3 available (individual)

Final Contest due Week 12

Agent system (Wiki+Video report later)

Wiki template to be provided soon.

CES Survey running now

Please fill it if you are participating actively. 🙏 😃

*

THE feedback: “Tough but fair”

After dropping 3’s: 85% positive (4+5) vs 15% negative (1+2)

“Tough”: marks must be earned; higher levels in Bloom’s hierarchy.

“Fair”: within what was covered; lots of small bits; depth+breadth

Test had 4 “theoretical” questions/parts.

73 points = 66 core + 7 bonus

Partial marks given as deemed suitable for each question.

almost every question was awarded partial marks.

but not via bean counting!

Average was 52% (with max 100% and 14% HDs)

Comparable with 2020 (but they did it in 3hrs)

You will get an email with results per question.

Final mark (out of 100%) = score / 66

Final mark will be uploaded to Canvas

Thanks for your professionalism (& humor!)

THE summary (preliminary!)

Extra 1-on-1 support for Pacman Project

Want 1-on-1 team support for your project?

Book 1-on-1 team 15’ consultation time on Wednesdays 3pm-4pm, weeks 9 to 12.

Book here!

When you book:

Make a PRIVATE post before to inform us what you want to chat.

Include the link to your repo.

Use the 1-on-1 tutorial consultation time!

Remember 1-on-1 Mondays 4pm-5pm Consultation Time for tutorials, use that & book it HERE!

Course Survey is live

Results have big impact on:

Teacher: promotion, resources, course assignments, teaching awards, etc.

Prospective students who may take the course!

If you like the course, please do not just ignore it. This is where it counts!

I want to know your opinion too…

CES: The Two Questions

What would you tell a friend who is interested in the course?

Do you think you will be a better Computer Scientist after taking AI?

Questions?

*

Today…

IMPORTANT DISCLAIMER

These pack of slides are NOT core study

material. The core & main material is the book or any specific resources given for the week.

The slides in this presentations were prepared only for supporting the instructor during the lectorial and only some of them have been used in the session as needed.

These pack of slides do not represent the complete material of the week, but only a few bits here and there around what was discussed in lectorials. Most of the week content will NOT show up in this pack of slides.

Please refer to the book (and any resources specifically given) for a comprehensive coverage of the week material.

probabilities…

‹#›

Recap from probabilities: Week 7

“When it is not in our power to determine what is true, we ought to follow what is most probable.” ― René Descartes

Sample space & events.

Probability functions.

Random variables.

Probability distributions.

Joint distributions.

Conditional probabilities.

Inference:

Enumeration/marginalization.

Bayes rule.

An Aside on Notation

P(X) : distribution over X.

P(X | Y): family of conditional distributions over X, one for each y ∊ Dom(Y).

Distinguish between:

P(X): a distribution (over the range domain of var X)

P(x) or P(¬x) (or P(xi) for nonboolean): numbers.

Think of P(X|Y) as a function that accepts any xi and yk and returns P(xi | yk).

Conditional Probability

Definition:

P(A | B) = P(A Λ B) / P(B)

P(A Λ B) and P(B): by marginalization

Chain/Product rule:

P(A Λ B) = P(A | B) P(B)

Bayes Rule: Update beliefs in H after evidence e:

Know these!

Chain rule

P(A Λ B Λ C) = P(A, B, C) = P(A)*P(B|A)*P(C|A,B)

We know that:

P(B|A) = P(AΛB) / P(A)

P(C|A,B) = P(AΛBΛC) / P(AΛB)

So:

P(A Λ BΛ C) = P(A,B,C) =

P(A)*(P(AΛB) / P(A))*(P(AΛBΛC) / P(AΛB))

P(A Λ B Λ C) =

P(A)*(P(A Λ B) / P(A))*(P(A Λ B Λ C) / P(A Λ B))

Bayes Rule: CP + Chain Rule

P(A|B) = P(AΛB) / P(B) [ Def Cnd Prob]

P(AΛB) = P(B|A)*P(A) [ Chain/Prod Rule]

So: P(A|B) = P(B|A)*P(A) / P(B)

A = the cause or hypothesis (illness)

B = the effect or evidence (synthom)

Deriving Bayes Rule: another way!

P(AΛB) = P(A|B)*P(B) [ Chain/Prod Rule]

P(AΛB) = P(B|A)*P(A) [ Chain/Prod Rule]

So:

P(B|A)*P(A) = P(A|B)*P(B)

Hence:

P(A|B) = P(B|A)*P(A) / P(B)

Deriving Bayes Rule

P(Cause | Effect) =

P(Effect|Cause)*P(Cause) / P(Effect)

We generally can easily get P(Effect|Cause) & P(Cause) via stat analysis of examples

But what about P(Effect)? IMPLICIT!

P(E) = P(E|C)*P(C) + P(E|¬C)*P(¬C)

Independence

Generally we want to compute query variables Y given evidence variables E = e

There may be hidden (i.e. unobserved) vars H

H = U – Y – E

If we had the joint probability distribution then could marginalize (the hidden variables):

P(Y | E=e) = P(Y Λ E=e) / P(E = e)

= Σh P(Y Λ E=e Λ H=h) /

Σh,y P(Y=y Λ E=e Λ H=h)

Computing Conditional Probabilities

Inference: Computational Bottleneck

Semantically/conceptually, picture is clear; but several issues must be addressed…

Issue 1

How do we specify the full joint distribution over a set of random variables X1, X2,…, Xn ?

e.g., what is the value of P(X1 Λ ¬X2 Λ …. Λ ¬Xn-1 Λ Xn)?

Exponential number of possible worlds.

e.g., if the Xi are boolean, then 2n numbers (or 2n-1 parameters/degrees of freedom, since they sum to 1)

These numbers are not robust/stable.

These numbers are not natural to assess (what is probability that “Sebastian wants a cup of tea; it’s not raining or snowing in Melbourne; robot charge level is low; …”?).

Issue 2

Inference in this representation is frightfully slow.

To calculate P(a) we must either:

Sum over exponential number of worlds (enumeration/marginalization); or

Sum over each possible evidence e to determine

P(α | E=e).

The Dentistry Example

Toothache: reported toothache.

Cavity: cavity is present.

Catch: feeling a ‘‘catch’’ with a dental probe.

Sunny: weather is sunny.

*

Calculating Distributions

toothache ¬toothache

catch ¬catch catch ¬catch

cavity 0.108 0.012 0.072 0.008

¬ cavity 0.016 0.064 0.144 0.576

P(cavity) = 0.108 + 0.012 + 0.072 + 0.008

P(cavity Λ toothache) = P(cavity Λ toothache | catch) + P(cavity Λ toothache | ¬catch)

Issue 1: how to get this big table?

Issue 2: lots of sums…

*

Is there anything we can do?

How do we avoid these two problems?

no solution in general;

but in practice there is structure we can exploit.

We’ll use: 💪

independence;

conditional independence;

Bayesian networks.

Independence

A and B are independent iff

P(A|B) = P(A) or P(B|A) = P(B) or P(A,B) = P(A)P(B)

Cavity

Toothache

Catch

Weather

Cavity

Toothache

Catch

Weather

P(Toothache, Catch, Cavity, Weather) = P(Toothache, Catch, Cavity) P(Weather)

P(Toothache, Catch, Cavity | Weather) = P(Toothache, Catch, Cavity)

intuitively, learning B doesn’t influence beliefs about A

*

Conditional Independence

Independence is very strong and hence rare…

Conditional independence is more common

Consider P(catch | toothache, cavity)

depends on

but not on

P(catch | toothache, cavity) = P(catch| cavity )

P(catch | toothache, ¬ cavity) = P(catch| ¬ cavity)

P(Catch | Toothache, Cavity) = P(Catch| Cavity )

*

Conditional Independence

P(Catch | Toothache, Cavity) = P(Catch | Cavity)

Similarly:

P(Toothache | Catch, Cavity) = P(Toothache | Cavity)

P(Toothache, Catch | Cavity) =

P(Toothache | Cavity) P(Catch | Cavity)

Catch is conditionally independent of Toothache given Cavity

*

Conditional Independence

x and y are conditionally independent given z iff:

P(x|yz) = P(x|z) OR

P(y|xz) = P(y|z) OR

P(xy|z) = P(x|z)P(y|z)

learning y doesn’t influence your beliefs about x if you already know z

Example: learning someone’s mark on AI’s exam can influence the probability you assign to a specific GPA; but if you already knew the final AI’s grade, learning the exam mark would not influence your GPA assessment

Conditional Independence

P(Toothache, Catch, Cavity) =

P(Toothache | Catch, Cavity) P(Catch, Cavity) =

P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity)

Because of conditional independence this becomes

P(Toothache | Cavity) P(Catch | Cavity) P(Cavity)

Reduces the size of

the joint distribution

*

What good is independence?

Suppose (say, boolean) variables X1, X2,…, Xn are mutually independent.

We can specify full joint distribution using only n parameters (linear) instead of 2n -1 (exponential)!

How? Simply specify P(x1), …, P(xn)

From this we can recover the probability of any world or any (conjunctive) query easily

Recall P(x,y)=P(x)P(y) and P(x|y)=P(x) and P(y|x)=P(y)

Example

4 independent boolean random variables X1, X2, X3, X4

P(x1) = 0.4, P(x2) = 0.2, P(x3) = 0.5, P(x4) = 0.8

P(x1, ¬x2, x3, x4) = P(x1)(1-P(x2))P(x3)P(x4)

= (0.4)(0.8)(0.5)(0.8)

= 0.128

P(x1, x2, x3 | x4) = P(x1,x2,x3)

= P(x1)P(x2)P(x3)

= (0.4)(0.2)(0.5)

= 0.04

The Value of Independence

Complete independence reduces both representation of joint and inference from O(2n) to O(n)!!

Unfortunately, such complete mutual independence is very rare in most realistic domains.

Fortunately, most domains do exhibit a fair amount of conditional independence. We can exploit conditional independence for representation and inference as well.

Bayesian networks do just this!

Method of teaching:

Lectorials + Forum

Assessments:

Pacman Projects

Flexible THE

Technology:

EdStem + GitHub

Learning Materials:

Videos + book + tute

If you are active participant in the course (lectorials, tutes) please please don’t just ignore it, so I get a balanced view…

‹#›

Exploiting Conditional Independence

E

C

L

S

G

E – Sebastian woke up too early G – Sebastian is grumpy

S – Students are sad C – Sebastian needs coffee

L– The lecture did not go smoothly

If Sebastian woke up too early E, Sebastian probably needs coffee C; if Sebastian needs coffee, he’s likely grumpy G. If he is grumpy then it’s possible that the lecture won’t go smoothly L. If the lecture does not go smoothly then the students will likely be sad S.

Conditional Independence

E

C

L

S

G

If you learned any of E, C, G, or L, would your assessment of P(S) change?

If any of these are seen to be true, you would increase P(s) and decrease P(¬s).

So S is not independent of E, or C, or G, or L.

Conditional Independence

E

C

L

S

G

If you knew the value of L (true or false), would learning the value of E, C, or G influence P(S)?

Influence that these factors have on S is mediated by their influence on L.

Students aren’t sad because Sebastian was grumpy, they are sad because of the lecture.

So S is independent of E, C, and G, given L

Conditional Independence

E

C

L

S

G

So S is independent of E, and C, and G, given L

Similarly:

S is independent of E, and C, given G.

G is independent of E, given C.

This means that:

P(S | L, {G,C,E} ) = P(S|L)

P(L | G, {C,E} ) = P(L| G)

P(G| C, {E} ) = P(G | C)

P(C | E) and P(E) don’t “simplify

Conditional Independence

E

C

L

S

G

By the chain rule (for any instantiation of S…E):

P(S,L,G,C,E) =

P(S|L,G,C,E) P(L|G,C,E) P(G|C,E) P(C|E) P(E)

By our independence assumptions:

P(S,L,G,C,E) = P(S|L) P(L|G) P(G|C) P(C|E) P(E)

We can specify the full joint by specifying just five local conditional distributions:

P(S|L); P(L|G); P(G|C); P(C|E); and P(E)

Inference is Easy

E

C

L

S

G

What to know P(g)?

P(c) = P(c|e)P(e) + P(c|¬e)P(¬e)

= 0.8 * 0.7 + 0.5 * 0.3 = 0.78

P(¬c) = P(¬c|e)P(e) + P(¬c|¬e)P(¬e) = 0.22

P(¬c) = 1 – P(c), as well

P(g) = P(g|c)P(c) + P(g|¬c)P(¬c)

= 0.7 * 0.78 + 0.0 * 0.22 = 0.546

P(¬g) = 1 – P(g) = 0.454

Bayesian Networks

Bayesian Networks

This structure is a Bayesian network.

Graphical representation of the direct dependencies over a set of variables + a set of conditional probability tables (CPTs) quantifying the strength of those influences.

Bayes nets generalize the above ideas in very interesting ways, leading to effective means of representation and inference under uncertainty.

Bayesian Networks

Graphical notation for conditional independence statements.

aka belief networks, probabilistic networks

Compact specification of joint distributions.

One node per variable.

Directed acyclic graph.

Link means ‘directly influences’.

Node conditional on its parents: P(Xi | parents(Xi) )

*

`Dentist’ Network

Cavity

Toothache

Catch

Weather

Weather is independent of other variables.

Toothache and Catch are conditionally independent given Cavity.

*

Bayesian Networks

A

C

B

P(a)

P(¬a)

P(b)

P(¬b)

P(c|a,b) P(¬c|a,b)

P(c|¬a,b) P(¬c|¬a,b)

P(c|a,¬b) P(¬c|a,¬b)

P(c|¬a,¬b) P(¬c|¬a,¬b)

BN over variables {X1, X2,…, Xn} consists of:

a DAG whose nodes are the variables.

a set of CPTs (P(Xi | Parents(Xi) ) for each Xi

Semantics of a Bayes Net

The structure of the BN means: every Xi is conditionally independent of all of its non-descendants given its parents:

P(Xi | S ∪ Par(Xi)) = P(Xi | Par(Xi))

for any subset S ⊆ NonDescendants(Xi)

Burglary Network

`My neighbour John calls me to say the house alarm has gone off. My other neighbour Mary hasn’t rung. Minor earthquakes (!) could set off the alarm. Is there a burglar?’

Causal links:

Earthquake can set off alarm

Burglar can set off the alarm

Alarm can cause John to call

Alarm can cause Mary to call

*

Burglary Network

Query: P(B = true | A = true, M = false, J = false)?

… or, what is the same: P(b | a, ¬m, ¬j)?

Semantics of Bayes Nets

If we ask for P(x1, x2,…, xn) we obtain

assuming an ordering consistent with network

By the chain rule, we have:

P(x1, x2,…, xn)

= P(xn | xn-1, … , x1) P(xn-1 | xn-2, …, x1) … P(x1)

= P(xn | Par(xn)) P(xn-1 | Par(xn-1)) … P(x1)

Thus, the joint is recoverable using the parameters (CPTs) specified in an arbitrary BN

Check my video…

Burglary Network

P(j, m, a, ¬e, ¬b)

= P(j | a) P(m | a) P(a | ¬e,¬b) P(¬e) P(¬b)

= 0.90 x 0.70 x 0.001 x 0.998 x 0.999

= 0.0006281113

Full joint distribution can be calculated as follows:

P(X1, …, Xn) = P(X1) P(X2 | X1) … P(Xn|X1,…Xn-1)

= ∏ni=1 P(Xi | X1, … Xi-1)

= ∏ni=1 P(Xi | Parent(Xi) )

This makes the calculation a lot simpler …

*

Inference by enumeration

What is the probability

of Burglary given that

John and Mary call?

Formally: P(B | j, m)?

Calculate (by marginalizing/enumerating a, e):

ΣaΣeP(B, j, m, a, e) ←— use Bayes Net to simplify!

Burglary inference via pgmpy

Check Piazza post @285 from Benhao: https://piazza.com/class/kbsmlzxg3k7418?cid=285

Designing Bayesian Networks

Designing Bayesian Networks

Choose a set of variables that describe the domain

Choose an ordering for the variables (!!)

While there are variables left:

Choose a variable Xi and a node to the network

Set Parent(Xi) to a minimal set of nodes such that P(Xi | X1, …, Xi-1) = P(Xi | Parent(Xi) )

(Variable can only have parents that come earlier in the ordering)

Define the conditional probability table for Xi

The trick is to choose the ordering so that Parent(Xi) is much less than {X1, …, Xi-1} …

*

Causal Intuitions

The construction of a BN is simple:

works with arbitrary orderings of variable set

but some orderings are much better than others!

if ordering/dependence structure reflects causal intuitions, a more natural, compact BN results

In this BN, we’ve used the ordering Mal, Flu, Cold, Aches to build BN for distribution P for Aches.

Variable can only have parents that come earlier in the ordering

Causal Intuitions

Suppose we build the BN for distribution P using the opposite ordering: Aches, Cold, Flu, Malaria

resulting network is more complicated!

Mal depends on Aches; but it also depends on Cold, Flu given Aches

Cold, Flu explain away Mal given Aches.

Flu depends on Aches; but also on Cold given Aches

Cold depends on Aches

Compactness

1 + 1 + 1 + 8 = 11 numbers

1 + 2 + 4 + 8 = 15 numbers

If each random variable is directly influenced by at most k others, then each CPT will be at most 2k. Thus the entire network of n variables is specified by n2k.

Contrast that with a full probability table: 2n

Testing Independence (optional)

Given BN, how do we determine if two variables X, Y are independent given evidence E?

we use a (simple) graphical property!

X and Y are conditionally independent given evidence E if E d-separates X and Y

D-separation: A set of variables E d-separates X and Y if E “blocks” every undirected path in the BN between X and Y.

Blocking: Graphical View

Conclusions

Probabilities allow us to reason under uncertainty.

Full probability distribution tables are just not feasible.

Exploit Conditional Independence.

Bayesian Networks is a representation of such independences.

reduces complexity!

Whiteboard used in lectorials