COSC1127/1125 Artificial Intelligence

School of Computing Technologies Semester 2, 2021

Prof. ̃a

Tutorial No. 10 Bayesian Networks

PART1: Conditionalprobabiltygraphs……………………………………………………. We can use the full joint probability tables to answer any question we may have about a set of random variables; however, as the number of variables increases, the size of the joint probability table increases exponentially. Thankfully, we can use the independence and conditional independence relationships between variables to design a compact representation that greatly reduces the number of probabilities needed to define the full joint probability distribution. These structures are called Bayesian networks and are typically represented as directed, acyclic graphs G = (V, E) where V is the set of random variables and E ⊆ V × V is the set of directed edges. Random variables are denoted by capital letters (e.g., X) and lower-case denotes the truth assignment of a variable (e.g., x or ¬x). Edge (Y,X) ∈ E indicates that Y is the parent of X and we can denote the set of parents of X as parents(X). The parents of X are the only variables that directly affect the probability of X. This means that given parents(X), there is no other information that is useful to estimate P(X),formally:P(X|parents(X))=P(X|V1,V2,…,Vn)forallrandomvariablesVi ∈V.

In order to compute the full joint distribution using a Bayesian network, we need to find the product of all the conditional probabilities. We can do this using the following equation:

n

P(x1,x2,…,xn) = P(xi | parents(Xi)).

i=1

It is clear that if Xi has no parents, its “conditional” probability is simply P(Xi).

Where we are making a query about a variable but we don’t have full knowledge about its parents, we must sum over the hidden variables. This can be done in the following manner:

P(X) = P(X,y), y

This technique is known as marginalisation and here, Y is the vector of all hidden variables we wish to sum over. For example, in the case where there is one hidden variable Y , we have,

P (X) = P (X, y) + P (X, ¬y); in the case of two hidden variables Y1 and Y2 we have,

P (X) = P (X, y1, y2) + P (X, y1, ¬y2) + P (X, ¬y1, y2) + P (X, ¬y1, ¬y2);

and so on. In general, we will have the sum of 2n terms, where n is the number of hidden variables being summed over. Each of these terms will represent one of the possible truth assignments for all of the hidden variables.

1

COSC1127/1125 – AI Tutorial 10: Bayesian Networks Example: Consider the following Bayesian network:

Page 2 of 8

P(B) 0.001

A P(J) T 0.90 F 0.05

Burglary

JohnCalls

Alarm

Earthquake

B E P(A) T T 0.95 T F 0.94 F T 0.29 F F 0.001

MaryCalls

P(E) 0.002

A P(M) T 0.70 F 0.01

(a) GiveanexpressionfortheprobabilityP(j,m,a,¬e,¬b).

Solution: Here, we need to find conditional probabilities for each truth assignment of the variables, based on the structure of the graphical model given. So, j depends on a; m depends on a; a depends on both e and b; and, b and e don’t depend on any other variables. The total probability for the entry in the full joint table represented by P (j, m, a, ¬e, ¬b) is the product of all of these conditional probabilities. Therefore an expression for the above probability would be:

P (j | a) × P (m | a) × P (a | ¬e, ¬b) × P (¬e) × P (¬b)

(b) ComputetheprobabilityP(j,m,a,¬e,¬b).

Solution: Having worked out an expression for the probability in the previous question, computing the value is a simple matter of plugging in the correct values – remembering that P (¬a) = 1 − P (a). Therefore, the value for the above probability is:

P (j, m, a, ¬e, ¬b) = P (j | a) × P (m | a) × P (a | ¬e, ¬b) × P (¬e) × P (¬b)

= P (j | a) × P (m | a) × P (a | ¬e, ¬b) × (1 − P (e)) × (1 − P (b))

= 0.90×0.70×0.001×(1−0.002)×(1−0.001) = 0.90 × 0.70 × 0.001 × 0.998 × 0.999

= 0.0006281113

(c) GiveanexpressionfortheprobabilityP(j,¬m,¬a,b).

Solution: Here, we are missing one of the truth assignment to the variable e, so we must marginalise over that variable by summing up all entries in the full joint table corresponding to all of the other truth assignments and when e is both true and when it is false.

Therefore, an expression for the above probability is:

P(j,¬m,¬a,b) = P(j,¬m,¬a,b,e) e

= P (j | ¬a) × P (¬m | ¬a) × P (¬a | b, e) × P (b) × P (e) e

= P (j | ¬a) × P (¬m | ¬a) × P (¬a | b, e) × P (b) × P (e)

+ P (j | ¬a) × P (¬m | ¬a) × P (¬a | b, ¬e) × P (b) × P (¬e)

(d) ComputetheprobabilityP(j,¬m,¬a,b).

RMITAI2021(Semester2)-SebastianSardin ̃a Exercise1continuesonthenextpage…

COSC1127/1125 – AI Tutorial 10: Bayesian Networks Page 3 of 8 Solution: Just plug the values in as you did before:

P (j, ¬m, ¬a, b) = P (j | ¬a) × (1 − P (m | ¬a)) × (1 − P (a | b, e)) × P (b) × P (e)

+ P (j | ¬a) × (1 − P (m | ¬a)) × (1 − P (a | b, ¬e)) × P (b) × (1 − P (e))

Questions:

= 0.05 × 0.99 × 0.05 × 0.001 × 0.002 +0.05×0.99×0.06×0.001×0.998

= 0.00000296901

Consider the following directed graphical models:

AAB CCC

DEEDE G1 G2 G3

Give expressions for the following probabilities using only terms of the form P(X | parents(X)): (a) with respect to graph G1;

i. P(c,d,e)

ii. P(d,e|c)

Solution: Your turn!

(b) with respect to graph G2; i. P(a,c,e)

Solution: Your turn! ii. P(a,e|c)

Solution:

P (d | c) × P (e | c) × P (c)

Solution:

P(a,e | c) = P(a,e,c) P (c)

= P(a)×P(c|a)×P(e|c) P (c)

Here, we have only the conditional probability table for C, which depends on A, so we must marginalise over A to obtain P (c).

P (a) × P (c | a) × P (e | c) P (a) × P (c | a) × P (e | c) P (a) × P (c | a) × P (e | c) P (c) = a P (c | a) × P (a) = P (c | a) × P (a) + ….

(c) with respect to graph G3. i. P(a,¬b,c,d,¬e)

RMITAI2021(Semester2)-SebastianSardin ̃a Exercise1continuesonthenextpage…

COSC1127/1125 – AI Tutorial 10: Bayesian Networks Page 4 of 8

Solution: Your turn! ii. P(a | ¬d,e)

Solution:

Using conditional probability rule:

P(a | ¬d,e) = P(a,¬d,e) = αP(a,¬d,e), P (¬d, e)

where α is the normalisation constant 1 . P (¬d,e)

Now, because d and e are conditional on c and c is conditional on b, we have to sum up over the hidden variables b and c.

So we have,

P(a | ¬d,e) = αP(a,¬d,e)

= αP(a,¬d,e,b,c)

b,c

= α P (a) × P (b) × P (c | a, b) × P (¬d | c) × …. b,c

= α[P (a) × P (b) × P (c | a, b) × P (¬d | c) × P (e | c)

+ P (a) × P (b) × P (¬c | a, b) × P (¬d | ¬c) × P (e | ¬c)

+ ….

+ P (a) × P (¬b) × P (¬c | a, ¬b) × P (¬d | ¬c) × P (e | ¬c)]

PART2: Cardiagnosis……………………………………………………………………. Consider the network for car diagnosis shown in the figure below:

Battery

Gas

(a) Extend the network with the Boolean variables IcyWeather and StarterMotor, by reasoning how compo- nents might affect each other. Being a design/modelling process, there may be many correct answers, just state your assumptions.

Radio

Ignition

Starts

Moves

Solution:

Adding variables to an existing net can be done in two ways. Formally speaking, one should insert the variable ordering and rerun the network construction process from the point where the first new variable appears. Informally speaking, one never really builds a network by a strict ordering. Instead, one asks what variables are direct causes or influences on what other ones, and builds local parent/child graphs

RMITAI2021(Semester2)-SebastianSardin ̃a Exercise2continuesonthenextpage…

COSC1127/1125 – AI Tutorial 10: Bayesian Networks Page 5 of 8

that way. It is usually very easy to identify where in such a structure the new variable goes, but one must be very careful to check for possible induced dependencies downstream.

IcyWeather is not caused by any of the car-related variables, so needs no parents. It directly affects the battery and starter moter. StarterMotor is an additional precondition for Starts. The new network is shown below:

1

IcyWeather

22

Battery StarterMotor

221

Radio Ignition

Gas

8

Starts

2

Moves

(b) Give reasonable conditional probability tables for all the nodes.

Solution: (Note: this is a partial solution)

1. A reasonable prior for IcyWeather might be 0.05 (perhaps, depending on the location and sea- son).

2. P (Battery | IcyW eather) = 0.95, P (Battery | ¬IcyW eather) = 0.997

3. P(StarterMotor | IcyWeather) = …., P(StarterMotor | ¬IcyWeather) = ….

4. P (Radio | Battery) = …., P (Radio | ¬Battery) = ….

5. P (Ignition | Battery) = 0.998, P (Ignition | ¬Battery) = 0.01

6. P (Gas) = 0.995

7. P(Starts | Ignition,StarterMotor,Gas) = ….

8. P(Moves | Starts) = ….

There are more entries in the full CPT, however they have not been listed here for space reasons.

(c) How many independent values are contained in the joint probability distribution for eight Boolean nodes, assuming that no conditional independence relations are known to hold among them?

RMITAI2021(Semester2)-SebastianSardin ̃a Exercise2continuesonthenextpage…

COSC1127/1125 – AI Tutorial 10: Bayesian Networks Page 6 of 8

Solution:

With 8 Boolean variables, the joint has 28 − 1 = 255 independent entries. In general there are 2n entries in a joint probability distribution; however, only 2n − 1 independent entries. If the probabilities of all of the independent entries are summed up, the probability of the final entry must be 1− P (independent), making its value dependent on the rest.

(d) How many independent probability values do your network tables contain?

Solution: Your turn!

(e) (optional)TheconditionaldistributionforStartscouldbedescribedasanoisy-ANDdistribution.Define

this family in general and relate it to the noisy-OR distribution.

Solution: (Note: this is a partial solution)

The CPT for Starts describes a set of nearly necessary conditions that are together almost sufficient. That is, all the entries are nearly zero except for the entry where all conditions are true. That entry will be not quite 1 (because there is always some other possible fault that we didn’t think of), but as we add more conditions it gets closer to 1. If we add a Leak node as an extra parent, then the probability is exactly 1 when all parents are true. We can relate noisy-AND to noisy-OR using de Morgan’s rule: A ∧ B ≡ ¬(¬A ∨ ¬B). That is, noisy-AND is the same as noisy-OR except that the polarities of the parent and child variables are reversed. In the noisy-OR case, we have

P(Y =true|x1,…,xk)=1− qi (1) {i:xi=true}

where qi is the probability that the presence of the ith parent fails to cause the child to be true. In the noisy-AND case, we can write

P(Y =true|x1,…,xk)=1−…. (2)

where ri is the probability that the absence of the ith parent fails to cause the child to be false (e.g., it is magically bypassed by some other mechanism).

(f) Reconstruct, showing your workings, five entries of the full joint probability distribution (e.g., reconstruct P (¬iW, b, r, i, g, s, sM, ¬m) or P (iW, ¬b, r, i, ¬g, ¬s, sM, m), or any other variable assignment).

Solution: (Note: this is a partial solution)

Here is one example. The probabilities are taken from question 1(b).

P(¬iW,b,r,i,g,s,sM,¬m) =

P (¬iW ) · P (b | ¬iW ) · P (r | b) · P (i | b) · P (g) · P (s | i, sM, g) · …. = (1 − P (iW )) · P (b | ¬iW ) · P (r | b) · P (i | b) · …. =

0.95 · 0.997 · 0.9999 · 0.998 · 0.995 · 0.9999 · 0.999 · 0.002 = 0.00188

PART 3: Power station diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . In your local nuclear power station, there is an alarm that senses when a temperature gauge exceeds a given threshold. The gauge measures the temperature of the core. Consider the Boolean variable A (alarm sounds), FA (alarm is faulty), and FG (gauge is faulty) and the multivalued nodes G (gauge reading) and T (actual core temperature).

(a) Draw a Bayesian network for this domain, given that the gauge is more likely to fail when the core temperature gets too high.

RMITAI2021(Semester2)-SebastianSardin ̃a Exercise3continuesonthenextpage…

COSC1127/1125 – AI Tutorial 10: Bayesian Networks Page 7 of 8

Solution:

A suitable network is shown in the figure below. The key aspects are: the failure nodes are parents of the sensor nodes, and the temperature node is a parent of both the gauge and the gauge failure node. It is exactly this kind of correlation that makes it difficult for humans to understand what is happening in complex systems with unreliable sensors.

FG FA

TGA

(b) Is your network a polytree?

Solution: Your turn!

(c) Suppose there are just two possible actual and measured temperatures, normal and high; the probability that the gauge gives the correct temperature is x when it is working, but y when it is faulty. Give the conditional probability table associated with G.

Solution: (Note: this is a partial solution)

The CPT for G is shown below. The wording of the question is a little tricky because x and y are defined in terms of “incorrect” rather than “correct”.

FG G = High 1 − y

G = Normal y

¬FG 1 − x x

T = Normal

T = High FG ¬FG …. …. …. ….

(d) Suppose the alarm works correctly unless it is faulty, in which case it never sounds. Give the conditional probability table associated with A.

Solution: (Note: this is a partial solution)

The alarm works correctly unless it is faulty. So, if the alarm is not faulty, then the alarm works correctly, meaning that it sounds only if gauge reading is high.

The CPT for A is as follows:

G = Normal G = High

FA ¬FA FA ¬FA A0001 ¬A …. …. …. ….

(e) (optional) Suppose the alarm and gauge are working and the alarm sounds. Using the information from the previous questions, calculate an expression for the probability that the temperature of the core is too high, in terms of the various conditional probabilities in the network.

Solution: (Note: this is a partial solution)

The assumption says that the alarm and gauge are working, which means that FA = ¬fa and FG = ¬fg .

RMIT AI 2021 (Semester 2) – ̃a

COSC1127/1125 – AI Tutorial 10: Bayesian Networks Page 8 of 8

Also, we have that the alarm sounds (A = a). As we are asked the probability of the temperature being too high (T = t), we can formulate the query as:

P(t | a,¬fa,¬fg)

Which we would have to evaluate for the values of G (the remaining variable in our network). However, since we are using the assumptions above (also from parts (c) and (d)) we can deduce the following: Since the alarm sounds (a) and the alarm is not faulty (¬fa), we can infer by the table above that for this to be possible, the gauge reading must be high (G = g). Hence, we can rewrite the above expression as:

Your turn!

P(t | a,¬fa,g,¬fg) = ···

PART 4: Bayesian Networks via pgmpy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Finally, let’s get our hands dirty with Bayesian Networks in Python!

One can build and do inference with Bayesian Networks in Python using the pgmy package! So, head to pgmy home page and:

(a) Install and set-up the package in your machine.

(b) Understand and run the Earthquake example in the documentation.

(c) Use the system to perform and validate some of the inferences you did in Part 1 for this domain.

(d) Enjoy & learn Bayesian Networks with Python! Make sure you understand what the code is doing and that it does not appear as “magic” to you!

End of tutorial worksheet

RMIT AI 2021 (Semester 2) – ̃a