F71SM STATISTICAL METHODS

2 PROBABILITY 2.1 Introduction

A random experiment is an experiment which is repeatable under identical conditions, and for which, at each repetition, the outcome is uncertain but is one of a known and describable set of possible outcomes.

The sample space S is the set of possible outcomes.

An event A is a subset of S (but see below).

Event A occurs if the outcome of the experiment is an element of the set A.

The union of two events A and B, denoted A ∪ B, is the event which occurs ⇔ at least one of events A, B occurs.

The intersection of two events A and B, denoted A∩B, is the event which occurs ⇔ both A,B occur.

The complement of event A, denoted A′, occurs if and only if the event A does not occur. The empty set ∅, considered as a subset of S, contains none of the set of possible outcomes

of the experiment and so corresponds to the impossible event.

A ∪ A′ = S , A ∩ A′ = ∅

Events A and B are mutually exclusive ⇔ A ∩ B = ∅ (that is, the events cannot occur simultaneously).

A set function is a function whose domain is a collection of sets. Venn diagram

2.2 Probability

Probability is a set function (P ), also called a probability measure, on the collection of subsets of S.

The domain of the function P is S. For A ∈ S, P(A) is a real number which gives the probability that the outcome of the experiment is in A; it is the ‘probability that event A occurs’. We require that P behave like relative frequency: it is a function which is non- negative, bounded above (by 1), and additive. Thus leads us to the following definition in which we declare probability as a function subject to the corresponding three axioms.

Axioms of Probability

Given a sample space the probability set function P : S → R is such that

A1 P(A)≥0forallA∈S

A3 For A1, A2, A3, . . . ∈ S with Ai ∩ Aj = ∅ for i ̸= j, P (∪iAi) = i P (Ai)

2.3 Basic results for probabilities

(i) P(∅)=0

Proof: S = S∪∅ and S∩∅ = ∅ ⇒ P(S) = P(S∪∅) = P(S)+P(∅) by A3; result follows.

(ii) For any event A, P(A) ≤ 1

Proof: 1 = P(S) = P(A ∪ A′) = P(A) + P(A′) by A2, A3 and result follows by A1.

(iii) ForanyeventA,P(A′)=1−P(A) Proof: from proof of (ii) above.

(iv) A⊆B⇒P(A)≤P(B)

Proof: A ⊆ B ⇒ B = A∪(B∩A′) ⇒ P(B) = P(A)+P(B∩A′) by A3, since

A ∩ (B ∩ A′) = ∅, and hence P(B) ≥ P(A) by A1.

Additionrule: P(A∪B)=P(A)+P(B)−P(A∩B)

Proof: A∪B=A∪(B∩A′)andA∩(B∩A′)=∅⇒P(A∪B)=P(A)+P(B∩A′) B = B ∩ S = B ∩ (A ∪ A′) = (B ∩ A) ∪ (B ∩ A′) and (B ∩ A) ∩ (B ∩ A′) = ∅

⇒ P(B) = P(B ∩ A) + P(B ∩ A′)

Together these results ⇒ P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

This generalises the result that for A and B mutually exclusive events, P(A ∪ B) = P(A) + P(B).

(a) Long term relative frequency [e.g. P (drawing-pin lands with its pin sticking up)]

We observe that relative frequency tends to settle down as the number of trials increases. Formally, we define the probability of the event occurring as the limit of the relative frequency, so

as number of trials → ∞, relative frequency → probability .

The graph below shows the relative frequency of occurrence of an event with probability

0.4 after 1, 2, . . . , 200 trials (the outcomes of the trials were simulated in R).

(b) Symmetry [e.g. P (fair six-sided die lands showing a 6)]

Finite number of equally likely outcomes.

In this case P (A) = # outcomes favourable to A

# outcomes possible 3

Evaluating probabilities in practice

The key to evaluating probabilities in this case is to define a sample space in a convenient way and then to count the numbers of outcomes corresponding to various events.

For example, consider a throw of two fair six-sided dice, one red and the other blue. Let A be the event ‘score = 7 or 8’. Let S = {(i,j) : i = 1,2,3,4,5,6; j = 1,2,3,4,5,6} where i and j are the scores on the red and blue die respectively. S consists of 36 elements (equally-likely outcomes).

A = {(1,6),(2,5),(3,4),(4,3),(5,2),(6,1),(2,6),(3,5),(4,4),(5,3),(6,2)} A consists of 11 elements, so P (A) = 11/36 = 0.3056.

(c) Uniform distribution over an interval

In the case that the sample space is an interval on the real line and the event occurs ‘at a random point in the interval’ we adopt a uniform distribution of probability over the interval so that the probabilities of the event occurring in sub-intervals of the same length are equal.

For example suppose we ‘choose a time at random’ between 14:00 and 15:00, then P (selected time is before 14:30) = 0.5

P (selected time is between 14:20 and 14.45) = 25/60 = 0.4167

P (selected time is before 14:10 or after 14:50) = 20/60 = 0.3333

We introduce the concept of the probability that an event occurs, conditional on another

specified event occurring (or, in other language, given that another specified event occurs).

For example, consider the event that in a throw of a fair six-sided die we score 6, conditional on scoring more than 2. The event ‘scoring more than 2’ corresponds to the 4 equally-likely outcomes {3, 4, 5, 6} and of these only 1 outcome corresponds to ‘score of 6’, so the probability required is 1/4. Imposing the condition has effectively reduced/restricted the sample space from {1,2,3,4,5,6} to {3,4,5,6}.

Conditional probability

Note that the conditional probability can be expressed as the ratio of two unconditional probabilities of events defined in terms of the original sample space of size 6 by 1 = 1/6 .

4 4/6 Again, consider a throw of two fair six-sided dice, one red and the other blue. Let A be the

event ‘score = 7 or 8’ and let B be the event ‘score = 8, 9 or 10’.

Let S = {(i,j) : i = 1,2,3,4,5,6; j = 1,2,3,4,5,6} where i and j are the scores on the red

and blue die respectively.

A = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1), (2, 6), (3, 5), (4, 4), (5, 3), (6, 2)}

B = {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2), (3, 6), (4, 5), (5, 4), (6, 3), (4, 6), (5, 5), (6, 4)}

P (A) = 11/36, P (B) = 12/36

A ∩ B is the event ‘score of 8’ and P(A ∩ B) = 5/36

11 elements 12 elements

Consider the event A conditional on B, that is ‘a score of 7 or 8 given that the score is 8, 9, or 10’.

The outcomes in B favourable to A are (2, 6), (3, 5), (4, 4), (5, 3), (6, 2) so the probability of event A conditional on B is 5/12. This probability is 5/36 = P(A ∩ B).

12/36 P (B)

This motivates the general definition:

The probability of event A conditional on event B is denoted P(A | B) and is defined as

P(A|B) = P(A∩B) forP(B)̸=0. P(B)

The multiplication rule for probabilities follows, namely P (A ∩ B) = P (A)P (B | A).

For example, suppose we draw two balls at random, one after the other and without re- placement, from a bag containing 6 red and 4 blue balls. Let A = first ball drawn is red, and let B = second ball drawn is blue. Then

P(1st ball drawn is red and 2nd ball drawn is blue) = P(A ∩ B)

2.6 Independent events and independent trials Events A and B are independent ⇔ P(A ∩ B) = P(A)P(B)

[⇔ P(A | B) = P(A) and P(B | A) = P(B)]

So events A and B are independent if and only if the occurrence of one does not affect the

probability of occurrence of the other.

Events A1,A2,…,Ak are independent if and only if the probability of the intersection of any 2, 3, . . . , k of the events equals the product of their respective probabilities. So, for three events A,B,C to be independent, we require P(A∩B) = P(A)P(B), P(A∩C) = P(A)P(C), P(B ∩ C) = P(B)P(C) and P(A ∩ B ∩ C) = P(A)P(B)P(C).

A trial is a single repetition of a random experiment.

T1 and T2 are independent trials ⇔ all events defined on the outcome of T1 are inde-

pendent of all events defined on the outcome of T2. 2.7 Partitioning of an event

Let {E1,E2,…,Ek} be a partition of S and let A be an event.

Then A = A∩S = A∩(∪iEi) = ∪i (A ∩ Ei) and P (A) = i P (A ∩ Ei) = i P (Ei) P (A | Ei)

The event A has been partitioned into events A ∩ Ei, i = 1, 2, . . . , k. For example, with k = 4:

= P(A)P(B | A)

= 6 ×4= 4 =0.2667.

P (A) is the sum of the probabilities of the events which make up the partition of A.

2.8 Bayes theorem

Let {E1,E2,…,Ek} be a partition of S and let A be an event.

P(Ei ∩A) P(Ei)P(A|Ei) P(Ei)P(A|Ei)

ThenP(Ei |A)= P(A) = P(A) =kj=1P(Ej)P(A|Ej),i=1,2,…,k The result is often written in proportional terms:

P (Ei | A) ∝ P (Ei)P (A | Ei),i = 1,2,…,k. The probabilities P (Ei), i = 1, 2, . . . , k, are the prior probabilities;

the probabilities P (Ei | A), i = 1, 2, . . . , k, are the posterior probabilities.

For example, suppose a population is made up of 60% men and 40% women. The percentages of men and women in the population who have an iPod are 30% and 40% respectively. A person is selected at random from the population and is found to have an iPod.

tree diagram

P (selected person is male) = P (person is male | has iPod) = 0.18 = 0.18 = 0.5294 0.18 + 0.16 0.34

2.9 Worked examples

2.1 In a family with five children, what is the probability that all the children are of the same sex? What is the probability that the three oldest children are boys and the two youngest are girls? What is the probability that the three oldest children are boys?

A suitable sample space is S = {(x1,x2,x3,x4,x5) : xi = M,F; i = 1,2,3,4,5} where xi

is the sex of the ith oldest child. Size of sample space: n(S) = 25 = 32.

Two outcomes are favourable to the event ‘all are same sex’, namely (M,M,M,M,M) and (F,F,F,F,F) . One outcome is favourable to the event ‘three oldest are boys and two youngest are girls’, namely (M, M, M, F, F ). 22 = 4 outcomes are favourable to the event ‘three oldest are boys’, namely any one of the form (M, M, M, ·, ·).

If we make the assumption that all 32 outcomes are equally likely (under what conditions is this a reasonable assumption?) then

P (all same sex) = P (three oldest are boys and two youngest are girls) = P (three oldest are boys) =

2 =0.0625 32

1 = 0.0313 32

4 =0.125. 32

2.2 In how many ways can we choose 6 numbers from a group of 59 for a line in the UK National Lottery?

59 59! 59×58×57×56×55×54

Solution: 6 = 6!53! = 6×5×4×3×2×1 =45,057,474

Note: This leads to (approximately) ‘a chance of 1 in 45 million’ of winning the jackpot.

2.3 A fair die is thrown 4 times. Find P (total score is 4 or 24). Solution:

S = {(x1,x2,x3,x4) : xi = 1,2,3,4,5,6; i = 1,2,3,4}, with n(S) = 64 = 1296.

Two outcomes are favourable to the event ‘total score is 4 or 24’, namely (1, 1, 1, 1) and

(6, 6, 6, 6). So P (score 4 or 24) = 2/1296 = 0.0015.

[Note: We are effectively taking a random sample of size 4 with replacement from the

population {1, 2, 3, 4, 5, 6}.]

2.4 A committee consists of 7 men and 4 women and a sub-committee of 6 is to be chosen at random. Find the probabilities that the sub-committee contains exactly k women, k = 0,1,2,3,4.

# ways of choosing the sub-committee = # different selections of 6 people from 11 = 11 6

# different selections of k women from 4 = 4 k

#differentselectionsof(6−k)menfrom7= 7 6−k

# ways of choosing a sub-committee containing k women = 4 7

∴ P (sub-committee contains exactly k women) = Substituting in values of k, then

(4)(7 ) k 6−k

P(sub-committee contains exactly k women)

= 0.45455 k=2

2.5 A bag contains 4 white and 3 red balls. Two balls are drawn out at random without replacement. What is the probability that they are white and red respectively? What is the probability that the second ball drawn is white?

Formally: let Wi be ‘ith ball drawn is white’, Ri be ‘ith ball drawn is red’ P(W1 ∩R2)=P(W1)P(R2|W1)=(4/7)×(3/6)=2/7

Noting that W1,R1 are mutually exclusive and exhaustive, we partition W2 as W2 =W2 ∩(W1 ∪R1)=(W2 ∩W1)∪(W2 ∩R1)

Then P(W2) = P ((W2 ∩ W1) ∪ (W2 ∩ R1)) = P(W2 ∩ W1) + P(W2 ∩ R1) = P(W1)P(W2|W1) + P(R1)P(W2|R1)

= (4/7) × (3/6) + (3/7) × (4/6) = 2/7 + 2/7 = 4/7 = 0.5714.

It is very easy to sort out all possibilities with a ‘tree diagram’:

Note that P (W2) = P (W1).

0.01515 k=0

0.18182 k=1

0.30303 k=3 0.04545 k=4

2.6 Andrew and Brian play a round of golf. The probability that Andrew (Brian) gets a 4 at the first hole is 0.3(0.6). Assuming independence, find the probability that at least one of them gets a 4 at the first hole.

Let A(B) be the event ‘Andrew (Brian) gets a 4’

Method 1: P(A∪B) = P(A)+P(B)−P(A∩B) = P(A)+P(B)−P(A)P(B) (by independence) = 0.3 + 0.6 − 0.18 = 0.72.

Method 2: P(neither gets a 4) = P(A′ ∩ B′) = P(A′)P(B′) = 0.7 × 0.4 = 0.28, so P (at least one gets a 4) = 1 − 0.28 = 0.72.

2.7 Sampling inspection

Regular sampling of mass-produced items is carried out by taking random samples of 8 items from the production line. Each selected item is tested to find out if it is defective. Assuming independence from item to item, and assuming that 10% of the production is defective, we can adopt a model with i.i.d. trials and with P (selected item is defective) = 0.1.

∴ P (sample contains 2 defective items) = 8 × 0.12 × 0.96 = 0.1488 2

Note: Our production of items is finite. Strictly speaking, as we sample items one after another the successive trials are not independent: the outcome of one trial conditions the probabilities associated with all later trials.

e.g. P (2nd item defective|1st item OK) ̸= P (2nd item defective)

and P (1st item OK and 2nd item defective) ̸= P (1st item OK)P (2nd item defective)

However, if the population of items is large (as in the above illustration) and the sample of moderate size, it is reasonable to adopt a model of independent trials with constant probability of an item being defective i.e. a model of i.i.d. trials. We are in effect assuming an unchanging population as an approximation to a population which is actually changing slightly from trial to trial — we are using the theory of sampling with replacement to approximate the real situation, which is sampling without replacement.

2.8 Players A and B throw a regular 6-sided die in turn. The first to throw a 6 wins the game. A throws first. Find the probability that A wins the game.

We can represent the sample space as the countable union of the events Ek, where Ek = game ends on the kth throw of the die, k = 1,2,3,…

A B A B A B ··· 6

6′6′6 6′6′6′6 6′6′6′6′6

. . . . . where 6′ denotes ‘not a 6’.

Probability (1/6) (5/6)(1/6) (5/6)2 (1/6) (5/6)3 (1/6) (5/6)4 (1/6)

E1 E2 E3 E4 E5 .

P(A wins) = P(E1 ∪E3 ∪E5 ∪…)

= P(E1)+P(E3)+P(E5)+···

= 1/6 + (5/6)2(1/6) + (5/6)4(1/6) + · · ·

= (1/6)1+25/36+(25/36)2 +···

= (1/6) (1 − 25/36)−1 = (1/6) × (36/11) = 6/11 = 0.5455.

Note: the advantage, of course, lies with the player who throws first.

OR: Let p = P(A wins)

‘A wins’ = ‘A wins on first throw’ ∪ ‘die passes to B and B does not win the game’

After the die passes to B, B is then in exactly the same position as A was at the start of the game, so p = 1/6 + (5/6)(1 − p), which gives p = 6/11.

2.9 A stick of length 12cm is broken into two pieces at a point chosen at random along its length. What is the probability that the rectangle which can be constructed using the pieces as two adjacent sides has an area less than 27cm2?

Lay the stick down and let x be the distance of the break point from the left end of the

S = {x : 0 < x < 12}
The probability measure is length.
Area of rectangle = x(12 − x), which is less than 27 for x < 3 or x > 9. Probability of event ‘x < 3 or x > 9’ is 6/12 = 0.5.

2.10 Two students are each, independently of the other, equally likely to arrive for a 10:15 lecture at any time between 10:15 and 10:30. Find the probability that their arrivals are separated by at least 10 minutes.

Call the students A and B and let x, y be their respective arrival times in minutes after

S = {(x, y) : 0 < x < 15, 0 < y < 15}
The probability measure is area.
Area for which the students arrive more than 10 minutes apart is {|x − y| > 10}, which has area 25 (draw a diagram to see this); area of S = 152 = 225.

Required probability = 25/225 = 1/9 = 0.1111.