# CS代考计算机代写 Bayesian Bayesian Statistics Statistics 4224/5224 — Spring 2021

Bayesian Statistics Statistics 4224/5224 — Spring 2021

Assignment 3

Reading:

By Tuesday, February 16, read Chapters 1–3, Sections 4.1–4.2, Chapters 5–6 and Section 7.4 of Bayesian

Data Analysis, by Andrew Gelman et al.

For Thursday, February 18, read Chapters 10–11 and Section 13.1 of Gelman et al.

Homework 3:

The following problems are nominally due before class on Tuesday, February 23. Homework submissions will be accepted on Courseworks, without penalty, through the end of the day on Wednesday, February 24, after which no late homework will be accepted.

1. The data in the table below are the result of a survey of commuters in J = 10 counties likely to be affected by the proposed addition of a high occupancy vehicle (HOV) lane.

County, j Approve, yj Disapprove, nj − yj 1 12 50

2 90 150

3 80 63

4 5 10 5 63 63 6 15 8 7 67 56 8 22 19 9 56 63

10 33 19

Letting θj be the proportion of commuters in county j that approve of the HOV lane, assume

yj |θ ∼ indep Binomial(nj , θj ) with the prior and hyperprior defined by θ1,…,θJ|α,β∼iidBeta(α,β), where p(α,β)∝(α+β)−5/2 .

(a) Simulate at least 1000 Monte Carlo samples from the posterior distributions of θ1,…,θJ. Make a plot of posterior medians and 90% credible intervals versus the sample proportions yj/nj for j = 1,…,J.

1

(b) Approximate the posterior distribution of y ̃10, the number of approvals out of 20 respondents in a follow-up survey taken in county 10. Summarize with a histogram and 90% prediction interval.

(c) Approximate the posterior distribution of θ ̃11, the true approval rate in the next (11th) county from which a survey will be taken. Summarize with a histogram and 90% confidence interval.

(d) Approximate the posterior predictive distribution of y ̃11, the number of approvals out of 20 respondents in a survey from the next (11th) county. Summarize with a histogram and 90% prediction interval.

2. The following table summarizes data on the amount of time students from J = 8 high schools spent on studying or homework during an exam period.

School, j 1

2

3

4

5

6

7

8

No. of students, nj 25

23

20

24

24

22

22

20

Avg hours, y ̄j 9.5

7.0 8.0 6.2

10.8 6.2 6.1 7.4

Analyze these data using the following hierarchical normal model. Letting yij = number of hours spent studying by the ith student at the jth school, assume

where

yij|θ ∼ indep Normal(θj,σ2) for i = 1,…,nj and j = 1,…,J ; θ1,…,θJ|μ,τ ∼ iid Normal(μ,τ2) .

Assume the variance is known to be σ2 = 14.3, and assign independent uniform priors to the hyperparameters, p(μ, τ ) ∝ 1. Compute or approximate the following:

(a) 95% posterior intervals for the mean hours spent studying at each school;

(b) the posterior probability that θ7 is smaller than θ6, as well as the posterior probability that

θ7 is the smallest of the θ’s;

(c) the posterior probability that y ̃7 < y ̃6, as well as the posterior probability that y ̃7 is the smallest of all the y ̃’s, where y ̃j is the number of hours spent studying for a randomly selected student from school j.
2
3. A cancer laboratory is estimating the rate of tumorigenesis in two strains of mice, A and B. They have tumor counts for 10 mice in strain A and 13 mice in strain B. Type A mice have been well studied, and information from other laboratories suggests that type A mice have tumor counts that are approximately Poisson-distributed with mean 12. Tumor count rates for type B mice are unknown, but type B mice are related to type A mice. The observed tumor counts for the two populations are
yA = (12,9,12,14,13,13,15,8,15,6) ; yB = (11,11,10,8,8,8,7,10,6,8,8,9,7) .
Assume a Poisson sampling distribution for each group and the prior distribution:
θA ∼Gamma(120,10), θB ∼Gamma(12,1), p(θA,θB)=p(θA)×p(θB).
(a) Find the equal tail 95% posterior intervals for θA and θB.
(b) Obtain Pr(θA > θB|yA,yB) via Monte Carlo sampling.

(c) Obtain Pr(y ̃A > y ̃B|yA,yB), where y ̃A and y ̃B are samples from the posterior predictive distribution.

4. Use posterior predictive checks to investigate the adequacy of the Poisson model for the tumor

count data of the previous exercise. Specifically, set S = 1000 and generate posterior predictive

datasetsyrep1,…,yrepS;eachyrepsisasampleofsizenA =10fromthePoisson(θs)distribution, AAA A

where θs is itself a sample from the posterior distribution p(θA|yA). Similarly, generate yrep s, a AB

sample of size nB = 13 from the Poisson(θBs ) distribution, where θBs ∼ p(θB|yB).

(a) For each s, let Ts be the sample average of the 10 values of yrep s, divided by the sample

standard deviation of yrep s. Make a histogram of Ts and compare to the observed value of A

this statistic. Based on this statistic, assess the fit of the Poisson model for these data.

(b) Repeat the above goodness of fit evaluation for the data in population B; that is, let Ts be

the sample average of the 13 values of yrep s divided by the sample standard deviation of B

yrep s. Compare the distribution of Ts to the observed value of this statistic. B

5. Using the “Mr. October” data from Homework 2 Problem 2 (Reggie Jackson’s y1 = 563 home runs in n1 = 2820 regular season games, and y2 = 10 home runs in n2 = 27 World Series games), compare the two models

H1 : y1|θ ∼ Poisson(n1θ0) and y2|θ ∼ Poisson(n2θ0) H2 : y1|θ ∼ Poisson(n1θ1) and y2|θ ∼ Poisson(n2θ2)

based on the Bayes factor of Model H2 relative to Model H1. Assume a Uniform(0,1) prior for all θj.

3

A