CS代考计算机代写 Bayesian Hints:
Hints:
Bayesian Statistics Statistics 4224/5224 — Spring 2021
Homework 3
1. The data keyed in below are the result of a survey of commuters in J = 10 counties likely to be affected by the proposed addition of a high occupancy vehicle (HOV) lane.
> approve <- c(12, 90, 80, 5, 63, 15, 67, 22, 56, 33)
> disapprove <- c(50, 150, 63, 10, 63, 8, 56, 19, 63, 19)
> y <- approve; rm(approve);
> n <- y + disapprove; rm(disapprove);
> J <- length(y)
Letting θj be the proportion of commuters in county j that approve of the HOV lane, assume yj |θ ∼ indep Binomial(nj , θj ) with the prior and hyperprior defined by
θ1,...,θJ|α,β∼iidBeta(α,β), where p(α,β)∝(α+β)−5/2 .
(a) Simulate at least 1000 Monte Carlo samples from the posterior distributions of θ1, . . . , θJ .
Conditional on α and β, the joint posterior of θ1, . . . , θJ is given by θj |α, β, y ∼ indep Beta(α + yj , β + nj − yj ) .
We will simulate a random sample from the marginal posterior p(θ1, . . . , θJ |y) by: i. Simulate random draws (αs,βs) ∼ p(α,β|y) for s = 1,...,S; then
ii. Simulate θjs ∼ Beta(αs + yj , βs + nj − yj ) for j = 1, . . . , J for s = 1, . . . S . This approach is justified by the equality
p(θ|α, β, y) p(α, β|y) dβ dα .
Now we just need to get a handle on the marginal posterior p(α,β|y). We use the relation
p(α, β|y) = p(θ, α, β|y) ∝ p(α, β) p(θ|α, β) , p(θ|α, β, y) p(θ|α, β, y)
which can be shown to reduce to p(α, β|y) ∝ p(α, β)[B(α, β)]−J Jj=1 B(α + yj , β + nj − yj ), where B(a, b) = Γ(a)Γ(b)/Γ(a + b). We can approximate this posterior distribution on a discrete grid of α- and β-values, then simulate random draws from this posterior using the sample function in R.
Because of the skewness in this distribution, however, it works better to reparameterize, and compute grid approximation for the joint posterior of (log(α/β), log(α + β)). The contour plots both before and after this reparameterization are given here (the R code is very similar to that provided for Example05a):
p(θ|y) =
1
Contours of p(alpha,beta|y)
Contours of posterior
beta
0 10 20 30 40 50
log(alpha+beta)
0123456
0 10 20 30 40
alpha
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
log(alpha/beta)
Sampling from the marginal posterior of θ, p(θ|y), is accomplished by:
i. Draw a sample from the marginal posterior of log(α/β), using the discrete grid approx- imation used to produce the right-hand plot above.
ii. Draw a sample from the corresponding conditional distribution of log(α+β), again using the discrete grid approximation.
iii. Add random jitter to both of the above, to better approximate sampling from a continuous joint distribution.
iv. Back-transform those samples, to obtain αs and βs.
v. Draw a sample θs = (θ1s, . . . , θ1s0) from p(θ|αs, βs, y).
Make a plot of posterior medians and 90% credible intervals versus the sample proportions yj/nj for j = 1,...,J.
(b) Approximate the posterior distribution of y ̃10, the number of approvals out of 20 respondents in a follow-up survey taken in county 10. Summarize with a histogram and 90% prediction interval.
(c) Approximate the posterior distribution of θ ̃11, the true approval rate in the next (11th) county from which a survey will be taken. Summarize with a histogram and 90% confidence interval.
(d) Approximate the posterior predictive distribution of y ̃11, the number of approvals out of 20 respondents in a survey from the next (11th) county. Summarize with a histogram and 90% prediction interval.
2. The following data show the amount of time students from J = 8 high schools spent on studying or homework during an exam period.
> y <- c(95,70,80,62,108,62,61,74) / 10
> n <- c(25,23,20,24, 24,22,22,20)
2
Letting yij = number of hours spent studying by the ith student at the jth school, assume yij|θ ∼ indep Normal(θj,σ2) for i = 1,...,nj and j = 1,...,J ; where θ1,...,θJ|μ,τ ∼ iid Normal(μ,τ2) . Assume the variance is known to be σ2 = 14.3,
> sigma <- sqrt(14.3/n); rm(n);
and assign independent uniform priors to the hyperparameters, p(μ, τ ) ∝ 1.
This problem is basically a matter of adapting the code for simulating from the posterior of the hierarchical normal model with known variance, as illustrated in Example05b. To simulate a random draw from the joint posterior p(θ,μ,τ|y) we proceed by:
• Generate τs ∼ p(τ|y), using a numerical approximation to the marginal posterior of τ on a discrete set of point;
• sample μs ∼ p(μ|τs,y), which has a Normal distribution; and
• sample θs ∼ p(θ|μs,τs,y), which are a product of J independent normal distributions.
• Sampling from the posterior predictive distributions, as required for part (d), is accomplished by y ̃js ∼ Normal(θs,σ2 = 14.3), for j = 1,...,J.
(a) A 95% posterior intervals for the mean hours spent studying at each school.
(b) The posterior probability that θ7 is smaller than θ6, as well as the posterior probability that
θ7 is the smallest of the θ’s.
(c) The posterior probability that y ̃7 < y ̃6, as well as the posterior probability that y ̃7 is the smallest of all the y ̃’s, where y ̃j is the number of hours spent studying for a randomly selected student from school j:
3. A cancer laboratory is estimating the rate of tumorigenesis in two strains of mice, A and B. They have tumor counts for 10 mice in strain A and 13 mice in strain B. The observed tumor counts for the two populations are
> y.A <- c(12, 9,12,14,13,13,15, 8,15, 6)
> y.B <- c(11,11,10, 8, 8, 8, 7,10, 6, 8, 8, 9, 7)
Assume a Poisson sampling distribution for each group and the prior distribution:
θA ∼Gamma(120,10), θB ∼Gamma(12,1), p(θA,θB)=p(θA)×p(θB).
(a) Find the equal tail 95% posterior intervals for θA and θB.
(b) Obtain Pr(θA > θB|yA,yB) via Monte Carlo sampling.
(c) Obtain Pr(y ̃A > y ̃B|yA,yB), where y ̃A and y ̃B are samples from the posterior predictive distribution.
3
4. Use posterior predictive checks to investigate the adequacy of the Poisson model for the tumor
count data of the previous exercise. Specifically, set S = 1000 and generate posterior predictive
datasetsyrep1,…,yrepS;eachyrepsisasampleofsizenA =10fromthePoisson(θs)distribution, AAA A
where θs is itself a sample from the posterior distribution p(θA|yA). Similarly, generate yrep s, a AB
sample of size nB = 13 from the Poisson(θBs ) distribution, where θBs ∼ p(θB|yB).
(a) For each s, let Ts be the sample average of the 10 values of yrep s, divided by the sample
standard deviation of yrep s. Make a histogram of Ts and compare to the observed value of A
this statistic. Based on this statistic, assess the fit of the Poisson model for these data. (b) Repeat the above goodness of fit evaluation for the data in population B.
5. Using the “Mr. October” data from Homework 2 Problem 2 (Reggie Jackson’s y1 = 563 home runs in n1 = 2820 regular season games, and y2 = 10 home runs in n2 = 27 World Series games), compare the two models
H1 : y1|θ ∼ Poisson(n1θ0) and y2|θ ∼ Poisson(n2θ0) H2 : y1|θ ∼ Poisson(n1θ1) and y2|θ ∼ Poisson(n2θ2)
based on the Bayes factor of Model H2 relative to Model H1. Assume a Uniform(0,1) prior for all θj.
The Bayes factor for Model H2 relative to Model H1 is
p(y1, y2|H2) BF(H2; H1) = p(y1, y2|H1) =
p(y1, y2|θ1, θ2; H2)p(θ1, θ2|H2)dθ2dθ1
p(y1, y2|θ0; H1)p(θ0|H1)dθ0
.
With Uniform(0,1) priors on all θj this reduces to
where
1 1
dpois(y1|n1θ)dθ × dpois(y2|n2θ)dθ
00 BF(H2;H1)= 1
[dpois(y1|n1θ) × dpois(y2|n2θ)] dθ 0
−λ λx
dpois(x|λ)=e x! for λ>0 and x=0,1,2,… .
.
A
The easiest approach is to evaluate all three integrals by Monte Carlo integration. An integral over θ from 0 to 1 is well approximated by taking the average of the integrand over a large number of θs ∼ iid Uniform(0,1),s = 1,…,S.
4