Data Science And Statistical Modelling In Space And Time

Assessment – Practical Modelling exercises

Section A consists of spatial modelling questions, and Section B consists of time series modelling questions. Commented R code (and the outcomes/plots) should be part of your answers.

This assessment is worth 50% of the module mark.

Copyright By cscodehelp代写 加微信 cscodehelp

You should submit a single pdf containing answers to A and B to the

A. Spatial modelling [100 marks]

You have just started work at an oceanographic consultancy. You are asked to interpolate a set of sea surface temperature data for one month in the Kuroshio region off the coast of Japan onto a grid with a resolution of .5° in both the E and N directions. We are going to assume a flat Earth!

The data are in the file kuroshio.csv. You are also provided with an R program to read the data (readkuro.R).

Analyse the data and answer the following questions (indicative marks are given).

1. Produce numerical and graphical summaries of the data. Comment on your findings and highlight any

potential outliers in the data. [10 marks]

2. Check for isotropy (the function variog4 in geoR may be useful). Do you need a trend in the model?

[20 marks]

3. Decide what spatial model you want to fit. You may want to try several and see which one fits best. Estimate the parameters of your chosen model by Maximum Likelihood and plot the expected value and variance for the estimate on the required grid. Validate your model or models. [35 marks]

4. Repeat 3 but use Bayesian methods. Show your priors and the ensuing posteriors. Consider different priors and models and justify your choice of the final model. Illustrate your results by plotting the mean and variance fields as well as some samples from the posterior fields. [25 marks]

5. Comment on the difference and the advantages and disadvantages of the two methods of estimation.

[10 marks]

Note: fitting Gaussian processes becomes significantly more expensive as the number of data points increases. You may want to consider fitting models to a subset of the data for computational efficiency (consider how you might want to split the data, how you might use the left-out data).

B. Time series modelling [100 marks]

1. The figures labelled A to E show five time series whose defining equations are given below.

i) Xt = 0.8Xt−1 + εt, ii) Xt = εt − 0.5εt−1,

iii) Xt = 2Xt−1 − Xt−2 + εt + 0.5εt−1 + 0.4εt−2, iv) Xt = εt + 0.1(250 − t)εt−1,

v) Xt = Xt−1 + 0.9εt−1 + εt. Ineachcase,εt ∼N(0,1).

State, with reasons, which equation corresponds to which plot. [10 marks] Fig A

0 50 100 150 200 250

50 100 150 200 250

0 50 100 150 200 250

50 100 150 200 250

−50 −30 −10 −60 −20 20

0 2000 4000 −3 −1 123

0 50 100 150 200 250

2. The ACF and PACF are plotted below for 5 different series. Suggest appropriate ARMA models for each (A, B, C, D, E), giving reasons for your choice in each case. [10 marks]

ACF, Series A

PACF, Series A

0 5 10 15 20 25 30

0 5 10 15 20 25 30

−4 −2 0 2 4

Partial ACF

−0.5 −0.3 −0.1

ACF, Series B

PACF, Series B

0 5 10 15 20 25 30

ACF, Series C

0 5 10 15 20 25 30

PACF, Series C

0 5 10 15 20 25 30

ACF, Series D

0 5 10 15 20 25 30

PACF, Series D

0 5 10 15 20 25 30

0 5 10 15 20 25 30

0.0 0.4 0.8

−0.2 0.2 0.6

Partial ACF

Partial ACF

Partial ACF

0.0 0.2 0.4 0.6

−0.2 0.0 0.2 0.4

ACF, Series E

PACF, Series E

0 5 10 15 20 25 30

0 5 10 15 20 25 30

3. The data for this assignment are the measured strength of the overturning in the North Atlantic from moorings at 26N between April 2004 and March 2014, found in file overturning.csv.

a. Average the data to quarterly means. Produce numerical and graphical summaries of the averaged data, and comment on your findings and highlight any potential outliers. You might find it useful to convert the averaged data to a time series object ts(). [10 marks]

b. Fit an ARMA and an ARIMA model to the data. Choose the most appropriate model, and use this to predict the values for the six 3-month periods from April 2014 to September 2015. [30 marks]

c. Fit a DLM to the data (including both a trend and a seasonal component). Use your model to predict the values for April 2014 to September 2015. [30 marks]

d. Compare the results of parts b and c, and comment on any differences you may find. [10 marks]

0.0 0.4 0.8

Partial ACF

−0.2 0.2 0.4 0.6

程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: kyit630461@163.com