# 程序代写代做代考 hw6

hw6

CS/INFO 3300; INFO 5100

Homework 6

Due 11:59pm Friday April 28

Linear Regression, and the intuitive meaning of the t-test

— or —

If I secretly randomized some aspect of your data set, would you notice?

Simple 2D linear regression takes, as input, an array of pairs of variables. Each pair consists

of an input x and an output y. The model says that there is a linear relationship between

the input and output: each time we increase the input by 1.0, we expect the output to

change by slope. But how can we be sure there really is a relationship? Or in other words,

how can we be sure that slope isn’t 0.0? The linear regression function will always give us a

value, but can we trust it?

This question is what statisticians mean when they ask whether a value is significant. One

way to evaluate the significance of a slope parameter is to use a t-test. We calculate a

function of the absolute value of the slope, and pass that value to a piece of code that

calculates the tail probability of a Student t distribution. The result is your p-value. I’ve

included code used to calculate that p-value. We talked about p-values in class. You may

have seen them before. When you run a linear regression in R or SAS, for each parameter

(ie slope, intercept) you get a number that is between 0 and 1. Values less than 0.05 are

supposed to be good. But what do these numbers really mean?

P-values often seem mysterious. The way they are calculated is, as you can see from the

pValue() code, incredibly confusing. You will compare the p-value you get from this code

to the value we get using a permutation test like we did in class. You will edit code in the file

hw6.html. When you open it and hit the “Run” button, the script will generate 10 points

from a linear model, plot the points, and plot the linear regression line for those points. It

will also show you a p-value for the slope. Now we need to figure out what this value

means.

Remember that the linear model is testing whether there is a systematic relationship

between the input and output variable. A permutation test tells us what would happen if

there were no relationship between these variables by creating fake datasets that have the

same x and y values as the original data but where there is no relationship between the x

and y values. We then compare the model we actually got from the real data to the possible

models we could have gotten from randomly shuffled variations of the same data.

a. Create a function that randomly shuffles the values of the y variable. You can start with

the provided stub of a function called permute(), you will fill in the body of this function.

In this function create a new array of x,y pairs. The x values should appear in the same

order as the x values in the input array points, but the y values should be randomly

shuffled. In other words, each y value in the input array should appear the same number of

times in the output array, but it may or may not be paired with its “correct” x value. (Class

notes may be of interest.) (10 pts)

b. Inside the run() function, create 200 random permutations of the real data. For each

one, compute a linear model from that permuted data. Call drawLine() with a different

color or opacity value from the real-data line. Check whether the absolute value of the

permuted-data slope is larger than the absolute value of the real-data slope. Set

steeperSlopes equal to the total number of permutations for which this condition is true.

(20 pts)

c. The code will now print half the number of permutations that had a more extreme (ie

steeper) slope than the “real” slope from the original points array. Hit “Run” a few times.

How does this permutation test value compare to the t-test p-value? (Use the free-response