程序代写代做代考 hw6

hw6

CS/INFO 3300; INFO 5100
Homework 6
Due 11:59pm Friday April 28

Linear Regression, and the intuitive meaning of the t-test
— or —

If I secretly randomized some aspect of your data set, would you notice?

Simple 2D linear regression takes, as input, an array of pairs of variables. Each pair consists
of an input x and an output y. The model says that there is a linear relationship between
the input and output: each time we increase the input by 1.0, we expect the output to
change by slope. But how can we be sure there really is a relationship? Or in other words,
how can we be sure that slope isn’t 0.0? The linear regression function will always give us a
value, but can we trust it?

This question is what statisticians mean when they ask whether a value is significant. One
way to evaluate the significance of a slope parameter is to use a t-test. We calculate a
function of the absolute value of the slope, and pass that value to a piece of code that
calculates the tail probability of a Student t distribution. The result is your p-value. I’ve
included code used to calculate that p-value. We talked about p-values in class. You may
have seen them before. When you run a linear regression in R or SAS, for each parameter
(ie slope, intercept) you get a number that is between 0 and 1. Values less than 0.05 are
supposed to be good. But what do these numbers really mean?

P-values often seem mysterious. The way they are calculated is, as you can see from the
pValue() code, incredibly confusing. You will compare the p-value you get from this code
to the value we get using a permutation test like we did in class. You will edit code in the file
hw6.html. When you open it and hit the “Run” button, the script will generate 10 points
from a linear model, plot the points, and plot the linear regression line for those points. It
will also show you a p-value for the slope. Now we need to figure out what this value
means.

Remember that the linear model is testing whether there is a systematic relationship
between the input and output variable. A permutation test tells us what would happen if
there were no relationship between these variables by creating fake datasets that have the
same x and y values as the original data but where there is no relationship between the x
and y values. We then compare the model we actually got from the real data to the possible
models we could have gotten from randomly shuffled variations of the same data.

a. Create a function that randomly shuffles the values of the y variable. You can start with
the provided stub of a function called permute(), you will fill in the body of this function.
In this function create a new array of x,y pairs. The x values should appear in the same
order as the x values in the input array points, but the y values should be randomly

shuffled. In other words, each y value in the input array should appear the same number of
times in the output array, but it may or may not be paired with its “correct” x value. (Class
notes may be of interest.) (10 pts)

b. Inside the run() function, create 200 random permutations of the real data. For each
one, compute a linear model from that permuted data. Call drawLine() with a different
color or opacity value from the real-data line. Check whether the absolute value of the
permuted-data slope is larger than the absolute value of the real-data slope. Set
steeperSlopes equal to the total number of permutations for which this condition is true.
(20 pts)

c. The code will now print half the number of permutations that had a more extreme (ie
steeper) slope than the “real” slope from the original points array. Hit “Run” a few times.
How does this permutation test value compare to the t-test p-value? (Use the free-response

Published by admin

Leave a Reply Cancel reply