机器学习|统计|回归分析|DS|Python

MECH 203


MECH 203
Week 7 Jupyter Notebook Written Report
LINEAR REGRESSION
Due date: 11:59PM on Tuesday, March 3rd, 2020
Grading & Weight: This assignment is out of 50 marks, as further specified in the mark
breakdown for each question. The assignment is worth 6% of your overall final grade in the
course.
Late Penalty: Late submissions will be penalized at 10% each day for up to 5 days, in which
case a grade of zero will be given.
1. Overview
This assignment is about applying simple linear regression to interpret data sets on model
rockets, thermal expansion of Al and hybrid cars in Questions 1, 2, 3, respectively.
Before you start working on the assignment make sure you:
 Review the online lecture videos, in-class lecture slides and the required reading.
This assignment aligns with the following CLO’s:
CLO 4: Implement simple linear regression
CLO 4: Implement simple linear regression with error bars
CLO 4: Apply a statistical test to compare regression models
1.1 Time for completion
This assignment will take approximately 6 hours to complete.
2. Instructions
For each question the corresponding data is available both as *.csv and
*.dat files, comma-separated values file and tab-delimited text file, respectively.
The *.csv and *.dat files with the same name contain the same data.
When you have completed the assignment, upload the Jupyter Notebook
file to onQ.
TASKS
Question 1
The “Q1_rocket_data” file contains data on the performance of model rockets constructed by
MME students. Each line represents a rocket launch, where the X and Y correspond to the

pressure of the propellant gas (measured in psi) and the maximum height (apogee, measured in
m) reached by the rocket, respectively.
By applying linear regression to this data we can create an empirical model which can predict the
MECH 203作业代做、代写Python课程作业
expected apogee of such a rocket from the pressure of the propellant. To perform the linear
regression follow the steps below:
a. Calculate the values of 𝑥̅, 𝑦̅, 𝑥𝑦̅̅̅, 𝑥
̅̅2̅, 𝑦̅̅2̅ (2/50)
b. Calculate the regression coefficients 𝛽̂
0, 𝛽̂
1 for the best fitting regression line using the
quantities above (2/50)
c. Calculate the sum of squares corresponding to the best fitting regression line (3/50)
d. Calculate the standard error of the regression coefficients 𝛽̂
0, 𝛽̂
1 and comment on their
value (3/50)
e. Make a plot which shows the data points and the best fitting regression line (2/50)
f. Calculate the 𝑅
2
and comment on its value (i.e. interpret its meaning) (2/50)
g. Perform the linear fit using Python (e.g. numpy.polyfit) and compare the coefficients 𝛽̂
0,
𝛽̂
1 for the best fitting regression line and the 𝑅
2 value obtained this way to the values
obtained above (2/50).
(The data was provided by Prof. Surgenor.)
Question 2
The “Q2_Al-thermal-expansion_data” file contains data collected using neutron scattering on
the crystal lattice parameter of an Al-based composite as a function of temperature (i.e. the data
is on the thermal expansion of the material). Each line represents a measurement, where the X
and Y correspond to the crystal lattice parameter (measured in Angstroms, 1 Angstrom = 10-10
m) and the temperature (measured in C), respectively.
By applying linear regression to this data we can create an empirical model which can predict the
expected lattice parameter of this Al-composite if the temperature of the material is known. To
perform the linear regression follow the steps below:
a. Calculate the values of 𝑥̅, 𝑦̅, 𝑥𝑦̅̅̅, 𝑥
̅̅2̅, 𝑦̅̅2̅ (2/50)
b. Calculate the regression coefficients 𝛽̂
0, 𝛽̂
1 for the best fitting regression line
using the quantities above (2/50)
c. Calculate the sum of squares corresponding to the best fitting regression line (3/50)
d. Calculate the standard error of the regression coefficients 𝛽̂
0, 𝛽̂
1 and comment on their
value (3/50)
e. Make a plot which shows the data points and the best fitting regression line (2/50)
f. Calculate the 𝑅
2
and comment on its value (i.e. interpret its meaning) (2/50)
g. Perform the linear fit using Python (e.g. numpy.polyfit) and compare the coefficients 𝛽̂
0,
𝛽̂
1 for the best fitting regression line and the 𝑅
2 value obtained this way to the values
obtained above (2/50).
(The data was collected by E. Tulk.)
Question 3
The “Q3_hybrid-cars_data” file contains data on hybrid cars from various manufacturers which
came out in the years between 1997 and 2013. Each line represents a specific car. The columns
denoted year, msrp, accelrate and mpg represent the model year, the manufacturer’s suggested
retail price in 2013 $, the maximum acceleration rate in km/hour/second and the fuel economy in
miles/gallon, respectively.
Using this data set we would like to investigate how the characteristics listed above correlate
with each other. Use 𝑅
2
to quantify and investigate these correlations while answering the
questions below:
a. How much does the year the car was manufactured affect its retail price? I.e. what is the
𝑅
2
for year vs msrp? Make a plot which shows the data points and the best fitting
regression line (3/50)
b. How much does the retail price of the car affect its maximum acceleration rate? I.e. what
is the 𝑅
2
for msrp vs accelrate? Make a plot which shows the data points and the best
fitting regression line (3/50)
c. How much does the fuel economy of the car affect its maximum acceleration rate? I.e.
what is the 𝑅
2
for mpg vs accelrate? Make a plot which shows the data points and the
best fitting regression line (3/50)
d. How much does the year the car was manufactured affect its fuel economy? I.e. what is
the 𝑅
2
for year vs mpg? Make a plot which shows the data points and the best fitting
regression line (3/50)
e. Compare the 𝑅
2 values obtained above and comment on their relative value: In your
subjective opinion which cases from above show a noteworthy effect (correlation) and
which don’t? Explain why (6/50).
(Note: the value of 𝑅
2
is independent of the choice of the response and regressor variables for
the data set pairs above, i.e. X vs Y and Y vs X have identical 𝑅
2
values. (This does not hold for
the regression coefficients and sum of squares)).
(Source of the data: D-J. Lim, S.R. Jahromi, T.R. Anderson, A-A. Tudorie (2014) “Comparing
Technological Advancement of Hybrid Electric Vehicles (HEV) in Different Market Segments”,
Technological Forecasting & Social Change, http://dx.doi.org/10.1016/j.techfore.2014.05.008)
3. Evaluation
The report will be evaluated based on the completeness of the answers/solutions provided
for each question.

Leave a Reply

Your email address will not be published. Required fields are marked *