# Homework 15¶

• Type your name and email in the “Student details” section below.

• Develop the code and generate the figures you need to solve the problems using this notebook.

• For the answers that require a mathematical proof or derivation you can either:

• Type the answer using the built-in latex capabilities. In this case, simply export the notebook as a pdf and upload it on gradescope; or

• You can print the notebook (after you are done with all the code), write your answers by hand, scan, turn your response to a single pdf, and upload on gradescope.

• The total homework points are 100. Please note that the problems are not weighed equally.

Note

• This is due before the beginning of the next lecture.

• Please match all the pages corresponding to each of the questions when you submit on gradescope.

## Student details¶

• First Name:

• Last Name:

• Email:

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(rc={"figure.dpi":100, 'savefig.dpi':300})
sns.set_context('notebook')
sns.set_style("ticks")
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina', 'svg')
import numpy as np
import scipy.stats as st
import pandas as pd
import requests
import os
"""
Downloads the file in the url and saves it in the current working directory.
"""
data = requests.get(url)
if local_filename is None:
local_filename = os.path.basename(url)
with open(local_filename, 'wb') as fd:
fd.write(data.content)


## Problem 1 - Optimizing the performance of a compressor¶

In this problem we are going to need this dataset. The dataset was kindly provided to us by Professor Davide Ziviani. As before, you can either put it on your Google drive or just download it with the code segment below:

url = 'https://raw.githubusercontent.com/PurdueMechanicalEngineering/me-297-intro-to-data-science/master/data/compressor_data.xlsx'


Note that this is an Excell file, so we are going to need pandas to read it. Here is how:

import pandas as pd
data

T_e DT_sh T_c DT_sc T_amb f m_dot m_dot.1 Capacity Power Current COP Efficiency
0 -30 11 25 8 35 60 28.8 8.000000 1557 901 4.4 1.73 0.467
1 -30 11 30 8 35 60 23.0 6.388889 1201 881 4.0 1.36 0.425
2 -30 11 35 8 35 60 17.9 4.972222 892 858 3.7 1.04 0.382
3 -25 11 25 8 35 60 46.4 12.888889 2509 1125 5.3 2.23 0.548
4 -25 11 30 8 35 60 40.2 11.166667 2098 1122 5.1 1.87 0.519
... ... ... ... ... ... ... ... ... ... ... ... ... ...
60 10 11 45 8 35 60 245.2 68.111111 12057 2525 11.3 4.78 0.722
61 10 11 50 8 35 60 234.1 65.027778 10939 2740 12.3 3.99 0.719
62 10 11 55 8 35 60 222.2 61.722222 9819 2929 13.1 3.35 0.709
63 10 11 60 8 35 60 209.3 58.138889 8697 3091 13.7 2.81 0.693
64 10 11 65 8 35 60 195.4 54.277778 7575 3223 14.2 2.35 0.672

65 rows × 13 columns

The data are part of a an experimental study of a variable speed reciprocating compressor. The experimentalists varied two temperatures $$T_e$$ and $$T_c$$ (both in C) and they measured various other quantities. Our goal is to learn the map between $$T_e$$ and $$T_c$$ and measured Capacity and Power (both in W). First, let’s see how you can extract only the relevant data.

# Here is how to extract the T_e and T_c columns and put them in a single numpy array
X = data[['T_e','T_c']].values
X

array([[-30,  25],
[-30,  30],
[-30,  35],
[-25,  25],
[-25,  30],
[-25,  35],
[-25,  40],
[-25,  45],
[-20,  25],
[-20,  30],
[-20,  35],
[-20,  40],
[-20,  45],
[-20,  50],
[-15,  25],
[-15,  30],
[-15,  35],
[-15,  40],
[-15,  45],
[-15,  50],
[-15,  55],
[-10,  25],
[-10,  30],
[-10,  35],
[-10,  40],
[-10,  45],
[-10,  50],
[-10,  55],
[-10,  60],
[ -5,  25],
[ -5,  30],
[ -5,  35],
[ -5,  40],
[ -5,  45],
[ -5,  50],
[ -5,  55],
[ -5,  60],
[ -5,  65],
[  0,  25],
[  0,  30],
[  0,  35],
[  0,  40],
[  0,  45],
[  0,  50],
[  0,  55],
[  0,  60],
[  0,  65],
[  5,  25],
[  5,  30],
[  5,  35],
[  5,  40],
[  5,  45],
[  5,  50],
[  5,  55],
[  5,  60],
[  5,  65],
[ 10,  25],
[ 10,  30],
[ 10,  35],
[ 10,  40],
[ 10,  45],
[ 10,  50],
[ 10,  55],
[ 10,  60],
[ 10,  65]])

# Here is how to extract the Capacity
y = data['Capacity'].values
y

array([ 1557,  1201,   892,  2509,  2098,  1726,  1398,  1112,  3684,
3206,  2762,  2354,  1981,  1647,  5100,  4547,  4019,  3520,
3050,  2612,  2206,  6777,  6137,  5516,  4915,  4338,  3784,
3256,  2755,  8734,  7996,  7271,  6559,  5863,  5184,  4524,
3883,  3264, 10989, 10144,  9304,  8471,  7646,  6831,  6027,
5237,  4461, 13562, 12599, 11633, 10668,  9704,  8743,  7786,
6835,  5891, 16472, 15380, 14279, 13171, 12057, 10939,  9819,
8697,  7575])


Fit the following multivariate polynomial model to both the Capacity and the Power:

$y = w_1 + w_2T_e + w_3 T_c + w_4 T_eT_c + w_5 T_e^2 + w_6T_c^2 + w_7 T_e^2T_c + w_8T_eT_c^2 + w_9 T_e^3 + w_{10}T_c^3 + \epsilon,$

where $$\epsilon$$ is a Gaussian noise term with unknown variance.

Hints:

• You may use sklearn.preprocessing.PolynomialFeatures to construct the design matrix of your polynomial features. Do not program the design matrix by hand.

• You should split your data into training and test and use various diagnostics to make sure that your models make sense.

### Part A - Fit the capacity¶

Please don’t just fit blindly. Split in training and test and use all the usual diagnostics.

# For polynomial features
from sklearn.preprocessing import PolynomialFeatures
# For splitting the data
from sklearn.model_selection import train_test_split
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# Here is how to make Polynomials features
poly = PolynomialFeatures(degree=3)
# Design matrix for train data
Phi_train = poly.fit_transform(X_train)
# Fit with Bayesian Ridge - Use normalize=True to let BayesianRidge
# scale the inputs and outputs to reasonable values (it will subtract their empirical mean
# and divide by their empirical standard deviation)


The mean square error:

# your code here


The coefficient of determination R2:

# your code here


The observations vs predictions plot:

# your code here


# your code here (use as many code and text blocks as you like)