# Homework 15

## Contents

# Homework 15¶

Type your name and email in the “Student details” section below.

Develop the code and generate the figures you need to solve the problems using this notebook.

For the answers that require a mathematical proof or derivation you can either:

Type the answer using the built-in latex capabilities. In this case, simply export the notebook as a pdf and upload it on gradescope; or

You can print the notebook (after you are done with all the code), write your answers by hand, scan, turn your response to a single pdf, and upload on gradescope.

The total homework points are 100. Please note that the problems are not weighed equally.

Note

This is due before the beginning of the next lecture.

Please match all the pages corresponding to each of the questions when you submit on gradescope.

## Student details¶

**First Name:****Last Name:****Email:**

```
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(rc={"figure.dpi":100, 'savefig.dpi':300})
sns.set_context('notebook')
sns.set_style("ticks")
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina', 'svg')
import numpy as np
import scipy.stats as st
import pandas as pd
import requests
import os
def download(url, local_filename=None):
"""
Downloads the file in the ``url`` and saves it in the current working directory.
"""
data = requests.get(url)
if local_filename is None:
local_filename = os.path.basename(url)
with open(local_filename, 'wb') as fd:
fd.write(data.content)
```

## Problem 1 - Optimizing the performance of a compressor¶

In this problem we are going to need this dataset. The dataset was kindly provided to us by Professor Davide Ziviani. As before, you can either put it on your Google drive or just download it with the code segment below:

```
url = 'https://raw.githubusercontent.com/PurdueMechanicalEngineering/me-297-intro-to-data-science/master/data/compressor_data.xlsx'
download(url)
```

Note that this is an Excell file, so we are going to need pandas to read it. Here is how:

```
import pandas as pd
data = pd.read_excel('compressor_data.xlsx')
data
```

T_e | DT_sh | T_c | DT_sc | T_amb | f | m_dot | m_dot.1 | Capacity | Power | Current | COP | Efficiency | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

0 | -30 | 11 | 25 | 8 | 35 | 60 | 28.8 | 8.000000 | 1557 | 901 | 4.4 | 1.73 | 0.467 |

1 | -30 | 11 | 30 | 8 | 35 | 60 | 23.0 | 6.388889 | 1201 | 881 | 4.0 | 1.36 | 0.425 |

2 | -30 | 11 | 35 | 8 | 35 | 60 | 17.9 | 4.972222 | 892 | 858 | 3.7 | 1.04 | 0.382 |

3 | -25 | 11 | 25 | 8 | 35 | 60 | 46.4 | 12.888889 | 2509 | 1125 | 5.3 | 2.23 | 0.548 |

4 | -25 | 11 | 30 | 8 | 35 | 60 | 40.2 | 11.166667 | 2098 | 1122 | 5.1 | 1.87 | 0.519 |

... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |

60 | 10 | 11 | 45 | 8 | 35 | 60 | 245.2 | 68.111111 | 12057 | 2525 | 11.3 | 4.78 | 0.722 |

61 | 10 | 11 | 50 | 8 | 35 | 60 | 234.1 | 65.027778 | 10939 | 2740 | 12.3 | 3.99 | 0.719 |

62 | 10 | 11 | 55 | 8 | 35 | 60 | 222.2 | 61.722222 | 9819 | 2929 | 13.1 | 3.35 | 0.709 |

63 | 10 | 11 | 60 | 8 | 35 | 60 | 209.3 | 58.138889 | 8697 | 3091 | 13.7 | 2.81 | 0.693 |

64 | 10 | 11 | 65 | 8 | 35 | 60 | 195.4 | 54.277778 | 7575 | 3223 | 14.2 | 2.35 | 0.672 |

65 rows × 13 columns

The data are part of a an experimental study of a variable speed reciprocating compressor. The experimentalists varied two temperatures \(T_e\) and \(T_c\) (both in C) and they measured various other quantities. Our goal is to learn the map between \(T_e\) and \(T_c\) and measured Capacity and Power (both in W). First, let’s see how you can extract only the relevant data.

```
# Here is how to extract the T_e and T_c columns and put them in a single numpy array
X = data[['T_e','T_c']].values
X
```

```
array([[-30, 25],
[-30, 30],
[-30, 35],
[-25, 25],
[-25, 30],
[-25, 35],
[-25, 40],
[-25, 45],
[-20, 25],
[-20, 30],
[-20, 35],
[-20, 40],
[-20, 45],
[-20, 50],
[-15, 25],
[-15, 30],
[-15, 35],
[-15, 40],
[-15, 45],
[-15, 50],
[-15, 55],
[-10, 25],
[-10, 30],
[-10, 35],
[-10, 40],
[-10, 45],
[-10, 50],
[-10, 55],
[-10, 60],
[ -5, 25],
[ -5, 30],
[ -5, 35],
[ -5, 40],
[ -5, 45],
[ -5, 50],
[ -5, 55],
[ -5, 60],
[ -5, 65],
[ 0, 25],
[ 0, 30],
[ 0, 35],
[ 0, 40],
[ 0, 45],
[ 0, 50],
[ 0, 55],
[ 0, 60],
[ 0, 65],
[ 5, 25],
[ 5, 30],
[ 5, 35],
[ 5, 40],
[ 5, 45],
[ 5, 50],
[ 5, 55],
[ 5, 60],
[ 5, 65],
[ 10, 25],
[ 10, 30],
[ 10, 35],
[ 10, 40],
[ 10, 45],
[ 10, 50],
[ 10, 55],
[ 10, 60],
[ 10, 65]])
```

```
# Here is how to extract the Capacity
y = data['Capacity'].values
y
```

```
array([ 1557, 1201, 892, 2509, 2098, 1726, 1398, 1112, 3684,
3206, 2762, 2354, 1981, 1647, 5100, 4547, 4019, 3520,
3050, 2612, 2206, 6777, 6137, 5516, 4915, 4338, 3784,
3256, 2755, 8734, 7996, 7271, 6559, 5863, 5184, 4524,
3883, 3264, 10989, 10144, 9304, 8471, 7646, 6831, 6027,
5237, 4461, 13562, 12599, 11633, 10668, 9704, 8743, 7786,
6835, 5891, 16472, 15380, 14279, 13171, 12057, 10939, 9819,
8697, 7575])
```

Fit the following multivariate polynomial model to **both the Capacity and the Power**:

where \(\epsilon\) is a Gaussian noise term with unknown variance.

**Hints:**

You may use sklearn.preprocessing.PolynomialFeatures to construct the design matrix of your polynomial features. Do not program the design matrix by hand.

You should split your data into training and test and use various diagnostics to make sure that your models make sense.

### Part A - Fit the capacity¶

Please don’t just fit blindly. Split in training and test and use all the usual diagnostics.

```
# For polynomial features
from sklearn.preprocessing import PolynomialFeatures
# For splitting the data
from sklearn.model_selection import train_test_split
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# Here is how to make Polynomials features
poly = PolynomialFeatures(degree=3)
# Design matrix for train data
Phi_train = poly.fit_transform(X_train)
# Fit with Bayesian Ridge - Use normalize=True to let BayesianRidge
# scale the inputs and outputs to reasonable values (it will subtract their empirical mean
# and divide by their empirical standard deviation)
# Your code here
```

The mean square error:

```
# your code here
```

The coefficient of determination R2:

```
# your code here
```

The observations vs predictions plot:

```
# your code here
```

Would you trust your model? Explain your answer.

*your comments here*

### Part B - Fit the power¶

Repeat what you did above.

```
# your code here (use as many code and text blocks as you like)
```