Homework 15¶

Type your name and email in the “Student details” section below.
Develop the code and generate the figures you need to solve the problems using this notebook.
For the answers that require a mathematical proof or derivation you can either:
- Type the answer using the built-in latex capabilities. In this case, simply export the notebook as a pdf and upload it on gradescope; or
- You can print the notebook (after you are done with all the code), write your answers by hand, scan, turn your response to a single pdf, and upload on gradescope.
The total homework points are 100. Please note that the problems are not weighed equally.

Note

This is due before the beginning of the next lecture.
Please match all the pages corresponding to each of the questions when you submit on gradescope.

Student details¶

First Name:
Last Name:
Email:

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(rc={"figure.dpi":100, 'savefig.dpi':300})
sns.set_context('notebook')
sns.set_style("ticks")
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina', 'svg')
import numpy as np
import scipy.stats as st
import pandas as pd
import requests
import os
def download(url, local_filename=None):
    """
    Downloads the file in the ``url`` and saves it in the current working directory.
    """
    data = requests.get(url)
    if local_filename is None:
        local_filename = os.path.basename(url)
    with open(local_filename, 'wb') as fd:
        fd.write(data.content)

Problem 1 - Optimizing the performance of a compressor¶

In this problem we are going to need this dataset. The dataset was kindly provided to us by Professor Davide Ziviani. As before, you can either put it on your Google drive or just download it with the code segment below:

url = 'https://raw.githubusercontent.com/PurdueMechanicalEngineering/me-297-intro-to-data-science/master/data/compressor_data.xlsx'
download(url)

Note that this is an Excell file, so we are going to need pandas to read it. Here is how:

import pandas as pd
data = pd.read_excel('compressor_data.xlsx')
data

	T_e	DT_sh	T_c	DT_sc	T_amb	f	m_dot	m_dot.1	Capacity	Power	Current	COP	Efficiency
0	-30	11	25	8	35	60	28.8	8.000000	1557	901	4.4	1.73	0.467
1	-30	11	30	8	35	60	23.0	6.388889	1201	881	4.0	1.36	0.425
2	-30	11	35	8	35	60	17.9	4.972222	892	858	3.7	1.04	0.382
3	-25	11	25	8	35	60	46.4	12.888889	2509	1125	5.3	2.23	0.548
4	-25	11	30	8	35	60	40.2	11.166667	2098	1122	5.1	1.87	0.519
...	...	...	...	...	...	...	...	...	...	...	...	...	...
60	10	11	45	8	35	60	245.2	68.111111	12057	2525	11.3	4.78	0.722
61	10	11	50	8	35	60	234.1	65.027778	10939	2740	12.3	3.99	0.719
62	10	11	55	8	35	60	222.2	61.722222	9819	2929	13.1	3.35	0.709
63	10	11	60	8	35	60	209.3	58.138889	8697	3091	13.7	2.81	0.693
64	10	11	65	8	35	60	195.4	54.277778	7575	3223	14.2	2.35	0.672

65 rows × 13 columns

The data are part of a an experimental study of a variable speed reciprocating compressor. The experimentalists varied two temperatures \(T_e\) and \(T_c\) (both in C) and they measured various other quantities. Our goal is to learn the map between \(T_e\) and \(T_c\) and measured Capacity and Power (both in W). First, let’s see how you can extract only the relevant data.

# Here is how to extract the T_e and T_c columns and put them in a single numpy array
X = data[['T_e','T_c']].values
X

array([[-30,  25],
       [-30,  30],
       [-30,  35],
       [-25,  25],
       [-25,  30],
       [-25,  35],
       [-25,  40],
       [-25,  45],
       [-20,  25],
       [-20,  30],
       [-20,  35],
       [-20,  40],
       [-20,  45],
       [-20,  50],
       [-15,  25],
       [-15,  30],
       [-15,  35],
       [-15,  40],
       [-15,  45],
       [-15,  50],
       [-15,  55],
       [-10,  25],
       [-10,  30],
       [-10,  35],
       [-10,  40],
       [-10,  45],
       [-10,  50],
       [-10,  55],
       [-10,  60],
       [ -5,  25],
       [ -5,  30],
       [ -5,  35],
       [ -5,  40],
       [ -5,  45],
       [ -5,  50],
       [ -5,  55],
       [ -5,  60],
       [ -5,  65],
       [  0,  25],
       [  0,  30],
       [  0,  35],
       [  0,  40],
       [  0,  45],
       [  0,  50],
       [  0,  55],
       [  0,  60],
       [  0,  65],
       [  5,  25],
       [  5,  30],
       [  5,  35],
       [  5,  40],
       [  5,  45],
       [  5,  50],
       [  5,  55],
       [  5,  60],
       [  5,  65],
       [ 10,  25],
       [ 10,  30],
       [ 10,  35],
       [ 10,  40],
       [ 10,  45],
       [ 10,  50],
       [ 10,  55],
       [ 10,  60],
       [ 10,  65]])

# Here is how to extract the Capacity
y = data['Capacity'].values
y

array([ 1557,  1201,   892,  2509,  2098,  1726,  1398,  1112,  3684,
        3206,  2762,  2354,  1981,  1647,  5100,  4547,  4019,  3520,
        3050,  2612,  2206,  6777,  6137,  5516,  4915,  4338,  3784,
        3256,  2755,  8734,  7996,  7271,  6559,  5863,  5184,  4524,
        3883,  3264, 10989, 10144,  9304,  8471,  7646,  6831,  6027,
        5237,  4461, 13562, 12599, 11633, 10668,  9704,  8743,  7786,
        6835,  5891, 16472, 15380, 14279, 13171, 12057, 10939,  9819,
        8697,  7575])

Fit the following multivariate polynomial model to both the Capacity and the Power:

\[ y = w_1 + w_2T_e + w_3 T_c + w_4 T_eT_c + w_5 T_e^2 + w_6T_c^2 + w_7 T_e^2T_c + w_8T_eT_c^2 + w_9 T_e^3 + w_{10}T_c^3 + \epsilon, \]

where \(\epsilon\) is a Gaussian noise term with unknown variance.

Hints:

You may use sklearn.preprocessing.PolynomialFeatures to construct the design matrix of your polynomial features. Do not program the design matrix by hand.
You should split your data into training and test and use various diagnostics to make sure that your models make sense.

Part A - Fit the capacity¶

Please don’t just fit blindly. Split in training and test and use all the usual diagnostics.

# For polynomial features
from sklearn.preprocessing import PolynomialFeatures
# For splitting the data
from sklearn.model_selection import train_test_split
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# Here is how to make Polynomials features
poly = PolynomialFeatures(degree=3)
# Design matrix for train data
Phi_train = poly.fit_transform(X_train)
# Fit with Bayesian Ridge - Use normalize=True to let BayesianRidge
# scale the inputs and outputs to reasonable values (it will subtract their empirical mean
# and divide by their empirical standard deviation)
# Your code here

The mean square error:

# your code here

The coefficient of determination R2:

# your code here

The observations vs predictions plot:

# your code here

Would you trust your model? Explain your answer.

your comments here

Part B - Fit the power¶

Repeat what you did above.

# your code here (use as many code and text blocks as you like)

Introduction to Data Science for Mechanical Engineers (Lecture Book)

Homework 15

Contents