Homework 3#
Type your name and email in the “Student details” section below.
Develop the code and generate the figures you need to solve the problems using this notebook.
For the answers that require a mathematical proof or derivation you can either:
Type the answer using the built-in latex capabilities. In this case, simply export the notebook as a pdf and upload it on gradescope; or
You can print the notebook (after you are done with all the code), write your answers by hand, scan, turn your response to a single pdf, and upload on gradescope.
The total homework points are 100. Please note that the problems are not weighed equally.
Note
This is due before the beginning of the next lecture.
Please match all the pages corresponding to each of the questions when you submit on gradescope.
Student details#
First Name:
Last Name:
Email:
Problem 1 - Playing with matrices#
Write Python code that constructs the 90-degree rotation matrix in 2D:
# your code here
Verify that the matrix \(A\) rotates the vector \(x=\begin{bmatrix} 1 \\ 0 \end{bmatrix}\) by 90 degrees counterclockwise. That is, \(A x = \begin{bmatrix} 0 \\ 1 \end{bmatrix}\).
# your code here
Create the matrix:
# your code here
Is \(B\) a rotation matrix? What is the effect of \(B\) on the vector \(x\)? Hint: To verify that \(B\) is a rotation matrix, you need to check that \(B^T\cdot B = I\) (i.e., that \(B\) is orthogonal) and that the determinant of \(B\) is 1. You can use the np.linalg.det()
function to compute the determinant of a matrix.
# your code here
Apply the matrix \(B\) to the vector \(x\) and verify that it rotates the vector by 180 degrees counterclockwise.
# your code here
If you perform 90 degree rotation 4 times, what is the effect of the resulting matrix on the vector \(x\)? Verify your answer by showing that multiplying \(A\) by itself 4 times results in the identity matrix.
# your code here
Problem 2 - Real data set from high-performance buildings#
In this problem we are going to use the pandas
library to analyze a real dataset collected as part of the NSF-funded project Sociotechnical Systems to Enable Energy-Aware Residential Communities.
You can find the raw dataset here.
Feel free to download it and open it up in Excel.
Uploading data files to Google Colab#
The first thing we are going to do in this problem is make this file accessible from this notebook. We could simply use the download()
function that I introduced in the lecture. I will put the code at the very last block for your convenience.
However, I still want to show you how you can upload your own data. Here we go.
Go to your Google Drive. We need to select a folder to put the files. To keep things simple, let’s just dump everything in “My Drive/Colab Notebooks,” i.e., the same folder that contains the copy of this Jupyter notebook which you should have already made Once you have entered this folder in your Google Drive (just double click on it), drag and drop the temperature_raw.xlsx
file in there.
Now we need to make the Google Drive visible from this computational session. We do this by mounting the drive. You need to run this code and follow the instructions:
# The following code does not run unless you are running the notebook on Google Colab
from google.colab import drive
drive.mount('/content/drive')
Finally, change directories so that the data files are in the current working directory:
# Only if you are running on Google Colab...
# Print working directory before changing
!pwd
# Change to the desired directory on Google drive
%cd "/content/drive/My Drive/Colab Notebooks"
# Print working directory after changing
!pwd
# List the contents of the directory - Make sure it does contain the files we want:
!ls
Now you should have access to the file and you can skip the next code block. If you tried and something went wrong, simply run the code block below and it will put the data file in the right location for you.
!curl -O https://raw.githubusercontent.com/PurdueMechanicalEngineering/me-239-intro-to-data-science/master/data/temperature_raw.xlsx
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
Loading Excel files#
Use the pd.read_excel
function to read the temperature_raw.xlsx
file.
Name the data frame you read df
.
import pandas as pd
df = # your code here
Questions#
Now that you have access to the data frame, let’s explore it. Answer the following questions.
Print the first 10 rows of the data frame
df
using the df.head() method.
# Your code here
Print the last 10 rows of the data frame
df
using the df.tail() method.
# Your code here
Print summary statistics of the data set using the df.describe() method.
# Your code here
Notice that there is a column called “date”, but pandas is not aware that this is actually a date. Let’s make it aware of that using the pd.to_datetime() function.
df.date = pd.to_datetime(df['date'], format='%Y-%m-%d')
Count how many NaN values you have on each column. Hint: Use the df.isna() method to create a boolean array that is True for NaN values and False otherwise. Then, use the sum() function to count the number of True values. True is equivalent to 1 and False is equivalent to 0.
# your code here
Clean the data set by dropping all NaN values. Call the cleaned data
df_clean
. Use the df.dropna() method.
df_clean = # your code here
Verify that there are no NaN values in
df_clean
.
# your code here
How many unique households do we have and what are their unique names? Hint: Use the df.unique() method.
# your code here
Print summary statistics for your cleaned dataset.
# your code here
Save your cleaned dataset in a csv file using
df_clean.to_csv
. Use the nametemperature_clean.csv
.
# your code here
Save the cleanded data set in an Microsoft Excel format. Look at pandas.DataFrame for the right function. Call the file
temperature_clean.xlsx
.
# your code here
Download the the newly created
temperature_clean.xlsx
file to your computer and open it up. There is no need to show something here. Just do it for your own education. If you are running the Jupyter notebook on your own computer, then there is nothing to do apart from finding the file. If you are working on Google Colab, see this.