Homework 3#

  • Type your name and email in the “Student details” section below.

  • Develop the code and generate the figures you need to solve the problems using this notebook.

  • For the answers that require a mathematical proof or derivation you can either:

    • Type the answer using the built-in latex capabilities. In this case, simply export the notebook as a pdf and upload it on gradescope; or

    • You can print the notebook (after you are done with all the code), write your answers by hand, scan, turn your response to a single pdf, and upload on gradescope.

  • The total homework points are 100. Please note that the problems are not weighed equally.

Note

  • This is due before the beginning of the next lecture.

  • Please match all the pages corresponding to each of the questions when you submit on gradescope.

Student details#

  • First Name:

  • Last Name:

  • Email:

Problem 1 - Playing with matrices#

Write Python code that constructs the 90-degree rotation matrix in 2D:

\[\begin{split} A = \begin{bmatrix} \cos(\pi/2) & \sin(\pi/2) \\ -\sin(\pi/2) & \cos(\pi/2) \end{bmatrix} \end{split}\]
# your code here

Verify that the matrix \(A\) rotates the vector \(x=\begin{bmatrix} 1 \\ 0 \end{bmatrix}\) by 90 degrees counterclockwise. That is, \(A x = \begin{bmatrix} 0 \\ 1 \end{bmatrix}\).

# your code here

Create the matrix:

\[ B = A\cdot A \]
# your code here

Is \(B\) a rotation matrix? What is the effect of \(B\) on the vector \(x\)? Hint: To verify that \(B\) is a rotation matrix, you need to check that \(B^T\cdot B = I\) (i.e., that \(B\) is orthogonal) and that the determinant of \(B\) is 1. You can use the np.linalg.det() function to compute the determinant of a matrix.

# your code here

Apply the matrix \(B\) to the vector \(x\) and verify that it rotates the vector by 180 degrees counterclockwise.

# your code here

If you perform 90 degree rotation 4 times, what is the effect of the resulting matrix on the vector \(x\)? Verify your answer by showing that multiplying \(A\) by itself 4 times results in the identity matrix.

# your code here

Problem 2 - Real data set from high-performance buildings#

In this problem we are going to use the pandas library to analyze a real dataset collected as part of the NSF-funded project Sociotechnical Systems to Enable Energy-Aware Residential Communities. You can find the raw dataset here.

Feel free to download it and open it up in Excel.

Uploading data files to Google Colab#

The first thing we are going to do in this problem is make this file accessible from this notebook. We could simply use the download() function that I introduced in the lecture. I will put the code at the very last block for your convenience. However, I still want to show you how you can upload your own data. Here we go.

Go to your Google Drive. We need to select a folder to put the files. To keep things simple, let’s just dump everything in “My Drive/Colab Notebooks,” i.e., the same folder that contains the copy of this Jupyter notebook which you should have already made Once you have entered this folder in your Google Drive (just double click on it), drag and drop the temperature_raw.xlsx file in there.

Now we need to make the Google Drive visible from this computational session. We do this by mounting the drive. You need to run this code and follow the instructions:

# The following code does not run unless you are running the notebook on Google Colab
from google.colab import drive
drive.mount('/content/drive')

Finally, change directories so that the data files are in the current working directory:

# Only if you are running on Google Colab...
# Print working directory before changing
!pwd
# Change to the desired directory on Google drive
%cd "/content/drive/My Drive/Colab Notebooks"
# Print working directory after changing
!pwd
# List the contents of the directory - Make sure it does contain the files we want:
!ls

Now you should have access to the file and you can skip the next code block. If you tried and something went wrong, simply run the code block below and it will put the data file in the right location for you.

!curl -O https://raw.githubusercontent.com/PurdueMechanicalEngineering/me-239-intro-to-data-science/master/data/temperature_raw.xlsx
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

Loading Excel files#

Use the pd.read_excel function to read the temperature_raw.xlsx file. Name the data frame you read df.

import pandas as pd
df = # your code here

Questions#

Now that you have access to the data frame, let’s explore it. Answer the following questions.

  • Print the first 10 rows of the data frame df using the df.head() method.

# Your code here
  • Print the last 10 rows of the data frame df using the df.tail() method.

# Your code here
  • Print summary statistics of the data set using the df.describe() method.

# Your code here
  • Notice that there is a column called “date”, but pandas is not aware that this is actually a date. Let’s make it aware of that using the pd.to_datetime() function.

df.date = pd.to_datetime(df['date'], format='%Y-%m-%d')
  • Count how many NaN values you have on each column. Hint: Use the df.isna() method to create a boolean array that is True for NaN values and False otherwise. Then, use the sum() function to count the number of True values. True is equivalent to 1 and False is equivalent to 0.

# your code here
  • Clean the data set by dropping all NaN values. Call the cleaned data df_clean. Use the df.dropna() method.

df_clean = # your code here
  • Verify that there are no NaN values in df_clean.

# your code here
  • How many unique households do we have and what are their unique names? Hint: Use the df.unique() method.

# your code here
  • Print summary statistics for your cleaned dataset.

# your code here
  • Save your cleaned dataset in a csv file using df_clean.to_csv. Use the name temperature_clean.csv.

# your code here
  • Save the cleanded data set in an Microsoft Excel format. Look at pandas.DataFrame for the right function. Call the file temperature_clean.xlsx.

# your code here
  • Download the the newly created temperature_clean.xlsx file to your computer and open it up. There is no need to show something here. Just do it for your own education. If you are running the Jupyter notebook on your own computer, then there is nothing to do apart from finding the file. If you are working on Google Colab, see this.