(applying-functions-to-dataframes)=
# Applying functions to dataframes

By now, you know that you need to load the standard libraries for plotting.
I am going to start hiding this code for now on.

In [4]:
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(rc={"figure.dpi":100, 'savefig.dpi':300})
sns.set_context('notebook')
sns.set_style("ticks")
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina', 'svg')

Let's download the `temp_price.csv` dataset we introdcued in {ref}`lecture03:pandas`.

In [5]:
import requests
import os
def download(url, local_filename=None):
    """
    Downloads the file in the ``url`` and saves it in the current working directory.
    """
    data = requests.get(url)
    if local_filename is None:
        local_filename = os.path.basename(url)
    with open(local_filename, 'wb') as fd:
        fd.write(data.content)
   
# The url of the file we want to download
url = 'https://raw.githubusercontent.com/PurdueMechanicalEngineering/me-297-intro-to-data-science/master/data/temp_price.csv'
download(url)

Let's load it and clean it as we did before:

In [6]:
import pandas as pd
temp_price = pd.read_csv('temp_price.csv')
temp_no_null_rows = temp_price.dropna(axis=0)
clean_data = temp_no_null_rows.rename(columns={'Price per week': 'week_price',
                                               'Price per day': 'daily_price'})
clean_data.head()

Unnamed: 0,household,date,score,t_out,t_unit,hvac,price,week_price,daily_price
0,a1,2019-01-06,85,38.599231,71.580704,35.113758,0.17303,6.075734,0.867962
1,a10,2019-01-06,70,38.599231,73.28626,63.949057,0.17303,11.065105,1.580729
2,a11,2019-01-06,61,38.599231,74.252046,147.612108,0.17303,25.541323,3.64876
3,a12,2019-01-06,65,38.599231,73.708482,74.394518,0.17303,12.872483,1.838926
4,a13,2019-01-06,66,38.599231,73.549554,173.095836,0.17303,29.950772,4.278682


We are going to change units of temperature from degrees F to degrees C.
We will use the function [pandas.DataFrame.apply()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html) to achieve this.
First, let us make a copy of the data frame:

In [10]:
new_price_data = clean_data.copy()

Now we need to define the function that changes from degrees F to degrees C:

In [13]:
def change_degrees_F_to_C(deg_F):
    """
    Changes the temperature from degrees F to degrees C.
    """
    return (deg_F - 32) * 5 / 9

Now we need to update the two columns `week_price` and `daily_price`.
This is a very elegant way to do it.

In [14]:
clean_data[['t_out', 't_unit']].apply(change_degrees_F_to_C)

Unnamed: 0,t_out,t_unit
0,3.66624,21.98928
1,3.66624,22.936811
2,3.66624,23.473359
3,3.66624,23.171379
4,3.66624,23.083085
5,3.66624,22.414698
6,3.66624,27.48804
7,3.66624,24.88028
8,3.66624,22.866871
9,3.66624,23.347305


In this way, we applied the function to each entry of the the `clean_data[['t_out', 't_unit']]`.
However, this change was not made to the `clean_data` dataframe.
To make the change stick to the old data frame you need to do it like this:

In [16]:
clean_data_in_deg_C = clean_data.copy()
clean_data_in_deg_C[['t_out', 't_unit']] = clean_data[['t_out', 't_unit']].apply(change_degrees_F_to_C)
clean_data_in_deg_C.head()

Unnamed: 0,household,date,score,t_out,t_unit,hvac,price,week_price,daily_price
0,a1,2019-01-06,85,3.66624,21.98928,35.113758,0.17303,6.075734,0.867962
1,a10,2019-01-06,70,3.66624,22.936811,63.949057,0.17303,11.065105,1.580729
2,a11,2019-01-06,61,3.66624,23.473359,147.612108,0.17303,25.541323,3.64876
3,a12,2019-01-06,65,3.66624,23.171379,74.394518,0.17303,12.872483,1.838926
4,a13,2019-01-06,66,3.66624,23.083085,173.095836,0.17303,29.950772,4.278682


## Question

+ Write code that changes the `hvac` column from kWh's to MJ (megaJoule).

In [17]:
# your code here