Applying functions to dataframes

Contents

Applying functions to dataframes

By now, you know that you need to load the standard libraries for plotting. I am going to start hiding this code for now on.

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(rc={"figure.dpi":100, 'savefig.dpi':300})
sns.set_context('notebook')
sns.set_style("ticks")
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina', 'svg')

Let’s download the temp_price.csv dataset we introdcued in The Python data analysis library.

import requests
import os
def download(url, local_filename=None):
    """
    Downloads the file in the ``url`` and saves it in the current working directory.
    """
    data = requests.get(url)
    if local_filename is None:
        local_filename = os.path.basename(url)
    with open(local_filename, 'wb') as fd:
        fd.write(data.content)
   
# The url of the file we want to download
url = 'https://raw.githubusercontent.com/PurdueMechanicalEngineering/me-297-intro-to-data-science/master/data/temp_price.csv'
download(url)

Let’s load it and clean it as we did before:

import pandas as pd
temp_price = pd.read_csv('temp_price.csv')
temp_no_null_rows = temp_price.dropna(axis=0)
clean_data = temp_no_null_rows.rename(columns={'Price per week': 'week_price',
                                               'Price per day': 'daily_price'})
clean_data.head()
household date score t_out t_unit hvac price week_price daily_price
0 a1 2019-01-06 85 38.599231 71.580704 35.113758 0.17303 6.075734 0.867962
1 a10 2019-01-06 70 38.599231 73.286260 63.949057 0.17303 11.065105 1.580729
2 a11 2019-01-06 61 38.599231 74.252046 147.612108 0.17303 25.541323 3.648760
3 a12 2019-01-06 65 38.599231 73.708482 74.394518 0.17303 12.872483 1.838926
4 a13 2019-01-06 66 38.599231 73.549554 173.095836 0.17303 29.950772 4.278682

We are going to change units of temperature from degrees F to degrees C. We will use the function pandas.DataFrame.apply() to achieve this. First, let us make a copy of the data frame:

new_price_data = clean_data.copy()

Now we need to define the function that changes from degrees F to degrees C:

def change_degrees_F_to_C(deg_F):
    """
    Changes the temperature from degrees F to degrees C.
    """
    return (deg_F - 32) * 5 / 9

Now we need to update the two columns week_price and daily_price. This is a very elegant way to do it.

clean_data[['t_out', 't_unit']].apply(change_degrees_F_to_C)
t_out t_unit
0 3.66624 21.989280
1 3.66624 22.936811
2 3.66624 23.473359
3 3.66624 23.171379
4 3.66624 23.083085
5 3.66624 22.414698
6 3.66624 27.488040
7 3.66624 24.880280
8 3.66624 22.866871
9 3.66624 23.347305
10 3.66624 21.884039
11 3.66624 24.847084
12 3.66624 23.770820
13 3.66624 24.484086
14 3.66624 23.297591
15 3.66624 24.337357
16 3.66624 22.750551
17 3.66624 21.637511
18 3.66624 23.686701
19 3.66624 24.048280
20 3.66624 24.305156
21 3.66624 25.893959
22 3.66624 25.940972
23 3.66624 23.154225
24 3.66624 25.505263
25 3.66624 24.003582
27 3.66624 21.964906
28 3.66624 22.496552
29 3.66624 25.355489
30 3.66624 21.614005
32 3.66624 22.715719
34 3.66624 21.346781
35 3.66624 24.490217
36 3.66624 23.475860
37 3.66624 23.798280
38 3.66624 23.262373
39 3.66624 24.211034
40 3.66624 20.633749
42 3.66624 25.585772
43 3.66624 25.672784
44 3.66624 23.767251
45 3.66624 22.729856
46 3.66624 22.692419
47 3.66624 23.679756
48 3.66624 22.648424
49 3.66624 24.263090

In this way, we applied the function to each entry of the the clean_data[['t_out', 't_unit']]. However, this change was not made to the clean_data dataframe. To make the change stick to the old data frame you need to do it like this:

clean_data_in_deg_C = clean_data.copy()
clean_data_in_deg_C[['t_out', 't_unit']] = clean_data[['t_out', 't_unit']].apply(change_degrees_F_to_C)
clean_data_in_deg_C.head()
household date score t_out t_unit hvac price week_price daily_price
0 a1 2019-01-06 85 3.66624 21.989280 35.113758 0.17303 6.075734 0.867962
1 a10 2019-01-06 70 3.66624 22.936811 63.949057 0.17303 11.065105 1.580729
2 a11 2019-01-06 61 3.66624 23.473359 147.612108 0.17303 25.541323 3.648760
3 a12 2019-01-06 65 3.66624 23.171379 74.394518 0.17303 12.872483 1.838926
4 a13 2019-01-06 66 3.66624 23.083085 173.095836 0.17303 29.950772 4.278682

Question

  • Write code that changes the hvac column from kWh’s to MJ (megaJoule).

# your code here