Applying functions to dataframes
Contents
Applying functions to dataframes¶
By now, you know that you need to load the standard libraries for plotting. I am going to start hiding this code for now on.
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(rc={"figure.dpi":100, 'savefig.dpi':300})
sns.set_context('notebook')
sns.set_style("ticks")
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina', 'svg')
Let’s download the temp_price.csv
dataset we introdcued in The Python data analysis library.
import requests
import os
def download(url, local_filename=None):
"""
Downloads the file in the ``url`` and saves it in the current working directory.
"""
data = requests.get(url)
if local_filename is None:
local_filename = os.path.basename(url)
with open(local_filename, 'wb') as fd:
fd.write(data.content)
# The url of the file we want to download
url = 'https://raw.githubusercontent.com/PurdueMechanicalEngineering/me-297-intro-to-data-science/master/data/temp_price.csv'
download(url)
Let’s load it and clean it as we did before:
import pandas as pd
temp_price = pd.read_csv('temp_price.csv')
temp_no_null_rows = temp_price.dropna(axis=0)
clean_data = temp_no_null_rows.rename(columns={'Price per week': 'week_price',
'Price per day': 'daily_price'})
clean_data.head()
household | date | score | t_out | t_unit | hvac | price | week_price | daily_price | |
---|---|---|---|---|---|---|---|---|---|
0 | a1 | 2019-01-06 | 85 | 38.599231 | 71.580704 | 35.113758 | 0.17303 | 6.075734 | 0.867962 |
1 | a10 | 2019-01-06 | 70 | 38.599231 | 73.286260 | 63.949057 | 0.17303 | 11.065105 | 1.580729 |
2 | a11 | 2019-01-06 | 61 | 38.599231 | 74.252046 | 147.612108 | 0.17303 | 25.541323 | 3.648760 |
3 | a12 | 2019-01-06 | 65 | 38.599231 | 73.708482 | 74.394518 | 0.17303 | 12.872483 | 1.838926 |
4 | a13 | 2019-01-06 | 66 | 38.599231 | 73.549554 | 173.095836 | 0.17303 | 29.950772 | 4.278682 |
We are going to change units of temperature from degrees F to degrees C. We will use the function pandas.DataFrame.apply() to achieve this. First, let us make a copy of the data frame:
new_price_data = clean_data.copy()
Now we need to define the function that changes from degrees F to degrees C:
def change_degrees_F_to_C(deg_F):
"""
Changes the temperature from degrees F to degrees C.
"""
return (deg_F - 32) * 5 / 9
Now we need to update the two columns week_price
and daily_price
.
This is a very elegant way to do it.
clean_data[['t_out', 't_unit']].apply(change_degrees_F_to_C)
t_out | t_unit | |
---|---|---|
0 | 3.66624 | 21.989280 |
1 | 3.66624 | 22.936811 |
2 | 3.66624 | 23.473359 |
3 | 3.66624 | 23.171379 |
4 | 3.66624 | 23.083085 |
5 | 3.66624 | 22.414698 |
6 | 3.66624 | 27.488040 |
7 | 3.66624 | 24.880280 |
8 | 3.66624 | 22.866871 |
9 | 3.66624 | 23.347305 |
10 | 3.66624 | 21.884039 |
11 | 3.66624 | 24.847084 |
12 | 3.66624 | 23.770820 |
13 | 3.66624 | 24.484086 |
14 | 3.66624 | 23.297591 |
15 | 3.66624 | 24.337357 |
16 | 3.66624 | 22.750551 |
17 | 3.66624 | 21.637511 |
18 | 3.66624 | 23.686701 |
19 | 3.66624 | 24.048280 |
20 | 3.66624 | 24.305156 |
21 | 3.66624 | 25.893959 |
22 | 3.66624 | 25.940972 |
23 | 3.66624 | 23.154225 |
24 | 3.66624 | 25.505263 |
25 | 3.66624 | 24.003582 |
27 | 3.66624 | 21.964906 |
28 | 3.66624 | 22.496552 |
29 | 3.66624 | 25.355489 |
30 | 3.66624 | 21.614005 |
32 | 3.66624 | 22.715719 |
34 | 3.66624 | 21.346781 |
35 | 3.66624 | 24.490217 |
36 | 3.66624 | 23.475860 |
37 | 3.66624 | 23.798280 |
38 | 3.66624 | 23.262373 |
39 | 3.66624 | 24.211034 |
40 | 3.66624 | 20.633749 |
42 | 3.66624 | 25.585772 |
43 | 3.66624 | 25.672784 |
44 | 3.66624 | 23.767251 |
45 | 3.66624 | 22.729856 |
46 | 3.66624 | 22.692419 |
47 | 3.66624 | 23.679756 |
48 | 3.66624 | 22.648424 |
49 | 3.66624 | 24.263090 |
In this way, we applied the function to each entry of the the clean_data[['t_out', 't_unit']]
.
However, this change was not made to the clean_data
dataframe.
To make the change stick to the old data frame you need to do it like this:
clean_data_in_deg_C = clean_data.copy()
clean_data_in_deg_C[['t_out', 't_unit']] = clean_data[['t_out', 't_unit']].apply(change_degrees_F_to_C)
clean_data_in_deg_C.head()
household | date | score | t_out | t_unit | hvac | price | week_price | daily_price | |
---|---|---|---|---|---|---|---|---|---|
0 | a1 | 2019-01-06 | 85 | 3.66624 | 21.989280 | 35.113758 | 0.17303 | 6.075734 | 0.867962 |
1 | a10 | 2019-01-06 | 70 | 3.66624 | 22.936811 | 63.949057 | 0.17303 | 11.065105 | 1.580729 |
2 | a11 | 2019-01-06 | 61 | 3.66624 | 23.473359 | 147.612108 | 0.17303 | 25.541323 | 3.648760 |
3 | a12 | 2019-01-06 | 65 | 3.66624 | 23.171379 | 74.394518 | 0.17303 | 12.872483 | 1.838926 |
4 | a13 | 2019-01-06 | 66 | 3.66624 | 23.083085 | 173.095836 | 0.17303 | 29.950772 | 4.278682 |