Applying functions to dataframes#
Let’s download the temp_price.csv dataset we introdcued in The Python Data Analysis Library.
Let’s load it and clean it as we did before:
| household | date | score | t_out | t_unit | hvac | price | week_price | daily_price | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | a1 | 2019-01-06 | 85 | 38.599231 | 71.580704 | 35.113758 | 0.17303 | 6.075734 | 0.867962 |
| 1 | a10 | 2019-01-06 | 70 | 38.599231 | 73.286260 | 63.949057 | 0.17303 | 11.065105 | 1.580729 |
| 2 | a11 | 2019-01-06 | 61 | 38.599231 | 74.252046 | 147.612108 | 0.17303 | 25.541323 | 3.648760 |
| 3 | a12 | 2019-01-06 | 65 | 38.599231 | 73.708482 | 74.394518 | 0.17303 | 12.872483 | 1.838926 |
| 4 | a13 | 2019-01-06 | 66 | 38.599231 | 73.549554 | 173.095836 | 0.17303 | 29.950772 | 4.278682 |
We are going to change units of temperature from degrees F to degrees C. We will use the function pandas.DataFrame.apply() to achieve this. First, we need to define the function that changes from degrees F to degrees C:
def change_degrees_F_to_C(deg_F):
"""
Changes the temperature from degrees F to degrees C.
"""
return (deg_F - 32) * 5 / 9
Now we need to update the two columns t_out and t_unit.
This is a very elegant way to do it.
clean_data[['t_out', 't_unit']].apply(change_degrees_F_to_C)
| t_out | t_unit | |
|---|---|---|
| 0 | 3.66624 | 21.989280 |
| 1 | 3.66624 | 22.936811 |
| 2 | 3.66624 | 23.473359 |
| 3 | 3.66624 | 23.171379 |
| 4 | 3.66624 | 23.083085 |
| 5 | 3.66624 | 22.414698 |
| 6 | 3.66624 | 27.488040 |
| 7 | 3.66624 | 24.880280 |
| 8 | 3.66624 | 22.866871 |
| 9 | 3.66624 | 23.347305 |
| 10 | 3.66624 | 21.884039 |
| 11 | 3.66624 | 24.847084 |
| 12 | 3.66624 | 23.770820 |
| 13 | 3.66624 | 24.484086 |
| 14 | 3.66624 | 23.297591 |
| 15 | 3.66624 | 24.337357 |
| 16 | 3.66624 | 22.750551 |
| 17 | 3.66624 | 21.637511 |
| 18 | 3.66624 | 23.686701 |
| 19 | 3.66624 | 24.048280 |
| 20 | 3.66624 | 24.305156 |
| 21 | 3.66624 | 25.893959 |
| 22 | 3.66624 | 25.940972 |
| 23 | 3.66624 | 23.154225 |
| 24 | 3.66624 | 25.505263 |
| 25 | 3.66624 | 24.003582 |
| 27 | 3.66624 | 21.964906 |
| 28 | 3.66624 | 22.496552 |
| 29 | 3.66624 | 25.355489 |
| 30 | 3.66624 | 21.614005 |
| 32 | 3.66624 | 22.715719 |
| 34 | 3.66624 | 21.346781 |
| 35 | 3.66624 | 24.490217 |
| 36 | 3.66624 | 23.475860 |
| 37 | 3.66624 | 23.798280 |
| 38 | 3.66624 | 23.262373 |
| 39 | 3.66624 | 24.211034 |
| 40 | 3.66624 | 20.633749 |
| 42 | 3.66624 | 25.585772 |
| 43 | 3.66624 | 25.672784 |
| 44 | 3.66624 | 23.767251 |
| 45 | 3.66624 | 22.729856 |
| 46 | 3.66624 | 22.692419 |
| 47 | 3.66624 | 23.679756 |
| 48 | 3.66624 | 22.648424 |
| 49 | 3.66624 | 24.263090 |
In this way, we applied the function to each entry of the the clean_data[['t_out', 't_unit']].
However, this change was not made to the clean_data dataframe.
To make the change stick to the old data frame you need to do it like this:
clean_data_in_deg_C = clean_data.copy()
clean_data_in_deg_C[['t_out', 't_unit']] = clean_data[['t_out', 't_unit']].apply(change_degrees_F_to_C)
clean_data_in_deg_C.head()
| household | date | score | t_out | t_unit | hvac | price | week_price | daily_price | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | a1 | 2019-01-06 | 85 | 3.66624 | 21.989280 | 35.113758 | 0.17303 | 6.075734 | 0.867962 |
| 1 | a10 | 2019-01-06 | 70 | 3.66624 | 22.936811 | 63.949057 | 0.17303 | 11.065105 | 1.580729 |
| 2 | a11 | 2019-01-06 | 61 | 3.66624 | 23.473359 | 147.612108 | 0.17303 | 25.541323 | 3.648760 |
| 3 | a12 | 2019-01-06 | 65 | 3.66624 | 23.171379 | 74.394518 | 0.17303 | 12.872483 | 1.838926 |
| 4 | a13 | 2019-01-06 | 66 | 3.66624 | 23.083085 | 173.095836 | 0.17303 | 29.950772 | 4.278682 |
Question#
Write code that changes the
hvaccolumn from kWh’s to MJ (megaJoule).
# your code here