{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(lecture03:pandas)=\n",
    "# The Python Data Analysis Library\n",
    "\n",
    "[Pandas](https://pandas.pydata.org/) is Python library that provides fast, flexible and data structures that lets you work real life data quickly. \n",
    "Typically, we import the library like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's load the `catalysis.csv` file we created before. I will download it for you first:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
      "                                 Dload  Upload   Total   Spent    Left  Speed\n",
      "100   261  100   261    0     0    544      0 --:--:-- --:--:-- --:--:--   544\n"
     ]
    }
   ],
   "source": [
    "!curl -O https://raw.githubusercontent.com/PurdueMechanicalEngineering/me-239-intro-to-data-science/master/data/catalysis.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The file should now be in the working directory. Let's check that:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "catalysis.csv\n"
     ]
    }
   ],
   "source": [
    "!ls catalysis.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now load the file into a pandas [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th># Time</th>\n",
       "      <th>NO3</th>\n",
       "      <th>NO2</th>\n",
       "      <th>N2</th>\n",
       "      <th>NH3</th>\n",
       "      <th>N2O</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>500.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>30</td>\n",
       "      <td>250.95</td>\n",
       "      <td>107.32</td>\n",
       "      <td>18.51</td>\n",
       "      <td>3.33</td>\n",
       "      <td>4.98</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>60</td>\n",
       "      <td>123.66</td>\n",
       "      <td>132.33</td>\n",
       "      <td>74.85</td>\n",
       "      <td>7.34</td>\n",
       "      <td>20.14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>90</td>\n",
       "      <td>84.47</td>\n",
       "      <td>98.81</td>\n",
       "      <td>166.19</td>\n",
       "      <td>13.14</td>\n",
       "      <td>42.10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>120</td>\n",
       "      <td>30.24</td>\n",
       "      <td>38.74</td>\n",
       "      <td>249.78</td>\n",
       "      <td>19.54</td>\n",
       "      <td>55.98</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>150</td>\n",
       "      <td>27.94</td>\n",
       "      <td>10.42</td>\n",
       "      <td>292.32</td>\n",
       "      <td>24.07</td>\n",
       "      <td>60.65</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>180</td>\n",
       "      <td>13.54</td>\n",
       "      <td>6.11</td>\n",
       "      <td>309.50</td>\n",
       "      <td>27.26</td>\n",
       "      <td>62.54</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   # Time     NO3     NO2      N2    NH3    N2O\n",
       "0       0  500.00    0.00    0.00   0.00   0.00\n",
       "1      30  250.95  107.32   18.51   3.33   4.98\n",
       "2      60  123.66  132.33   74.85   7.34  20.14\n",
       "3      90   84.47   98.81  166.19  13.14  42.10\n",
       "4     120   30.24   38.74  249.78  19.54  55.98\n",
       "5     150   27.94   10.42  292.32  24.07  60.65\n",
       "6     180   13.54    6.11  309.50  27.26  62.54"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.read_csv('catalysis.csv')\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "DataFrame is a two-dimensional data structure that can store data of different types (e.g., numbers, strings, booleans) in a table format. It is similar to a spreadsheet. The columns can have names. Here are the column names:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['# Time', 'NO3', 'NO2', 'N2', 'NH3', 'N2O'], dtype='object')"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We do not particularly like the fact that the first column is has the \"#\" symbol. Let's remove it by renaming the column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['Time', 'NO3', 'NO2', 'N2', 'NH3', 'N2O'], dtype='object')"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = data.rename(columns={\"# Time\": \"Time\"})\n",
    "data.columns\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That's better!\n",
    "\n",
    "Let's summarize the data. The function `DataFrame.describe()` provides a summary of the data. It shows the mean, standard deviation, minimum, maximum, and quartiles of the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Time</th>\n",
       "      <th>NO3</th>\n",
       "      <th>NO2</th>\n",
       "      <th>N2</th>\n",
       "      <th>NH3</th>\n",
       "      <th>N2O</th>\n",
       "      <th>X</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>7.000000</td>\n",
       "      <td>7.000000</td>\n",
       "      <td>7.000000</td>\n",
       "      <td>7.000000</td>\n",
       "      <td>7.000000</td>\n",
       "      <td>7.000000</td>\n",
       "      <td>7.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>90.000000</td>\n",
       "      <td>147.257143</td>\n",
       "      <td>56.247143</td>\n",
       "      <td>158.735714</td>\n",
       "      <td>13.525714</td>\n",
       "      <td>35.198571</td>\n",
       "      <td>89.035714</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>64.807407</td>\n",
       "      <td>175.825979</td>\n",
       "      <td>55.215843</td>\n",
       "      <td>129.625326</td>\n",
       "      <td>10.498362</td>\n",
       "      <td>26.626079</td>\n",
       "      <td>44.289176</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>13.540000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>45.000000</td>\n",
       "      <td>29.090000</td>\n",
       "      <td>8.265000</td>\n",
       "      <td>46.680000</td>\n",
       "      <td>5.335000</td>\n",
       "      <td>12.560000</td>\n",
       "      <td>82.825000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>90.000000</td>\n",
       "      <td>84.470000</td>\n",
       "      <td>38.740000</td>\n",
       "      <td>166.190000</td>\n",
       "      <td>13.140000</td>\n",
       "      <td>42.100000</td>\n",
       "      <td>95.290000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>135.000000</td>\n",
       "      <td>187.305000</td>\n",
       "      <td>103.065000</td>\n",
       "      <td>271.050000</td>\n",
       "      <td>21.805000</td>\n",
       "      <td>58.315000</td>\n",
       "      <td>110.315000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>180.000000</td>\n",
       "      <td>500.000000</td>\n",
       "      <td>132.330000</td>\n",
       "      <td>309.500000</td>\n",
       "      <td>27.260000</td>\n",
       "      <td>62.540000</td>\n",
       "      <td>141.680000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Time         NO3         NO2          N2        NH3        N2O  \\\n",
       "count    7.000000    7.000000    7.000000    7.000000   7.000000   7.000000   \n",
       "mean    90.000000  147.257143   56.247143  158.735714  13.525714  35.198571   \n",
       "std     64.807407  175.825979   55.215843  129.625326  10.498362  26.626079   \n",
       "min      0.000000   13.540000    0.000000    0.000000   0.000000   0.000000   \n",
       "25%     45.000000   29.090000    8.265000   46.680000   5.335000  12.560000   \n",
       "50%     90.000000   84.470000   38.740000  166.190000  13.140000  42.100000   \n",
       "75%    135.000000  187.305000  103.065000  271.050000  21.805000  58.315000   \n",
       "max    180.000000  500.000000  132.330000  309.500000  27.260000  62.540000   \n",
       "\n",
       "                X  \n",
       "count    7.000000  \n",
       "mean    89.035714  \n",
       "std     44.289176  \n",
       "min      0.000000  \n",
       "25%     82.825000  \n",
       "50%     95.290000  \n",
       "75%    110.315000  \n",
       "max    141.680000  "
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How can we extract the data pertaining to \"NO3\"? Easy!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    500.00\n",
       "1    250.95\n",
       "2    123.66\n",
       "3     84.47\n",
       "4     30.24\n",
       "5     27.94\n",
       "6     13.54\n",
       "Name: NO3, dtype: float64"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['NO3']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What type of object is this? Let's check:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(data['NO3'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Aha! It's a pandas [Series](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html)! You can go to the documentation to see what you can do with it.\n",
    "\n",
    "What if you want to extract the data pertaining to \"NO3\" and \"NO2\"? You can do this by passing a list of column names to the dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>NO3</th>\n",
       "      <th>NO2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>500.00</td>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>250.95</td>\n",
       "      <td>107.32</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>123.66</td>\n",
       "      <td>132.33</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>84.47</td>\n",
       "      <td>98.81</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>30.24</td>\n",
       "      <td>38.74</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>27.94</td>\n",
       "      <td>10.42</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>13.54</td>\n",
       "      <td>6.11</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      NO3     NO2\n",
       "0  500.00    0.00\n",
       "1  250.95  107.32\n",
       "2  123.66  132.33\n",
       "3   84.47   98.81\n",
       "4   30.24   38.74\n",
       "5   27.94   10.42\n",
       "6   13.54    6.11"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[['NO3', 'NO2']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What if I ask for a column that does not exist? Try it out to see the error.\n",
    "\n",
    "Another useful operation is to extract a subset of the dataframe based on some condition. For example, let's extract the data for the first 90 minutes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Time</th>\n",
       "      <th>NO3</th>\n",
       "      <th>NO2</th>\n",
       "      <th>N2</th>\n",
       "      <th>NH3</th>\n",
       "      <th>N2O</th>\n",
       "      <th>X</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>500.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>30</td>\n",
       "      <td>250.95</td>\n",
       "      <td>107.32</td>\n",
       "      <td>18.51</td>\n",
       "      <td>3.33</td>\n",
       "      <td>4.98</td>\n",
       "      <td>114.91</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>60</td>\n",
       "      <td>123.66</td>\n",
       "      <td>132.33</td>\n",
       "      <td>74.85</td>\n",
       "      <td>7.34</td>\n",
       "      <td>20.14</td>\n",
       "      <td>141.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>90</td>\n",
       "      <td>84.47</td>\n",
       "      <td>98.81</td>\n",
       "      <td>166.19</td>\n",
       "      <td>13.14</td>\n",
       "      <td>42.10</td>\n",
       "      <td>95.29</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Time     NO3     NO2      N2    NH3    N2O       X\n",
       "0     0  500.00    0.00    0.00   0.00   0.00    0.00\n",
       "1    30  250.95  107.32   18.51   3.33   4.98  114.91\n",
       "2    60  123.66  132.33   74.85   7.34  20.14  141.68\n",
       "3    90   84.47   98.81  166.19  13.14  42.10   95.29"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[data['Time'] <= 90]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is going on here? What on earth is ``data['Time'] <= 90``? Let's print it to see:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     True\n",
       "1     True\n",
       "2     True\n",
       "3     True\n",
       "4    False\n",
       "5    False\n",
       "6    False\n",
       "Name: Time, dtype: bool"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['Time'] <= 90"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It looks like a boolean ``Series`` object. It contains ``True`` and ``False`` values. When you pass it to the dataframe, it only keeps the rows where the condition is ``True``. This is neat!\n",
    "\n",
    "What about getting the rows corresponding to times between 60 and 150 minutes? This is how you can do it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Time</th>\n",
       "      <th>NO3</th>\n",
       "      <th>NO2</th>\n",
       "      <th>N2</th>\n",
       "      <th>NH3</th>\n",
       "      <th>N2O</th>\n",
       "      <th>X</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>60</td>\n",
       "      <td>123.66</td>\n",
       "      <td>132.33</td>\n",
       "      <td>74.85</td>\n",
       "      <td>7.34</td>\n",
       "      <td>20.14</td>\n",
       "      <td>141.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>90</td>\n",
       "      <td>84.47</td>\n",
       "      <td>98.81</td>\n",
       "      <td>166.19</td>\n",
       "      <td>13.14</td>\n",
       "      <td>42.10</td>\n",
       "      <td>95.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>120</td>\n",
       "      <td>30.24</td>\n",
       "      <td>38.74</td>\n",
       "      <td>249.78</td>\n",
       "      <td>19.54</td>\n",
       "      <td>55.98</td>\n",
       "      <td>105.72</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>150</td>\n",
       "      <td>27.94</td>\n",
       "      <td>10.42</td>\n",
       "      <td>292.32</td>\n",
       "      <td>24.07</td>\n",
       "      <td>60.65</td>\n",
       "      <td>84.60</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Time     NO3     NO2      N2    NH3    N2O       X\n",
       "2    60  123.66  132.33   74.85   7.34  20.14  141.68\n",
       "3    90   84.47   98.81  166.19  13.14  42.10   95.29\n",
       "4   120   30.24   38.74  249.78  19.54  55.98  105.72\n",
       "5   150   27.94   10.42  292.32  24.07  60.65   84.60"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[(data['Time'] >= 60) & (data['Time'] <= 150)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pay attention to the parentheses. They are necessary! The character ``&`` is the logical AND operator. It is applied element-wise to the ``Series`` objects. If it sees a ``True`` and a ``False``, it returns ``False``. Only if it sees two ``True``s, it returns ``True``. Note that this is different from the ``and`` operator in Python. If you try using ``and`` instead of ``&``, you will get an error.\n",
    "\n",
    "Let's do an OR operation. For example, let's get the rows where the time is either less than 60 minutes or greater than 150 minutes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Time</th>\n",
       "      <th>NO3</th>\n",
       "      <th>NO2</th>\n",
       "      <th>N2</th>\n",
       "      <th>NH3</th>\n",
       "      <th>N2O</th>\n",
       "      <th>X</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>500.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>30</td>\n",
       "      <td>250.95</td>\n",
       "      <td>107.32</td>\n",
       "      <td>18.51</td>\n",
       "      <td>3.33</td>\n",
       "      <td>4.98</td>\n",
       "      <td>114.91</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>180</td>\n",
       "      <td>13.54</td>\n",
       "      <td>6.11</td>\n",
       "      <td>309.50</td>\n",
       "      <td>27.26</td>\n",
       "      <td>62.54</td>\n",
       "      <td>81.05</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Time     NO3     NO2      N2    NH3    N2O       X\n",
       "0     0  500.00    0.00    0.00   0.00   0.00    0.00\n",
       "1    30  250.95  107.32   18.51   3.33   4.98  114.91\n",
       "6   180   13.54    6.11  309.50  27.26  62.54   81.05"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[(data['Time'] < 60) | (data['Time'] > 150)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The ``|`` operator is the logical OR operator. It is applied element-wise to the ``Series`` objects. If it sees a ``True`` or a ``False``, it returns ``True``. Only if it sees two ``False``s, it returns ``False``.\n",
    "\n",
    "How do we do a NOT operation? The operator we need is ``~``. For example, let's get the rows where the time is not between 60 and 150 minutes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Time</th>\n",
       "      <th>NO3</th>\n",
       "      <th>NO2</th>\n",
       "      <th>N2</th>\n",
       "      <th>NH3</th>\n",
       "      <th>N2O</th>\n",
       "      <th>X</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>500.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>30</td>\n",
       "      <td>250.95</td>\n",
       "      <td>107.32</td>\n",
       "      <td>18.51</td>\n",
       "      <td>3.33</td>\n",
       "      <td>4.98</td>\n",
       "      <td>114.91</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Time     NO3     NO2     N2   NH3   N2O       X\n",
       "0     0  500.00    0.00   0.00  0.00  0.00    0.00\n",
       "1    30  250.95  107.32  18.51  3.33  4.98  114.91"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[~(data['Time'] >= 60) & (data['Time'] <= 150)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's repeat the total mass calculation that we did before:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    500.00\n",
       "1    385.09\n",
       "2    358.32\n",
       "3    404.71\n",
       "4    394.28\n",
       "5    415.40\n",
       "6    418.95\n",
       "dtype: float64"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "total_mass = data.iloc[:, 1:].sum(axis=1)\n",
    "total_mass\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The result of such manipulations is of type [Series](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(total_mass)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mass is not conserved in this reaction. This doesn't sound right. There should be another species that we are not taking into account. Let's call that species \"X\". The mass of that species is the total mass minus the mass of the other species:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      0.00\n",
       "1    114.91\n",
       "2    141.68\n",
       "3     95.29\n",
       "4    105.72\n",
       "5     84.60\n",
       "6     81.05\n",
       "dtype: float64"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mass_X = 500 - data.iloc[:, 1:].sum(axis=1)\n",
    "mass_X"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's add a column to the dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Time</th>\n",
       "      <th>NO3</th>\n",
       "      <th>NO2</th>\n",
       "      <th>N2</th>\n",
       "      <th>NH3</th>\n",
       "      <th>N2O</th>\n",
       "      <th>X</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>500.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>30</td>\n",
       "      <td>250.95</td>\n",
       "      <td>107.32</td>\n",
       "      <td>18.51</td>\n",
       "      <td>3.33</td>\n",
       "      <td>4.98</td>\n",
       "      <td>114.91</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>60</td>\n",
       "      <td>123.66</td>\n",
       "      <td>132.33</td>\n",
       "      <td>74.85</td>\n",
       "      <td>7.34</td>\n",
       "      <td>20.14</td>\n",
       "      <td>141.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>90</td>\n",
       "      <td>84.47</td>\n",
       "      <td>98.81</td>\n",
       "      <td>166.19</td>\n",
       "      <td>13.14</td>\n",
       "      <td>42.10</td>\n",
       "      <td>95.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>120</td>\n",
       "      <td>30.24</td>\n",
       "      <td>38.74</td>\n",
       "      <td>249.78</td>\n",
       "      <td>19.54</td>\n",
       "      <td>55.98</td>\n",
       "      <td>105.72</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>150</td>\n",
       "      <td>27.94</td>\n",
       "      <td>10.42</td>\n",
       "      <td>292.32</td>\n",
       "      <td>24.07</td>\n",
       "      <td>60.65</td>\n",
       "      <td>84.60</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>180</td>\n",
       "      <td>13.54</td>\n",
       "      <td>6.11</td>\n",
       "      <td>309.50</td>\n",
       "      <td>27.26</td>\n",
       "      <td>62.54</td>\n",
       "      <td>81.05</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Time     NO3     NO2      N2    NH3    N2O       X\n",
       "0     0  500.00    0.00    0.00   0.00   0.00    0.00\n",
       "1    30  250.95  107.32   18.51   3.33   4.98  114.91\n",
       "2    60  123.66  132.33   74.85   7.34  20.14  141.68\n",
       "3    90   84.47   98.81  166.19  13.14  42.10   95.29\n",
       "4   120   30.24   38.74  249.78  19.54  55.98  105.72\n",
       "5   150   27.94   10.42  292.32  24.07  60.65   84.60\n",
       "6   180   13.54    6.11  309.50  27.26  62.54   81.05"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['X'] = mass_X\n",
    "data\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's save our work to a new file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "data.to_csv('catalysis-with-X.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here are the contents of the new file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ",Time,NO3,NO2,N2,NH3,N2O,X\n",
      "0,0,500.0,0.0,0.0,0.0,0.0,0.0\n",
      "1,30,250.95,107.32,18.51,3.33,4.98,114.91000000000003\n",
      "2,60,123.66,132.33,74.85,7.34,20.14,141.68\n",
      "3,90,84.47,98.81,166.19,13.14,42.1,95.28999999999996\n",
      "4,120,30.24,38.74,249.78,19.54,55.98,105.71999999999997\n",
      "5,150,27.94,10.42,292.32,24.07,60.65,84.60000000000002\n",
      "6,180,13.54,6.11,309.5,27.26,62.54,81.05000000000001\n"
     ]
    }
   ],
   "source": [
    "!cat catalysis-with-X.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can save the file in different formats. For example, you can save it in Excel format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "data.to_excel('catalysis-with-X.xlsx')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you are working on Google Colab, download the file to your local machine. If you are working on your own computer, locate the file, and open it with Excel."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}