Homework 7#

  • Type your name and email in the “Student details” section below.

  • Develop the code and generate the figures you need to solve the problems using this notebook.

  • For the answers that require a mathematical proof or derivation you can either:

    • Type the answer using the built-in latex capabilities. In this case, simply export the notebook as a pdf and upload it on gradescope; or

    • You can print the notebook (after you are done with all the code), write your answers by hand, scan, turn your response to a single pdf, and upload on gradescope.

  • The total homework points are 100. Please note that the problems are not weighed equally.

Note

  • This is due before the beginning of the next lecture.

  • Please match all the pages corresponding to each of the questions when you submit on gradescope.

Hide code cell source
# Here are some modules that you may need - please run this block of code:
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_context('paper')
import numpy as np
import scipy
import scipy.stats as st
# A helper function for downloading files
import requests
import os
def download(url, local_filename=None):
    """
    Downloads the file in the ``url`` and saves it in the current working directory.
    """
    data = requests.get(url)
    if local_filename is None:
        local_filename = os.path.basename(url)
    with open(local_filename, 'wb') as fd:
        fd.write(data.content)

Student details#

  • First Name:

  • Last Name:

  • Email:

Problem 1 - Blackjack probabilities#

Blackjack is a popular card game. The background information \(I\) captures the basic rules of the game relevant to this problem:

We have a deck of 52 cards. The deck includes: Four versions aces (A); Four versions of each number from 2 to 10; Four versions of the figures J, Q, and K. In blackjack all the cards are associated with a number. The cards that have a number on them are associated with that number. The figures J, Q, and K are associated with the number 10. The aces A can either be the number 1 or the number 11. The deck of cards is shuffled adequately.

Now consider the logical proposition \(A\) (blackjack):

You draw two cards at random from the deck without replacement. You either have two aces (AA) or the maximum sum of the numbers associated with the cards is 21. For example: (10, A), (J, A), etc.

2.A - Finding the probability of \(A\) using the principle of inssuficient reason#

  • Find the number of ways in which you can choose two unique cards from the deck. Hint: Google “N choose k”.

Answer:

Your answer here.

  • Find the number ways in which you can get two cards that sum to 21. Hint: Enumerate all possibilities.

Answer:

Your answer here.

  • Find the probability that you pick two cards that sum to 21, i.e., find \(p(A|I)\). Hint: Use the principle of insufficient reason.

Answer:

Your answer here.

2.B - Estimating the probability of A by simulation#

In this problem, we are going to use Monte Carlo simulations to estimate the probability of picking two cards that sum to 21, i.e., \(p(A|I)\). Basically, we are going to simulate the process of picking these two cards. First, let’s start by making all the different cards that appear in a deck of 52. In what follows, I use the following conventions:

  • ‘d’ stands for ‘diamonds’.

  • ‘h’ stands for ‘hearts’.

  • ‘s’ stands for ‘spades’.

  • ‘c’ stands for ‘clubs’. Numbers stand for themselves. And finally:

  • ‘A’ for ‘ace’.

  • ‘J’ for ‘jack’.

  • ‘Q’ for ‘queen’.

  • ‘K’ for ‘king’. For example, this is if you see the string ‘2h’ then this is the ‘two of hearts’. If you see ‘Ad’ this is the ‘Ace of diamonds’. And so on. Let’s make a deck of cards:

deck = []
for n in ['A', '2', '3', '4', '5', '6', '7', '8', '9', '10', 'J', 'Q', 'K']:
    for c in ['d', 'h', 's', 'c']:
        card = n + c
        deck.append(card)
print(deck)

We can use numpy.random.shuffle to shuffle the deck in place. Here is how:

np.random.shuffle(deck)
print(deck)

Once the deck is shuffled, you can pick two cards at random by just picking the first two cards of the deck:

my_cards = deck[:2]
print(my_cards)

Now, let’s write a function that calculates the sum of the cards. I wrote the function so that it only works with two cards. It will always use 11 for aces.

def count_cards(cards):
    """Counts cards according to blackjack conventions.
    
    Arguments:
    cards   -   Two cards. They much be a string from a deck.
    
    Returns: The blackjack value of the cards.
    """
    assert len(cards) == 2, 'This only works for two cards.'
    s = 0
    for c in cards:
        n = c[0]
        if n == 'A':
            s += 11
        elif n == 'J' or n == 'Q' or n == 'K':
            s += 10
        elif len(c) == 3: # this is the case of '10d', '10h', etc.
            s += 10
        else:
            s += int(n)
    return s

Let’s test it a few times:

print(my_cards)
print(count_cards(my_cards))

Do it ten times at random:

for i in range(10):
    np.random.shuffle(deck)
    my_cards = deck[:2]
    sum_of_cards = count_cards(my_cards)
    print(my_cards, ' sum to: ', sum_of_cards)
  • Now, we have everything we need. Complete the following code which use the Monte Carlo method to estimate the probability of randomly picking two cards that sum to 21. Feel free to experiment with the number of simulations so that you get an accurate estimate.

# The number of experiments you want to simulate
num_exp = 1000
# This is a list in which we are going to put the result
# of each "experiment". We will record a 1 (one) if the experiment
# is successful (cards sum to 21) and a 0 (zero) otherwise
result = []
# Loop over experiments
for i in range(num_exp):
    # YOUR CODE HERE (shuffle the deck)
    my_cards = # YOUR CODE HERE (pick the first two cards from the deck)
    sum_of_cards = # YOUR CODE HERE (find the sum of the cards)
    # YOUR CODE HERE (write a conditional statement that appends 1
    #                 to result if the sum of cards is 21 and 0 otherwise)
p_A_g_I = # YOUR CODE HERE (use result to estimate the probability of getting two
        #                 cards that sum to 21)
print(f'p(A|I) ~= {p_A_g_I:1.5f}')
  • Plot the estimate of \(p(A|I)\) as a function of the number of experiments. In the same plot, use a red dashed line to mark the true value of \(p(A|I)\) based on your answer to the very first question. Hint: See the discussion at the very end of Estimating probabilities by simulation.

# Your code here

Problem 2 - Predicting the probability of major earthquakes in Southern California#

We will use the Southern California Earthquake Data Center catalog in this problem. The catalog contains all earthquakes recorded from 1932 until now in Southern California. Do not worry about how I get the data. Just run the code and it will produce a nice dataframe that you can play with. Our goal is to estimate the probability of a major earthquake (to be defined below) somewhere in Southern California during a given year.

First, let’s download the data and put them in a dataframe.

for year in range(1932, 2021):
    print('Downloading year', year)
    url = f'https://raw.githubusercontent.com/SCEDC/SCEDC-catalogs/master/SCEC_DC/{year}.catalog'
    !curl -O $url
Downloading year 1932
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 35589  100 35589    0     0   110k      0 --:--:-- --:--:-- --:--:--  110k
Downloading year 1933
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 86430  100 86430    0     0   302k      0 --:--:-- --:--:-- --:--:--  301k
Downloading year 1934
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 49201  100 49201    0     0   204k      0 --:--:-- --:--:-- --:--:--  204k
Downloading year 1935
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 47807  100 47807    0     0   218k      0 --:--:-- --:--:-- --:--:--  218k
Downloading year 1936
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 43461  100 43461    0     0  81961      0 --:--:-- --:--:-- --:--:-- 82001
Downloading year 1937
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 30587  100 30587    0     0   117k      0 --:--:-- --:--:-- --:--:--     0-:--:-- --:--:-- --:--:--  117k
Downloading year 1938
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 24519  100 24519    0     0  88426      0 --:--:-- --:--:-- --:--:-- 88197
Downloading year 1939
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 23863  100 23863    0     0   105k      0 --:--:-- --:--:-- --:--:--  105k
Downloading year 1940
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 25831  100 25831    0     0   119k      0 --:--:-- --:--:-- --:--:--  120k
Downloading year 1941
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 21157  100 21157    0     0   100k      0 --:--:-- --:--:-- --:--:--  100k
Downloading year 1942
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 23453  100 23453    0     0   124k      0 --:--:-- --:--:-- --:--:--  124k
Downloading year 1943
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 20747  100 20747    0     0   113k      0 --:--:-- --:--:-- --:--:--  114k
Downloading year 1944
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 19025  100 19025    0     0  99908      0 --:--:-- --:--:-- --:--:--   97k
Downloading year 1945
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 15253  100 15253    0     0  84710      0 --:--:-- --:--:-- --:--:-- 84738
Downloading year 1946
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 34523  100 34523    0     0   155k      0 --:--:-- --:--:-- --:--:--  156k
Downloading year 1947
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 50513  100 50513    0     0   227k      0 --:--:-- --:--:-- --:--:--  228k
Downloading year 1948
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 41001  100 41001    0     0   161k      0 --:--:-- --:--:-- --:--:--  161k
Downloading year 1949
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 40673  100 40673    0     0   210k      0 --:--:-- --:--:-- --:--:--  211k
Downloading year 1950
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 42477  100 42477    0     0   161k      0 --:--:-- --:--:-- --:--:--     0 --:--:--  162k
Downloading year 1951
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 21239  100 21239    0     0   119k      0 --:--:-- --:--:-- --:--:--  119k
Downloading year 1952
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 46331  100 46331    0     0   189k      0 --:--:-- --:--:-- --:--:--  190k
Downloading year 1953
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 47643  100 47643    0     0   227k      0 --:--:-- --:--:-- --:--:--  228k
Downloading year 1954
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 45265  100 45265    0     0   208k      0 --:--:-- --:--:-- --:--:--  209k
Downloading year 1955
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 29685  100 29685    0     0   123k      0 --:--:-- --:--:-- --:--:--  123k
Downloading year 1956
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 23699  100 23699    0     0   116k      0 --:--:-- --:--:-- --:--:--     0-- --:--:--  116k
Downloading year 1957
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 21731  100 21731    0     0   117k      0 --:--:-- --:--:-- --:--:--  117k
Downloading year 1958
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 19517  100 19517    0     0   106k      0 --:--:-- --:--:-- --:--:--  107k
Downloading year 1959
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 22879  100 22879    0     0   100k      0 --:--:-- --:--:-- --:--:--  100k
Downloading year 1960
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 15909  100 15909    0     0  84036      0 --:--:-- --:--:-- --:--:-- 84174
Downloading year 1961
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 24929  100 24929    0     0   113k      0 --:--:-- --:--:-- --:--:--  113k
Downloading year 1962
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 22059  100 22059    0     0   104k      0 --:--:-- --:--:-- --:--:--  104k
Downloading year 1963
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 25011  100 25011    0     0   108k      0 --:--:-- --:--:-- --:--:--  108k
Downloading year 1964
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 16975  100 16975    0     0  71557      0 --:--:-- --:--:-- --:--:-- 71624
Downloading year 1965
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 23781  100 23781    0     0   119k      0 --:--:-- --:--:-- --:--:--  119k
Downloading year 1966
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 27225  100 27225    0     0   124k      0 --:--:-- --:--:-- --:--:--  124k
Downloading year 1967
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 29439  100 29439    0     0   159k      0 --:--:-- --:--:-- --:--:--  159k
Downloading year 1968
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 45921  100 45921    0     0   227k      0 --:--:-- --:--:-- --:--:--  227k
Downloading year 1969
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 50923  100 50923    0     0   201k      0 --:--:-- --:--:-- --:--:--  201k
Downloading year 1970
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 41985  100 41985    0     0   187k      0 --:--:-- --:--:-- --:--:--  188k
Downloading year 1971
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 68635  100 68635    0     0   271k      0 --:--:-- --:--:-- --:--:--  270k
Downloading year 1972
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 58959  100 58959    0     0   243k      0 --:--:-- --:--:-- --:--:--  243k
Downloading year 1973
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  108k  100  108k    0     0   288k      0 --:--:-- --:--:-- --:--:--  288k
Downloading year 1974
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 95450  100 95450    0     0   362k      0 --:--:-- --:--:-- --:--:--  364k
Downloading year 1975
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  226k  100  226k    0     0   793k      0 --:--:-- --:--:-- --:--:--  794k
Downloading year 1976
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  332k  100  332k    0     0  1239k      0 --:--:-- --:--:-- --:--:-- 1239k
Downloading year 1977
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  414k  100  414k    0     0   685k      0 --:--:-- --:--:-- --:--:--  686k
Downloading year 1978
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  501k  100  501k    0     0  1161k      0 --:--:-- --:--:-- --:--:-- 1162k
Downloading year 1979
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  798k  100  798k    0     0  2216k      0 --:--:-- --:--:-- --:--:-- 2212k
Downloading year 1980
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  449k  100  449k    0     0   989k      0 --:--:-- --:--:-- --:--:--  991k
Downloading year 1981
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  856k  100  856k    0     0  1864k      0 --:--:-- --:--:-- --:--:-- 1865k
Downloading year 1982
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1124k  100 1124k    0     0  2538k      0 --:--:-- --:--:-- --:--:-- 2537k
Downloading year 1983
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1159k  100 1159k    0     0  3012k      0 --:--:-- --:--:-- --:--:-- 3018k
Downloading year 1984
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1433k  100 1433k    0     0  2432k      0 --:--:-- --:--:-- --:--:-- 2433k
Downloading year 1985
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1510k  100 1510k    0     0  3037k      0 --:--:-- --:--:-- --:--:-- 3038k
Downloading year 1986
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1367k  100 1367k    0     0  2533k      0 --:--:-- --:--:-- --:--:-- 2533k
Downloading year 1987
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1087k  100 1087k    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0  2322k      0 --:--:-- --:--:-- --:--:-- 2319k
Downloading year 1988
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  887k  100  887k    0     0  1470k      0 --:--:-- --:--:-- --:--:-- 1468k
Downloading year 1989
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  927k  100  927k    0     0  1945k      0 --:--:-- --:--:-- --:--:--     0:-- --:--:-- 1947k
Downloading year 1990
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  900k  100  900k    0     0  2296k      0 --:--:-- --:--:-- --:--:-- 2297k
Downloading year 1991
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  808k  100  808k    0     0  2058k      0 --:--:-- --:--:-- --:--:-- 2058k
Downloading year 1992
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 4119k  100 4119k    0     0  6953k      0 --:--:-- --:--:-- --:--:-- 6958k
Downloading year 1993
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1763k  100 1763k    0     0  3209k      0 --:--:-- --:--:-- --:--:-- 3212k
Downloading year 1994
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2263k  100 2263k    0     0  4348k      0 --:--:-- --:--:-- --:--:-- 4352k
Downloading year 1995
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2001k  100 2001k    0     0  4343k      0 --:--:-- --:--:-- --:--:-- 4342k
Downloading year 1996
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1592k  100 1592k    0     0  3034k      0 --:00:13 --:--:--  0:00:13  115k--:-- --:--:-- --:--:-- 3032k
Downloading year 1997
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1227k  100 1227k    0     0  2901k      0 --:--:-- --:--:-- --:--:--     0-- 2902k
Downloading year 1998
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1086k  100 1086k    0     0  2833k      0 --:--:-- --:--:-- --:--:-- 2836k
Downloading year 1999
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1712k  100 1712k    0     0  3403k      0 --:--:-- --:--:-- --:--:-- 3398k
Downloading year 2000
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1632k  100 1632k    0     0  3418k      0 --:--:-- --:--:-- --:--:-- 3422k
Downloading year 2001
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1451k  100 1451k    0     0  2864k      0 --:--:-- --:--:-- --:--:-- 2868k
Downloading year 2002
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  950k  100  950k    0     0  2384k      0 --:--:-- --:--:-- --:--:-- 2387k
Downloading year 2003
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  925k  100  925k    0     0  2382k      0 --:--:-- --:--:-- --:--:-- 2378k
Downloading year 2004
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  982k  100  982k    0     0  1357k      0 --:--:-- --:--:-- --:--:-- 1356k
Downloading year 2005
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1058k  100 1058k    0     0  3181k      0 --:--:-- --:--:-- --:--:-- 3187k
Downloading year 2006
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  897k  100  897k    0     0  1992k      0 --:--:-- --:--:-- --:--:-- 1989k
Downloading year 2007
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  920k  100  920k    0     0  2599k      0 --:--:-- --:--:-- --:--:-- 2600k
Downloading year 2008
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1126k  100 1126k    0     0  3016k      0 --:--:-- --:--:-- --:--:-- 3020k
Downloading year 2009
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1340k  100 1340k    0     0  3120k      0 --:--:-- --:--:-- --:--:-- 3117k
Downloading year 2010
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3360k  100 3360k    0     0  7114k      0 --:--:-- --:--:-- --:--:-- 7105k
Downloading year 2011
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1247k  100 1247k    0     0  2399k      0 --:--:-- --:--:-- --:--:-- 2403k
Downloading year 2012
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1375k  100 1375k    0     0  2957k      0 --:--:-- --:--:-- --:--:-- 2963k
Downloading year 2013
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1485k  100 1485k    0     0  3782k      0 --:--:-- --:--:-- --:--:-- 3780k
Downloading year 2014
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1156k  100 1156k    0     0  2999k      0 --:--:-- --:--:-- --:--:-- 3003k
Downloading year 2015
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1248k  100 1248k    0     0  3247k      0 --:--:-- --:--:-- --:--:-- 3252k
Downloading year 2016
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1284k  100 1284k    0     0  2155k      0 --:--:-- --:--:-- --:--:-- 2154k
Downloading year 2017
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1309k  100 1309k    0     0  1359k      0 --:--:-- --:--:-- --:--:-- 1358k
Downloading year 2018
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1660k  100 1660k    0     0  2850k      0 --:--:-- --:--:-- --:--:-- 2853k
Downloading year 2019
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 5106k  100 5106k    0     0  7809k      0 --:--:-- --:--:-- --:--:-- 7808k
Downloading year 2020
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2986k  100 2986k    0     0  5123k      0 --:--:-- --:--:-- --:--:-2 94408- 5122k

Each one of these is a csv file. We will put them all in the same daframe for your convenience:

import pandas as pd
list_of_dfs = []
for year in range(1932, 2021):
    filename = '{0:d}.catalog'.format(year)
    print('Reading: ', filename)
    df_year = pd.read_csv(filename, sep=r'\s+', comment='#',
                      names=['Date', 'Hour', 'ET', 'GT', 'MAG', 'M', 'LAT', 'LON',
                               'DEPTH', 'Q', 'EVID', 'NPH', 'NGRM'])
    df_year.Date = pd.to_datetime(df_year['Date'], format='%Y/%m/%d')
    list_of_dfs.append(df_year)
df = pd.concat(list_of_dfs, ignore_index=True)
df['Year'] = pd.DatetimeIndex(df['Date']).year
Reading:  1932.catalog
Reading:  1933.catalog
Reading:  1934.catalog
Reading:  1935.catalog
Reading:  1936.catalog
Reading:  1937.catalog
Reading:  1938.catalog
Reading:  1939.catalog
Reading:  1940.catalog
Reading:  1941.catalog
Reading:  1942.catalog
Reading:  1943.catalog
Reading:  1944.catalog
Reading:  1945.catalog
Reading:  1946.catalog
Reading:  1947.catalog
Reading:  1948.catalog
Reading:  1949.catalog
Reading:  1950.catalog
Reading:  1951.catalog
Reading:  1952.catalog
Reading:  1953.catalog
Reading:  1954.catalog
Reading:  1955.catalog
Reading:  1956.catalog
Reading:  1957.catalog
Reading:  1958.catalog
Reading:  1959.catalog
Reading:  1960.catalog
Reading:  1961.catalog
Reading:  1962.catalog
Reading:  1963.catalog
Reading:  1964.catalog
Reading:  1965.catalog
Reading:  1966.catalog
Reading:  1967.catalog
Reading:  1968.catalog
Reading:  1969.catalog
Reading:  1970.catalog
Reading:  1971.catalog
Reading:  1972.catalog
Reading:  1973.catalog
Reading:  1974.catalog
Reading:  1975.catalog
Reading:  1976.catalog
Reading:  1977.catalog
Reading:  1978.catalog
Reading:  1979.catalog
Reading:  1980.catalog
Reading:  1981.catalog
Reading:  1982.catalog
Reading:  1983.catalog
Reading:  1984.catalog
Reading:  1985.catalog
Reading:  1986.catalog
Reading:  1987.catalog
Reading:  1988.catalog
Reading:  1989.catalog
Reading:  1990.catalog
Reading:  1991.catalog
Reading:  1992.catalog
Reading:  1993.catalog
Reading:  1994.catalog
Reading:  1995.catalog
Reading:  1996.catalog
Reading:  1997.catalog
Reading:  1998.catalog
Reading:  1999.catalog
Reading:  2000.catalog
Reading:  2001.catalog
Reading:  2002.catalog
Reading:  2003.catalog
Reading:  2004.catalog
Reading:  2005.catalog
Reading:  2006.catalog
Reading:  2007.catalog
Reading:  2008.catalog
Reading:  2009.catalog
Reading:  2010.catalog
Reading:  2011.catalog
Reading:  2012.catalog
Reading:  2013.catalog
Reading:  2014.catalog
Reading:  2015.catalog
Reading:  2016.catalog
Reading:  2017.catalog
Reading:  2018.catalog
Reading:  2019.catalog
Reading:  2020.catalog
df.round(2)
Date Hour ET GT MAG M LAT LON DEPTH Q EVID NPH NGRM Year
0 1932-01-01 23:52:07.87 eq l 0.00 n 34.13 -117.99 6.0 D 3358386 7 0 1932
1 1932-01-02 16:42:43.68 eq l 2.73 l 33.90 -117.64 6.0 C 3358387 12 0 1932
2 1932-01-03 17:58:10.01 eq l 3.00 h 32.00 -116.00 6.0 D 3358388 7 0 1932
3 1932-01-04 21:30:00.96 eq l 2.00 h 33.77 -117.49 6.0 C 3358396 11 0 1932
4 1932-01-05 02:37:27.96 eq l 1.50 h 33.56 -118.44 6.0 C 3358398 8 0 1932
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
818704 2020-12-31 20:50:40.05 eq l 0.33 l 33.67 -116.76 14.0 A 39508231 11 1933 2020
818705 2020-12-31 21:37:28.19 qb l 1.23 l 32.59 -116.87 -0.3 C 39508247 38 2085 2020
818706 2020-12-31 22:15:56.72 eq l 1.45 l 33.35 -116.42 13.4 A 39508263 69 2079 2020
818707 2020-12-31 22:21:28.12 eq l 2.12 l 33.18 -115.60 3.9 A 39508279 70 2121 2020
818708 2020-12-31 23:04:53.02 eq l 1.66 l 36.06 -117.38 2.7 A 39508287 50 1659 2020

818709 rows × 14 columns

Each row in this dataframe corresponds to an earthquake event that happened between 1/1/1932 and 12/31/2020. The meaning of the columns is explained here. But for the purposes of this problem we will only need information from the following columns:

  • Year: This is the year of the event.

  • ET: This is the type of the event. There are various types of events. For example, the seismometers may pick more than earthquakes, e.g., explosions. We are only intersted in earthquake events which are labeled by eq.

  • MAG: This is the magnitude of the event.

Let’s play with the data set to gain some experience. First, let’s extract all data for a random year. Say, year 2019.

df_2019 = df[df['Year'] == 2019]
df_2019
Date Hour ET GT MAG M LAT LON DEPTH Q EVID NPH NGRM Year
717665 2019-01-01 01:39:57.67 eq l 0.83 l 33.506 -116.794 5.0 A 38412384 45 1154 2019
717666 2019-01-01 01:43:28.42 eq l 0.47 l 33.484 -116.785 5.5 A 38412392 36 1309 2019
717667 2019-01-01 02:27:57.45 eq l 0.98 l 33.505 -116.798 3.5 A 38412416 52 1410 2019
717668 2019-01-01 02:31:17.10 eq l 0.61 l 33.511 -116.794 2.7 A 38412424 39 825 2019
717669 2019-01-01 02:38:38.38 eq l 0.61 l 33.706 -116.808 15.3 A 38412432 27 1135 2019
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
781419 2019-12-31 22:47:36.78 eq l 1.47 l 35.638 -117.460 6.9 A 39019127 34 1906 2019
781420 2019-12-31 22:52:18.08 eq l 0.23 l 33.589 -116.805 5.5 A 39019135 38 1665 2019
781421 2019-12-31 23:05:39.85 eq l 0.94 l 35.755 -117.559 6.7 A 39019143 19 691 2019
781422 2019-12-31 23:18:31.60 eq l 0.74 l 35.721 -117.555 3.8 A 39019151 38 476 2019
781423 2019-12-31 23:28:38.45 eq l 1.45 l 33.018 -116.000 1.1 A 39019159 41 1790 2019

63759 rows × 14 columns

Out of these, we only care about earthquake events. So, let’s filter out everything else:

df_2019_eq = df_2019[df_2019['ET'] == 'eq']
df_2019_eq
Date Hour ET GT MAG M LAT LON DEPTH Q EVID NPH NGRM Year
717665 2019-01-01 01:39:57.67 eq l 0.83 l 33.506 -116.794 5.0 A 38412384 45 1154 2019
717666 2019-01-01 01:43:28.42 eq l 0.47 l 33.484 -116.785 5.5 A 38412392 36 1309 2019
717667 2019-01-01 02:27:57.45 eq l 0.98 l 33.505 -116.798 3.5 A 38412416 52 1410 2019
717668 2019-01-01 02:31:17.10 eq l 0.61 l 33.511 -116.794 2.7 A 38412424 39 825 2019
717669 2019-01-01 02:38:38.38 eq l 0.61 l 33.706 -116.808 15.3 A 38412432 27 1135 2019
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
781419 2019-12-31 22:47:36.78 eq l 1.47 l 35.638 -117.460 6.9 A 39019127 34 1906 2019
781420 2019-12-31 22:52:18.08 eq l 0.23 l 33.589 -116.805 5.5 A 39019135 38 1665 2019
781421 2019-12-31 23:05:39.85 eq l 0.94 l 35.755 -117.559 6.7 A 39019143 19 691 2019
781422 2019-12-31 23:18:31.60 eq l 0.74 l 35.721 -117.555 3.8 A 39019151 38 476 2019
781423 2019-12-31 23:28:38.45 eq l 1.45 l 33.018 -116.000 1.1 A 39019159 41 1790 2019

63108 rows × 14 columns

Now, let’s see if there was at least one major earthquake during 2019:

test_mag = df_2019_eq['MAG'] >= 6
test_mag
717665    False
717666    False
717667    False
717668    False
717669    False
          ...  
781419    False
781420    False
781421    False
781422    False
781423    False
Name: MAG, Length: 63108, dtype: bool

Is there at least one True value in this array?

test_mag.value_counts()
MAG
False    63106
True         2
Name: count, dtype: int64

There are exactly 2 major earthquakes. You can extract the number like this:

test_mag.value_counts()[True]
2

So, to test whether or not there was a major earthquake you need to do:

True in test_mag.value_counts().keys()
True
  • Now, we will use bootstrapping to estimate the probability of a major earthquake during a randomly picked year. Follow the instructions below completing the code where necessary.

def estimate_probability_of_major_earthquake_during_year(num_years, df):
    """
    Estimate the probability of major earthquake in a random year.
    
    Arguments:
    num_years    -    The number of years to pick at random.
    df           -    The dataframe containing all the observed events.
    
    Returns: The number of years in which we had at least one major earthquake divided by the num_years.
    """
    num_major_eqs = 0
    for i in range(num_years):
        # Pick a year at random between 1932 and 2020
        y = np.random.randint(1932, 2021)
        # Extract all the events that happened in that year
        df_y = # YOUR CODE HERE
        # Find all earthquake events
        df_y_eq = # YOUR CODE HERE
        # Test if there is at least one major earthquake in this year
        test_mag = # YOUR CODE HERE
        test_mag_counts = test_mag.value_counts()
        # Test if there is at least one major event in this year
        # and increase num_major_eqs by one if yes
        if True in test_mag.value_counts():
            num_major_eqs += 1
    return num_major_eqs / num_years

Use the following lines to test your code. We run it 10 times. Notice that everytime you get a slighlty different estimate.

for i in range(10):
    p_major_eq = estimate_probability_of_major_earthquake_during_year(50, df)
    print(f'p_major_eq = {p_major_eq:1.2f}')
# A place to store the estimates
p_major_eqs = []
# Put 1000 estimates in there
for i in range(200):
    print(i)
    p_major_eq = # your code here
    p_major_eqs.append(p_major_eq)
# And now do the histogram
# Your code here