Homework 7#
Type your name and email in the “Student details” section below.
Develop the code and generate the figures you need to solve the problems using this notebook.
For the answers that require a mathematical proof or derivation you can either:
Type the answer using the built-in latex capabilities. In this case, simply export the notebook as a pdf and upload it on gradescope; or
You can print the notebook (after you are done with all the code), write your answers by hand, scan, turn your response to a single pdf, and upload on gradescope.
The total homework points are 100. Please note that the problems are not weighed equally.
Note
This is due before the beginning of the next lecture.
Please match all the pages corresponding to each of the questions when you submit on gradescope.
Show code cell source
# Here are some modules that you may need - please run this block of code:
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_context('paper')
import numpy as np
import scipy
import scipy.stats as st
# A helper function for downloading files
import requests
import os
def download(url, local_filename=None):
"""
Downloads the file in the ``url`` and saves it in the current working directory.
"""
data = requests.get(url)
if local_filename is None:
local_filename = os.path.basename(url)
with open(local_filename, 'wb') as fd:
fd.write(data.content)
Student details#
First Name:
Last Name:
Email:
Problem 1 - Blackjack probabilities#
Blackjack is a popular card game. The background information \(I\) captures the basic rules of the game relevant to this problem:
We have a deck of 52 cards. The deck includes: Four versions aces (A); Four versions of each number from 2 to 10; Four versions of the figures J, Q, and K. In blackjack all the cards are associated with a number. The cards that have a number on them are associated with that number. The figures J, Q, and K are associated with the number 10. The aces A can either be the number 1 or the number 11. The deck of cards is shuffled adequately.
Now consider the logical proposition \(A\) (blackjack):
You draw two cards at random from the deck without replacement. You either have two aces (AA) or the maximum sum of the numbers associated with the cards is 21. For example: (10, A), (J, A), etc.
2.A - Finding the probability of \(A\) using the principle of inssuficient reason#
Find the number of ways in which you can choose two unique cards from the deck. Hint: Google “N choose k”.
Answer:
Your answer here.
Find the number ways in which you can get two cards that sum to 21. Hint: Enumerate all possibilities.
Answer:
Your answer here.
Find the probability that you pick two cards that sum to 21, i.e., find \(p(A|I)\). Hint: Use the principle of insufficient reason.
Answer:
Your answer here.
2.B - Estimating the probability of A by simulation#
In this problem, we are going to use Monte Carlo simulations to estimate the probability of picking two cards that sum to 21, i.e., \(p(A|I)\). Basically, we are going to simulate the process of picking these two cards. First, let’s start by making all the different cards that appear in a deck of 52. In what follows, I use the following conventions:
‘d’ stands for ‘diamonds’.
‘h’ stands for ‘hearts’.
‘s’ stands for ‘spades’.
‘c’ stands for ‘clubs’. Numbers stand for themselves. And finally:
‘A’ for ‘ace’.
‘J’ for ‘jack’.
‘Q’ for ‘queen’.
‘K’ for ‘king’. For example, this is if you see the string ‘2h’ then this is the ‘two of hearts’. If you see ‘Ad’ this is the ‘Ace of diamonds’. And so on. Let’s make a deck of cards:
deck = []
for n in ['A', '2', '3', '4', '5', '6', '7', '8', '9', '10', 'J', 'Q', 'K']:
for c in ['d', 'h', 's', 'c']:
card = n + c
deck.append(card)
print(deck)
We can use numpy.random.shuffle to shuffle the deck in place. Here is how:
np.random.shuffle(deck)
print(deck)
Once the deck is shuffled, you can pick two cards at random by just picking the first two cards of the deck:
my_cards = deck[:2]
print(my_cards)
Now, let’s write a function that calculates the sum of the cards. I wrote the function so that it only works with two cards. It will always use 11 for aces.
def count_cards(cards):
"""Counts cards according to blackjack conventions.
Arguments:
cards - Two cards. They much be a string from a deck.
Returns: The blackjack value of the cards.
"""
assert len(cards) == 2, 'This only works for two cards.'
s = 0
for c in cards:
n = c[0]
if n == 'A':
s += 11
elif n == 'J' or n == 'Q' or n == 'K':
s += 10
elif len(c) == 3: # this is the case of '10d', '10h', etc.
s += 10
else:
s += int(n)
return s
Let’s test it a few times:
print(my_cards)
print(count_cards(my_cards))
Do it ten times at random:
for i in range(10):
np.random.shuffle(deck)
my_cards = deck[:2]
sum_of_cards = count_cards(my_cards)
print(my_cards, ' sum to: ', sum_of_cards)
Now, we have everything we need. Complete the following code which use the Monte Carlo method to estimate the probability of randomly picking two cards that sum to 21. Feel free to experiment with the number of simulations so that you get an accurate estimate.
# The number of experiments you want to simulate
num_exp = 1000
# This is a list in which we are going to put the result
# of each "experiment". We will record a 1 (one) if the experiment
# is successful (cards sum to 21) and a 0 (zero) otherwise
result = []
# Loop over experiments
for i in range(num_exp):
# YOUR CODE HERE (shuffle the deck)
my_cards = # YOUR CODE HERE (pick the first two cards from the deck)
sum_of_cards = # YOUR CODE HERE (find the sum of the cards)
# YOUR CODE HERE (write a conditional statement that appends 1
# to result if the sum of cards is 21 and 0 otherwise)
p_A_g_I = # YOUR CODE HERE (use result to estimate the probability of getting two
# cards that sum to 21)
print(f'p(A|I) ~= {p_A_g_I:1.5f}')
Plot the estimate of \(p(A|I)\) as a function of the number of experiments. In the same plot, use a red dashed line to mark the true value of \(p(A|I)\) based on your answer to the very first question. Hint: See the discussion at the very end of Estimating probabilities by simulation.
# Your code here
Problem 2 - Predicting the probability of major earthquakes in Southern California#
We will use the Southern California Earthquake Data Center catalog in this problem. The catalog contains all earthquakes recorded from 1932 until now in Southern California. Do not worry about how I get the data. Just run the code and it will produce a nice dataframe that you can play with. Our goal is to estimate the probability of a major earthquake (to be defined below) somewhere in Southern California during a given year.
First, let’s download the data and put them in a dataframe.
for year in range(1932, 2021):
print('Downloading year', year)
url = f'https://raw.githubusercontent.com/SCEDC/SCEDC-catalogs/master/SCEC_DC/{year}.catalog'
!curl -O $url
Downloading year 1932
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 35589 100 35589 0 0 110k 0 --:--:-- --:--:-- --:--:-- 110k
Downloading year 1933
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 86430 100 86430 0 0 302k 0 --:--:-- --:--:-- --:--:-- 301k
Downloading year 1934
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 49201 100 49201 0 0 204k 0 --:--:-- --:--:-- --:--:-- 204k
Downloading year 1935
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 47807 100 47807 0 0 218k 0 --:--:-- --:--:-- --:--:-- 218k
Downloading year 1936
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 43461 100 43461 0 0 81961 0 --:--:-- --:--:-- --:--:-- 82001
Downloading year 1937
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 30587 100 30587 0 0 117k 0 --:--:-- --:--:-- --:--:-- 0-:--:-- --:--:-- --:--:-- 117k
Downloading year 1938
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 24519 100 24519 0 0 88426 0 --:--:-- --:--:-- --:--:-- 88197
Downloading year 1939
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 23863 100 23863 0 0 105k 0 --:--:-- --:--:-- --:--:-- 105k
Downloading year 1940
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 25831 100 25831 0 0 119k 0 --:--:-- --:--:-- --:--:-- 120k
Downloading year 1941
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 21157 100 21157 0 0 100k 0 --:--:-- --:--:-- --:--:-- 100k
Downloading year 1942
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 23453 100 23453 0 0 124k 0 --:--:-- --:--:-- --:--:-- 124k
Downloading year 1943
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 20747 100 20747 0 0 113k 0 --:--:-- --:--:-- --:--:-- 114k
Downloading year 1944
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 19025 100 19025 0 0 99908 0 --:--:-- --:--:-- --:--:-- 97k
Downloading year 1945
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 15253 100 15253 0 0 84710 0 --:--:-- --:--:-- --:--:-- 84738
Downloading year 1946
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 34523 100 34523 0 0 155k 0 --:--:-- --:--:-- --:--:-- 156k
Downloading year 1947
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 50513 100 50513 0 0 227k 0 --:--:-- --:--:-- --:--:-- 228k
Downloading year 1948
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 41001 100 41001 0 0 161k 0 --:--:-- --:--:-- --:--:-- 161k
Downloading year 1949
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 40673 100 40673 0 0 210k 0 --:--:-- --:--:-- --:--:-- 211k
Downloading year 1950
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 42477 100 42477 0 0 161k 0 --:--:-- --:--:-- --:--:-- 0 --:--:-- 162k
Downloading year 1951
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 21239 100 21239 0 0 119k 0 --:--:-- --:--:-- --:--:-- 119k
Downloading year 1952
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 46331 100 46331 0 0 189k 0 --:--:-- --:--:-- --:--:-- 190k
Downloading year 1953
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 47643 100 47643 0 0 227k 0 --:--:-- --:--:-- --:--:-- 228k
Downloading year 1954
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 45265 100 45265 0 0 208k 0 --:--:-- --:--:-- --:--:-- 209k
Downloading year 1955
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 29685 100 29685 0 0 123k 0 --:--:-- --:--:-- --:--:-- 123k
Downloading year 1956
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 23699 100 23699 0 0 116k 0 --:--:-- --:--:-- --:--:-- 0-- --:--:-- 116k
Downloading year 1957
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 21731 100 21731 0 0 117k 0 --:--:-- --:--:-- --:--:-- 117k
Downloading year 1958
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 19517 100 19517 0 0 106k 0 --:--:-- --:--:-- --:--:-- 107k
Downloading year 1959
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 22879 100 22879 0 0 100k 0 --:--:-- --:--:-- --:--:-- 100k
Downloading year 1960
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 15909 100 15909 0 0 84036 0 --:--:-- --:--:-- --:--:-- 84174
Downloading year 1961
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 24929 100 24929 0 0 113k 0 --:--:-- --:--:-- --:--:-- 113k
Downloading year 1962
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 22059 100 22059 0 0 104k 0 --:--:-- --:--:-- --:--:-- 104k
Downloading year 1963
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 25011 100 25011 0 0 108k 0 --:--:-- --:--:-- --:--:-- 108k
Downloading year 1964
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 16975 100 16975 0 0 71557 0 --:--:-- --:--:-- --:--:-- 71624
Downloading year 1965
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 23781 100 23781 0 0 119k 0 --:--:-- --:--:-- --:--:-- 119k
Downloading year 1966
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 27225 100 27225 0 0 124k 0 --:--:-- --:--:-- --:--:-- 124k
Downloading year 1967
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 29439 100 29439 0 0 159k 0 --:--:-- --:--:-- --:--:-- 159k
Downloading year 1968
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 45921 100 45921 0 0 227k 0 --:--:-- --:--:-- --:--:-- 227k
Downloading year 1969
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 50923 100 50923 0 0 201k 0 --:--:-- --:--:-- --:--:-- 201k
Downloading year 1970
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 41985 100 41985 0 0 187k 0 --:--:-- --:--:-- --:--:-- 188k
Downloading year 1971
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 68635 100 68635 0 0 271k 0 --:--:-- --:--:-- --:--:-- 270k
Downloading year 1972
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 58959 100 58959 0 0 243k 0 --:--:-- --:--:-- --:--:-- 243k
Downloading year 1973
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 108k 100 108k 0 0 288k 0 --:--:-- --:--:-- --:--:-- 288k
Downloading year 1974
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 95450 100 95450 0 0 362k 0 --:--:-- --:--:-- --:--:-- 364k
Downloading year 1975
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 226k 100 226k 0 0 793k 0 --:--:-- --:--:-- --:--:-- 794k
Downloading year 1976
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 332k 100 332k 0 0 1239k 0 --:--:-- --:--:-- --:--:-- 1239k
Downloading year 1977
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 414k 100 414k 0 0 685k 0 --:--:-- --:--:-- --:--:-- 686k
Downloading year 1978
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 501k 100 501k 0 0 1161k 0 --:--:-- --:--:-- --:--:-- 1162k
Downloading year 1979
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 798k 100 798k 0 0 2216k 0 --:--:-- --:--:-- --:--:-- 2212k
Downloading year 1980
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 449k 100 449k 0 0 989k 0 --:--:-- --:--:-- --:--:-- 991k
Downloading year 1981
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 856k 100 856k 0 0 1864k 0 --:--:-- --:--:-- --:--:-- 1865k
Downloading year 1982
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1124k 100 1124k 0 0 2538k 0 --:--:-- --:--:-- --:--:-- 2537k
Downloading year 1983
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1159k 100 1159k 0 0 3012k 0 --:--:-- --:--:-- --:--:-- 3018k
Downloading year 1984
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1433k 100 1433k 0 0 2432k 0 --:--:-- --:--:-- --:--:-- 2433k
Downloading year 1985
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1510k 100 1510k 0 0 3037k 0 --:--:-- --:--:-- --:--:-- 3038k
Downloading year 1986
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1367k 100 1367k 0 0 2533k 0 --:--:-- --:--:-- --:--:-- 2533k
Downloading year 1987
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1087k 100 1087k 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 2322k 0 --:--:-- --:--:-- --:--:-- 2319k
Downloading year 1988
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 887k 100 887k 0 0 1470k 0 --:--:-- --:--:-- --:--:-- 1468k
Downloading year 1989
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 927k 100 927k 0 0 1945k 0 --:--:-- --:--:-- --:--:-- 0:-- --:--:-- 1947k
Downloading year 1990
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 900k 100 900k 0 0 2296k 0 --:--:-- --:--:-- --:--:-- 2297k
Downloading year 1991
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 808k 100 808k 0 0 2058k 0 --:--:-- --:--:-- --:--:-- 2058k
Downloading year 1992
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 4119k 100 4119k 0 0 6953k 0 --:--:-- --:--:-- --:--:-- 6958k
Downloading year 1993
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1763k 100 1763k 0 0 3209k 0 --:--:-- --:--:-- --:--:-- 3212k
Downloading year 1994
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2263k 100 2263k 0 0 4348k 0 --:--:-- --:--:-- --:--:-- 4352k
Downloading year 1995
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2001k 100 2001k 0 0 4343k 0 --:--:-- --:--:-- --:--:-- 4342k
Downloading year 1996
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1592k 100 1592k 0 0 3034k 0 --:00:13 --:--:-- 0:00:13 115k--:-- --:--:-- --:--:-- 3032k
Downloading year 1997
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1227k 100 1227k 0 0 2901k 0 --:--:-- --:--:-- --:--:-- 0-- 2902k
Downloading year 1998
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1086k 100 1086k 0 0 2833k 0 --:--:-- --:--:-- --:--:-- 2836k
Downloading year 1999
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1712k 100 1712k 0 0 3403k 0 --:--:-- --:--:-- --:--:-- 3398k
Downloading year 2000
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1632k 100 1632k 0 0 3418k 0 --:--:-- --:--:-- --:--:-- 3422k
Downloading year 2001
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1451k 100 1451k 0 0 2864k 0 --:--:-- --:--:-- --:--:-- 2868k
Downloading year 2002
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 950k 100 950k 0 0 2384k 0 --:--:-- --:--:-- --:--:-- 2387k
Downloading year 2003
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 925k 100 925k 0 0 2382k 0 --:--:-- --:--:-- --:--:-- 2378k
Downloading year 2004
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 982k 100 982k 0 0 1357k 0 --:--:-- --:--:-- --:--:-- 1356k
Downloading year 2005
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1058k 100 1058k 0 0 3181k 0 --:--:-- --:--:-- --:--:-- 3187k
Downloading year 2006
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 897k 100 897k 0 0 1992k 0 --:--:-- --:--:-- --:--:-- 1989k
Downloading year 2007
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 920k 100 920k 0 0 2599k 0 --:--:-- --:--:-- --:--:-- 2600k
Downloading year 2008
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1126k 100 1126k 0 0 3016k 0 --:--:-- --:--:-- --:--:-- 3020k
Downloading year 2009
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1340k 100 1340k 0 0 3120k 0 --:--:-- --:--:-- --:--:-- 3117k
Downloading year 2010
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3360k 100 3360k 0 0 7114k 0 --:--:-- --:--:-- --:--:-- 7105k
Downloading year 2011
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1247k 100 1247k 0 0 2399k 0 --:--:-- --:--:-- --:--:-- 2403k
Downloading year 2012
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1375k 100 1375k 0 0 2957k 0 --:--:-- --:--:-- --:--:-- 2963k
Downloading year 2013
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1485k 100 1485k 0 0 3782k 0 --:--:-- --:--:-- --:--:-- 3780k
Downloading year 2014
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1156k 100 1156k 0 0 2999k 0 --:--:-- --:--:-- --:--:-- 3003k
Downloading year 2015
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1248k 100 1248k 0 0 3247k 0 --:--:-- --:--:-- --:--:-- 3252k
Downloading year 2016
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1284k 100 1284k 0 0 2155k 0 --:--:-- --:--:-- --:--:-- 2154k
Downloading year 2017
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1309k 100 1309k 0 0 1359k 0 --:--:-- --:--:-- --:--:-- 1358k
Downloading year 2018
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1660k 100 1660k 0 0 2850k 0 --:--:-- --:--:-- --:--:-- 2853k
Downloading year 2019
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 5106k 100 5106k 0 0 7809k 0 --:--:-- --:--:-- --:--:-- 7808k
Downloading year 2020
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2986k 100 2986k 0 0 5123k 0 --:--:-- --:--:-- --:--:-2 94408- 5122k
Each one of these is a csv file. We will put them all in the same daframe for your convenience:
import pandas as pd
list_of_dfs = []
for year in range(1932, 2021):
filename = '{0:d}.catalog'.format(year)
print('Reading: ', filename)
df_year = pd.read_csv(filename, sep=r'\s+', comment='#',
names=['Date', 'Hour', 'ET', 'GT', 'MAG', 'M', 'LAT', 'LON',
'DEPTH', 'Q', 'EVID', 'NPH', 'NGRM'])
df_year.Date = pd.to_datetime(df_year['Date'], format='%Y/%m/%d')
list_of_dfs.append(df_year)
df = pd.concat(list_of_dfs, ignore_index=True)
df['Year'] = pd.DatetimeIndex(df['Date']).year
Reading: 1932.catalog
Reading: 1933.catalog
Reading: 1934.catalog
Reading: 1935.catalog
Reading: 1936.catalog
Reading: 1937.catalog
Reading: 1938.catalog
Reading: 1939.catalog
Reading: 1940.catalog
Reading: 1941.catalog
Reading: 1942.catalog
Reading: 1943.catalog
Reading: 1944.catalog
Reading: 1945.catalog
Reading: 1946.catalog
Reading: 1947.catalog
Reading: 1948.catalog
Reading: 1949.catalog
Reading: 1950.catalog
Reading: 1951.catalog
Reading: 1952.catalog
Reading: 1953.catalog
Reading: 1954.catalog
Reading: 1955.catalog
Reading: 1956.catalog
Reading: 1957.catalog
Reading: 1958.catalog
Reading: 1959.catalog
Reading: 1960.catalog
Reading: 1961.catalog
Reading: 1962.catalog
Reading: 1963.catalog
Reading: 1964.catalog
Reading: 1965.catalog
Reading: 1966.catalog
Reading: 1967.catalog
Reading: 1968.catalog
Reading: 1969.catalog
Reading: 1970.catalog
Reading: 1971.catalog
Reading: 1972.catalog
Reading: 1973.catalog
Reading: 1974.catalog
Reading: 1975.catalog
Reading: 1976.catalog
Reading: 1977.catalog
Reading: 1978.catalog
Reading: 1979.catalog
Reading: 1980.catalog
Reading: 1981.catalog
Reading: 1982.catalog
Reading: 1983.catalog
Reading: 1984.catalog
Reading: 1985.catalog
Reading: 1986.catalog
Reading: 1987.catalog
Reading: 1988.catalog
Reading: 1989.catalog
Reading: 1990.catalog
Reading: 1991.catalog
Reading: 1992.catalog
Reading: 1993.catalog
Reading: 1994.catalog
Reading: 1995.catalog
Reading: 1996.catalog
Reading: 1997.catalog
Reading: 1998.catalog
Reading: 1999.catalog
Reading: 2000.catalog
Reading: 2001.catalog
Reading: 2002.catalog
Reading: 2003.catalog
Reading: 2004.catalog
Reading: 2005.catalog
Reading: 2006.catalog
Reading: 2007.catalog
Reading: 2008.catalog
Reading: 2009.catalog
Reading: 2010.catalog
Reading: 2011.catalog
Reading: 2012.catalog
Reading: 2013.catalog
Reading: 2014.catalog
Reading: 2015.catalog
Reading: 2016.catalog
Reading: 2017.catalog
Reading: 2018.catalog
Reading: 2019.catalog
Reading: 2020.catalog
df.round(2)
Date | Hour | ET | GT | MAG | M | LAT | LON | DEPTH | Q | EVID | NPH | NGRM | Year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1932-01-01 | 23:52:07.87 | eq | l | 0.00 | n | 34.13 | -117.99 | 6.0 | D | 3358386 | 7 | 0 | 1932 |
1 | 1932-01-02 | 16:42:43.68 | eq | l | 2.73 | l | 33.90 | -117.64 | 6.0 | C | 3358387 | 12 | 0 | 1932 |
2 | 1932-01-03 | 17:58:10.01 | eq | l | 3.00 | h | 32.00 | -116.00 | 6.0 | D | 3358388 | 7 | 0 | 1932 |
3 | 1932-01-04 | 21:30:00.96 | eq | l | 2.00 | h | 33.77 | -117.49 | 6.0 | C | 3358396 | 11 | 0 | 1932 |
4 | 1932-01-05 | 02:37:27.96 | eq | l | 1.50 | h | 33.56 | -118.44 | 6.0 | C | 3358398 | 8 | 0 | 1932 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
818704 | 2020-12-31 | 20:50:40.05 | eq | l | 0.33 | l | 33.67 | -116.76 | 14.0 | A | 39508231 | 11 | 1933 | 2020 |
818705 | 2020-12-31 | 21:37:28.19 | qb | l | 1.23 | l | 32.59 | -116.87 | -0.3 | C | 39508247 | 38 | 2085 | 2020 |
818706 | 2020-12-31 | 22:15:56.72 | eq | l | 1.45 | l | 33.35 | -116.42 | 13.4 | A | 39508263 | 69 | 2079 | 2020 |
818707 | 2020-12-31 | 22:21:28.12 | eq | l | 2.12 | l | 33.18 | -115.60 | 3.9 | A | 39508279 | 70 | 2121 | 2020 |
818708 | 2020-12-31 | 23:04:53.02 | eq | l | 1.66 | l | 36.06 | -117.38 | 2.7 | A | 39508287 | 50 | 1659 | 2020 |
818709 rows × 14 columns
Each row in this dataframe corresponds to an earthquake event that happened between 1/1/1932 and 12/31/2020. The meaning of the columns is explained here. But for the purposes of this problem we will only need information from the following columns:
Year: This is the year of the event.
ET: This is the type of the event. There are various types of events. For example, the seismometers may pick more than earthquakes, e.g., explosions. We are only intersted in earthquake events which are labeled by
eq
.MAG: This is the magnitude of the event.
Let’s play with the data set to gain some experience. First, let’s extract all data for a random year. Say, year 2019.
df_2019 = df[df['Year'] == 2019]
df_2019
Date | Hour | ET | GT | MAG | M | LAT | LON | DEPTH | Q | EVID | NPH | NGRM | Year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
717665 | 2019-01-01 | 01:39:57.67 | eq | l | 0.83 | l | 33.506 | -116.794 | 5.0 | A | 38412384 | 45 | 1154 | 2019 |
717666 | 2019-01-01 | 01:43:28.42 | eq | l | 0.47 | l | 33.484 | -116.785 | 5.5 | A | 38412392 | 36 | 1309 | 2019 |
717667 | 2019-01-01 | 02:27:57.45 | eq | l | 0.98 | l | 33.505 | -116.798 | 3.5 | A | 38412416 | 52 | 1410 | 2019 |
717668 | 2019-01-01 | 02:31:17.10 | eq | l | 0.61 | l | 33.511 | -116.794 | 2.7 | A | 38412424 | 39 | 825 | 2019 |
717669 | 2019-01-01 | 02:38:38.38 | eq | l | 0.61 | l | 33.706 | -116.808 | 15.3 | A | 38412432 | 27 | 1135 | 2019 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
781419 | 2019-12-31 | 22:47:36.78 | eq | l | 1.47 | l | 35.638 | -117.460 | 6.9 | A | 39019127 | 34 | 1906 | 2019 |
781420 | 2019-12-31 | 22:52:18.08 | eq | l | 0.23 | l | 33.589 | -116.805 | 5.5 | A | 39019135 | 38 | 1665 | 2019 |
781421 | 2019-12-31 | 23:05:39.85 | eq | l | 0.94 | l | 35.755 | -117.559 | 6.7 | A | 39019143 | 19 | 691 | 2019 |
781422 | 2019-12-31 | 23:18:31.60 | eq | l | 0.74 | l | 35.721 | -117.555 | 3.8 | A | 39019151 | 38 | 476 | 2019 |
781423 | 2019-12-31 | 23:28:38.45 | eq | l | 1.45 | l | 33.018 | -116.000 | 1.1 | A | 39019159 | 41 | 1790 | 2019 |
63759 rows × 14 columns
Out of these, we only care about earthquake events. So, let’s filter out everything else:
df_2019_eq = df_2019[df_2019['ET'] == 'eq']
df_2019_eq
Date | Hour | ET | GT | MAG | M | LAT | LON | DEPTH | Q | EVID | NPH | NGRM | Year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
717665 | 2019-01-01 | 01:39:57.67 | eq | l | 0.83 | l | 33.506 | -116.794 | 5.0 | A | 38412384 | 45 | 1154 | 2019 |
717666 | 2019-01-01 | 01:43:28.42 | eq | l | 0.47 | l | 33.484 | -116.785 | 5.5 | A | 38412392 | 36 | 1309 | 2019 |
717667 | 2019-01-01 | 02:27:57.45 | eq | l | 0.98 | l | 33.505 | -116.798 | 3.5 | A | 38412416 | 52 | 1410 | 2019 |
717668 | 2019-01-01 | 02:31:17.10 | eq | l | 0.61 | l | 33.511 | -116.794 | 2.7 | A | 38412424 | 39 | 825 | 2019 |
717669 | 2019-01-01 | 02:38:38.38 | eq | l | 0.61 | l | 33.706 | -116.808 | 15.3 | A | 38412432 | 27 | 1135 | 2019 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
781419 | 2019-12-31 | 22:47:36.78 | eq | l | 1.47 | l | 35.638 | -117.460 | 6.9 | A | 39019127 | 34 | 1906 | 2019 |
781420 | 2019-12-31 | 22:52:18.08 | eq | l | 0.23 | l | 33.589 | -116.805 | 5.5 | A | 39019135 | 38 | 1665 | 2019 |
781421 | 2019-12-31 | 23:05:39.85 | eq | l | 0.94 | l | 35.755 | -117.559 | 6.7 | A | 39019143 | 19 | 691 | 2019 |
781422 | 2019-12-31 | 23:18:31.60 | eq | l | 0.74 | l | 35.721 | -117.555 | 3.8 | A | 39019151 | 38 | 476 | 2019 |
781423 | 2019-12-31 | 23:28:38.45 | eq | l | 1.45 | l | 33.018 | -116.000 | 1.1 | A | 39019159 | 41 | 1790 | 2019 |
63108 rows × 14 columns
Now, let’s see if there was at least one major earthquake during 2019:
test_mag = df_2019_eq['MAG'] >= 6
test_mag
717665 False
717666 False
717667 False
717668 False
717669 False
...
781419 False
781420 False
781421 False
781422 False
781423 False
Name: MAG, Length: 63108, dtype: bool
Is there at least one True value in this array?
test_mag.value_counts()
MAG
False 63106
True 2
Name: count, dtype: int64
There are exactly 2 major earthquakes. You can extract the number like this:
test_mag.value_counts()[True]
2
So, to test whether or not there was a major earthquake you need to do:
True in test_mag.value_counts().keys()
True
Now, we will use bootstrapping to estimate the probability of a major earthquake during a randomly picked year. Follow the instructions below completing the code where necessary.
def estimate_probability_of_major_earthquake_during_year(num_years, df):
"""
Estimate the probability of major earthquake in a random year.
Arguments:
num_years - The number of years to pick at random.
df - The dataframe containing all the observed events.
Returns: The number of years in which we had at least one major earthquake divided by the num_years.
"""
num_major_eqs = 0
for i in range(num_years):
# Pick a year at random between 1932 and 2020
y = np.random.randint(1932, 2021)
# Extract all the events that happened in that year
df_y = # YOUR CODE HERE
# Find all earthquake events
df_y_eq = # YOUR CODE HERE
# Test if there is at least one major earthquake in this year
test_mag = # YOUR CODE HERE
test_mag_counts = test_mag.value_counts()
# Test if there is at least one major event in this year
# and increase num_major_eqs by one if yes
if True in test_mag.value_counts():
num_major_eqs += 1
return num_major_eqs / num_years
Use the following lines to test your code. We run it 10 times. Notice that everytime you get a slighlty different estimate.
for i in range(10):
p_major_eq = estimate_probability_of_major_earthquake_during_year(50, df)
print(f'p_major_eq = {p_major_eq:1.2f}')
Repeat the probability estimation above 200 times, store all estimates in a list, and do a histogram of the estimates. Hint: Replicate what we did at the very end of Estimating probabilities from data - Bootstrapping.
# A place to store the estimates
p_major_eqs = []
# Put 1000 estimates in there
for i in range(200):
print(i)
p_major_eq = # your code here
p_major_eqs.append(p_major_eq)
# And now do the histogram
# Your code here