Homework 9

  • Type your name and email in the “Student details” section below.

  • Develop the code and generate the figures you need to solve the problems using this notebook.

  • For the answers that require a mathematical proof or derivation you can either:

    • Type the answer using the built-in latex capabilities. In this case, simply export the notebook as a pdf and upload it on gradescope; or

    • You can print the notebook (after you are done with all the code), write your answers by hand, scan, turn your response to a single pdf, and upload on gradescope.

  • The total homework points are 100. Please note that the problems are not weighed equally.


  • This is due before the beginning of the next lecture.

  • Please match all the pages corresponding to each of the questions when you submit on gradescope.

Student details

  • First Name:

  • Last Name:

  • Email:

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(rc={"figure.dpi":100, 'savefig.dpi':300})
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina', 'svg')
import numpy as np
import scipy.stats as st

Problem 1: Predicting the probability of major earthquakes in Southern California

We are going to revisit the Problem 2 - Predicting the probability of major earthquakes in Southern California, but this time we are going to use a Poisson distribution to carry out our analysis.

The San Andreas fault extends through California forming the boundary between the Pacific and the North American tectonic plates. It has caused some of the major earthquakes on Earth. We are going to focus on Southern California and we would like to assess the probability of a major earthquake, defined as an earthquake of magnitude 6.5 or greater, during the next ten years.

The first thing we are going to do is go over a database of past earthquakes that have occured in Southern California and collect the relevant data. We are going to start at 1900 because data before that time may are unreliable. Go over each decade and count the occurence of a major earthquake (i.e., count the number of organge and red colors in each decade). I have done this for you.

eq_data = np.array([
    0, # 1900-1909
    1, # 1910-1919
    2, # 1920-1929
    0, # 1930-1939
    3, # 1940-1949
    2, # 1950-1959
    1, # 1960-1969
    2, # 1970-1979
    1, # 1980-1989
    4, # 1990-1999
    0, # 2000-2009
    2 # 2010-2019 
fig, ax = plt.subplots(dpi=150)
ax.bar(np.linspace(1900, 2019, eq_data.shape[0]), eq_data, width=10)
ax.set_ylabel('# of major earthquakes in Southern CA');

The Poisson distribution is a discrete distribution with values \(\{0,1,2,\dots\}\) which is commonly used to model the number of events occuring in a certain time period. It is the right choice when these events are happening independently and the probability of any event happening over a small period of time is constant. Let’s use the Poisson to model the number of earthquakes \(X\) occuring in a decade. We write: $\( X \sim \operatorname{Poisson}(r), \)\( where \)r\( is the *rate parameter* of Poisson. The rate is the number of events per time period. Here, \)r$ is the number of earthquakes per decade.

  • Using the eq_data to find the rate \(r\) of the Poisson. We can set it as the empirical average of the observed number of earthquakes per decade:

r = # Your code here
  • Initialize a Poisson random variable \(X\) with rate parameter \(r\) using scipy.stats. Hint: See The Poisson distribution.

X = # Your code here
  • Plot the probability mass function of \(X\).

# Your code here
  • What is the probability that no major earthquake will occur during the next decade?


# Your may use code to answer
  • What is the probability that one or two major earthquakes will occur during the next decade?


# Your may use code to answer
  • What is the probability that at least one major earthquake will occur during the next decade? Hint: Use the obvious rule.


# Your may use code to answer