# Homework 14¶

• Type your name and email in the “Student details” section below.

• Develop the code and generate the figures you need to solve the problems using this notebook.

• For the answers that require a mathematical proof or derivation you can either:

• Type the answer using the built-in latex capabilities. In this case, simply export the notebook as a pdf and upload it on gradescope; or

• You can print the notebook (after you are done with all the code), write your answers by hand, scan, turn your response to a single pdf, and upload on gradescope.

• The total homework points are 100. Please note that the problems are not weighed equally.

Note

• This is due before the beginning of the next lecture.

• Please match all the pages corresponding to each of the questions when you submit on gradescope.

## Student details¶

• First Name:

• Last Name:

• Email:

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(rc={"figure.dpi":100, 'savefig.dpi':300})
sns.set_context('notebook')
sns.set_style("ticks")
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina', 'svg')
import numpy as np
import scipy.stats as st
import pandas as pd
import requests
import os
"""
Downloads the file in the url and saves it in the current working directory.
"""
data = requests.get(url)
if local_filename is None:
local_filename = os.path.basename(url)
with open(local_filename, 'wb') as fd:
fd.write(data.content)


## Problem 1 - Joint probability mass function, covariance, and correlation¶

In this problem you will perform some analytical calculations that demonstrate further the concept of the joint probability mass (density) function (see Lecture 13.

Consider two random variables $$X$$ and $$Y$$ taking discrete values in $$\{0,1\}$$. The joint probability mass function $$p(x,y)$$ is given by:

$p(X = 0, Y = 0) = 0.1,$
$p(X = 1, Y=0) = 0.2,$
$p(X = 0, Y = 1) = 0.2,$

and

$p(X = 1, Y = 1) = 0.5.$

For your convenience, I have put these numbers in a 2 x 2 matrix:

A = np.array([[0.1, 0.2],
[0.2, 0.5]])

• Verify that the sum of all entries in A is one, i.e., verify that $$\sum_{x,y} p(x,y) = 1$$.

# Your code here

• What is the probability mass function of $$Y$$? Hint: Use the marginalization property,

$p(Y = y) = \sum_{x} p(x, Y = y) = p(X=0, Y = y) + p(X=1, Y = y),$

to find $$p(Y=y)$$ for $$y=0$$ and $$1$$.

# Your code/text here

• What is the probability mass function of $$X$$? Hint: Same as above if you swap $$x$$ and $$y$$.

# Your code/text here

• What is the expectation of $$X$$?

# Your code/text here

• What is the expectation of $$Y$$?

# Your code/text here

• What is the variance of $$X$$?

# Your code/text here

• What is the variance of $$Y$$?

# Your code/text here

• What is the covariance of $$X$$ and $$Y$$?

# Your code/text here

• What is the correlation of $$X$$ and $$Y$$?

# Your code/text here

• What is the covariance of $$X$$ with itself? Hint: Use the definition of covariance.

# Your code/text here

• What is the covariance of $$3X$$ with $$Y$$? Hint: Use the definition of covariance.

# Your code/text here

• What is the covariance of $$X$$ with 0.31? Yes, the constant 0.31. Hint: Use the definition of covariance for $$Y=0.31$$. You will need the mean of the constant. What is the mean of a constant?

# Your code/text here


## Problem 2 - Estimating the mechanical properties of a plastic material from molecular dynamics simulations¶

First, make sure that this dataset is visible from this Jupyter notebook.

url = 'https://raw.githubusercontent.com/PurdueMechanicalEngineering/me-297-intro-to-data-science/master/data/stress_strain.txt'


data =  pd.read_csv('stress_strain.txt', delimiter='\t', names=['Strain', 'Stress'], skiprows=1)
data

Strain Stress
0 0.000000 -29.489760
1 0.000252 11.779628
2 0.000503 45.195298
3 0.000755 49.495215
4 0.001007 42.283019
... ... ...
996 0.250583 115.896653
997 0.250835 125.218603
998 0.251086 118.189281
999 0.251338 142.209075
1000 0.251589 153.353895

1001 rows × 2 columns

The dataset was generated using a molecular dynamics simulation of a plastic material (thanks to Professor Alejandro Strachan for sharing the data!). This is a simulated version of the real experiment we saw in Homework 5. Specifically, Strachan’s group did the following:

• They took a rectangular chunk of the material and marked the position of each one of its atoms;

• They started applying a tensile force along one dimension. The atoms are coupled together through electromagnetic forces and they must all satisfy Newton’s law of motion.

• For each value of the applied tensile force they marked the Stress (force be unit area) in the middle of the materail and the corresponding Strain of the material (percent enlogation in the pulling direction).

• Eventually the material entered the plastic regime and then it broke.

Here is a visualization of the data:

fig, ax = plt.subplots()
ax.plot(data['Strain'], data['Stress'], 'o', label='Stress-strain data')
plt.xlabel('$\epsilon$ (strain in %)')
plt.ylabel('$\sigma$ (stress in MPa)')
plt.legend(loc='best'); Note that for each particular value of the strain, you don’t necessarily get a unique stress. This is because in molecular dynamics the atoms are jiggling around due to thermal effects. So there is always this “jiggling” noise when you are trying to measure the stress and the strain. We would like to process this noise in order to extract what is known as the stress-strain curve of the material. The stress-strain curve is a macroscopic property of the the material which is affeted by the fine structure, e.g., the chemical bonds, the crystaline structure, any defects, etc. It is a required input to mechanics of materials.

The very first part of the stress-strain curve should be linear. It is called the elastic regime. In that region, say $$\epsilon < \epsilon_l=0.04$$, the relationship between stress and strain is: $$$\sigma(\epsilon) = E\epsilon.$$$$The constant$$E$$is known as the *Young modulus* of the material. Assume that you measure$$\epsilon$$without any noise, but your measured$$\sigma$ is noisy.

First, extract the relevant data for this problem, split it into training and validation datasets, and visualize the training and validation datasets using different colors.

# The point at which the stress-strain curve stops being linear
epsilon_l = 0.04
# Relevant data (this is nice way to get the linear part of the stresses and straints)
data_linear = data[data['Strain'] < 0.04]
data_nonlinear = data[data['Strain'] >= 0.04]


Let’s visualize this to make sure that we have split the data correctly:

fig, ax = plt.subplots()
ax.plot(data_linear['Strain'], data_linear['Stress'], 'o', label='Linear part')
ax.plot(data_nonlinear['Strain'], data_nonlinear['Stress'], 'x', label='Linear part')
plt.xlabel('$\epsilon$ (strain in %)')
plt.ylabel('$\sigma$ (stress in MPa)')
plt.legend(loc='best'); • Find the empirical correlation between Strain and Stress in the linear regime.

rho_linear = # Your code here
print('rho_linear = {0:1.2f}'.format(rho_linear))

• Find the empirical correlation between Strain and Stress in the nonlinear regime.

rho_nonlinear = # Your code here
print('rho_nonlinear = {0:1.2f}'.format(rho_nonlinear))

• Find the empirical mean of the Strain in the linear regime:

mu_strain_linear = # Your code here
# print something

• Find the standard deviation of the Strain in the linear regime:

sigma_strain_linear = # Your code here
# print something

• Find the empirical mean of the Stress in the linear regime:

mu_stress_linear = # Your code here
# print something

• Find the standard deviation of the Stress in the linear regime:

sigma_stress_linear = # Your code here
# print something

• In the linear regime, fit the following model:

$\sigma(\epsilon) = a\epsilon + b.$

In this model, $$\hat{a}$$ will be your estimate of the Young’s modulus $$E$$ of the material.

Hint: Use the formulas for $$\hat{a}$$ and $$\hat{b}$$ that we derived in Linear Regression with One Variable.

a = # Your code here
b = # Your code here
# print something

• Theoretically, $$b$$ should be excactly zero. If you get a non-zero value, where do you think this is coming from?

• In the same figure, plot the data from the linear regime along with the predictions of the fitted model $$\sigma(\epsilon) = \hat{a}\epsilon + \hat{b}$$.
# Your code here