# Plotting noisy measurements¶

Let’s continue on the previous example with the dumped harmonic oscillator by making things a bit more realistic. Now assume that we have a small error in the measurement of the position. So, we assume that our measurement $$y_i$$ at timestep $$t_i$$ is not exactly given by the function $$f(t_i) = e^{-0.1t_i}\left[\cos(\pi t_i) + 0.5\sin(\pi t_i)\right]$$, but by:

$y_i = f(t_i) + \text{noise}.$

This is a model of the measurement process. It is one of the fundamental blocks in data science problems. We do not know enough to understand the nature of the noise. Instead, I am going to just make some noise for you and tell you how you can make it bigger or smaller. Then, we are going to simulate the measurement process and generate some data to plot.

Start by importing some basic libraries.

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(rc={"figure.dpi":100, 'savefig.dpi':300})
sns.set_context('notebook')
sns.set_style("ticks")
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina', 'svg')
import numpy as np

These are the times on which I will have measurements:

ts = np.linspace(0, 4 * np.pi, 100)
ts
array([ 0.        ,  0.12693304,  0.25386607,  0.38079911,  0.50773215,
0.63466518,  0.76159822,  0.88853126,  1.01546429,  1.14239733,
1.26933037,  1.3962634 ,  1.52319644,  1.65012947,  1.77706251,
1.90399555,  2.03092858,  2.15786162,  2.28479466,  2.41172769,
2.53866073,  2.66559377,  2.7925268 ,  2.91945984,  3.04639288,
3.17332591,  3.30025895,  3.42719199,  3.55412502,  3.68105806,
3.8079911 ,  3.93492413,  4.06185717,  4.1887902 ,  4.31572324,
4.44265628,  4.56958931,  4.69652235,  4.82345539,  4.95038842,
5.07732146,  5.2042545 ,  5.33118753,  5.45812057,  5.58505361,
5.71198664,  5.83891968,  5.96585272,  6.09278575,  6.21971879,
6.34665183,  6.47358486,  6.6005179 ,  6.72745093,  6.85438397,
6.98131701,  7.10825004,  7.23518308,  7.36211612,  7.48904915,
7.61598219,  7.74291523,  7.86984826,  7.9967813 ,  8.12371434,
8.25064737,  8.37758041,  8.50451345,  8.63144648,  8.75837952,
8.88531256,  9.01224559,  9.13917863,  9.26611167,  9.3930447 ,
9.51997774,  9.64691077,  9.77384381,  9.90077685, 10.02770988,
10.15464292, 10.28157596, 10.40850899, 10.53544203, 10.66237507,
10.7893081 , 10.91624114, 11.04317418, 11.17010721, 11.29704025,
11.42397329, 11.55090632, 11.67783936, 11.8047724 , 11.93170543,
12.05863847, 12.1855715 , 12.31250454, 12.43943758, 12.56637061])

Get the true (but unobserved) oscillator position:

fs = np.exp(-0.1 * ts) * (np.cos(np.pi * ts) + 0.5 * np.cos(np.pi * ts))

And let’s make the some noise!

This is the variable that controls the noise. It is called “standard deviation.” We will explain what it is in a few lectures. For now, this is what you need to know:

• Smaller standard deviation, means less noise.

• Bigger standard deivation, means more noie.

• The smallest standard deviation you can have is zero (no noise).

sigma = 0.1

And now we are going to make some random numbers centered at zero, and we are going to multiply them with sigma and add them to fs:

ys = fs + sigma * np.random.randn(fs.shape[0])
ys
array([ 1.56973355,  1.36883717,  0.9781098 ,  0.44663113, -0.05728611,
-0.59942684, -1.03941339, -1.40012097, -1.23195924, -1.34212573,
-0.85960661, -0.43792201,  0.12108928,  0.65846996,  1.0428189 ,
1.12905815,  1.09954211,  1.13637876,  0.57997425,  0.30727974,
-0.12185871, -0.64338427, -1.03608775, -1.20521753, -1.10805339,
-1.01655788, -0.61122002, -0.11188226,  0.16986948,  0.40607295,
0.76543685,  0.88804251,  0.91099358,  0.78897458,  0.65022897,
0.23795338, -0.33168056, -0.57382496, -0.8087279 , -1.05312185,
-0.76679741, -0.74115966, -0.4456702 ,  0.09277946,  0.03023698,
0.54138368,  0.5820714 ,  0.77258375,  0.64617286,  0.69570863,
0.37914282,  0.13689345, -0.31098637, -0.59876778, -0.74369654,
-0.75611362, -0.50016996, -0.56391642, -0.34302369, -0.08051999,
0.10209056,  0.49143289,  0.7692702 ,  0.65642031,  0.44075706,
0.5148094 ,  0.09448634, -0.03307068, -0.25133974, -0.35346487,
-0.49275691, -0.61384692, -0.44140502, -0.50161076, -0.061796  ,
-0.07955332,  0.26266313,  0.39539221,  0.55443381,  0.73261334,
0.53816396,  0.27905128,  0.16885766, -0.22357881, -0.37953348,
-0.37841846, -0.45235417, -0.57089282, -0.33201214, -0.28557784,
-0.02496453, -0.0252706 ,  0.38162693,  0.33649605,  0.44754801,
0.37404447,  0.29067732,  0.18992481,  0.05718148, -0.16327172])

We now have the noisy position measurements ys. Let’s plot them against the true (but, I repeat, unobserved) oscillator positions:

fig, ax = plt.subplots()
ax.plot(ts, fs, label='Unobserved true position')
ax.plot(ts, ys, '.', label='Noisy measurement of position')
ax.set_xlabel('$t$ (Time)')
ax.set_ylabel('$y$ (Position)')
plt.legend(loc='best');

Typically, we do not observe the velocity of the oscillator. We can, however, reconstruct the velocity from the position measurements. This is done by estimating the derivative via finite differences, i.e., by:

$v(t) \approx \frac{y(t+\delta t) - y(t)}{\delta t},$

for $$\delta t$$ being the time that passes between two consecutive timesteps. The Numpy library has a function for reconstructing the velocity in this way. It is called numpy.gradient. Let’s use it to reconstruct the velocity. We have to use the noisy position data:

dt = ts[1] - ts[0]
vs
array([-1.58269575, -2.33045615, -3.63264785, -4.07851233, -4.1205111 ,
-3.8686827 , -3.15400212, -0.75845441,  0.2284482 ,  1.46672859,
3.56173514,  3.86304429,  4.318781  ,  3.63077118,  1.85368681,
0.22343753,  0.02883651, -2.04662187, -3.26589141, -2.76457956,
-3.74474619, -3.60122575, -2.21310886, -0.28347876,  0.74314634,
1.95706877,  3.56359404,  3.07677782,  2.04026953,  2.34599038,
1.89851899,  0.57336033, -0.390237  , -1.02717392, -2.17051926,
-3.86782494, -3.19766372, -1.87912997, -1.88799112,  0.16516777,
1.22884551,  1.26494734,  3.28495697,  1.87463873,  1.76709006,
2.17372263,  0.91071671,  0.25250108, -0.30281761, -1.05185397,
-2.20122041, -2.71847747, -2.897832  , -1.70448204, -0.61979863,
0.95927185,  0.75708108,  0.61901247,  1.90413956,  1.75334279,
2.25297091,  2.62807722,  0.64989945, -1.29404113, -0.55781741,
-1.36398974, -2.15814612, -1.36223826, -1.26205991, -0.95096273,
-1.02566696,  0.20227947,  0.44210775,  1.49531211,  1.66252007,
1.27807206,  1.87085073,  1.14930944,  1.32834266, -0.0640883 ,
-1.78661945, -1.45472886, -1.97990257, -2.16015924, -0.60992652,
-0.28684688, -0.75817285,  0.47403749,  1.12387991,  1.20948657,
1.02537233,  1.60159824,  1.42502952,  0.25966875,  0.14790641,
-0.61792697, -0.72526296, -0.91975992, -1.39127112, -1.73676773])

Let’s plot these data against the true (but unobserved) velocity:

vs_true = -0.1 * fs + np.exp(-0.1 * ts) * (-np.pi * np.sin(np.pi * ts) + 0.5 * np.pi * np.cos(np.pi * ts))
fig, ax = plt.subplots()
ax.plot(ts, vs_true, label='Unobserved true velocity')
ax.plot(ts, vs, '.', label='Reconstruction of velocity from noisy measurements')
ax.set_xlabel('$t$ (Time)')
ax.set_ylabel('$v$ (Velocity)')
plt.legend(loc='best');

Observe that the noise of the velocity is bigger than for the position. With the things that we learn in this class it is actually possible to predict how much uncertainty there is going to be in the estimate of the velocity. Using state-of-the-art data science, it is also possible to construct a much more accurate estimate the velocity than the naïve method we used above. However, these so-called filtering problems are outside of the scope of this course.

## Questions¶

• Rerun the code above with smaller sigma so that that the velocity reconstruction becomes very accurate. How low should you go?