Plotting noisy measurements

Contents

Plotting noisy measurements#

Let’s continue on the previous example with the dumped harmonic oscillator by making things a bit more realistic. Now assume that we have a small error in the measurement of the position. So, we assume that our measurement \(y_i\) at timestep \(t_i\) is not exactly given by the function \(f(t_i) = e^{-0.1t_i}\left[\cos(\pi t_i) + 0.5\sin(\pi t_i)\right]\), but by:

\[ y_i = f(t_i) + \text{noise}. \]

This is a model of the measurement process. It is one of the fundamental blocks in data science problems. We do not know enough to understand the nature of the noise. Instead, I am going to just make some noise for you and tell you how you can make it bigger or smaller. Then, we are going to simulate the measurement process and generate some data to plot.

Start by importing some basic libraries:

Hide code cell source
MAKE_BOOK_FIGURES=False

import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib_inline
matplotlib_inline.backend_inline.set_matplotlib_formats('svg')
import seaborn as sns
sns.set_context("paper")
sns.set_style("ticks")

def set_book_style():
    plt.style.use('seaborn-v0_8-white') 
    sns.set_style("ticks")
    sns.set_palette("deep")

    mpl.rcParams.update({
        # Font settings
        'font.family': 'serif',  # For academic publishing
        'font.size': 8,  # As requested, 10pt font
        'axes.labelsize': 8,
        'axes.titlesize': 8,
        'xtick.labelsize': 7,  # Slightly smaller for better readability
        'ytick.labelsize': 7,
        'legend.fontsize': 7,
        
        # Line and marker settings for consistency
        'axes.linewidth': 0.5,
        'grid.linewidth': 0.5,
        'lines.linewidth': 1.0,
        'lines.markersize': 4,
        
        # Layout to prevent clipped labels
        'figure.constrained_layout.use': True,
        
        # Default DPI (will override when saving)
        'figure.dpi': 600,
        'savefig.dpi': 600,
        
        # Despine - remove top and right spines
        'axes.spines.top': False,
        'axes.spines.right': False,
        
        # Remove legend frame
        'legend.frameon': False,
        
        # Additional trim settings
        'figure.autolayout': True,  # Alternative to constrained_layout
        'savefig.bbox': 'tight',    # Trim when saving
        'savefig.pad_inches': 0.1   # Small padding to ensure nothing gets cut off
    })

def save_for_book(fig, filename, is_vector=True, **kwargs):
    """
    Save a figure with book-optimized settings.
    
    Parameters:
    -----------
    fig : matplotlib figure
        The figure to save
    filename : str
        Filename without extension
    is_vector : bool
        If True, saves as vector at 1000 dpi. If False, saves as raster at 600 dpi.
    **kwargs : dict
        Additional kwargs to pass to savefig
    """    
    # Set appropriate DPI and format based on figure type
    if is_vector:
        dpi = 1000
        ext = '.pdf'
    else:
        dpi = 600
        ext = '.tif'
    
    # Save the figure with book settings
    fig.savefig(f"{filename}{ext}", dpi=dpi, **kwargs)


def make_full_width_fig():
    return plt.subplots(figsize=(4.7, 2.9), constrained_layout=True)

def make_half_width_fig():
    return plt.subplots(figsize=(2.35, 1.45), constrained_layout=True)

if MAKE_BOOK_FIGURES:
    set_book_style()
make_full_width_fig = make_full_width_fig if MAKE_BOOK_FIGURES else lambda: plt.subplots()
make_half_width_fig = make_half_width_fig if MAKE_BOOK_FIGURES else lambda: plt.subplots()

These are the times on which I will have measurements:

import numpy as np
ts = np.linspace(0, 4 * np.pi, 100)
ts
array([ 0.        ,  0.12693304,  0.25386607,  0.38079911,  0.50773215,
        0.63466518,  0.76159822,  0.88853126,  1.01546429,  1.14239733,
        1.26933037,  1.3962634 ,  1.52319644,  1.65012947,  1.77706251,
        1.90399555,  2.03092858,  2.15786162,  2.28479466,  2.41172769,
        2.53866073,  2.66559377,  2.7925268 ,  2.91945984,  3.04639288,
        3.17332591,  3.30025895,  3.42719199,  3.55412502,  3.68105806,
        3.8079911 ,  3.93492413,  4.06185717,  4.1887902 ,  4.31572324,
        4.44265628,  4.56958931,  4.69652235,  4.82345539,  4.95038842,
        5.07732146,  5.2042545 ,  5.33118753,  5.45812057,  5.58505361,
        5.71198664,  5.83891968,  5.96585272,  6.09278575,  6.21971879,
        6.34665183,  6.47358486,  6.6005179 ,  6.72745093,  6.85438397,
        6.98131701,  7.10825004,  7.23518308,  7.36211612,  7.48904915,
        7.61598219,  7.74291523,  7.86984826,  7.9967813 ,  8.12371434,
        8.25064737,  8.37758041,  8.50451345,  8.63144648,  8.75837952,
        8.88531256,  9.01224559,  9.13917863,  9.26611167,  9.3930447 ,
        9.51997774,  9.64691077,  9.77384381,  9.90077685, 10.02770988,
       10.15464292, 10.28157596, 10.40850899, 10.53544203, 10.66237507,
       10.7893081 , 10.91624114, 11.04317418, 11.17010721, 11.29704025,
       11.42397329, 11.55090632, 11.67783936, 11.8047724 , 11.93170543,
       12.05863847, 12.1855715 , 12.31250454, 12.43943758, 12.56637061])

Get the true (but unobserved) oscillator position:

fs = np.exp(-0.1 * ts) * (np.cos(np.pi * ts) + 0.5 * np.cos(np.pi * ts))

And let’s make the some noise!

This is the variable that controls the noise. It is called “standard deviation.” We will explain what it is in a few lectures. For now, this is what you need to know:

  • Smaller standard deviation, means less noise.

  • Bigger standard deivation, means more noie.

  • The smallest standard deviation you can have is zero (no noise).

sigma = 0.1

And now we are going to make some random numbers centered at zero, and we are going to multiply them with sigma and add them to fs:

ys = fs + sigma * np.random.randn(fs.shape[0])
ys
array([ 1.4939844 ,  1.17297325,  1.11161168,  0.4457545 , -0.01144643,
       -0.56612288, -0.95509287, -1.36750521, -0.98265014, -1.12213091,
       -1.00365159, -0.29097008,  0.20368885,  0.62108445,  0.93699972,
        1.22450894,  1.27634348,  1.11822838,  0.72398129,  0.39639402,
       -0.22696911, -0.61813557, -0.81504844, -1.03958969, -1.08393304,
       -1.03012538, -0.72964004, -0.06272475, -0.01652057,  0.6668657 ,
        0.80025996,  0.97866772,  0.89450173,  0.94303851,  0.67499594,
        0.1247536 , -0.39189089, -0.47506442, -0.7569252 , -0.81565284,
       -0.89192101, -0.56752135, -0.52283979, -0.13536049,  0.07293474,
        0.65442844,  0.8138885 ,  0.83482351,  0.80203939,  0.68979821,
        0.51111427,  0.15894807, -0.14073572, -0.44448025, -0.57539898,
       -0.80868805, -0.6793171 , -0.50159409, -0.31214813, -0.08595849,
        0.26325071,  0.4455467 ,  0.52048849,  0.63648893,  0.59483714,
        0.42795686,  0.41930225, -0.04761196, -0.24924812, -0.30065718,
       -0.60876726, -0.61532532, -0.66010308, -0.29250997, -0.11325648,
        0.00542164,  0.14324618,  0.51294569,  0.51496321,  0.42051548,
        0.3756283 ,  0.20882418,  0.05597006, -0.00370275, -0.33633546,
       -0.27962092, -0.43309922, -0.41337406, -0.34514539, -0.29411204,
        0.04215301,  0.25735351,  0.26873612,  0.38187978,  0.28557635,
        0.42992947,  0.27479935,  0.27599961,  0.07668661, -0.03371873])

We now have the noisy position measurements ys. Let’s plot them against the true (but, I repeat, unobserved) oscillator positions:

fig, ax = make_full_width_fig()
ax.plot(ts, fs, label='Unobserved true position')
ax.plot(ts, ys, '.', label='Noisy measurement of position')
ax.set_xlabel('$t$ (Time)')
ax.set_ylabel('$y$ (Position)')
plt.legend(loc='best')
save_for_book(fig, 'ch4.fig3');
../_images/3259b4fa5899320d91b98d85c87129cdcc380dfc74d07b8f4bf0b27a161b1c92.svg

Typically, we do not observe the velocity of the oscillator. We can, however, reconstruct the velocity from the position measurements. This is done by estimating the derivative via finite differences, i.e., by:

\[ v(t) \approx \frac{y(t+\delta t) - y(t)}{\delta t}, \]

for \(\delta t\) being the time that passes between two consecutive timesteps. The Numpy library has a function for reconstructing the velocity in this way. It is called numpy.gradient. Let’s use it to reconstruct the velocity. We have to use the noisy position data:

dt = ts[1] - ts[0]
vs = np.gradient(ys, dt)
vs
array([-2.52898032, -1.50619859, -2.86457635, -4.42382118, -3.98587084,
       -3.71710341, -3.156713  , -0.10855042,  0.96655018, -0.08272649,
        3.27401301,  4.75581642,  3.59266017,  2.88857373,  2.37694028,
        1.33670389, -0.41864816, -2.17580155, -2.84336677, -3.74587428,
       -3.99631812, -2.31649437, -1.66014356, -1.05915927,  0.03728075,
        1.3955902 ,  3.81067315,  2.80903808,  2.87391869,  3.21736781,
        1.22821459,  0.37122634, -0.14034649, -0.86465196, -3.22329368,
       -4.20255773, -2.36273407, -1.4379011 , -1.34160666, -0.53175995,
        0.97741099,  1.45384222,  1.70231826,  2.34680641,  3.11104562,
        2.91867975,  0.71059147, -0.04667468, -0.57126695, -1.14597871,
       -2.09106374, -2.5676924 , -2.37695536, -1.71217548, -1.43464542,
       -0.40934227,  1.20966919,  1.44630971,  1.63722387,  2.26654484,
        2.09364404,  1.01328147,  0.75213767,  0.29286565, -0.82142552,
       -0.69144686, -1.87330592, -2.6334766 , -0.99676659, -1.41617642,
       -1.2395045 , -0.20221615,  1.27159706,  2.15407515,  1.17357786,
        1.01038576,  1.99918032,  1.46422492, -0.36409045, -0.54885205,
       -0.83386999, -1.2591609 , -0.8371616 , -1.54532475, -1.0868651 ,
       -0.38116067, -0.52686495,  0.34645759,  0.46978319,  1.52560128,
        2.17226962,  0.89253009,  0.49051951,  0.06633509,  0.1892718 ,
       -0.0424515 , -0.60634276, -0.78038288, -1.22000682, -0.86979203])

Let’s plot these data against the true (but unobserved) velocity:

vs_true = -0.1 * fs + np.exp(-0.1 * ts) * (-np.pi * np.sin(np.pi * ts) + 0.5 * np.pi * np.cos(np.pi * ts))
fig, ax = make_full_width_fig()
ax.plot(ts, vs_true, label='Unobserved true velocity')
ax.plot(ts, vs, '.', label='Reconstruction of velocity from noisy measurements')
ax.set_xlabel('$t$ (Time)')
ax.set_ylabel('$v$ (Velocity)')
plt.legend(loc='best')
save_for_book(fig, 'ch4.fig4');
../_images/35aea33142980399e450e1a3b30c3351e3d57cf3bbaef194f7117ffff8c43ca0.svg

Observe that the noise of the velocity is bigger than for the position. With the things that we learn in this class it is actually possible to predict how much uncertainty there is going to be in the estimate of the velocity. Using state-of-the-art data science, it is also possible to construct a much more accurate estimate the velocity than the naïve method we used above. However, these so-called filtering problems are outside of the scope of this course.

Questions#

  • Rerun the code above with smaller sigma so that that the velocity reconstruction becomes very accurate. How low should you go?