Polynomial Regression

We take up where we left in the previous section. Recall that we tried to fit a linear regression model to data generated from:

\[ y_i = -0.5 + 2x_i + 2x_i^2 + \epsilon_i, \]

where \(\epsilon_i \sim N(0, 1)\) and where we sample \(x_i \sim U([-1,1])\):

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(rc={"figure.dpi":100, 'savefig.dpi':300})
sns.set_context('notebook')
sns.set_style("ticks")
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina', 'svg')
import numpy as np
import scipy.stats as st
# How many observations we have
num_obs = 10
x = -1.0 + 2 * np.random.rand(num_obs)
w0_true = -0.5
w1_true = 2.0
w2_true = 2.0
sigma_true = 0.1
y = w0_true + w1_true * x + w2_true * x ** 2 + sigma_true * np.random.randn(num_obs)
# Let's plot the data
fig, ax = plt.subplots()
ax.plot(x, y, 'x', label='Observed data')
ax.set_xlabel('$x$')
ax.set_ylabel('$y$')
plt.legend(loc='best');
../_images/polynomial-regression_2_0.svg

We already saw that the linear model does not work here. We need to try to fit a quadratic model:

\[ y = w_0 + w_1 x + w_2 x^2. \]

How can we do this? Of course, by minimizing the square loss:

\[ L(w_0, w_1, w_2) = \sum_{i=1}^N(y_i - w_0 - w_1 x_i - w_2 x_i^2)^2. \]

Fortunately, we do not have to do things from scratch. The notation we developed previously comes to our resque. Recall, that \(\mathbf{y} = (y_1,\dots,y_N)\) is the vector of observations. Use

\[ \mathbf{w} = (w_0, w_1, w_2), \]

to denote the weight vector. What about the design matrix? Before it was an \(N\times 2\) matrix with the first column being one and the second column being the vector of observed inputs. Well, now it is the \(N\times 3\) matrix. The first two columns are exactly like before, but now the third column is the observed inputs squared. So, it is:

\[\begin{split} \mathbf{X} = \begin{bmatrix} 1 & x_1 & x_1^2\\ 1 & x_2 & x_2^2\\ \vdots & \vdots \\ 1 & x_N & x_N^2 \end{bmatrix}. \end{split}\]

As before, if you multiply the design matrix \(\mathbf{X}\) with the weight vector \(\mathbf{w}\) you get the predictions of our model. So, again, the square loss can be written as:

\[ L(w_0, w_1, w_2) = L(\mathbf{w}) = \parallel \mathbf{y} - \mathbf{X}\mathbf{w}\parallel^2. \]

Well, this is mathematically the same equation as before. The only difference is that we have 3-dimensional weight vector (instead of a 2-dimensional) and that the design matrix is \(N\times 3\) instead of \(N\times 2\). If you take the gradien of this with respect to \(\mathbf{w}\) and set it equal to zero you will get that you need to solve exactly the same linear system of equations as before (but now it is 3 equations for 3 unknowns instead of 2 equations for 2 unknowns).

Let’s solve it numerically. First, the design matrix:

X = np.hstack([np.ones((num_obs, 1)), x.reshape((num_obs, 1)), x.reshape((num_obs, 1)) ** 2])
X
array([[ 1.        ,  0.3008122 ,  0.09048798],
       [ 1.        , -0.50014368,  0.2501437 ],
       [ 1.        ,  0.99029017,  0.98067462],
       [ 1.        , -0.77137143,  0.59501388],
       [ 1.        ,  0.89471904,  0.80052216],
       [ 1.        , -0.69097515,  0.47744666],
       [ 1.        , -0.9697118 ,  0.94034098],
       [ 1.        ,  0.84920533,  0.7211497 ],
       [ 1.        , -0.4348905 ,  0.18912974],
       [ 1.        ,  0.08657948,  0.00749601]])

and then:

w, _, _, _ = np.linalg.lstsq(X, y, rcond=None)
print('w_0 = {0:1.2f}'.format(w[0]))
print('w_1 = {0:1.2f}'.format(w[1]))
print('w_2 = {0:1.2f}'.format(w[2]))
w_0 = -0.56
w_1 = 2.01
w_2 = 2.09

Let’s visualize the model predictions:

fig, ax = plt.subplots()
# Some points on which to evaluate the regression function
xx = np.linspace(-1, 1, 100)
# The true connection between x and y
yy_true = w0_true + w1_true * xx + w2_true * xx ** 2
# The model we just fitted
yy = w[0] + w[1] * xx + w[2] * xx ** 2
# plot the data again
ax.plot(x, y, 'x', label='Observed data')
# overlay the true 
ax.plot(xx, yy_true, label='True response surface')
# overlay our prediction
ax.plot(xx, yy, '--', label='Fitted model')
plt.legend(loc='best');
../_images/polynomial-regression_8_0.svg

Questions

  • Repeat with very small num_obs and very large num_obs and observe the behavior of the fit.

Regression with high-degree polynomials and overfitting

What would have happened if we tried to use a higher degree polynomial. To achieve this, we need to be able to evaluate a design matrix of the form:

\[\begin{split} \mathbf{X} = \begin{bmatrix} 1 & x_1 & x_1^2\dots & x_1^\rho\\ 1 & x_2 & x_2^2\dots & x_2^\rho\\ \vdots & \vdots\dots & \vdots\\ 1 & x_N & x_N^2 \dots & x_N^\rho \end{bmatrix}, \end{split}\]

where \(\rho\) is the degree of the polynomial. The linear system we need to solve is the same as before. Only the weight vector is now \(\rho + 1\) dimensional and the design matrix \(N\times (\rho + 1)\).

Let’s write some code to find the design matrix.

def get_polynomial_design_matrix(x, degree):
    """
    Returns the polynomial design matrix of ``degree`` evaluated at ``x``.
    """
    # Make sure this is a 2D numpy array with only one column
    assert isinstance(x, np.ndarray), 'x is not a numpy array.'
    assert x.ndim == 2, 'You must make x a 2D array.'
    assert x.shape[1] == 1, 'x must be a column.'
    # Start with an empty list where we are going to put the columns of the matrix
    cols = []
    # Loop over columns and add the polynomial
    for i in range(degree+1):
        cols.append(x ** i)
    return np.hstack(cols)

Let’s try fitting a degree 3 polynomial and see what we get:

degree = 3
# The design matrix is:
X = get_polynomial_design_matrix(x[:, None], degree)
# And we fit just like previously:
w, _, _, _ = np.linalg.lstsq(X, y, rcond=None)
print('w = ', w)
w =  [-0.53728843  2.37957674  2.1012689  -0.50997047]

Let’s visualize the fit. Notice, that for making predictions I am evaluating the design matrix on the points I want to make predictions at.

fig, ax = plt.subplots()
# Some points on which to evaluate the regression function
xx = np.linspace(-1, 1, 100)
# The true connection between x and y
yy_true = w0_true + w1_true * xx + w2_true * xx ** 2
# The model we just fitted
XX = get_polynomial_design_matrix(xx[:, None], degree)
yy = np.dot(XX, w)
# plot the data again
ax.plot(x, y, 'x', label='Observed data')
# overlay the true 
ax.plot(xx, yy_true, label='True response surface')
# overlay our prediction
ax.plot(xx, yy, '--', label='Fitted model')
ax.set_title(r'$\rho = {0:d}$'.format(degree))
plt.legend(loc='best');
../_images/polynomial-regression_15_0.svg

Questions

  • Start increasing the polynomial degree from 3, to 4, to a number where things get bad… You will soon start overfitting.