Lab 2: Bayesian Linear Regression

ECE368: Probabilistic Reasoning
In this lab, we use Bayesian regression to fit a linear model. Consider a linear model of the form
z = a1x + a0 + w, (1)
where x is the scale input variable, and a = (a0, a1)
is the vector-valued parameter with unknown entries
a0, a1, and w is the additive Gaussian noise:
w ∼ N (0, σ2
), (2)
where σ
is a known parameter.
Suppose that we have access to a training data set containing N samples {x1, z1}, {x2, z2}, . . . , {xN , zN }.
We aim to estimate the parameter a by finding its posterior distribution. When the training finishes, we
make predictions based on new inputs. We consider a Bayesian approach, which models the parameter a as
a zero mean isotropic Gaussian random vector whose probability distribution is expressed as
p (a) = N


β 0
0 β
 , (3)
where β is a known hyperparameter.
Download under Files/Labs/Lab2 Part1/ on Quercus and unzip the file. File training.txt contains the
training data: the first column is the inputs; the second column is the targets. The training data is generated
from z = −0.5x−0.1+w. Please answer the questions below and complete File contains
a few useful functions.
1. Express the posterior distribution p(a|x1, z1, . . . , xN , zN ) using σ
, β, x1, z1, x2, z2, . . . , xN , zN .
2. Let σ
2 = 0.1 and β = 1. Based on the posterior distribution obtained in the last question, draw four
contour plots corresponding to p(a), p(a|x1, z1), p(a|x1, z1, . . . , x5, z5), and p(a|x1, z1, . . . , x100, z100).
In all contour plots, the x-axis represents a0, and the y-axis represents a1. The range is set as [−1, 1]×
[−1, 1]. In each figure, also draw the true value of a, which corresponds to the point (−0.1, −0.5).
3. Suppose that there is a new input x, for which we want to predict the target value z. Write down the
distribution of the prediction z, i.e., p(z|x, x1, z1, . . . , xN , zN ).
4. Let σ
2 = 0.1 and β = 1. Suppose that the set of the new inputs is {−4, −3.8, −3.6, . . . , 0, . . . , 3.6, 3.8, 4}.
Plot three figures corresponding to the following three cases:
(a) The predictions are based on one training sample, i.e., based on p(z|x, x1, z1).
(b) The predictions are based on 5 training samples, i.e., based on p(z|x, x1, z1, . . . , x5, z5).
(c) The predictions are based on 100 training samples, i.e., based on p(z|x, x1, z1, . . . , x100, z100).
In all figures, the x-axis is the input, the y-axis is the target, and the range is set as [−4, 4] × [−4, 4].
Each figure should contain three components: 1) the new inputs and the predicted targets; 2) a vertical
interval at each predicted target, indicating the range within one standard deviation; 3) the training
sample(s) that are used for the prediction. Use plt.errorbar for 1) and 2); use plt.scatter for 3).
References: C. M. Bishop, Pattern Recognition and Machine Learning, Springer New York, 2006, pp. 152–
159. & K. Murphy, Machine Learning: A Probabilistic Approach, MIT Press, 2012, pp. 231–234.

