Starting from:

$29.99

School of Computing Theory Assignment 2 of 4


School of Computing Theory Assignment 2 of 4
COMP3670/6670: Introduction to Machine Learning

Maximum credit. 100
Errata: In Exercise 4, the loss function included a regulariser term kck
2
B, which is undefined due to a
dimensionality mismatch. This has been replaced with kck
2
A.
Exercise 1 Inner Products induce Norms 20 credits
Let V be a vector space, and let h·, ·i : V × V → R be an inner product on V . Define ||x|| :=
p
hx, xi.
Prove that || · || is a norm.
(Hint: To prove the triangle inequality holds, you may need the Cauchy-Schwartz inequality, hx, yi ≤
||x||||y||.)
Exercise 2 Vector Calculus Identities 10+10 credits
1. Let x, a, b ∈ R
n. Prove that ∇x(x
T abT x) = a
T xbT + b
T xaT
.
2. Let B ∈ R
n×n, x ∈ R
n. Prove that ∇x(x
T Bx) = x
T
(B + BT
).
Exercise 3 Properties of Symmetric Positive Definiteness 10 credits
Let A, B be symmetric positive definite matrices. 1 Prove that for any p, q > 0 that pA + qB is also
symmetric and positive definite.
Exercise 4 General Linear Regression with Regularisation (10+10+10+10+10 credits)
Let A ∈ R
N×N , B ∈ R
D×D be symmetric, positive definite matrices. From the lectures, we can use
symmetric positive definite matrices to define a corresponding inner product, as shown below. From the
previous question, we can also define a norm using the inner products.
hx, yiA := x
T Ay
kxk
2
A := hx, xiA
hx, yiB := x
T By
kxk
2
B := hx, xiB
Suppose we are performing linear regression, with a training set {(x1, y1), . . . ,(xN , yN )}, where for each
i, xi ∈ R
D and yi ∈ R. We can define the matrix
X = [x1, . . . , xN ]
T
∈ R
N×D
and the vector
y = [y1, . . . , yN ]
T
∈ R
N .
We would like to find θ ∈ R
D, c ∈ R
N such that y ≈ Xθ + c, where the error is measured using k · kA.
We avoid overfitting by adding a weighted regularization term, measured using ||·||B. We define the loss
function with regularizer:
LA,B,y,X(θ, c) = ||y − Xθ − c||2
A + ||θ||2
B + kck
2
A
For the sake of brevity we write L(θ, c) for LA,B,y,X(θ, c).
For this question:
1A matrix is symmetric positive definite if it is both symmetric and positive definite.
• You may use (without proof) the property that a symmetric positive definite matrix is invertible.
• We assume that there are sufficiently many non-redundant data points for X to be full rank. In
particular, you may assume that the null space of X is trivial (that is, the only solution to Xz = 0
is the trivial solution, z = 0.)
1. Find the gradient ∇θL(θ, c).
2. Let ∇θL(θ, c) = 0, and solve for θ. If you need to invert a matrix to solve for θ, you should prove
the inverse exists.
3. Find the gradient ∇cL(θ, c).
We now compute the gradient with respect to c.
4. Let ∇cL(θ) = 0, and solve for c. If you need to invert a matrix to solve for c, you should prove the
inverse exists.
5. Show that if we set A = I, c = 0, B = λI, where λ ∈ R, your answer for 4.2 agrees with the analytic
solution for the standard least squares regression problem with L2 regularization, given by
θ = (XT X + λI)
−1XT y.

More products