Starting from:

$30

Problem Set 0: Math prerequisites

CM146
Problem Set 0: Math prerequisites

Submission instructions
• Submit your solutions electronically on the course Gradescope site as PDF files.
• If you plan to typeset your solutions, please use the LaTeX solution template. If you must
submit scanned handwritten solutions, please use a black pen on blank white paper and a
high-quality scanner app.
Although many students find a machine learning class to be rewarding, we do assume that you
have a basic familiarity with several types of math. Before taking the class, you should evaluate
whether you have the mathematical background the class depends upon.
• Multivariate Calculus (at the level of a first undergraduate course, e.g., Math 32A and 32B
at UCLA). For example, we rely on you being able to take derivatives and integrals. During
the class you might be asked, for example, to derive gradients of multivariate functions.
• Linear Algebra (at the level of a first undergraduate course, e.g., Math 33A). For example,
we assume you know how to multiply vectors and matrices, and that you understand matrix
inversion, eigenvectors and eigenvalues. During the class, you might also be asked to also
learn about methods for matrix factorization.
• Probability and Statistics (at the level of a first undergraduate course, e.g., Statistics
100A). For example, we assume you know how to find the mean and variance of a set of
data, that you are familiar with common probability distributions such as the Gaussian and
Uniform distributions, and that you understand basic notions such as conditional probabilities
and Bayes rule. During the class, you might be asked to calculate the likelihood (probability)
of a data set with respect to some given probability distribution, and to then derive the
parameters of the distribution that maximize this likelihood.
This assignment helps you self-evaluate whether you have the background to succeed in the class.
For each of these mathematical topics, we provide below (1) a minimum background test and (2) a
moderate background test. If you pass the moderate background test, you are in excellent shape to
take the class. If you pass the minimum background but not the moderate background test, then
you can still take the class, but you should expect to devote extra time to fill in necessary math
background. If you cannot pass the minimum background test, we suggest you fill in your math
background before taking the class.
You may find the following resources helpful:
This assignment is adapted from course material by William Cohen, Ziv Bar-Joseph (CMU) and Jessica Wu
(Harvey Mudd).
1
• Andrew Ng’s CS229 Course (Stanford)
– Linear Algebra Review (http://cs229.stanford.edu/section/cs229-linalg.pdf)
– Probability Theory Review (http://cs229.stanford.edu/section/cs229-prob.pdf)
Additional resources are available on the course syllabus.
2
Necessary Minimum Background Test [45 pts]
While you are welcome to use online resources, such as Wolfram-Alpha, you should be able to solve
these problems by hand.
1 Multivariate Calculus [2 pts]
Consider y = z sin(x)e
−x
. What is the partial derivative of y with respect to x?
3
2 Linear Algebra [8 pts]
Consider the matrix X and the vectors y and z below:
X =
?
2 4
1 2?
y =
?
1
3
?
z =
?
2
3
?
(a) What is the inner product y
T z?
(b) What is the product Xy?
(c) Is X invertible? If so, give the inverse; if not, explain why not.
(d) What is the rank of X?
4
3 Probability and Statistics [10 pts]
Consider a sample of data S obtained by flipping a coin five times. Xi
, i ∈ {1, . . . , 5} is a random
variable that takes a value 0 when the outcome of coin flip i turned up heads, and 1 when it turned
up tails. Assume that the outcome of each of the flips does not depend on the outcomes of any of
the other flips. The sample obtained S = (X1, X2, X3, X4, X5) = (1, 1, 0, 1, 0).
(a) What is the sample mean for this data?
(b) What is the unbiased sample variance ?
(c) What is the probability of observing this data assuming that a coin with an equal probability
of heads and tails was used? (i.e., The probability distribution of Xi
is P(Xi = 1) = 0.5,
P(Xi = 0) = 0.5.)
(d) Note the probability of this data sample would be greater if the value of the probability of
heads P(Xi = 1) was not 0.5 but some other value. What is the value that maximizes the
probability of the sample S? [Optional: Can you prove your answer is correct?]
(e) Given the following joint distribution between X and Y , what is P(X = T|Y = b)?
P(X, Y )
Y
a b c
X
T 0.2 0.1 0.2
F 0.05 0.15 0.3
5
4 Probability axioms [5 pts]
Let A and B be two discrete random variables. In general, are the following true or false? (Here
Ac denotes complement of the event A.)
(a) P(A ∪ B) = P(A ∩ (B ∩ Ac
))
(b) P(A ∪ B) = P(A) + P(B)
(c) P(A) = P(A ∩ B) + P(Ac ∩ B)
(d) P(A|B) = P(B|A)
(e) P(A1 ∩ A2 ∩ A3) = P(A3|(A2 ∩ A1))P(A2|A1)P(A1)
6
5 Discrete and Continuous Distributions[5 pts]
Match the distribution name to its formula.
(a) Gaussian (i) p
x
(1 − p)
1−x
, when x ∈ {0, 1}; 0 otherwise
(b) Exponential (ii) 1
b−a when a ≤ x ≤ b; 0 otherwise
(c) Uniform (iii)
n
x
?
p
x
(1 − p)
n−x
(d) Bernoulli (iv) λe−λx when x ≥ 0; 0 otherwise
(e) Binomial (v) √
1
(2π)σ2
exp

1
2σ2 (x − µ)
2
?
6 Mean and Variance[5 pts]
(a) What is the mean and variance of a Bernoulli(p) random variable?
(b) If the variance of a zero-mean random variable X is σ
2
, what is the variance of 2X? What
about the variance of X + 3?
8
7 Algorithms [10 pts]
(a) Big-O notation
For each pair (f, g) of functions below, list which of the following are true: f(n) = O(g(n)),
g(n) = O(f(n)), or both. Briefly justify your answers.
i. f(n) = ln(n), g(n) = lg(n). Note that ln denotes log to the base e and lg denotes log to
the base 2.
ii. f(n) = 3n
, g(n) = n
10
iii. f(n) = 3n
, g(n) = 2n
(b) Divide and Conquer
Assume that you are given an array with n elements all entries equal either to 0 or +1 such
that all 0 entries appear before +1 entries. You need to find the index where the transition
happens, i.e., you need to report the index with the last occurrence of 0. Give an algorithm
that runs in time O(log n). Explain your algorithm in words, describe why the algorithm is
correct, and justify its running time.
9
Moderate Background Test [35 pts]
10
8 Probability and Random Variables [5 pts]
(a) Mutual and Conditional Independence If X and Y are independent random variables,
show that E[XY ] = E[X]E[Y ].
(b) Law of Large Numbers and Central Limit Theorem
Provide one line justifications.
i. If a fair die is rolled 6000 times, the number of times 3 shows up is close to 1000.
ii. If a fair coin is tossed n times and X¯ denotes the average number of heads, then the
distribution of X¯ satisfies

n(X¯ −
1
2
)
n→∞ −−−→ N (0,
1
4
)
11
9 Linear Algebra [20 pts]
(a) Vector Norms [4 pts]
Draw the regions corresponding to vectors x ∈ R
2 with following norms (you can hand draw
or use software for this question):
i. ||x||2 ≤ 1 (Recall ||x||2 =
qP
i
x
2
i
.)
ii. ||x||0 ≤ 1 (Recall ||x||0 =
P
i:xi6=0 1.)
iii. ||x||1 ≤ 1 (Recall ||x||1 =
P
i
|xi
|.)
iv. ||x||∞ ≤ 1 (Recall ||x||∞ = maxi
|xi
|.)
12
(b) Matrix Decompositions [6 pts]
i. Give the definition of the eigenvalues and the eigenvectors of a square matrix.
ii. Find the eigenvalues and eigenvectors of
A =
?
2 1
1 2?
.
13
iii. For any positive integer k, show that the eigenvalues of Ak
are λ
k
1
, λk
2
, . . . , λk
n
, the k
th
powers of the eigenvalues of matrix A, and that each eigenvector of A is still an eigenvector of Ak
.
14
(c) Vector and Matrix Calculus [5 pts]
Consider the vectors x and a and the symmetric matrix A.
i. What is the first derivative of a
Tx with respect to x?
ii. What is the first derivative of x
T Ax with respect to x? What is the second derivative?
15
(d) Geometry [5 pts]
i. Show that the vector w is orthogonal to the line wTx + b = 0. (Hint: Consider two
points x1, x2 that lie on the line. What is the inner product wT
(x1 − x2)?)
ii. Argue that the distance from the origin to the line wTx + b = 0 is b
||w||2
.
16
17
Programming Skills
Start familiarizing yourself with the Python libraries numpy and matplotlib by completing the
following exercises. (You do not have to submit your code.)
You may find the following references helpful:
• http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.multivariate_
normal.html
• http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.eig.html
18
10 Sampling from a Distribution [2.5 pts]
For questions (a-e), only submit your plots. You do not need to submit code.
(a) Draw 1000 samples x =

x1
x2
?
from a 2-dimensional Gaussian distribution with mean
0
0
?
and
identity covariance matrix, i.e. p(x) = √
1
(2π)
2
exp ?

||x||2
2
?
, and make a scatter plot (x1 vs
x2).
(b) How does the scatter plot change if the mean is
1
1
?
?
(c) How does the (original) scatter plot change if you double the variance of each component?
(d) How does the (original) scatter plot change if the covariance matrix is changed to
1 0.5
0.5 1 ?
?
(e) How does the (original) scatter plot change if the covariance matrix is changed to
1 −0.5
−0.5 1 ?

20
11 Eigendecomposition [2.5 pts]
Write a python program to compute the eigenvector corresponding to the largest eigenvalue of the
following matrix and submit the computed eigenvector.
A =
?
1 0
1 3?
.
21
12 Data [5 pts]
There are now lots of really interesting data sets publicly available to play with. They range in
size, quality and the type of features and have resulted in many new machine learning techniques
being developed.
Find a public, free, supervised (i.e. it must have features and labels), machine learning dataset.
You may NOT list a data set from 1) The UCI Machine Learning Repository or 2) from Kaggle.com.
Once you have found the data set, provide the following information:
(a) The name of the data set.
(b) Where the data can be obtained.
(c) A brief (i.e. 1-2 sentences) description of the data set including what the features are and
what is being predicted.
(d) The number of examples in the data set.
(e) The number of features for each example. If this is not concrete (i.e. it is text), then a short
description of the features.
For this question, do not just copy and paste the description from the website or the paper; reference
it, but use your own words. Your goal here is to convince the staff that you have taken the time to
understand the data set, where it came from, and potential issues involved.
22

More products