Starting from:

$30

Artificial Intelligence: Homework 1

CS 541 Artificial Intelligence: Homework 1

Instruction: See Lecture 1.
Notation: kxk denotes the `2-norm of the vector x.
Problem 1: Random Projection for Nearest Neighbor Search
Background. The promise of random projection is that for any data points x1 and x2, it is possible
to construct a mapping matrix A, such that
kx1 − x2k ≈ kAx1 − Ax2k . (1)
Therefore, if we think of Ax1 (respectively Ax2) as the new representation of x1 (respectively
x2), their `2 distance is almost preserved. That means we can always perform a pre-processing
step to reduce the dimension of the data, without sacrificing the performance of the algorithm for
some specific applications. Observe that Eq. (1) is equivalent to verifying that for any vector x,
kxk ≈ kAxk.
Experiments. Let d = 1000. Write a Python or Matlab program to randomly generate a ddimensional vector x, where each entry of x follows a uniform distribution in [−100, 100]. Then for
each
k ∈ {10, 30, 50, 80, 100, 150, 200, 300, 400, 500, 600, 800, 1000},
Step 1: randomly generate a matrix A ∈ R
k×d
, where each element in A is an i.i.d. draw from the
normal distribution N(0, 1/k);
Step 2: plot figures to compare kxk and kAxk.
Conclusion. Based on the figures, summarize your findings.
Problem 2: Reliable Data Annotation
Background. Most of the AI applications require high-quality data. While raw data are easy to
obtain, it is expensive to gather correct labels even in the presence of crowdsourcing platforms. In
particular, consider we have a data point x (e.g. you can think of it as an image), and we want to
get the correct label of whether it is a digit or not. Suppose that the groundtruth is +1, i.e. it is a
digit. Furthermore, suppose that a crowd worker i will return +1 with probability 0.6, and return
−1 otherwise. That is,
Pr(yi = 1) = 0.6, Pr(yi = −1) = 0.4. (2)
1
Our goal is to obtain the correct label for this image with a confidence as high as 0.99, by querying
multiple workers and following the majority vote.
1. Give a condition phrased in terms of y1, . . . , yn under which the majority vote returns correct
label;
2. Use Chebyshev’s inequality to estimate n;
3. Use Chernoff bound to estimate n;
4. Compare the estimates given by the above two inequalities, and summarize your findings.
Appendix
Chernoff bound. Let Z1, Z2, . . . , Zn be n independent random variables that take value in {0, 1}.
Let Z =
Pn
i=1 Zi
. For each Zi
, suppose that Pr(Zi = 1) ≤ η. Then for any α ∈ [0, 1]
Pr(Z ≥ (1 + α)ηn) ≤ e

α
2ηn
3 .
When Pr(Zi = 1) ≥ η, for any α ∈ [0, 1]
Pr(Z ≤ (1 − α)ηn) ≤ e

α
2ηn
2 .
2

More products