$30
CS5785 Homework 4
A complete submission should include:
1. A write-up as a single .pdf file
2. Source code and data files for all of your experiments (AND figures) in .py files if you use
Python or .ipynb files if you use the IPython Notebook. If you use some other language, include all build scripts necessary to build and run your project along with instructions on how
to compile and run your code.
The write-up should be in professional lab report format. It should contain a general summary of
what you did, how well your solution works, any insights you found, etc. On the cover page, include
the class name, homework number, and team member names. You are responsible for submitting
clear, organized answers to the questions.
Please include all relevant information for a question, including text response, equations, figures,
graphs, output, etc. If you include graphs, be sure to include the source code that generated them.
Please pay attention to the discussion board for relevant information regarding updates, tips, and
policy changes. You are encouraged (but not required) to work in groups of 2.
1 WRITTEN EXERCISES
1. Association rule learning. As a data scientist employed by the wildly popular online retail web
site Nile.com, your job is to maximize profits by bundling related items together based on your
customers’ spending habits. On your first day of work, you put on your statistician hat and download a sampling of Nile.com’s purchase logs for last month. Of course, Nile.com has an extensive
catalog of N ≈ millions of conveniently packaged everyday items, all delivered to your door within
two or three business days, shipping price not included. Since any combination of these items
could be interesting to the marketing team, you decide to use association rule mining to decide
which item correlations could be the most profitable. The table below lists sets of items that appear in customers’ shopping baskets along with the count of purchases that contain those items.
For example, the fifth row says that 60,159 transactions contained both dog food and cat food in
addition to whatever else those customers purchased.
The entire purchase log is too big to download, but you can probably already make some interesting inferences. For some of the questions below, the table does not contain enough information
to give an exact answer. In these cases, give the tightest upper and lower bounds for the possible
value.
a. Sanity check: How many purchases did Nile.com fulfill last month?
CS5785 Fall 2016: Homework 4 Page 2
Item set Number of purchases containing at least these items
{} 927,125
{Dog Food} 80,915
{C atFood} 185,279
{C atLi t ter } 130,122
{C atFood,Dog Food} 60,159
{C atFood,C atLi t ter } 120,091
{Bur g er s} 29,751
{Bur g er s,V i t aminCTable t s} 231
{V i t aminCTable t s, Ar t i si anTapW ater } 25
{Bur g er s,Buns,K e tchup} 15,293
.
.
.
b. Sanity check: How many customers purchased Vitamin C tablets in addition to their other
items?
c. Sanity check: How many customers purchased only cat food and nothing else?
d. What is the (possible) value of SUPP({Bur g er s,V i t aminCTable t s,K e tchup})?
e. What is the (possible) value of SUPP({Bur g er s,Buns})? How do you know?
f. What is the (possible) value of CONF({Bur g er s} =⇒ {V i t aminCTable t s})? Does this seem
like an interesting promotion opportunity?
g. What is the (possible) value of CONF({Dog Food,C atFood} =⇒ {C atLi t ter })?
h. What is the (possible) value of LIFT({Dog Food} =⇒ {C atFood})?
i. Suppose you wish to run the APriori algorithm with a minimum support of 0.1 (10% of purchases). You begin by gathering the support for all item sets of length 1. Which pairs of items
could APriori definitely eliminate using the downward closure property?
j. Which pairs of items could APriori probably eliminate if you knew more of the table?
k. Which pairs of items could APriori definitely not eliminate?
2. Neural networks as function approximators. Design a feed-forward neural network to approximate the 1-dimensional function given in Fig. ??. The output should match exactly. How many
hidden layers do you need? How many units are there within each layer? Show the hidden layers,
units, connections, weights, and biases.
Use the ReLU nonlinearity for every unit. Every possible path from input to output must pass
through the same number of layers. This means each layer should have the form
Yi = σ(WiY
T
i−1 +βi), (1)
where Yi ∈ R
di ×1
is the output of the ith layer, Wi ∈ R
di ×di−1
is the weight matrix for that layer,
Y0 = x ∈ R
1×1
, and the ReLU nonlinearity is defined as
σ(x)
∆
=
(
x x ≥ 0,
0 otherwise
. (2)
2
CS5785 Fall 2016: Homework 4 Page 3
0 1 2 3 4 5 6 7 8 9 10
Input
1
0
1
2
3
4
5
6
Output
"Pride Rock"
Figure 1: Example function to approximate using a neural network.
Hint 1: Try to express the input function as a sum of weighted and scaled ReLU units. Once you
can do this, it should be straightforward to turn your final equation into a series of neural network
layers with individual weights and biases.
Hint 2: If you’re not convinced neural networks can approximate this function,
see http://neuralnetworksanddeeplearning.com/chap4.html for an example of the flavor of
solution we’re looking for. (If you rely on this source, cite it!) However, keep in mind that your
output must match the given input exactly.
3. Approximating images with neural networks. In this question, you will implement your own
neural network toolkit. You will be writing your own implementation from scratch, using C++ and
CUDA. You should calculate the derivatives of each layer by hand using pencil and paper. Please
attach a scan of your paper notes to the homework.
Just kidding. We’re not that mean. There are several good convolutional neural network packages
that have done the heavy lifting for us. One of the more interesting (and well-written!) demos is
called CONVNETJS. It is implemented in Javascript and runs in a modern web browser without any
dependencies.
Take a look at convnet.js’s “Image Painting” demo at: http://cs.stanford.edu/people/karpathy/
convnetjs/demo/image_regression.html
a. Describe the structure of the network. How many layers does this network have? What is
the purpose of each layer?
b. What does “Loss” mean here? What is the actual loss function? You may need to consult the
source code, which is available on Github.
c. Plot the loss over time, after letting it run for 5,000 iterations. How good does the network
eventually get?
3
CS5785 Fall 2016: Homework 4 Page 4
d. Can you make the network converge to a lower loss function by lowering the learning rate
every 1,000 iterations? (Some learning rate schedules, for example, halve the learning rate
every n iterations. Does this technique let the network converge to a lower training loss?)
e. Lesion study. The text box contains a small snippet of Javascript code that initializes the network. You can change the network structure by clicking the “Reload network” button, which
simply evaluates the code. Let’s perform some brain surgery: Try commenting out each layer,
one by one. Report some results: How many layers can you drop before the accuracy drops
below a useful value? How few hidden units can you get away with before quality drops noticeably?
f. Try adding a few layers by copy+pasting lines in the network definition. Can you noticeably
increase the accuracy of the network?
2 RANDOM FORESTS FOR IMAGE APPROXIMATION
In this question, you will use random forest regression to approximate an image by learning a function,
f : R
2 → R, (3)
that takes image (x, y) coordinates as input and outputs pixel brightness. This way, the function learns to
approximate areas of the image that it has not seen before.
2a. Start with an image of the Mona Lisa. If you don’t like the Mona Lisa, pick another interesting
image of your choice.
Figure 2: Left: http://tinyurl.com/mona-lisa-small Mona Lisa, Leonardo da Vinci, via Wikipedia.
Licensed under Public Domain. Middle: Example output of a decision tree regressor. The input is a
“feature vector” containing the (x, y) coordinates of the pixel. The output at each point is an (r, g ,b)
tuple. This tree has a depth of 7. Right: Example output of a k-NN regressor, where k = 1. The output at
each pixel is equal to its closest sample from the training set.
2b. Preprocessing the input. To build your “training set,” uniformly sample 5,000 random (x, y) coordinate locations.
4
CS5785 Fall 2016: Homework 4 Page 5
• What other preprocessing steps are necessary for random forests inputs? Describe them,
implement them, and justify your decisions. In particular, do you need to perform mean
subtraction, standardization, or unit-normalization?
2c. Preprocessing the output. Sample pixel values at each of the given coordinate locations. Each
pixel contains red, green, and blue intensity values, so decide how you want to handle this. There
are several options available to you:
• Convert the image to grayscale
• Regress all three values at once, so your function maps (x, y) coordinates to (r, g ,b) values:
f : R
2 → R
3
• Learn a different function for each channel, fRed : R
2 → R, and likewise for fGr een, fB lue .
2d. Rescale the pixel intensities to lie between 0.0 and 1.0. (The default for pixel values may be between 0 and 255, but your image library may have different defaults.)
2e. What other preprocessing steps are necessary for random regression forest outputs? Describe
them, implement them, and justify your decisions.
2f. To build the final image, for each pixel of the output, feed the pixel coordinate through the random
forest and color the resulting pixel with the output prediction. You can then use imshow to view
the result. (If you are using grayscale, try imshow(Y, cmap=’gray’) to avoid fake-coloring). You
may use any implementation of random forests, but you should understand the implementation
and you must cite your sources.
2g. Experimentation.
(a) Repeat the experiment for a random forest containing a single decision tree, but with depths
1, 2, 3, 5, 10, and 15. How does depth impact the result? Describe in detail why.
(b) Repeat the experiment for a random forest of depth 7, but with number of trees equal to 1, 3,
5, 10, and 100. How does the number of trees impact the result? Describe in detail why.
(c) As a simple baseline, repeat the experiment using a k-NN regressor, for k = 1. This means
that every pixel in the output will equal the nearest pixel from the “training set.” Compare
and contrast the outlook: why does this look the way it does?
(d) Experiment with different pruning strategies of your choice.
2h. Analysis.
(a) What is the decision rule at each split point? Write down the 1-line formula for the split point
at the root node for one the trained decision trees inside the forest. Feel free to define any
variables you need.
(b) Why does the resulting image look like the way it does? What shape are the patches of color,
and how are they arranged?
(c) Easy: How many patches of color may be in the resulting image if the forest contains a single
decision tree? Define any variables you need.
(d) Tricky: How many patches of color might be in the resulting image if the forest contains n
decision trees? Define any variables you need.
5