$30
685.621 Algorithms for Data Science
Homework 2
Assigned at the start of Module 3
Due at the end of Module 4
Total Points 100/100
Collaboration groups have been set up in Blackboard. Make sure your group starts an individual
thread for each collaborative problem and subproblem. You are required to participate in each of the
collaborative problem and subproblem. Do not directly post a complete solution, the goal is for the
group develop a solution after everyone has participated.
Problems for Grading
1. Problem 1
20 Points Total
CLRS 34.3-2: Show that the ≤P relation is a transitive relation on languages. That is, show
that if L1 ≤P L2 and L2 ≤P L3, then L1 ≤P L3.
2. Problem 2
20 Points Total
Recall the definition of a complete graph Kn is a graph with n vertices such that every vertex
is connected to every other vertex. Recall also that a clique is a complete subset of some graph.
The graph coloring problem consists of assigning a color to each of the vertices of a graph such
that adjacent vertices have different colors and the total number of colors used is minimized. We
define the chromatic number of a graph G to be this minimum number of colors required to color
graph G. Prove that the chromatic number of a graph G is no less than the size of the maximal
clique of G.
3. Problem 3 Note this is a Collaborative Problem
30 Points Total
Suppose you’re helping to organize a summer sports camp, and the following problem comes up.
The camp is supposed to have at least one counselor who’s skilled at each of the n sports covered by the camp (baseball, volleyball, and so on). They have received job applications from m
potential counselors. For each of the n sports, there is some subset of the m applicants qualified
in that sport. The question is “For a given number k < m, is is possible to hire at most k of the
counselors and have at least one counselor qualified in each of the n-sports?” We’ll call this the
Efficient Recruiting Problem. Prove that Efficient Recruiting is NP-complete.
4. Problem 4 Note this is a Collaborative Problem
30 Points Total
We start by defining the Independent Set Problem (IS). Given a graph G = (V, E), we say a set
of nodes S ⊆ V is independent if no two nodes in S are joined by an edge. The Independent
Set Problem, which we denote IS, is the following. Given G, find an independent set that is as
large as possible. Stated as a decision problem, IS answers the question: “Does there exist a set
S ⊆ V such that |S| ≥ k?” Then set k as large as possible. For this problem, you may take as
given that IS is NP-complete.
1
Table 1: Customer Tracking Table
Customer Detergent Beer Diapers Cat Litter
Raj 0 6 0 3
Alanis 2 3 0 0
Chelsea 0 0 0 7
A store trying to analyze the behavior of its customers will often maintain a table A where the
rows of the table correspond to the customers and the columns (or fields) correspond to products
the store sells. The entry A[i, j] specifies the quantity of product j that has been purchased by
customer i. For example, Table 1 shows one such table.
One thing that a store might want to do with this data is the following. Let’s say that a subset
S of the customers is diverse if no two of the customers in S have ever bought the same product
(i.e., for each product, at most one of the customers in S has ever bought it). A diverse set of
customers can be useful, for example, as a target pool for market research.
We can now define the Diverse Subset Problem (DS) as follows: Given an m × n array A as
defined above and a number k ≤ m, is there a subset of at least k customers that is diverse?
Prove that DS is NP-complete.
2