$30
DSCI 599 Homework 4
1. [10 points] What is a Bayesian Network (BN)? What are its advantages over a tabular
representation of the joint distribution of the system variables? Explain both the
representational and the computational advantages. What is a Dynamic BN (DBN)?
Describe Markov Models (MMs) and Hidden Markov Models (HMMs). Pick a problem of
interest in Data Science that can be modeled using HMMs. Describe the problem and
the application of HMM techniques on it.
2. [10 points] What is a Weighted Constraint Satisfaction Problem (WCSP)? Pick two
problems of interest in Graph Theory and explain how they can be modeled as WCSPs.
Explain how the MAP query on a BN is equivalent to a WCSP. How can a WCSP on
Boolean variables be translated to the Minimum Weighted Vertex Cover (MWVC)
problem? In what cases can the substrate MWVC problem be solved efficiently?
3. [10 points] What is the Variable Interaction (VI) graph associated with a combinatorial
problem? Explain how the Variable Elimination (VE) algorithm uses the VI graph to solve
the combinatorial problem via Dynamic Programming. In this context, explain the notion
of the treewidth of the VI graph. Is computing the treewidth NP-hard? In what cases is it
easy?
4. [10 points] Describe the architecture of a Feed-Forward Neural Network (NN). Explain
the Backpropagation (BP) algorithm used to train it. What are the key algorithmic
principles used in BP? Pick a problem of interest in Data Science that can be solved
efficiently using a Feed-Forward NN and BP. Describe the problem and the application of
these techniques on it.
5. [10 points] What is Polynomial Identity Checking (PIC)? Describe the randomized
algorithm standardly used for PIC. Pick a problem of interest in Data Science that can be
solved efficiently using PIC. Describe the problem and the application of PIC on it. What
are heavy-tailed distributions and where do they occur? Describe a situation where the
runtime behavior of an algorithm on a combinatorial problem can exhibit a heavy-tailed
distribution. In such a situation, what can be done to boost the performance of the
algorithm?