Starting from:

$30

COSI 126A: Homework 3

COSI 126A: Homework 3

Section I: Association Problems (50 points)
Problem 1 (10 points)
Transaction ID Items Bought
1 {Milk, Beer, Diapers}
2 {Bread, Butter, Milk}
3 {Milk, Diapers, Cookies}
4 {Bread, Butter, Cookies}
5 {Beer, Cookies, Diapers}
6 {Milk, Diapers, Bread, Butter}
7 {Bread, Butter, Diapers}
8 {Beer, Diapers}
9 {Milk, Diapers, Bread, Butter}
10 {Beer, Cookies}
Consider the market basket transactions shown above.
(a) What is the maximum number of association rules that can be extracted from this
data (including rules that have zero support)?
(b) What is the maximum size of frequent itemsets that can be extracted (assuming
minsup > 0)?
(c) Write an expression for the maximum number of size-3 itemsets that can be derived
from this data set.
(d) Find an itemset (of size 2 or larger) that has the largest support.
1
(e) Find a pair of items, a and b, such that the rules {a} −→ {b} and {b} −→ {a} have
the same confidence.
Problem 2 (10 points)
Consider the following set of frequent 3-itemsets:
{1, 2, 3}, {1, 2, 4}, {1, 2, 5}, {1, 3, 4}, {1, 3, 5}, {2, 3, 4}, {2, 3, 5}, {3, 4, 5}.
Assume that there are only five items in the data set.
(a) List all candidate 4-itemsets obtained by a candidate generation procedure using the
Fk−1 × F1 merging strategy.
(b) List all candidate 4-itemsets obtained by the candidate generation procedure in Apriori.
(c) List all candidate 4-itemsets that survive the candidate pruning step of the Apriori
algorithm.
Problem 3 (10 points)
The original association rule mining formulation uses the support and confidence measures
to prune uninteresting rules.
(a) Draw a contingency table for each of the following rules using the transactions shown
in the table below.
Transaction ID Items Bought
1 {a, b, c, e}
2 {b, c, d}
3 {a, b, d, e}
4 {a, c, d, e}
5 {b, c, d, e}
6 {b, d, e}
7 {d, e}
8 {a, b, c}
9 {a, d, e}
10 {b, d}
Rules: {b} −→ {c}, {a} −→ {d}, {b} −→ {d}, {e} −→ {c}, {c} −→ {a}.
(b) Use the contingency tables in part (a) to compute and rank the rules in decreasing
order according to the following measures.
(a) Support.
(b) Confidence.
2
(c) Interest(X −→ Y ) = P(X,Y )
P(X)
P(Y ).
(d) IS(X −→ Y ) = √
P(X,Y )
P(X)P(Y )
.
(e) Klosgen(X −→ Y ) = p
P(X, Y )×(P(Y | X)−P(Y )), where P(Y | X) = P(X,Y )
P(X)
.
(f) Odds ratio(X −→ Y ) = P(X,Y )P(X,Y )
P(X,Y )P(X,Y )
.
Problem 4 (10 points)
Given the rankings you had obtained in Exercise 12, compute the correlation between the
rankings of confidence and the other five measures. Which measure is most highly correlated
with confidence? Which measure is least correlated with confidence?
Problem 5 (10 points)
Suppose we have market basket data consisting of 100 transactions and 20 items. If the
support for item a is 22%, the support for item b is 91% and the support for itemset {a, b}
is 17%. Let the support and confidence thresholds be 10% and 60%, respectively.
(a) Compute the confidence of the association rule {a} −→ {b}. Is the rule interesting
according to the confidence measure?
(b) Compute the interest measure for the association pattern {a, b}. Describe the nature
of the relationship between item a and item b in terms of the interest measure.
(c) What conclusions can you draw from the results of parts (a) and (b)?
3

More products