$30
Illinois Institute of Technology - CS528 Data Privacy and Security
Homework 2
Name:
CWID:
Notice that:
• Penalty will apply for late submissions (per our syllabus).
• If you would like to let us know how to run your program, please feel free to include
it in the report (but this is optional). Thank you.
1. Laplace Mechanism (15 points)
Using the same dataset as Homework 1: https://archive.ics.uci.edu/ml/datasets/Adult.
Query the average age of the records with age greater than 25. Inject Laplacian noise to
the query result (average age) to ensure 0.5-differential privacy and 1-differential privacy.
Tasks:
(a) Derive the global sensitivity for the query (average age). (5 points)
(b) Calculate the variance for the Laplace noise and inject noise into the average age query
result with 0.5-differential privacy and 1-differential privacy, respectively. (10 points)
Submission Part I: (1) a report including the screenshots of the major steps of each task
and your answers (named as hw2-report.pdf ), and (2) source code files – all named with the
prefix “hw2-1-” (e.g., hw2-1-laplace.java).
2. Exponential Mechanism (15 points)
Using the same dataset as Homework 1: https://archive.ics.uci.edu/ml/datasets/Adult.
Query the most frequent “Education” result. Design an exponential mechanism (randomized) to ensure -differential privacy for the query. Repeat all the procedures for Exponential
mechanism ( = 0.5 and = 1):
Tasks:
(a) Derive the global sensitivity for the query (most frequent “Education”). (5 points)
(b) Compute the probabilities and generate the noisy output result with 0.5-differential
privacy and 1-differential privacy, respectively. (10 points)
Submission Part II: (1) include the screenshots of the major steps of each task and your
answers in the same report as above hw2-report.pdf, and (2) source code files – all named
with the prefix “hw2-2-” (e.g., hw2-2-exponential.java).
1
Illinois Institute of Technology - CS528 Data Privacy and Security
3. Differentially Private Classification (35 points)
Considering the application of classifying the “iris plant” using the following dataset:
• The full dataset and description are available at:
https://archive.ics.uci.edu/ml/datasets/Iris.
• Four attributes and three classes (using iris.data).
1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
5. class: Iris Setosa, Iris Versicolour, and Iris Virginica
• It is a small-scale dataset (150 records). Consider records (1-10, 51-60, 101-110) as
the testing data for prediction, and the remaining 120 records as the training data.
Tasks:
(a) Build a Naive Bayes Classifier to predict the classes for records 1-10, 51-60, and 101-
110 (both training and prediction). See more information about Naive Bayes Classifier:
https://en.wikipedia.org/wiki/Naive Bayes classifier. (5 points)
(b) Design and implement a differentially private algorithm (satisfying -differential privacy) to train the Naive Bayes Classifier for prediction. (15 points)
Hints: you can consider the Laplace Mechanism and allocate budgets to ensure -
differential privacy for the algorithm. The algorithm consists of many queries, and you
can think about sequential composition and parallel composition for the queries.
(c) Prove -differential privacy for your designed algorithm. (5 points)
(d) Set = 0.5, 1, 2, 4, 8, and 16. Then, calculate the precision and recall of the prediction
results (of records 1-10, 51-60, and 101-110) generated by the differentially private
classifier. Note that the true results of the 30 records are given in the original dataset
for benchmarking. (10 points)
Submission Part III: (1) include the proof for differential privacy, screenshots of the
procedures and results in the same report as above hw2-report.pdf, and (2) source code files –
all named with the prefix “hw2-3-” (e.g., hw2-3-classifier.java, and hw2-3-dpclassifier.java).
4. Local Differential Privacy (35 points)
Using the same dataset as Homework 1: https://archive.ics.uci.edu/ml/datasets/Adult.
The server tries to learn the distribution of the users’ ages (each record is privately held by
one user). However, each user doesn’t trust the server and locally perturbs its age at the
client side with the privacy bound .
Tasks:
2
Illinois Institute of Technology - CS528 Data Privacy and Security
(a) Implement the following two LDP protocols: Unary Coding, and Generalized Random
Response for LDP. Note that each LDP protocol includes both private data collection
via local randomization and the frequency estimation performed on the noisy data (by
the data aggregator/server). It is unnecessary to implement the communication protocol
for 48k+ users/clients and server (simulating the local randomization algorithm for all
the users and the frequency estimation for the server would be fine). (15 points)
(b) Compare these protocols’ accuracy with different and discuss your findings. The
parameter varies from 1 to 10 with a step of 1. You can measure the relative error
in age frequency estimation by calculating the L1-distance between the actual and
estimated frequencies. You can plot the results with x axis and y axis the L1-distance.
(10 points)
(c) Compare these protocols’ accuracy with different numbers of users n and discuss your
findings with a fixed = 2. The number of users n varies from 10% to 100% of all the
users with a step of 10%. You can measure the relative error in age frequency estimation
by calculating the L1-distance between the actual and estimated frequencies. You can
plot the results with x axis n and y axis the L1-distance. (10 points)
Submission Part IV: (1) include the screenshots of the procedures and results in the
same report as above hw2-report.pdf (you can explain your findings with tables or figures),
and (2) source code files – all named with the prefix “hw2-4-” (e.g., hw2-4-unary.java).
3