Starting from:

$29

Assignment 3-Tools for clustering


Assignment 3
The main goal of this assignment is that you obtain
hands-on experience in using tools for clustering and apply them to real problems. For this
assignment, use your favorite software package or programming language. Note that you do not
have to implement the clustering algorithms. Implementations of these algorithms are available
in Matlab, Octave, R, Python, C/C++, Java, Weka, Scikit and others – you can use any of these
implementations.
You will work with the 4 datasets given in Resources (clusterincluster, halfkernel, twogaussians
and twosprials).
1. For each of the 4 datasets do the following:
a. Remove the class labels.
b. Run k-means (with Euclidean distance) and EM, where k = 2 for both algorithms.
c. Add the original labels to the existing clusters and plot the points as follows: points
in the new clusters should have the same shape; points in the original classes should
have the same color.
2. Pick an index of clustering validity discussed in class, and find the best value of k for each
dataset, for both k-means and EM.
3. Comment on the results obtained for all datasets and clustering algorithms. How good is a
particular clustering algorithm on a particular dataset? Why?
Submit:
1) A report (in PDF or Word) that includes all the times as required:
a) In one paragraph, explain how you used the tools to perform the clustering.
b) All plots resulting from the clusterings.
c) Comments and comparison for each clustering scheme, and for each dataset.
2) One or two screenshots that show how you ran the algorithms in Weka, Scikit or the tool you
used.
Note: Missing explanations about your implementations will imply marks deducted!
Upload a single Zip file that includes all the files to the Blackboard system no later than the
date/time specified as the deadline. Late submissions will be penalized with 10% per day for up
to 3 days – after the 3rd day, the mark will be zero.

More products