$30
685.621 Algorithms for Data Science
Homework 3
Assigned at the start of Module 5
Due at the end of Module 6
Total Points 100/100
Collaboration groups have been set up in Blackboard. Make sure your group starts an individual
thread for each collaborative problem and subproblem. You are required to participate in each of the
collaborative problem and subproblem. Do not directly post a complete solution, the goal is for the
group develop a solution after everyone has participated.
Problems for Grading
1. Problem 1
20 Points Total
In this problem, develop code to analyze the Iris data sets using the test statistics listed in Table 1.
Table 1: Data Analysis Statistics
Test Statistics Statistical Function F(·)
Minimum Fmin(x) = min(x) = xmin
Maximum Fmax(x) = max(x) = xmax
Mean Fµ(x) = µ(x) = 1
n
Pn
i=1
xi
Trimmed Mean Fµt
(x) = µt(x) = 1
n−2p
nP−p
i=p+1
xi
Standard Deviation Fσ(x) = σ(x) =
1
n
Pn
i=1
xi − µ(x)
2
1/2
Skewness Fγ(x) = γ(x) =
Pn
i=1
xi − µ(x)
3
σ(x)
3
Kurtosis Fκ(x) = κ(x) =
Pn
i=1
xi − µ(x)
4
σ(x)
4
The analysis should be done by feature followed by class of flower type. This analysis should
provide insight into the Iris data set.
Note: The trimmed mean is a variation of the mean which is calculated by removing values from the beginning and end of a sorted set of data. The average is then taken using
the remaining values. This allows any potential outliers to be removed when calculating the
statistics of the data. Assuming the data in xs = [x1,s, x2,s, · · · , xn,s] is sorted, the resulting
xs,p = [x1+p,s, x2+p,s, · · · , xn−p,s]. the trimmed mean allows the removal of extreme values influencing the mean of the data.