$30
ASSIGNMENT 1
Q1: GOAL: SIMPLE PLOTTING IN D3, MARK : 30
Write a program that generates real numbers x and y at random in the range −2 < x < 2 and −2 < y <
2 until you find 1000 points that satisfy the following equation (x
2 + y
2 − 1)3 − x
2
· y
3 < 0. Plot these
points in a scatter plot.
Submission Instruction: You must use D3 version 5+ to create the plot. Submit your code as a
single .html file named ‘Q1_NSID.html’. Clicking on the file must generate the output on the browser.
Bonus Point: Add axis to your plot so that one can clearly see the point (0,0) as the intersection of
X and Y axes.
HINT
If you do it right, you will get the heart shape. You may need to increase the number of points to get a smooth
shape. You can also try it out with (x
2 + y
2 − 1)3 − x
2
· y
3 < −0.07 and get a Mickey mouse.
Q2: GOAL: DATA READ IN D3, MARK : 30
Write a program that reads the data file msl.csv, normalize the values between 0 to 1, and write the
first row and first column using console.log(). For each x, y coordinate, draw a circle with a radius
equal to 5 times the value at that coordinate.
Submission Instruction: You must use D3 version 5+ to create the plot. Submit your code as a
single .html file named ‘Q2_NSID.html’. Clicking on the file must generate the output on the browser.
Do not log any unnecessary values in the console.
Q3: GOAL: COMBINE STATISTICS IN D3, MARK : 30
This is a continuation of Q2. After you are done with the normalization, find the 25, 50 and 75
percentile values and log those into the console. Color the circles of your previous plot with radii
falling within [0-25%], [25%-50%], [50%-75%] [75%-100%] using White, Green, Blue, Red colors.
Submission Instruction: You must use D3 version 5+ to create the plot. Submit your code as a
single .html file named ‘Q3_NSID.html’. Clicking on the file must generate the output on the browser.
Do not log any unnecessary values in the console.
Q4: GOAL: SELF EXPLORATION - DATA CLEANING, MARK : 10
Real life data is often messy, contains missing entries and even unexpected error. We thus need to
clean up the data before we visualize things. You are given a dataset of ‘.csv’ format (extract the given
zip file).
Submission Deadline: See Moodle 1
https://www.dropbox.com/s/0f8btqs4ht44wg5/birds.zip?dl=0
The file contains 4670 columns. The column 20 to 4670 we have scientific names of the birds.
Each row looks like this:
SAMPLING_EVENT_ID LOC_ID LATITUDE ... Zosterops_samoensis
S34654 L645646 40.6514561 ... 1
S78675 L786445 ? ... 0
S39485 ? 20.6514561 ... 0
S04837 ? 20.6514561 ... X
S68372 L786445 X ... 1
Note that the data may contain error. An entry containing 1 below a bird’s column denotes a
sighting of that bird. Here is the lovely bird Zosterops_samoensis. In the table above, we have two
sightings of this bird.
TASKS
Your task in this question is to write a program that creates a new csv file sample.csv.
sample.csv is created by taking, for each i = 0, 1, 2, . . ., exactly 10 consecutive rows starting from the
(10000 ∗ i)th row.
While creating these new csv files, ignore the columns that do not represent birds. You must fill all
the erroneous entries in the bird columns with 0 (Any non-integer value is an error).
INSTRUCTIONS
1. unzip the birds.zip file.
2. You can use any programming language (python, java, c++, or, c) of your choice and commonly
used libraries to complete the assignment.
SUBMISSION
Do not use any Software (e.g., Excel or SPSS) to read and clean the data. Write your own code, and
submit your code and the two sample files in Moodle. If you were not able to complete the task, then
add a text file that describes your efforts the challenges you faced while working with this real-life
data.
Submission Deadline: See Moodle 2