$35
EECS 484 Project 4
Import and Query MongoDB Database
Overview
In this project, we will be using the same dataset as in Project 2 FakeBook and explore the
capability of MongoDB (NOSQL). You will first export FakeBook Database to JSON format using
JDBC, and then implement nine MongoDB queries in JavaScript. This spec will give you an
introduction to MongoDB syntax.
1. The Environment
You have to work in CAEN environment for Part A: Export Oracle Database to JSON.
Because our Oracle Database server is only accessible from the University of Michigan network,
you need to either be on-campus, or connected to UM VPN whenever you run the program.
For Part B: MongoDB Queries, we have set up MongoDB account in CAEN for every student
registered in this class. To login, refer to section 4 instruction.
If you choose to install MongoDB on your personal computer, refer to the MongoDB installation
document for instructions. You can connect to eecs484.eecs.umich.edu server from your
PC only when you are on-campus or connected to UM VPN. Make sure to test your code on
CAEN environment before submission.
2. Files Provided to You
On Canvas you will find startfile.tar.gz containing files provided to you for project4.
To complete Part A: Export Oracle database to JSON, start with the 2 Java files: Main.java
and GetData.java. We have also provided 3 jar packages: ojdbc6.jar,
json_simple-1.1.jar, json-20151123.jar. Put all the above files in the same folder
with Makefile.
To complete Part B: MongoDB Queries, set up your MongoDB database using Makefile.
Implement MongoDB query in 8 JavaScript files query1.js to query8.js. The file test.js
can be used to check partial correctness of your query results.
1) Main.java
This file provides the main function for running Part A. You can use it to run your program, but
you don’t need to turn it in.
Please only modify the oracleUserName and password static variables, replacing them with
your own Oracle username and password.
2) GetData.java
This file contains the function you need to implement for Part A. Query USERS, FRIENDS,
CITIES tables to export data from Oracle Database to JSON format. An output file named
output.json should be generated in the folder where your Java files are. Your
output.json is expected to contain the same data, but can be in entirely different order from
sample.json. An introduction to JSON format is covered in the spec for Part A.
3) Makefile
To compile, execute make in the terminal. Or you can use the following command:
javac -Xlint:-unchecked -cp
"ojdbc6.jar:json-20151123.jar:json_simple-1.1.jar:" Main.java
GetData.java
To run, execute make run in the terminal.Or you can use the following command:
java -cp "ojdbc6.jar:json-20151123.jar:json_simple-1.1.jar:" Main
To setup or clear MongoDB database for Part B, please substitute <uniquename and
<password with your MongoDB account information (NOT UM account) in Makefile. The
default MongoDB password is your uniquename. Please login and change your password
following Section 4 Part B instructions.
To prepare tar.gz file for submission, execute make submit via the terminal.
4) sample.json
This file contains the JSON data from running our official implementation of GetData. Please DO
NOT diff output.json sample.json because JSON arrays are likely to come in entirely
different order between any two runs.
Part A and Part B in this project do not have dependencies on each other. You may setup your
database for Part B using sample.json to test MongoDB queries. The AutoGrader testing on
Part B does not rely on correct results from Part A.
5) test.js
In Part B, you will implement 8 queries in the given JavaScript files. The file test.js contains
one simple test on each of the query. You may use it to check partial correctness of your
implementation. Notice an output saying “test1 correct!” does not assure your query1 will score
full on the AutoGrader. Use Make mongoquerytest to feed the test file into MongoDB, or put
the following command into a CAEN terminal (use your mongodb account and password):
mongo <uniqname -u <uniqname -p <password --host
eecs484.eecs.umich.edu < test.js
3. Part A: Export Oracle database to JSON
1) Introduction to JSON
JSON (JavaScript Object Notation) is a key-value representation, in which values can also be
JSON objects. Different from std::map in C++, the values does not have to be consistent in
terms of data types. Here is an example of JSON object (initialized in JavaScript):
var student1 = {"Name": "John Doe", "Age": 21, "Major": ["CS", "Math"]}
Name, Age and Major are the keys. Their corresponding values are string, integer and array of
strings. An example of calling certain field of value is shown below:
student1[ "Name"]; // gives John Doe
With multiple JSON objects, we can create a JSON array in JavaScript:
var students = [
{"Name": "John Doe", "Age": 21, "Major": ["CS", "Math"]},
{"Name": "Richard Roe", "Age": 22, "Major": ["CS"]},
{"Name": "Joe Public", "Age": 21, "Major": ["CE"]} ];
students [0] [ "Name"]; // gives John Doe
2) Export to JSON
From Project 2 FakeBook Database, you need to use JDBC to query USERS, FRIENDS,
CITIES and other relevant tables to export comprehensive information on each user. The results
should be a JSONArray users_info, containing 800 JSONObjects for 800 users. It is
suggested that you use multiple queries to achieve all the information. Each JSONObject should
include:
● user_id
● first_name
● last_name
● gender
● YEAR_OF_BIRTH
● MONTH_OF_BIRTH
● DATE_OF_BIRTH
● hometown (JSONObject) that contains:
o city
o state
o Country
● current_city (JSONObject) that contains:
o city
o state
o Country
● friends (JSONArray) that contains: all of the user_ids of users who are friends with the
current user, and has larger user_id than the current user
Note: It is possible that a user might have no information regarding his\her hometown or current
city in our database. In this case put an empty JSONObject({})as value under key “hometown”
or “current_city”
See the file sample.json for an example valid output.
4. Part B: MongoDB Queries
1) Introduction to MongoDB
MongoDB is a document-oriented database program. It’s comparable to SQL Oracle Databases
in many aspects. Each document in MongoDB is one JSON object, with key-value pairs of data,
just like a tuple in SQL has fields of data; each collection in MongoDB is one JSON array of
multiple JSON objects, just like a table/relation in SQL has multiple tuples. Refer to the following
table for concepts of document and collection in MongoDB, as well as queries to select certain
columns and rows.
SQL MongoDB
Tuple Document. JSON object
Relation/Table Collection. Initialized using a JSON array
SELECT * FROM users; db.users.find();
SELECT * FROM users WHERE name
= ‘John’ and age = 50;
db.users.find({name: ‘John’, age: 50});
SELECT user_id, addr FROM users
WHERE name = ‘John’;
db.users.find({name: ‘John’}, {user_id: 1,
addr: 1, _id: 0});
https://docs.mongodb.com/manual/reference/sql-comparison/ is a document comparing
MongoDB with SQL.
2) Log in to MongoDB
To perform the MongoDB queries, you will need to login to Mongo Shell. The following
instructions apply to CEAN environment. The default MongoDB password is your
uniquename.
$ module load mongodb/3.6.8
This command allows you to connect to our mongodb server hosted on
eecs484.eecs.umich.edu via mongo shell in you CAEN computer.
$ mongo <uniquename -u <uniquename -p <password --host
eecs484.eecs.umich.edu
You can update password with the following command:
db.updateUser("<uniquename", {pwd : "<newpassword"})
The new password takes effect when you logout (Ctrl+D).
3) Import JSON to MongoDB
Open the terminal in the folder where you have sample.json and/or output.json and
Makefile. Remember to load module each time you open a new terminal to perform any
MongoDB operations. Modify Makefile with your updated MongoDB account information
and put make setupsampledb in the terminal to load database from sample.json, or make
setupmydb to load database from your output.json.
Alternatively, put one of the following commands in the terminal:
$ mongoimport --host eecs484.eecs.umich.edu --username <uniquename
--password <password --collection users --db <uniquename --file
sample.json --jsonArray
$ mongoimport --host eecs484.eecs.umich.edu --username <uniquename
--password <password --collection users --db <uniquename --file
output.json --jsonArray
Please do not modify the –collection users field.
On success, you should have imported 800 user documents. Notice mongoimport command
is accumulative, meaning you will have another 800 user objects imported next time you
use mongoimport. To clear up data in the database, use make dropdatabase to drop.
There’s no need to repetitively load and drop database for each query.
4) Queries
Query 1: Townsmen
In this query, we want to find all users whose hometown city is the specified ‘city’. The result is
to be returned as a JavaScript array of user_ids, in which the order of user_ids does not matter.
You may find cursor.forEach() helpful:
https://docs.mongodb.com/v3.0/reference/method/cursor.forEach/
Query 2: flat_users
In Part A, we have created a friends array using JDBC. Each user (JSON object) has
friends (JSON array) that contains all user_ids, who are friends to the current user and has a
larger user_id. In this query, we want to restore the friendship information to pairs.
Create a collection called flat_users. Documents in the collection follow the schema:
{"user_id": xxx, "friends": xxx}
For example, if we have the following user in the users collection:
{"user_id": 100, "first_name": "John" , … "friends": [ 120, 200, 300 ]}
The query would produce 3 documents (JSON objects) and keep them in the collection
flat_users:
{"user_id": 100, "friends": 120},
{"user_id": 100, "friends": 200},
{"user_id": 100, "friends": 300},
You do not need to return anything for this query.
Hint: You may find this link on MongoDB $unwind helpful:
https://docs.mongodb.org/manual/reference/operator/aggregation/unwind/
You may use $project and $out to create the collection, or you may insert tuples into
flat_users iteratively.
Query 3: Hometown Cities collection
Similar to query 1, we want to create a collection named cities. Each document in the
collection should contain two fields: _id field holding the city name, and users field holding an
array of user_ids who live in that city.
For example, if users 10, 20 and 30 live in Bucklebury, the following document will be in the
collection cities:
{" _id": "Bucklebury", "users": [ 10, 20, 30]}
You should not return anything for this query.
Query 4: Suggest friends
Find all user_id pairs (A, B) that meet the following requirements:
i. user A is a male and user B is a female
ii. their Year_Of_Birth difference is less than year_diff, an argument passed in to the query
iii. user A and user B are not friends
iv. user A and user B are from the same city
Your query should return a JSON array of pairs; each pair is an array with two user_ids.
Hint: You may user cursor.forEach() useful.
You may use array.indexOf() in JavaScript to check for the non-friend constraints.
Query 5: Find the oldest friend
Find the oldest friend for each user who has friends. For simplicity, user only year of birth to
determine age. In case of a tie, return the friend with smallest user_id.
Notice in the users collection, each user has only information on friends whose user_id is
greater than his/hers. You will need to consider all existing friendships. The idea of Query2 and
3 may be useful.
Your query should return a JSON object: key is the user_id and the value is his/her oldest
friends user_id. The order does not matter. The schema should look like the following:
{ user_id1: user_idx, user_id2: user_idxx, …}
The number of keys should equal the number of users who have friends.
Query 6: Find average friend count
Find the average number of friends a user owns in the users collection and return a decimal
number. The average friend count on users should take into consideration those who have 0
friends. In order to make this easier, we’re treating the number of friends that a user has as
equal to the number of friends in their friend list (we aren’t counting users with lower ids, since
they aren’t in the friend list). DO NOT round the result to an integer.
Query 7: Find count of user born in each months using MapReduce
MapReduce is a powerful yet simple parallel data processing paradigm. We have set up the
MapReduce calling point in the test.js and you need to implement the mapper, reducer and
finalizer.
In this query, we are asking you to use MapReduce to find the number of users born in each
month. Hint: You need to emit a JSON object from your mapper and return a JSON object of
the exact same form from your reducer.
Note that after running test.js, running db.born_each_month.find() in Mongo Shell allows
you to bring up the collection with users born in each month. For example, if there are 200 users
born in September, the document below would be in the collection:
{" _id": 9, "value": 200}
You may find the following document helpful: https://docs.mongodb.com/v3.2/core/map-reduce/
Query 8: Find city-average friend count using MapReduce
In this query, we are asking you to use MapReduce to find average friend count per user where
the users belong to the same city. Instead of getting only one number for all users’ average
friend count, we will get have an average friend count for each hometown city.
Hint: You need to emit a JSON object from your mapper and return a JSON object of the exact
same form from your reducer. The average calculation should be performed in the finalizer.
Note that after running test.js, running db.friend_city_population.find() in Mongo
Shell allows you to bring up the collection with per city average friend count. For example, if
Bucklebury has average friend count 15.23, the document below would be in the collection:
{" _id": "Bucklebury", "value": 15.23}
5. Submission and Grading
To submit on AG, join a team first and then run make submit to generate a project4.tar.gz
Ensure that the tar.gz file contains all of the following files and no others:
1. GetData.java
2. query[1-8].js
Grading policies:
Late day policy:
Project 4 is due on Apr. 19th, 2019 at 11:55 pm EST. If you do not turn in your project by this
date, or you are unhappy with your work, you may re-submit until 4 days after the due date.
Each late day (or part thereof) on which you submit incurs a 1% deduction to your overall
course grade at the end of the semester. These deductions are cumulative for all course
projects, and the first two deductions (overall, not per project) will be reprieved.
- If you fail to join your group before submitting to the grading system, we will
assume that you are submitting on your own and any resemblance to others’ solutions
will be considered violation of the honor code, which will be reported, and you will get
0 points in the project.