$30
SOEN 363: Data Systems for Software Engineers
Assignment 3
Weight: 5% of the overall grade.
Individual assignment. You must work strictly on your own.
Overview
In this assignment, you create a NoSQL database of movies and their information. The
movies data are directly extracted from assignment 2 and transferred into the NoSQL
database.
Implementation Platform
We use Neo4J [1] in this assignment. While you may nd many tutorials online, attending
the tutorials sessions are strongly recommended. For any help re: programming, or questions
on the platform, please see PODs.
1
Data Transfer
The data transfer is done by converting the data from each relation from the RDBMS into
a csv (or tsv) or json data. You will then use the data and directly import it into Neo4j.
https://neo4j.com/developer/guide-import-csv/
https://neo4j.com/labs/apoc/4.1/import/load-json/
Entities / Nodes
In this assignment you creating the following entities (nodes) and populate the data.
Movies (attributes: title, description / plot (full text), rating, release year, runtime,
genres, and languages)
Actors (rst name, last name)
Countries
Keywords
Data Files and Scripts
[15 pts] Extract the data from your database into the data le (csv, tsv, or json1
).
[40 pts] Write scripts to create the database in Neo4J.
Note that Neo4J supports array attributes, which are normally represented using weak entities in relational model:
https://neo4j.com/docs/cypher-manual/current/functions/list/.
To populate the such data (i.e. genres, languages), you may use a separate csv le.
1Using JSON is not recommended, but is permitted. Naturally, a relational data may be directly represented using a tabular data format such as CSV, TSV, etc.
2
Queries
Provide the answers to the following:
A) [5 pts] Find all movies that are played by a sample actor.
B) [5 pts] Find all movies that are released after the year 2000 and has a rating of at
least 5.
C) [5 pts] Find all movies that share two keywords of your choice. Make sure your query
returns more than one movie.
D) [10 pts] Find top 2 movies with largest number of keywords.
E) [10 pts] Find top 10 movies (ordered by rating) in a language of your choice.
F) [5 pts] Build full text search index to query movie plots.
G) [5 pts] Write a full text search query and search for some sample text of your choice.
Make sure all above queries return data. Modify the data in your database, if necessary.
Submit your assignment electronically on Moodle: https://moodle.concordia.ca
Include your name and student ID in the submission. Make sure that you upload the
assignment to the correct assignment box on Moodle. No email submissions are accepted.
Assignments uploaded to the wrong system, wrong folder, or submitted via email will be
discarded and no resubmission will be allowed. Make sure you can access Moodle prior to
the submission deadline. The deadline will not be extended.
References
1. https://neo4j.com/
3