$30
Project #2: Fakebook Oracle JDBC
In Project #2, you will be building a Java application that executes SQL queries against a relational
database and places the results in special data structures. We provide you the majority of the
structure for the Java application and your job is to fill it with the query text and to process the
results of the queries appropriately. This project will give you additional practice with standard SQL
query practices in addition to hands-on experience with real-world database application
programming.
This project is to be done in teams of 2 students (recommended) or individually. Students may
participate in the same teams as in Project #1, or they may work with different partners. Both
members of each team will receive the same score - the highest score among all time submissions;
as such, it is not necessary for each team member to submit the assignment. To create a team on the
autograder, follow these steps before making your first submission:
1. One team member clicks the “Send group invitation” button on the Project 2 page.
2. The other team member confirms on their autograder assignment page.
Do not make any submissions before joining your team! Once you click on “I’m working
alone'', the autograder will not let you change team members. If you do need to make a
correction, the teaching staff has the ability to modify teams.
Project #2 is due on Thursday, February 27
th at 11:55pm EST. If you do not turn in your project
by that deadline, or if you are unhappy with your work, you may continue to submit up until
Tuesday, March 3
rd at 11:55m EST . Please refer to the course syllabus for more information on late
days.
1
1. The Public Data Set
The Fakebook data on which Project #2 relies is structured in the same way as the data set you
created for Project #1, except that there is neither a Messages table nor a Participants table. Use the
DESC tablename command to view the full schema of any of the public data tables.
We’d like to clarify that every row of two users (user1_id, user2_id) in the Friends table will meet the
invariant user1_id < user2_id. This enforces the constraint that users cannot be friends with
themselves, and the structure of the table prevents friendships being listed more than once.
The tables are stored under the domain of project2. The prefix for the public data set is PUBLIC (not
case-sensitive). To access the tables, you should use project2.PUBLIC_[tableName]. You
should use this access approach only when running your queries through sqlplus interactive
mode; there is a separate access mechanism when implementing your queries in Java (see section
2.5).
2. Starter Code
There is a zip file named starter_code.zip downloadable from Canvas. The compression contains a
directory named project2 and you should not need to change its organization. Read sections 2.5 and
2.6 about which files you need to modify; sections 2.8 through 2.10 explain how to compile your
code and reference the public output.
2.1 PublicFakebookOracleConstants.java
Do not modify this file. It defines Oracle domain, prefix and a series of non-modifiable table name
variables that you will use to implement your queries.
2.2 FakebookOracleUtilities.java
Do not modify this file. It defines a single utility class, Facebook ArrayList, that will be used for
storing lists of data structures built up from your query results. This utility class exists for the
purpose of customizing printing output.
2.3 FakebookOracleDataStructures.java
Do not modify this file. It defines a series of custom data structures that allow you to report your
query results. Please familiarize yourself with these various data structures so that you are
comfortable creating new instances and invoking the structures’ various augmentation functions.
Example usages of these data structures are shown in comments in StudentFakebookOracle.java.
2
2.4 FakebookOracle.java
Do not modify this file. It defines the abstract parent class from which your Java application will
derive. This base class defines the nine abstract functions that you will have to implement; these
function declarations have already been repeated for you in StudentFakebookOracle.java (section
2.5). In addition, this base class defines a series of printing functions that are used to output the
results of your queries.
2.5 StudentFakebookOracle.java
This is the file in which you should implement your SQL queries. It defines the derived query
class that implements the abstract functions defined by the parent FakebookOracle class. Each of
the nine required queries has its own function, which is commented to briefly describe the goal of
the query (a full description of the queries can be found later in the specification); additionally, each
function skeleton contains a comment showing how to use the necessary data structures for the
query. You are encouraged to follow the given structure and create additional oracle
statements/try-catch when necessary.
The bottom of the StudentFakebookOracle class defines 11 constant variables that you should use
to reference the public data set tables; these variables are consistent with definition in
PublicFakebookOracleConstants.java and have been placed in this class for ease of reference. Please
familiarize yourself with these variables, but do not modify them. Any time you wish to use the
name of a table in your query, select the appropriate variable and insert the variable into your
query string. DO NOT hard-code the table names into your queries under any circumstances:
you will fail Autograder private tests if you do so.
Do not modify the class constructor, which appears at the top of the class definition. Neither should
you remove any of the @Override directives.
2.6 FakebookOracleMain.java
You need to fill in your uniqname and Oracle password in this file to enable Oracle connection.
This is the application driver. It can be invoked from the command line (preferably by the Makefile)
and takes command line arguments that define which query/queries to execute and whether to
print the output or measure the runtime.
You will NOT submit this file, so you do not need to worry about staff members obtaining your
password. However, if you wish to change your SQL*PLUS password, use the following command:
ALTER USER uniqname IDENTIFIED BY newpassword;
2.7 ojdbc6.jar
This is a JAR file needed to compile your application. This driver has been tested with JDKs 1.7 and
1.8; we cannot guarantee its compatibility with other JDK versions.
3
2.8 Makefile
Makefile allows you to easily compile, run, and clean your code. You are responsible for ensuring
that your application compiles and runs using the unmodified Makefile, which will be utilized on the
Autograder. There is no guarantee that staff members will be able to assist you in customizing or
troubleshooting the Makefile if you choose add or modify make targets.
- To compile your Java application, navigate to your project root directory (where the
Makefile is located) and run make or make compile; this will compile silently if there are
no errors and will print any compilation problems to the command line. Fix any errors that
you have so that your code perfectly compiles; we will not be able to grade your submission
unless it compiles correctly.
- To run your queries and view the output, you have two options. If you want to run all your
queries to compare your output to the provided solution output file, run make
query-all. To run a single query to view the output, run make queryN where N is the
query number. You may redirect output for output diff.
- To run your queries and measure their runtime, you again have two options. If you want to
time all your queries, run make time-all. To measure the runtime of a single query, run
make timeN where N is the query number. Make sure that your code runs without any
error before attempting to measure its runtime.
- To remove the files generated by compilation, run make clean.
2.9 PublicSolution.txt
This file contains the expected output of each of the ten queries (nine you will implement, one
implemented for you; see below) when your Java application is executed against the public data set.
The output of running make query-all on your application should match this file exactly; any
deviation indicates an error with your code. You can separate the individual query results into their
own files if you wish so that you can test the outputs of single queries; if you choose to do this, do
not omit the trailing blank lines after the output of a query or the query result header.
2.10 PublicTime.txt
This file contains the best runtime of instructor implementation when the ten queries are executed
against the public data set on the CAEN system. To collect your runtime, use command make
time-all. Be aware that runtime for this project is extremely unstable due to JDBC Oracle
connection mechanism, and may change drastically when you switch between platforms. You do
not need to perform strictly better than our runtime to pass all tests on the autograder. Reasonable
buffer time is allowed on the autograder. Again, your best score among all time submissions will be
recorded as your score for this project.
4
3. Queries
3.1 Overview
You will implement 9 SQL queries, although some of the queries may actually consist of multiple
individual queries. They are listed below with detailed specifications as to what fields to return and
in what order. You should put your queries into the appropriate Java function in
StudentFakebookOracle.java. The results of the queries should be placed in the appropriate data
structures as demonstrated by the skeleton code comments.
It is your responsibility to ensure that your queries are correct irrespective of the data set upon
which the queries are executed. We have provided you with sample correct output based on the
public data set, but we will also be testing your implementations against a private data set to which
you are not given access. Points for each query will be split between performance on the public and
private data sets.
3.2 Implementation Approach and Rules
1. We highly recommend writing your SQL queries in plaintext and executing them
interactively through SQL*PLUS before transplanting them into the Java application
function. The SQL*PLUS CLI will provide more helpful error messages than the JRE (Java
Runtime Environment), making it easier to debug your solutions.
2. In order to meet the runtime requirements (see below), you need to understand the
tradeoff between data access and runtime. For example, you should not be attempting to
sort data in Java; rather, use an ORDER BY clause to offload that work to the DBMS.
3. When you submit to the Autograder, don’t submit partially-completed queries; this will drag
down the time it takes to grade your solution. Make sure that your file compiles with just
the necessary return statement in the incomplete query functions.
4. If you create any views as part of your implementation, you should use
stmt.executeUpdate() rather than use stmt.executeQuery().Be sure to drop
views before closing the statement at the end of the query. When testing your
application, the creation and dropping of views can get complicated if you have syntax
errors in queries in the same function. For example, a CREATE VIEW is executed
successfully, but a statement execute line causes the query to halt, skipping the
corresponding DROP VIEW statement, and the view will persist to the next time you test
your code. If you encounter a syntax error in a query where you have created a view, you
should manually drop the view by executing the appropriate SQL from SQL*PLUS CLI. You may
use SELECT view_name from user_views; to list all the views in your domain. You
are not permitted to create additional tables in your implementations.
5. If you wish to add print statements to your functions for debugging purposes, you should
use the Java equivalent of C++’s cout, which is System.out.println. This function takes a string
parameter and prints it to the standard output stream. Similarly, you can print to the
standard error stream with System.err.println. Be sure to remove any such print statements
from your file before submitting, as they will cause you to fail Autograder tests.
5
3.3 Runtime Efficiency
All of the make targets have a built-in 120-second timeout; any queries that take longer than this to
execute will automatically terminate. Such queries will receive a 0 on the Autograder. Generally,
queries that take too long will produce no output.
The efficiency tests are worth 2 points per query, public and private. You need to score correctness
in order to be evaluated for efficiency.
You should not need to worry about time performance for a majority of the queries. For certain
queries, using a different logic (processing data in Java data structures) could drastically change
time performance. Check Section 2.10 for instructors runtime reference.
Query 0: Birth Months (Provided; 0 points)
This query has been implemented for you as an example
—————————————————————————
Query 0 asks you to identify information about Fakebook users’ birth months. You should
determine in which month the most Fakebook users were born and in which month the fewest (but
at least 1) Fakebook users were born; if there are ties, pick the month that occurs earliest in the
calendar year. For each of those months, report the IDs, first names, and last names of the Fakebook
users born in that month; sort the users in ascending order by ID. You should also report the total
number of Fakebook users that have a birth month listed. You can safely assume that at least one
Fakebook user has listed a birth month.
Query 1: First Names (10 points)
Public: 5 points • Private: 5 points
——————————————————————————————————
Query 1 asks you to identify information about Fakebook users’ first names.
● We’d like to know the longest and shortest first names by length; if there are ties between
multiple names, report all tied names in alphabetical order.
● We’d also like to know what first name(s) are the most common and how many users have
that first name; again, if there are ties, report all tied names in alphabetical order.
Hint: You may consider using LENGTH( ) operation in SQL. Remember that you are allowed to
execute multiple SQL statements in one query.
Query 2: Lonely Users (10 points)
Public: 5 points • Private: 5 points
——————————————————————————————————
Query 2 asks you to identify all of the Fakebook users with no Fakebook friends. For each user
without any friends, report their ID, first name, and last name. The users should be reported in
6
ascending order by ID. If every Fakebook user has at least one Fakebook friend, you should return
an empty FakebookArrayList.
Query 3: Users who Live Away from Home (10 points)
Public: 5 points • Private: 5 points
——————————————————————————————————
Query 3 asks you to identity all of the Fakebook users that no longer live in their hometown. For
each such user, report their ID, first name, and last name; results should be sorted in ascending
order by the users’ ID. If a user does not have a current city or a hometown listed, they should not
be included in the results. If every Fakebook user still lives in his/her hometown, you should return
an empty FakebookArrayList.
Query 4: Highly-Tagged Photos (12 points)
Public: 6 points • Private: 6 points
——————————————————————————————————
Query 4 asks you to identify the most highly-tagged photos. We will pass an integer argument num
to the query function; you should return the top num photos with the most tagged users sorted in
descending order by the number of tagged users (most tagged users first). If there are fewer than
num photos with at least 1 tag, then you should return only those available photos. If more than one
photo has the same number of tagged users, list the photo with the smaller ID first. For each photo,
you should report the photo’s ID, the ID of the album containing the photo, the photo’s Fakebook
link, and the name of the album containing the photo.
For each reported photo, you should list the ID, first name, and last name of the users tagged in that
photo. Tagged users should be listed in ascending order by ID.
Query 5: Match Maker (14 points)
Public: 7 points • Private: 7 points
——————————————————————————————————
Query 5 asks you suggest possible unrealized Fakebook friendships. We will pass two integer
arguments, num and yearDiff to the query function; you should return the top num pairs of two
Fakebook users who meet each of the following conditions:
• The two users are the same gender
• The two users are tagged in at least one common photo
• The two users are not friends
• The difference in the two users’ birth years is less than or equal to yearDiff
The pairs of users should be reported in (and cut-off based on) descending order by the number of
photos in which the two users were tagged together. For each pair, report the IDs, first names, and
7
last names of the two users; list the user with the smaller ID first. If multiple pairs of users that
meet the criteria are tagged in the same number of photos, order the results in ascending order by
the smaller user ID and then in ascending order by the larger user ID. If there are fewer than num
pairs of users that meet the criteria, you should return only those pairs that are viable.
For each pair of users, you should also report the photos in which they were tagged together. The
information you should report is the photo’s ID, the photo’s Fakebook link, the ID of the album
containing the photo, and the name of the album containing the photo. List the photos in ascending
order by photo ID.
Query 6: Suggest Friends (14 points)
Public: 7 points • Private: 7 points
———————————————————————————————————
Query 6 asks you to suggest possible unrealized Fakebook friendships in a different way. We will
pass a single integer argument, num, to the query function; you should return the top num pairs of
Fakebook users with the most mutual friends who are not friends themselves. A mutual friend is
one such that A is friends with B and B is friends with C, in which case B is a mutual friend of A and
C. The IDs, first names, and last names of the two users should be returned; list the user with the
smaller ID first within the pair and rank the pairs in descending order by the number of mutual
friends. In the event of a tie between pairs, list the pair with the smaller smaller ID first; if pairs are
still tied, list the pair with the smaller larger ID first.
For each pair of users you report, you should also list the IDs, first names, and last names of all their
mutual friends. List the mutual friends in ascending order by ID.
Query 7: Event-Heavy States (10 points)
Public: 5 points • Private: 5 points
——————————————————————————————————
Query 7 asks you to identify the states in which the most Fakebook events are held. If more than
one state is tied for hosting the most Fakebook events, all states involved in the tie should be
returned, listed in ascending order by state name. You also need to report how many events are
held in that/those states. You can assume that there is at least 1 Fakebook event.
Query 8: Oldest and Youngest Friends (10 points)
Public: 5 points • Private: 5 points
——————————————————————————————————
Query 8 asks you to identify the oldest and youngest friend of a particular Fakebook user. We will
pass a single integer argument, userID, to the query function; you should return the ID, first name,
and last name of the oldest and youngest friend of the Fakebook user with that ID. Notice that you
should not type convert the date, month and year fields using TO_DATE; instead, order them just as
8
they are (numbers). If two friends of that user are born on the exact same date, report the one with
the larger user ID. You can assume that the user with the specified ID has at least 1 Fakebook friend.
Query 9: Potential Siblings (10 points)
Public: 5 points • Private: 5 points
——————————————————————————————————
Query 9 asks you to identify pairs of Fakebook users that might be siblings. Two users might be
siblings if they meet each of the following criteria:
• The two users have the same last name
• The two users have the same hometown
• The two users are friends
• The difference in the two users’ birth years is strictly less than 10 years
Each pair should be reported with the smaller user ID first and the larger user ID second. The
smaller ID should bee used to order pairs relative to one another (smaller smaller ID first); the
larger ID should be used to break ties (smaller larger ID first).
Submitting
The only deliverable for Project #2 is StudentFakebookOracle.java. After creating a team, you
may submit this file to the Autograder without compressing or modifying file name. Each team
will be allowed 3 submissions per day with feedback; any submission made in excess of those 3 will
be graded, but the results of those submissions will be hidden from the group.
This project in particular takes a lengthy amount of time to grade (up to 40 minutes). Please do not
make submissions to the AutoGrader right after your last submission, before the
feedbacks/results are provided to you. Please be patient and only contact the staff with concerns
if you have waiting more than 40 minutes without seeing the results.
Your highest-scoring overall submission will be accepted, but you will be penalized according to
the course late policy if you submit at all during the late period, even if we accept an earlier, on-time
submission as your final score. Due to Oracle JDBC connection stability issue, you may observe
connection reset exceptions for one or multiple test cases of your submissions. In compensation for
that, you may submit a regrade request after the project is due and the private performance is
released, so that the staff will rerun your latest submissions and make sure your project score is not
affected. We will post on Piazza a Google form link and instructions when the project is due.
9
Appendix
Java Syntax You Should Know
This project has been designed in such a way that you do not need to know or understand
significant aspects of the Java programming language to successfully complete it. Future sections of
the appendix contain in-depth treatments of some Java-specific tools that will be of paramount
importance in implementing your application. However, there are a small handful of major
syntactical differences between Java and C++ that you should be familiar with, as they may impact
your programming.
Java has two types of objects: primitives (which are analogous to C++ built-in data types) and
references (which are analogous to C++ pointers). There are no pointers in Java, but all
references are dynamically-allocated with the new keyword. For example, to create an
instance of a variable of type Foo, you would write Foo foo = new Foo(); You do not need to use
new to create primitive variables.
Java has garbage collection, so you do not need to manually deallocate memory even if you allocate
references with the new keyword
Java has primitive and reference versions of all simple data types. The primitive version is
firstletter lower-case (i.e. long) and the reference version is first-letter upper-case (i.e. Long).
This has important bearing on equality comparisons. We highly recommend you use the
primitive version wherever necessary, as it is more idiomatic and more akin to C++.
Java has two ways to compare equality. To compare equality of primitives or to determine if two
references are the exact same object (analogous to the same pointer in C++), use the standard
== comparator. To test if two references are logically equivalent (as defined by the class’s
particular implementation), you must use the .equals() member function. You can also use
.compareTo() if the class implements the Comparable interface, but you should not need this
aspect of the language for Project #2.
Java strings are references and are declared with a capital letter String. To create a new string, you
can type String str = ‘‘ABC’’ without the new keyword.
Java has automatic string conversion built in to every primitive and every object, so the standard
string concatenation operator + can be used to combine strings, numeric types, and objects;
an example of this is String str = ‘‘abc’’ + 2 + ‘‘def’’; in which the numeric value 2 is converted to
a string an properly concatenated.
Statements and Result Sets
The primary JDBC tools you will be using to execute your queries are Statements and ResultSets.
The appropriate Java libraries have already been imported for you, so you can simply use these
tools to perform your queries.
10
Statements are JDBC objects against which you can execute queries and updates. Each of the query
function skeletons has already created a Statement object named stmt that you can use without any
additional hassle. However, if you ever want to create a new Statement object, you can copy the
body of the try-with-resource statement, changing the name of the variable as necessary. To execute
a query against a Statement, you should use the Statement::executeQuery(String) member function,
which returns a ResultSet. To create or drop a view, you should use the
Statement::executeUpdate(String) function, which has no return type. See the implementation of
Query 0 for examples of how to execute queries.
ResultSets are essentially lists of rows that are returned by queries executed against a Statement.
To loop over the list of results, you can use the ResultSet::next() function, which returns FALSE when
you have advanced past the last result. You can also use the ResultSet::isFirst() and ResultSet::isLast()
functions to determine if a particular row is the first/last row in the query result, respectively. To
extract a value of a column from the current row of a ResultSet, use either
ResultSet::getLong(arg) or ResultSet::getString(arg); the argument to these functions can either be the
case-sensitive name of the column whose data you wish to extract or the 1-based column index of
the column whose data you wish to extract. See the implementation of Query 0 for examples of how
to navigate ResultSets and extract data therefrom.
You will be responsible for closing all of the resources you utilize in this project; specifically, you
must close all your Statements and ResultSets using the close() member function. You should always
do this last, when you no longer need the object, as doing so otherwise will make it impossible to
complete your implementation. An important thing to note is that when you reuse a Statement to
execute another query, any ResultSets generated previously from that Statement will get
automatically closed. As such, the following Java snippet (with actual query strings omitted for
brevity) will induce a runtime error:
Statement stmt = new Statement ( . . . ) ; ResultSet rst
= stmt . executeQuery ( . . . ) ; while ( rst . next ()) {
ResultSet rst2 = stmt . executeQuery ( . . . ) ; long val =
rst . getLong (1);
}
The reason is that the reuse of stmt to generate the results stored in rst2 causes rst to close, and
then the attempt to access its data will throw an exception. If you find that you want to do
something akin to this, you must create a second Statement to use for the inner query. Make sure to
create this statement outside of the loop, however, so that it doesn’t get garbage collected and
reinitialized every time through.
Additionally, closing a Statement will close any ResultSets generated by that Statement; however,
we explicitly suggest that you separately close your ResultSets before you close your Statements for
the surest resource management. See the implementation of Query 0 for an example of how to close
your resources.
11
Additional Java Libraries
You are not permitted to use any additional Java libraries for this project. All of the necessary
import statements have already been included in the StudentFakebookOracle.java file for you.
The ROWNUM Pseudo-variable
In the course of implementing the queries, you may find that you wish to select only the first N rows
of results. Oracle SQL provides a pseudo-variable that facilitates this desire, but it can be incredibly
fidgety. This pseudo-variable is ROWNUM.
The was ROWNUM works is quite simple, but not always intuitive. After the FROM clause is
evaluated (meaning all JOINs are completed and winnowed based on the ON clauses), each row is
given a monotonically-increasing integer starting at 1; this value is stored in a psuedo-column
called ROWNUM, which allows us to access the value later on. Note, however, that these values are
applied before the WHERE clause is evaluated and, more importantly, before the ORDER BY clause is
evaluated. Because the order in which SQL returns results in the absence of an ORDER BY clause is
undefined, the order in which the values are applied to the rows is likewise undefined.
Consider the following query, which is supposed to find the users named “Bob” with the 10 smallest
IDs:
SELECT User_ID
FROM Users
WHERE First_Name = ’Bob ’ AND ROWNUM <= 10
ORDER BY User_ID ;
Although this query looks exactly right, it will almost certainly not behave as we’d like. The reason
is that the rows are numbered before they are sorted, and as mentioned, that numbering is applied
in no particular order. So when the WHERE clause is evaluated, the only rows that are returned are
those for users whose first name is “Bob” that happened to be in the first 10 rows to which rows
were assigned. Not only may this query not return the 10 smallest ID’d users whose first name is
“Bob,” but it might not even return any results despite there being results to return.
Instead, we would have to write out query like this:
SELECT User_ID
FROM (
SELECT User_ID
FROM Users
WHERE First_Name = ’Bob ’
ORDER BY User_ID
)
WHERE ROWNUM <= 10;
12
Now, the inner query returns all the users whose first name is “Bob” and sorts them in the order we
want; now, when the row numbers are applied, we’re in control of the order in which they are
applied. We can then filter using the ROWNUM pseudo-variable and we get what we want.
The ROWNUM pseudo-variable does not error out if there aren’t “enough” rows to return, as it
behaves just like any other field-based conditional.
It is possible to complete this project and earn full credit without using the ROWNUM
pseudo-variable.
13