$29
%load_ext sql
%sql sqlite:///
Problem Set 2
Instructions / Notes:
Read these carefully
This problem set does not come with a dataset to load; instead, make your own instances of tables, either as solutions to the problems or for testing solutions to the problems.
You may create new IPython notebook cells to use for e.g. testing, debugging, exploring, etc.- this is encouraged in fact!- just make sure that your final answer for each question is in its own cell and clearly indicated
When you see In [*]: to the left of the cell you are executing, this means that the code / query is running.
If the cell is hanging- i.e. running for too long: To restart the SQL connection, you must restart the entire python kernel
To restart kernel using the menu bar: "Kernel >> Restart >> Clear all outputs & restart"), then re-execute the sql connection cell at top
You will also need to restart the connection if you want to load a different version of the database file
Remember:
%sql [SQL] is for single line SQL queries
%%sql [SQL] is for multi line SQL queries
See Piazza for submission instructions
Have fun!
Problem 1
[15 points total]
For each part of this problem you will need to provide a single SQL query which will check whether a certain condition holds on a specific instance of a relation, in the following way: your query should return an empty result if and only if the condition holds on the instance. (If the condition doesn't hold, your query should return something non-empty, but it doesn't matter what this is).
Note our language here: the conditions that we specify cannot be proved to hold in general without knowing the externally-defined functional dependencies; so what we mean is, check whether they could hold in general for the relation, given any specific set of tuples.
You may assume that there will be no NULL values in the tables, and you may assume that the relations are sets rather than multisets, but otherwise your query should work for general instances. We define the schemas of the tables used below for convenience, but in this problem you will need to construct your own test tables if you wish to use them to check your answers!
%%sql
DROP TABLE IF EXISTS R; CREATE TABLE R (W INT, X INT, Y INT, Z INT);
DROP TABLE IF EXISTS Cat; CREATE TABLE Cat(cat_name TEXT, breed TEXT, owner_name TEXT);
DROP TABLE IF EXISTS Owner; CREATE TABLE Owner(owner_name TEXT, ssn INT, hometown TEXT);
DROP TABLE IF EXISTS S; CREATE TABLE S (A INT, B INT, C INT, D INT, E INT);
Part (a)
[5 points]
{X,Y} is a superkey for a relation R(W,X,Y,Z) .
%%sql
Part (b)
[5 points]
The individual attributes of a relation R(W,X,Y,Z) are each keys.
%%sql
Part (c)
[5 points]
A multivalued dependency (MVD) is defined as follows: let R be a schema i.e. a set of attributes, and consider two sets of attributes A⊆R and B⊆R . We say that a multivalued dependency (MVD), written:
A↠B
holds on R if whenever there are two tuples t1,t2 such that t1[A]=t2[A] , there also exists a third tuple t3 such that:
t3[A]=t1[A]=t2[A]
t3[B]=t1[B]
t3[R∖B]=t2[R∖B]
Note that R∖B is all the attributes in R that are not in B , and that t3 need not be distinct from t1 or t2 . Note especially that an MVD holds on an entire relation, meaning that any two tuples (in any order) in the relation should satisfy the above conditions if the MVD holds. See the end of the lecture 7 slides for more on MVDs!
In this problem, write your query to check if the MVD {B}↠{D,E} holds for a relation S(A,B,C,D,E) .
%%sql
Problem 2
[20 points total]
Part (a)
[10 points]
Consider a relation T(V,W,X,Y,Z) . Provide two different sets of functional dependencies, F_1 and F_2, such that each one results in T having the largest number of distinct keys T could possibly have.
Store your lists of FDs as python lists having elements that are pairs of sets; for example to set F_1 as the set consisting of two FDs, {V,W}→{X,Y} and {W}→{X} :
F_1 = [(set(['V','W']), set(['X','Y'])), (set(['W']), set(['X']))]
*Note: the above is not necessarily one of the FDs- just an example of the syntax!
*Hint: You may use any of the functions from the lecture activities if they seem helpful! However your final answer should not involve these functions directly, nor are they necessary for this problem
F_1 = []
F_2 = []
Part (b)
[10 points]
Consider a schema T(A1,...,An) which has FDs {Ai,Ai+1}→{Ai+2} for i=1,...,n−2 . Create an instance of T , for n=4 , for which these FDs hold, and no other ones do- i.e. all other FDs are violated.
Use a series of INSERT statements below to populate the table T:
%%sql
DROP TABLE IF EXISTS T;
CREATE TABLE T (A int, B int, C int, D int);
Problem 3
[30 points total]
Consider a relation R(X,Y,Z) . In each part of this problem you will be given a condition, and you need to create the following three instances of R (as tables in SQL):
An instance A
An instance B which differs from A only in that it has one fewer row. Any row in B should also be there in A.
An instance C which differs from A only in that it has one additional row. Apart from the additional row, all the rows of C are same as A.
Part (a)
[10 points]
Create A , B and C such that each individual attribute is a key for A , but none of the individual attributes is a key for B or C . If you believe that B and/or C cannot be created, provide them as an empty table.
%%sql
DROP TABLE IF EXISTS A;
DROP TABLE IF EXISTS B;
DROP TABLE IF EXISTS C;
Part (b)
[10 points]
Create A , B and C such that ONLY the functional dependencies {Z}→{Y} and {X,Z}→{Y} hold on B , ONLY the functional dependency {X,Z}→{Y} holds on A and NO functional dependencies hold on C . If you believe B and/or C cannot be created, provide them as an empty table.
%%sql
DROP TABLE IF EXISTS A;
DROP TABLE IF EXISTS B;
DROP TABLE IF EXISTS C;
Part (c)
[10 points]
Create A , B and C such that the MVD Z↠X holds in A , but not in B or C . If you believe that B and/or C cannot be created, provide them as an empty table.
%%sql
DROP TABLE IF EXISTS A;
DROP TABLE IF EXISTS B;
DROP TABLE IF EXISTS C;