$30
Database-Management-Systems
Problem Set 1 [100 points]¶
Deliverables:
Submit your queries (and only those) using the submission_template.txt file that is posted on Canvas. Follow the instructions on the file! Upload the file at Canvas.
Instructions / Notes:
Run the top cell above to load the database hw1.db (make sure the database file, hw1.db, is in the same directory as this IPython notebook is running in)
Some of the problems involve changing this database (e.g. deleting rows)- you can always re-download hw1.db or make a copy if you want to start fresh!
You may create new IPython notebook cells to use for e.g. testing, debugging, exploring, etc.- this is encouraged in fact!- just make sure that your final answer for each question is in its own cell and clearly indicated
When you see In [*]: to the left of the cell you are executing, this means that the code / query is running.
If the cell is hanging- i.e. running for too long: To restart the SQL connection, you must restart the entire python kernel
To restart kernel using the menu bar: “Kernel >> Restart >> Clear all outputs & restart”), then re-execute the sql connection cell at top
You will also need to restart the connection if you want to load a different version of the database file
Remember:
%sql [SQL] is for single line SQL queries
%%sql [SQL] is for multi line SQL queries
Have fun!
Problem 1: Linear Algebra [30 points]
Two random 3×3 ( N=3
) matrices have been provided in tables A and B, having the following schema:
i INT: Row index
j INT: Column index
val INT: Cell value
Note: all of your answers below must work for any square matrix sizes, i.e. any value of N
.
Note how the matrices are represented – why do we choose this format? Run the following queries to see the matrices in a nice format:
%sql SELECT group_concat(val, ” , “) AS “A” FROM A GROUP BY i;
%sql SELECT group_concat(val, ” , “) AS “B” FROM B GROUP BY i;
Part (a): Matrix addition [5 points]
The sum of a matrix A
(having dimensions n×m
) and a matrix B
(having dimensions n×m
) is the matrix C
(of dimension n×m
) having cell at row i
and column j
equal to:
Cij=Aij+Bij
Write a single SQL query to get the sum of A
and B
(in the same format as A
and B
):
“””
Expected output below- don’t re-evaluate this cell!
NOTE: A valid answer must work for ALL inputs of the given type,
not just this example. I.e. do not hardcode around this answer / etc!
“””
%%sql
SELECT A.i as i, B.j as j, A.val+B.val as val
FROM A,B
WHERE A.i=B.i and A.j=B.j;
* sqlite:///hw1.db
Done.
i j val
0 0 16
0 1 11
0 2 18
1 0 17
1 1 13
1 2 16
2 0 3
2 1 1
2 2 12
Part (b): Dot product [5 points]
The dot product of two vectors
a=[a1a2…an]
and
b=[b1b2…bn]
is
a⋅b=∑ni=1aibi=a1b1+a2b2+⋯+anbn
Write a single SQL query to take the dot product of the second column of A
and the third column of B
.:
“””
Expected output below- don’t re-evaluate this cell!
NOTE: A valid answer must work for ALL inputs of the given type,
not just this example. I.e. do not hardcode around this answer / etc!
“””
%%sql
SELECT sum(A.val*B.val) AS DotProduct
FROM A,B
WHERE A.j=1 AND B.j=2 AND A.i = B.i;
* sqlite:///hw1.db
Done.
DotProduct
113
Part (c): Matrix multiplication [10 points]
The product of a matrix A
(having dimensions n×m
) and a matrix B
(having dimensions m×p
) is the matrix C
(of dimension n×p
) having cell at row i
and column j
equal to:
Cij=∑mk=1AikBkj
In other words, to multiply two matrices, get each cell of the resulting matrix C
, Cij
, by taking the dot product of the i
th row of A
and the j
th column of B
.
Write a single SQL query to get the matrix product of A
and B
(in the same format as A
and B
):
“””
Expected output below- don’t re-evaluate this cell!
NOTE: A valid answer must work for ALL inputs of the given type,
not just this example. I.e. do not hardcode around this answer / etc!
“””
%%sql
SELECT A.i AS i, B.j AS j, sum(A.val*B.val) AS val
FROM A,B
WHERE A.j=B.i
GROUP BY A.i, B.j;
* sqlite:///hw1.db
Done.
i j val
0 0 106
0 1 80
0 2 171
1 0 146
1 1 109
1 2 212
2 0 23
2 1 17
2 2 55
Part (d): Matrix power [10 points]
The power An
of a matrix A
is defined as the matrix product of n
copies of A
.
Write a single SQL query that computes the third power of matrix A
, in other words, A3=A⋅A⋅A
:
“””
Expected output below- don’t re-evaluate this cell!
NOTE: A valid answer must work for ALL inputs of the given type,
not just this example. I.e. do not hardcode around this answer / etc!
“””
%%sql
with A_squared as (
SELECT A1.i AS i, A2.j AS j, sum(A1.val*A2.val) AS val
FROM A as A1, A as A2
WHERE A1.j=A2.i
GROUP BY A1.i, A2.j
)
SELECT A_squared.i as i, A3.j as j, sum(A_squared.val * A3.val) as val
FROM A_squared, A as A3
WHERE A_squared.j=A3.i
GROUP BY A_squared.i, A3.j;
* sqlite:///hw1.db
Done.
i j val
0 0 1767
0 1 1065
0 2 2065
1 0 2396
1 1 1463
1 2 2745
2 0 350
2 1 190
2 2 467
Problem 2: The Sales Database [35 points]
We’ve prepared and loaded a dataset related to sales data from a company. The dataset has the following schema:
Holidays (WeekDate, IsHoliday)
Stores (Store, Type, Size)
TemporalData(Store, WeekDate, Temperature, FuelPrice, CPI, UnemploymentRate)
Sales (Store, Dept, WeekDate, WeeklySales)
Before you start writing queries on the database, find the schema and the constraints (keys, foreign keys).
Part (a): Sales during Holidays [10 points]
Using a single SQL query, find the store(s) with the largest overall sales during holiday weeks. Further requirements:
Use the WITH clause before the main body of the query to compute a subquery if necessary.
Return a relation with schema (Store, AllSales).
Write your query here:
“””
Expected output below- don’t re-evaluate this cell!
NOTE: A valid answer must work for ALL inputs of the given type,
not just this example. I.e. do not hardcode around this answer / etc!
“””
%%sql
WITH Holiday_Sales as (
SELECT Stores.store as Store, sum(Sales.WeeklySales) as HolidaySales
FROM Stores, Sales, Holidays
WHERE Sales.Store = Stores.Store and Sales.WeekDate = Holidays.WeekDate and Holidays.IsHoliday = ‘TRUE’
GROUP BY Stores.Store
)
SELECT Store, HolidaySales as AllSales
FROM Holiday_Sales
ORDER BY HolidaySales DESC
LIMIT 1;
* sqlite:///hw1.db
Done.
Store AllSales
20 22490350.81000001
Part (b): When Holidays do not help Sales [15 points]
Using a single SQL query, compute the number of non-holiday weeks that had larger sales than the overall average sales during holiday weeks. Further requirements:
Use the WITH clause before the main body of the query to compute a subquery if necessary.
Return a relation with schema (NumNonHolidays).
Write your query here:
“””
Expected output below- don’t re-evaluate this cell!
NOTE: A valid answer must work for ALL inputs of the given type,
not just this example. I.e. do not hardcode around this answer / etc!
“””
%%sql
WITH Non_Holiday_Sales as (
SELECT Sales.WeekDate as date, Sum(Sales.WeeklySales) as NonHolidaySales
FROM Sales, Holidays
WHERE Sales.WeekDate = Holidays.WeekDate and Holidays.IsHoliday = ‘FALSE’
GROUP BY Sales.WeekDate
), Holiday_Sales as (
SELECT Sales.WeekDate as date, Sum(Sales.WeeklySales) as HolidaySales
FROM Sales, Holidays
WHERE Sales.WeekDate = Holidays.WeekDate and Holidays.IsHoliday = ‘TRUE’
GROUP BY Sales.WeekDate
), Holiday_Avg as (
SELECT AVG(Holiday_Sales.HolidaySales) as Average_Sales
FROM Holiday_Sales
)
SELECT COUNT(*) as NumNonHolidays
FROM Non_Holiday_Sales, Holiday_Avg
WHERE Non_Holiday_Sales.NonHolidaySales > Holiday_Avg.Average_Sales;
* sqlite:///hw1.db
Done.
NumNonHolidays
8
Part (c): Total Summer Sales [10 points]
Using a single SQL query, compute the total sales during summer (months 6,7,and 8) for each type of store. Further requirements:
Return a relation with schema (type, TotalSales).
Hint: SQLite3 does not support native operations on the DATE datatype. To create a workaround, you can use the LIKE predicate and the string concatenation operator (||). You can also use the substring operator that SQLite3 supports (substr).
Write your query here:
“””
Expected output below- don’t re-evaluate this cell!
NOTE: A valid answer must work for ALL inputs of the given type,
not just this example. I.e. do not hardcode around this answer / etc!
“””
%%sql
SELECT Stores.Type as type, SUM(Sales.WeeklySales) as TotalSales
FROM Stores, Sales
WHERE (Sales.WeekDate LIKE ‘%-06-%’ or Sales.WeekDate LIKE ‘%-07-%’ or Sales.WeekDate LIKE ‘%-08-%’) and Stores.Store = Sales.Store
GROUP BY Stores.Type;
* sqlite:///hw1.db
Done.
type TotalSales
A 1211554899.8500097
B 561610722.3999932
C 112555450.65999933
Problem 3: The Traveling SQL Server Salesman Problem [35 points]
SQL Server salespeople are lucky as far as traveling salespeople go- they only have to sell one or two big enterprise contracts, at one or two offices in Wisconsin, in order to make their monthly quota!
Answer the following questions using the table of streets connecting company office buildings.
Note that for convenience all streets are included twice, as A→B
and B→A
. This should make some parts of the problem easier, but remember to take it into account!
%sql SELECT * FROM streets LIMIT 4;
* sqlite:///hw1.db
Done.
id direction A B d
0 F UW-Madison DooHickey Collective 7
0 R DooHickey Collective UW-Madison 7
1 F DooHickey Collective Gizmo Corp 2
1 R Gizmo Corp DooHickey Collective 2
Part (a): One-hop, two-hop, three-hop… [15 points]
Our salesperson has stopped at UW-Madison, to steal some cool new RDBMS technology from CS564-ers, and now wants to go sell it to a company _within 9 miles of UW-Madison and passing through no more than 3 distinct streets. Write a single query, not using WITH (see later on), to find all such companies.
Your query should return the schema (company, distance) where distance is cumulative from UW-Madison.
Write your query here:
“””
Expected output below- don’t re-evaluate this cell!
NOTE: A valid answer must work for ALL inputs of the given type,
not just this example. I.e. do not hardcode around this answer / etc!
“””
%%sql
SELECT Streets.A as company, Streets.d as distance
FROM Streets
WHERE Streets.B = ‘UW-Madison’ and Streets.d <= 9
UNION
SELECT Streets_1.A as company, (Streets_1.d + Streets_2.d) as distance
FROM Streets as Streets_1, Streets as Streets_2
WHERE Streets_2.B = ‘UW-Madison’
and (Streets_1.B = Streets_2.A)
and NOT(Streets_1.A = Streets_2.B)
and (Streets_1.d + Streets_2.d <= 9)
UNION
SELECT Streets_1.A as company, (Streets_1.d + Streets_2.d + Streets_3.d) as distance
FROM Streets as Streets_1, Streets as Streets_2, Streets as Streets_3
WHERE Streets_3.B = ‘UW-Madison’
and (Streets_1.B = Streets_2.A)
and (Streets_2.B = Streets_3.A)
and NOT(Streets_1.A = Streets_3.B)
and NOT(Streets_1.A = Streets_2.B)
and NOT(Streets_3.B = Streets_2.A)
and (Streets_1.d + Streets_2.d + Streets_3.d <= 9);
* sqlite:///hw1.db
Done.
company distance
DooHickey Collective 7
DooHickey Corp 9
Gadget Collective 9
Gadget Corp 6
Gizmo Corp 9
Part (b): A stop at the Farm [10 points]
Now, our salesperson is out in the field, and wants to see all routes- and their distances- which will take him/her from a company A
to a company B
, with the following constraints:
The route must pass through UW-Madison (in order to pick up new RDBMS tech to sell!)
A
and B
must each individually be within 2 hops of UW-Madison
A
and B
must be different companies
The total distance must be <=15
Do not use WITH
If you return a path A→B
, do not include B→A
in your answer!
In order to make your answer a bit cleaner, you may split into two queries, one of which creates a VIEW. A view is a virtual table based on the output set of a SQL query. A view can be used just like a normal table- the only difference under the hood is that the DBMS re-evaluates the query used to generate it each time a view is queried by a user (thus the data is always up-to date!)
Here’s a simple example of a view:
%%sql
DROP VIEW IF EXISTS short_streets;
CREATE VIEW short_streets AS
SELECT A, B, d FROM streets WHERE d < 3;
SELECT * FROM short_streets LIMIT 3;
* sqlite:///hw1.db
Done.
Done.
Done.
A B d
DooHickey Collective Gizmo Corp 2
Gizmo Corp DooHickey Collective 2
Gizmo Corp Widget Industries 1
Write your query or queries here:
“””
Expected output below- don’t re-evaluate this cell!
NOTE: A valid answer must work for ALL inputs of the given type,
not just this example. I.e. do not hardcode around this answer / etc!
“””
%%sql
SELECT DISTINCT MAX(S1.A, S2.B) as company_1, MIN(S1.A, S2.B) as company_2, (S1.d + S2.d) as distance
FROM Streets as S1, Streets as S2
Where NOT(S1.A = S2.B)
and (S1.B = ‘UW-Madison’)
and (S2.A = ‘UW-Madison’)
and (S1.d + S2.d <= 15)
UNION
SELECT DISTINCT MAX(S1.A, S2.B) as company_1, MIN(S1.A, S2.B) as company_2, (S1.d + S1_Hop.d + S2_Hop.d + S2.d) as distance
FROM Streets as S1, Streets as S2, Streets as S1_Hop, Streets as S2_Hop
Where NOT(S1.A = S2.B)
and (S1.B = S1_Hop.A)
and (S1_Hop.B = ‘UW-Madison’)
and (S2_Hop.A = ‘UW-Madison’)
and (S2_Hop.B = S2.A)
and (S1.d + S1_Hop.d + S2_Hop.d + S2.d <= 15)
Union
SELECT DISTINCT MAX(S1.A, S2.B) as company_1, MIN(S1.A, S2.B) as company_2, (S1.d + S1_Hop.d + S2.d) as distance
FROM Streets as S1, Streets as S2, Streets as S1_Hop
Where NOT(S1.A = S2.B)
and (S1.B = S1_Hop.A)
and (S1_Hop.B = ‘UW-Madison’)
and (S2.A = ‘UW-Madison’)
and (S1.d + S1_Hop.d + S2.d <= 15)
UNION
SELECT DISTINCT MAX(S1.A, S2.B) as company_1, MIN(S1.A, S2.B) as company_2, (S1.d + S2_Hop.d + S2.d) as distance
FROM Streets as S1, Streets as S2, Streets as S2_Hop
Where NOT(S1.A = S2.B)
and (S1.B = ‘UW-Madison’)
and (S2_Hop.A = ‘UW-Madison’)
and (S2_Hop.B = S2.A)
and (S1.d + S2_Hop.d + S2.d <= 15);
* sqlite:///hw1.db
Done.
company_1 company_2 distance
Gadget Corp DooHickey Collective 13
Gadget Corp DooHickey Corp 15
Gadget Corp Gadget Collective 15
Gizmo Corp Gadget Corp 15
Part (c): Finding Triangles [10 points]
Finally, our salesperson wants to find a route that goes from company A
to company B
to company C
and then back to company A
with the following constraints:
A
, B
, C
must be different companies
Do not use WITH
Output each such route that you find once (use the id’s as a way to break ties)
Output the distance of the route
Write your query here:
“””
Expected output below- don’t re-evaluate this cell!
NOTE: A valid answer must work for ALL inputs of the given type,
not just this example. I.e. do not hardcode around this answer / etc!
“””
%%sql
SELECT S1.A as A, S2.A as B, S3.A as C, (S1.d + S2.d + S3.d) as distance
FROM Streets as S1, Streets as S2, Streets as S3
WHERE (S1.B = S2.A)
and (S2.B = S3.A)
and (S3.B = S1.A)
and (S1.id < S2.id)
and (s2.id < S3.id);
* sqlite:///hw1.db
Done.
A B C distance
Thing Industries Widget Collective GadgetCo 18