Starting from:

$29.99

HW4. Relational Algebra & Query Plans


HW4. Relational Algebra & Query Plans
Objectives
In this homework assignment, you will review your knowledge in two different topics:

Relational Algebra
Query Processing & Optimization
This assignment has two sections, and a total of 20 points.

Preparation
To write a relational algebra query in a cell, the cell should be a Markdown cell. You can use LaTeX equations in a markdown cell for required algebraic notation. Here is a list of the main operators:

Selection (
)
Projection (
)
Union (
)
Intersect (
)
Set Difference (
)
Cross Product (
)
Rename (
)
Join (
)
Conjunction (
)
Disconjunction (
)
Greater Than or Equal To (
)
Less Than or Equal To (
)
You may also need 
 and 
 in the notations you use.

Consider the same bank database you have used in homework assignments 2 and 3.

Customer = {customerID, firstName, lastName, income, birthDate}
Account = {accNumber, type, balance, branchNumberFK-Branch}
Owns = {customerIDFK-Customer, accNumberFK-Account}
Transactions = {transNumber, accNumberFK-Account, amount}
Employee = {sin, firstName, lastName, salary, branchNumberFK-Branch}
Branch = {branchNumber, branchName, managerSINFK-Employee, budget}
Q1: Relational Algebra (5 points)
In this assignment, we want you to write down the relational algebraic presentations for the queries you have previously written to extract data from the bank database.

1.1. Find out names of the bank branches and first name and last name of their managers.


1.2. Show account number, account type, account balance, and transaction amount of the accounts with balance higher than 100,000 and transaction amounts higher than 15000.


1.3. Show first name, last name, and income of customers whose income is at least twice the income of any customer whose lastName is Butler.



1.4. Show Customer ID, income, account numbers and branch numbers of customers with income greater than 90,000 who own an account at both London and Latveria branches.


1.5. Customer ID of customers who have an account at the New York branch, who do not own an account at the London branch and who do not co-own an account with another customer who owns an account at the London branch.

(▷) is the symbol for Anti-join



Q2: Query Processing (15 points)
In this assignment, we want to go through the query processing steps for a query you have previously written.

2.1. (5 points). You have previously provided the SQL query to find out customer ID, first name, last name and income of customers who have income greater than $5000 and own accounts in all of the branches that Helen Morgan owns accounts in. Parse your query into a query parse tree.

Parse Tree image3.png

2.2. (5 points). Convert your parse tree to the equivalent relational algebraic representation.

image2.jpgRA

2.3. (3 points) Can you rewrite your query plan?

image1.pngQuery Plan

Explanations:

By Removing "duplicated elimination" symbol, our final answer will not have any effect by this change.
If we move "income > 5000", "c.firstName = 'Helen'" and "c.lastName = 'Morgan'" down to one of the subquery, we will have the same result and improve effienciy.
Grouping the associative/commutative operators together.
2.4. (2 points) Assume you have a million records in each of the six tables above. If you need, make necessary assumptions about your storage blocks, as well as about charactristics in the bank.db. Can you enumerate the size and cost of the intermediate tables in your query plan?

Suppose we have:

2000 people with firstname 'Helen'
2000 people with lastname 'Morgan'
1/100 people with income > 5000
Hence, for each table:

R(a,b) has T(R) = 1,000,000 records
V(R,a) = 2,000. 1,000,000/2,000 = 500, which means that there are 1000 distinct variables in each table
Let:

Subquery 1: S = 

Subquery 2: S = 

Note: we need to do the set difference between subqueires 1 and 2

For subquery 1:

Size of the natural join of 3 tables:
T(Customer
Owns
account) = T(Customer)T(Owns)T(Account)/max(V(Customer,Y), V(Owns,Y), Y(Account, Y)) 2. V(R,a) = 2000 [1000 different people], the number of tuples that satisfy income < 5000 is 1/100 of the total tuples. Hence, we could do a fair estimation on the size of S (without join):
T(R)(1-(1-1/V(R,a))(1 - 1/100)) = 1,000,000 * (1-(1-1/2,000)*(1-1/100))


For subquery 2:
The basic step is similar. Suppose after calcuation we get the size estimation: T(R1)


Hence, for the set difference (or, except), we could get the result is between T(R) - T(R1) tuples. One fair approach is to do an average: T(R) - T(R1)/2

Submission
Complete the code in this notebook.Put hw4.ipynb and your image files together in a zip file hw4.zip, submit it to through Canvas system to your HW4 activity. You can also include a pdf file where you can add your comments, thoughts, explanations about any of the questions.

More products