$30
Ex. 1 — SQL (70 Points) Below is the ER model and corresponding relational model a simple student-course enrollment system. There are various courses under different departments, a department can chose to offer a course during a certain term (such as winter 2018) or it may not offer the course. Each course is also worth a certain number of credits. A course offering can have multiple sections (because there is too many students to fit into a single class). A student can enroll in the same course in different terms (if not passed previously), however they cannot enrol in different sections of the same course in a given term. These are enforced by the application. Students receive grades at the end of the semester.
A sample schema and few records have been provided as setup.sh Please add more records into this as you may need to test various scenarios. Use your individual database accounts to work on the assignment. DO NOT use your project groups database account as it is shared between all of your team members.
Student courseOffering course enroll Of
sid
sname
grade
term section ccode credits
dept
student(sid, sname) course(ccode, credits, dept) courseoffer(term, section, ccode) ccode is foreign key to course. enroll(sid, term, section, ccode, grade) (term,section,ccode) is foreign key to courseoffer. sid is foreign key to student.
1
Important !! All the sql solutions will be evaluated by an automated system, which compares the output data produced by executing your query on our dataset with the expected output result for the correct query. So it is important that you include the correct column names, in the correct order, perform any ordering on output tuples as asked etc.
Double check your SQL for typos, for example if you spelt ‘computer scence’ instead of ‘computer science’, a query might not return the correct records and you will not get any points. While the columnn and table names are not case sensitive, the data itself can be case sensitive. So do not write ‘Computer Science’, where it was required to write ‘computer science’ this can produce no results or wrong results. For more details read the attached sql formatting guide. If you have questions about this post it in the discussion forum for assignment 2. Remember you will either get 0 or all points for a given SQL question !!
For this assignment you will not create views or intermediate tables in your solution. All your answers should be comprised of only a select query. Output ONLY the attributes in the question, following the exact order mentioned in the question. Adding attributes not mentioned can result in a 0 score !
Unless specified, your output query should not produce duplicate results in your output resultset. Use the technique taught in class to eliminate duplicate records from the output. Where an output ordering is asked for, remember to order the output records. The technique for this was also shown in class.
1. (2 Pts) List the course codes and credits of all 3 and 1 credit courses in the dept ‘computer science’, ordering the output by the decreasing order of credits and ascending order of course code. 2. (2 Pts) List all the course codes and their credits for courses offered in the term ‘winter 2018’ by the dept ‘computer science’ - without using joins. Order the output by course code. 3. (3 Pts) List all the course codes and their credits for courses offered in the term ‘winter 2018’ by the dept ‘computer science’ - using joins. Order the output by course code. 4. (3 Pts) List all the course codes and their credits for courses offered in the term ‘winter 2018’ by the dept ‘computer science’ - using a correlated subquery. Order the output by course code. 5. (2 Pts) Give the course code and credits of all courses ever enrolled by the student with student id 12345678. Order the output by course code. 6. (2 Pts) Give the course code and credits of all courses enrolled by the student with student id 12345678 in the term ‘winter 2018’. Order the output by course code. 7. (2 Pts) Give list of course codes not offered in ‘winter 2018’ but has been offered in ‘winter 2017’. Order the output by course code. 8. (3 Pts) List all the course codes and credits for courses that was taken by BOTH the students with student ids 12345678 and 12345679 in the term ‘winter 2018’ (i.e., they are classmates - even if they took different sections of the same course). Order the output by course code. 9. (3 Pts) List all the course codes and credits for courses that were taken by the student with id 12345678 but not by the student with id 12345679 in the term ‘winter 2018’ - using one of the SQL set operators taught in the class. Order the output by course code. 10. (4 Pts) List all the course codes, terms and grades for courses taken by the student with student id 12345678 which was offered by the dept ‘computer science’ - using a correlated query. Order the output by course code and term. 11. (5 Pts) List the student id and names of all the other students that have taken the same course during the same term (section could be different) as that of the student with student id 12345678. Order the output by student id. 12. (2 Pts) Find the total number of students. Give the output column the name numstudents. 13. (2 Pts) Find the total number of students enrolled for a course in the term ‘winter 2018’. Give the output column the name numstudents. 14. (3 Pts) List the names of the department and the number of courses that each of them have (irrespective of they were offered or not). Name the later, numcourses. Order the output in the decreasing order of the number of courses and then by the ascending order of the department names.
2
15. (5 Pts) List the course code and number of credits of all courses offered by dept ‘computer science’ in ‘winter 2018’ term that has at the least 5 students enrolled. Order the output by course code. 16. (6 Pts) Give the names of departments such that all the students wth a course enrollment for ‘winter 2018’ term has also enrolled in at the least one of the courses offered by the dept in the ‘winter 2018’ term. Order the output by department name. 17. (5 Pts) List the course codes and the number of students enrolled (name it numstudents) across all sections for each course in ‘winter 2018’ term. If a particular course is offered in ‘winter 2018’, but has no students enrolled in it, it should show 0 for numstudents. Write this query without using any outer joins. Order the output by course code. 18. (6 Pts) Solve the above question by using an outer join, Order the output by course code. Hint:- use the derived table method discussed in class if you want along with the trick to manipulate NULL values. 19. (4 Pts) What is the average number of students enrolled in a course (across all sections) during the ‘winter 2018’ term. Ignore course offering where no students were enrolled in any of its sections. Name the average column avgenrollment. 20. (6 Pts) List the course code and the number of students enrolled in it (across all sections), for the course(s) offered in ‘winter 2018’ term, that has the highest course enrollment in that term. If your output has multiple courses, it must be ordered by the course code. Name the number of students enrolled column as numstudents.
Ex. 2 — Indexing (30 Points) Consider the enroll table from the previous question.
enroll(sid INTEGER, term VARCHAR(15), section INTEGER, ccode VARCHAR(10), grade VARCHAR(2))
Here is some additional information. •An INTEGER has 64 bits, average size of term is 10 bytes, average size of ccode is 8 bytes, and average size of grade is 1 byte. •There are 100 departments and on an average 40 courses per department. •Each year has 2 terms. •Each term, 90% of the courses across all the departments are offered. All course offerings have only 1 section each. •enroll has 4 years worth of data. •The university has 100,000 students. (for simplicity, assume they all started together 4 years ago and no new students were added later). •On an average each course offering has about 140 students enrolled. •All students enroll in some course offering(s) each term. Now assume there exists an indirect, clustered type II B+-tree index on sid of enroll and an unclustered type II B+-tree indirect index on (term, ccode) columns of enroll. A single data entry must always fit into one leaf page (it may not spread over more than one leaf page).
Further, rids of indexes take 10 bytes, internal page pointers are 6 bytes, page size is 4000 bytes. Leaf pages are filled on an average 60%, intermediate pages can have a fill factor in the range 50 - 100%. The root might have any fill factor.
1. For both indices calculate: (a) (6 Points) the avg. number of rids per data entry, the size of the data entry and the total number of data entries. (b) (4 Points) the number of leaves. (c) (5 Points) maximum and minimum possible number of intermediate nodes in the index (for the given possible fill factor range of 50-100%) and the height of the tree in each case.
3