$30
csc343
assignment #1: relational algebra
goals
This assignment aims to help you learn to:
? read a relational scheme and analyze instances of the schema
? read and apply integrity constraints
? express queries and integrity constraints of your own
? think about the limits of what can be expressed in relational algebra
Your assignment must be typed to produce a PDF document a1.pdf (hand-written submissions are not
acceptable). You may work on the assignment in groups of 1 or 2, and submit a single assignment for
the entire group on MarkUs. You must establish your group well before the due date by submitting an
incomplete, or even empty, submission.
background
You will be working on a schema and queries for a database used by a zoological institute to track an archive
of their artifacts.
During a ?eld trip collectors gather a variety of artifacts of the animals they study, resulting in tissue
samples, images, physical models (such as casts of paw prints), or live colonies.
After arriving at the institute, artifacts must be safely stored and maintained by technicians. Some
artifacts are cited in one or more publications. In all cases the o?cial species name must be recorded, and
must appear in the Catalogue of Life database. If correct taxonomic practices are followed, each species
belongs to exactly one genus, and each genus to exactly one family. Tables COL, Genus, and Species are
derived from Catalogue of Life database.
relations
? Collection(CID, date, SID)
Tuples here represent entire collections from a ?eld trip, where CID is the collection ID, date is the
starting date of the ?eld trip, and SID is the sta? ID of the collector.
1
? Collected(CID, AN)
A tuple here represents the fact that collection CID includes artifact number AN. A single collection usually contains multiple artifacts, and a single artifact may be aggregated from more than one
collection.
? Artifact(AN, species, type, location, SID)
Tuples here represent single artifact collected in the ?eld. AN is the artifact number, species is the
scienti?c species name, type is one of tissue, image, model, or live, location is where it was collected,
and SID is the sta? number of the technician who maintains this artifact.
? Published(AN, journal, date)
A tuple here represents the fact that artifact AN was mentioned in scholarly publication journal with
publication date date.
? Sta?(SID, name, email, rank, date)
These tuples represent a member of the institute's scienti?c sta?. SID is the sta? ID, name is their
full name, email is their professional email, rank is one of: technician, student, pre-tenure, or tenured,
and date is the date when they attained that rank.
? COL(family)
A singleton tuple here means that family is a scienti?c zoological family name that appears in the
Catalogue of Life.
? Genus(genus, family)
A tuple here means that genus is in family family.
? Species(species, genus)
A tuple here means that species is in genus genus.
our constraints
For each of the following constraints give a one sentence explanation of what the constraint implies, and
why it is required.
? ?species(Artifact) ?species(Species) = ;.
? ?rank(Staff) ? f'technician', 'student', 'pre-tenure', 'tenure'g.
? ?family(Genus) ?family(COL) = ;.
? ?genus(Species) ? ?genus(Genus).
? ?CID(Collected) = ?CID(Collection).
? ?AN (Artifact) = ?AN (Collected).
? ?SID(Collection) ? ?SID(Staff).
? ?SID(Artifact) ? ?SID(Staff).
? ?type(Artifact) ? f0
tissue0
;
0
image0
;
0 model0
;
0
live0g
? ?AN (P ublished) ? ?AN (Artifact)
queries
Write relational algebra expressions for each of the queries below. You must use notations from this course
and operators:
?; ?; ?; ./; ./condition; ;\;[; ; =
You may also use constants:
today (for current date) ; (for the empty set)
In your queries pay attention to the following:
? All relations are sets, and you may only use relational algebra operators covered in Chapter 2 of the
course text.
? Do not make assumptions that are not enforced by our constraints above, so your queries should work
correctly for any database that obeys our schema and constraints.
? Other than constants such as 23 or "lupus", a select operation only examines values contained in a
tuple, not aggregated over an entire column.
? Your selection conditions can use arithmetic operators, such as +; ?; 6=; ?; ; < and friends. You can
use logical operators such as _;^, and :, and treat dates and numeric attributes as numbers that you
can perform arithmetic on.
? Use good variable names and provide lots of comments to explain your intentions.
? Return multiple tuples if that is appropriate for your query.
There may be a query or queries that cannot be expressed in the relational algebra you have been taught
so far, in which case just write \cannot be expressed." The queries below are not in any particular order.
1. Rationale: Performance reviews include seeing how current the work is of sta? who have held their
current rank for a long time.
Query: Find the most recent collection date of any artifact collected by a sta? member who has held
their current rank the longest. Keep ties.
2. Rationale: Sta? who maintain every artifact in some collection should be considered favourably in
performance reviews.
Query: Find all sta? who maintain all artifacts in at least one collection.
3. Rationale: An artifact collected and maintained by the same sta? may have some special requirements
that should be investigated.
Query: Find all artifacts that were collected by the same sta? who maintains them.
4. Rationale: Identify multi-talented ?eld workers.
Query: Find all sta? who have collected at least 3 artifacts from every species in some family.
5. Rationale: Which publications might have some specialized niche focus?
Query: Find all publications that have used exactly 2 of our artifacts.
6. Rationale: Identify motherlode locations.
Query: Find all locations where at least one artifact from every family has been collected.
7. Rationale: Exclusively tissue sample collectors may need extra support for special reagents and shipping costs.
Query: Find all sta? who have collected only tissue samples.
8. Rationale: Collection sta? who should be encouraged to diversify their network.
Query: Find all sta? pairs who have worked only with each other on collections.
9. Rationale: Track the in uence of a given sta? member.
Query: Sta? member SID1 is in uenced by sta? member SID2 if (a) they have ever worked together on
a collection or (b) if SID1 has ever worked with a sta? member who is in uenced by SID2. Find SIDs
of sta? members in uenced by SID 42.
your constraints
For each of these constraints you should derive a relational algebra expression of the form R = ;, where R
may be derived in several steps, by assigning intermediate results to a variable. If the constraint cannot be
expressed in the relational algebra you have been taught, write \cannot be expressed."
1. No species is also a genus.
2. No genus belongs to more than one family.
3. All publications must be published after all artifacts they use have been collected.
4. Students may not catalogue live artifacts.
submissions
Submit a1.pdf on MarkUs. One submission per group, whether a group is one or two people. You declare a
group by submitting an empty, or partial, ?le, and this should be done well before the due date. You may
always replace such a ?le with a better version, until the due date.
Double check that you have submitted the correct version of your ?le by downloading it from MarkUs.
marking
We mark your submission for correctness, but also for good form:
? For full marks you should add comments to describe the data, rather than technique, of your queries.
These may help you get part marks if there is a aw in your query.
? Please use the assignment operator, \:=" for intermediate results.
? Name relations and attributes in a manner that helps the reader remember their intended meaning.
? Format the algebraic expressions with line breaks and formatting that help make the meaning clear.
4