$30
CS 314 Principles of Programming Languages
Project 2: LL(1) Parser in Scheme
In this project, you will implement a Scheme functions to generate the correct PREDICT sets for random,
correctly-written, LL(1) grammars. Your functions will return the correct FIRST, FOLLOW, and PREDICT
sets for the grammar given.
1 Structure for the Grammars and FIRST, FOLLOW & PREDICT
Sets
This project will run functions based on random, correctly-generated LL(1) grammars. An LL(1) grammar
will be represented by a list of lists, where each sub-list begins with a non-terminal symbol, then subsequently
containing the right hand side of the rules corresponding to that non-terminal, which are lists comprising of
terminal and/or nonterminal symbols. So, a grammar will be of the following form:
’(
( NT1 rhsrule11 rhsrule12 ...)
( NT2 rhsrule21 rhsrule22 ...)
...
( NTk rhsrulek1 rhsrulek2 ...)
)
We will use ”eps” for the symbol ?. If the right hand side of the rule is the list ’(”eps”), then that rule
corresponds to the symbol ?. RULES CAN HAVE OTHER NON-TERMINALS IN THEM. Remember, every non-terminal will have at least one production rule. The start nonterminal is the left hand side
nonterminal of the first rule. In the case above, the start nonterminal is NT1.
The list of FIRST sets and FOLLOW sets have the following form and structure: a list of pairs consisting
of the nonterminal and its corresponding FIRST or FOLLOW set:
’(
( NT1 FIRST ( NT1 ))
( NT2 FIRST ( NT2 ))
...
( NTk FIRST ( NTk ))
)
and
’(
( NT1 FOLLOW ( NT1 ))
( NT2 FOLLOW ( NT2 ))
...
( NTk FOLLOW ( NTk ))
)
The list of FIRST sets and FOLLOW sets are both associative lists. The list of PREDICT sets have the
following form and structure: a list of pairs consisting of the rule (which is a list) and the PREDICT set for
that rule:
’(
( rule11 PREDICT ( rule11 ))
( rule12 PREDICT ( rule12 ))
...
( rulek1 PREDICT ( rulek1 ))
...
)
An example of a random LL(1) grammar is the following:
’(
(A (x B) ("eps"))
(B (y A) ("eps"))
)
1
which has the following BNF form:
A ::= xB | ?
B ::= yA | ?
where A and B are nonterminals. A is the start nonterminal. x and y are terminals, since they don’t have any
production rules. A correct output of the PREDICT sets for the grammar is as follows:
’(
((A ::= (x B)) (x))
((A ::= ("eps")) (eof)))
((B ::= (y A)) (y))
((B ::= ("eps")) (eof)))
)
The structure of the output is as follows: the number of elements in the list will be the number of nonterminals there are in the grammar. The first element of each element of the list is the non-terminal. Then,
each subsequent element lists the rule, and then its PREDICT set. In this example, based on the grammar
given above, if one is currently reducing A and encounters the symbol x, then x should be expanded with the
rule A ::= xB, whereas if one is currently reducing B and encounters the symbol y, then y should be expanded
with the rule B ::= yA. Note that in both cases, if one encounters EOF, (i.e. if one sees nothing else), then
expansion of the rules is finished, and parsing is finished. This is denoted by ’(eof).
1.1 FIRST Sets
Recall the method for constructing a FIRST set for a symbol (or collection of symbols) X:
1. If X is a terminal, then FIRST(X) = {X}.
2. If X ::= ?, then ? ∈ FIRST(X).
3. For each X as a non-terminal, initialize FIRST(X) = {}.
4. Iterate until no more terminals or ? can be added to any FIRST(X), where X = Y1Y2 . . . Yk:
(a) add a to FIRST(X) if a ∈ FIRST(Y1).
(b) add a to FIRST(X) if a ∈ FIRST(Yi) and ? ∈ FIRST(Yj ), for all 1 ≤ j < i and j ≥ 2.
(c) add ? to FIRST(X) if ? ∈ FIRST(Yi), for all i where 1 ≤ i ≤ k.
The algorithm is as follows, where NT is the set of all nonterminals in the grammar, and P is the set of
production rules:
for all A ∈ NT :
FIRST ( A ) = {}
while ( FIRST sets are still changing ) do
for each p ∈ P , of the form X → Y1 Y2 ... Yk do
temp ← FIRST ( Y1 ) - {?}
i ← 1
while ( i ≤ k - 1 and ? ∈ FIRST ( Yi))
temp ← temp ∪ ( FIRST ( Yi+1 ) - {?})
i ← i + 1
end // while loop
if i == k and ? ∈ FIRST ( Yk )
then temp ← temp ∪ {?}
end // if - then
FIRST ( X ) ← FIRST ( X ) ∪ temp
end // for loop
end // while loop
Also, refer to pg. 28 and 29 of Lecture 7 for more details.
2
1.2 FOLLOW Sets
Recall the algorithm for constructing a FOLLOW set for a non-terminal X:
1. Place EOF in FOLLOW(start nonterminal).
2. For all other nonterminals X, initialize FOLLOW(X) = {}.
3. Iterate until no more terminals can be added to any FOLLOW(X).
(a) if a rule is of the form A ::= αBβ:
i. if ? ∈ FIRST(β):
A. FOLLOW(B) = FOLLOW(B) ∪ (FOLLOW(A) ∪ (FIRST(β) - {?}))
ii. otherwise:
A. FOLLOW(B) = FIRST(β) ∪ FOLLOW(B).
(b) if a rule is of the form A ::= αB, then FOLLOW(B) = FOLLOW(B) ∪ FOLLOW(A).
The algorithm is as follows, where NT is the set of all nonterminals in the grammar, and P is the set of
production rules:
for all A ∈ NT :
FOLLOW ( A ) = {}
FOLLOW ( start nonterminal ) = { EOF }
while ( FOLLOW sets are still changing ) do
for each p ∈ P , of the form A → B1 B2 ... Bk do
TRAILER ← FOLLOW ( A )
for i ← k down to 1
if Bi ∈ NT then
FOLLOW ( Bi) ← FOLLOW ( Bi) ∪ TRAILER
if ? ∈ FIRST ( Bi)
TRAILER ← TRAILER ∪ ( FIRST ( Bi) - {?})
else TRAILER ← FIRST ( Bi)
else TRAILER ← { Bi}
Also, refer to pg. 49 of Lecture 7.
1.3 PREDICT sets
Recall the algorithm for constructing a PREDICT for a rule A ::= δ:
1. If ? ∈ FIRST(δ):
(a) PREDICT(A ::= δ) = (FIRST(δ) - {?}) ∪ FOLLOW(A)
2. otherwise:
(a) PREDICT(A ::= δ) = FIRST(δ)
For more information regarding FIRST, FOLLOW & PREDICT sets, please refer to the slides for recitation
week 3, the last two slides for recitation week 5, and lectures 7 and 8.
2 Project Description
For this project, you will have to complete the Scheme functions provided in order to generate the FIRST,
FOLLOW, and PREDICT sets properly. You will need to complete the following tasks:
1. Complete the utility functions getStartSymbol, getNonterminals, getProductions, updateAssocList,
and getInitSets.
(a) Function signatures and descriptions:
i.
(getStartSymbol grammar) returns the start nonterminal of the grammar. As defined in the
project description, the start nonterminal is the left-hand side nonterminal of the first rule of
the grammar. (This function is useful for generating initial FOLLOW sets.)
Input: grammar
Output: the start non-terminal of the grammar.
3
ii.
(getNonterminals grammar) returns a list of all nonterminals in the grammar. You may
assume that every nonterminal in the grammar will have a rule!
Input: grammar
Output: a list of all nonterminals in the grammar: ’(NT1 NT2 ... NTk)
iii.
(getProductions grammar) Returns a list of all production rules in the grammar. If a rule
happens to look like ”lhs ::= rhs1 | rhs2”, getProductions will separate those two rules into
different elements of the list.
Input: grammar
Output: a list of every rule in the grammar
Output format example:
Given an example grammar:
NT1 ::= rhs1 | rhs2
NT2 ::= rhs3
Which is of the form
’(( NT1 ( rhs1 ) ( rhs2 )) ( NT2 ( rhs3 )))
getProductions should return
’(( NT1 rhs1 ) ( NT1 rhs2 ) ( NT2 rhs3 )).
iv.
(updateAssocList assocList symbol set) takes as input an associative list, a symbol, and
the set to be union-ed with the corresponding list of the symbol.
Input: grammar
• assocList: associative list
• symbol: a symbol
• set: the set to be union-ed with the corresponding list of the symbol in assocList
Output: the updated associative list with the corresponding list of the symbol union-ed with
set.
An associative list is defined as follows: given symbols a, b, ..., k and their corresponding lists
lista, listb, ... listk, an associative list is a list of pairs of symbols and their corresponding lists:
’(
(a lista)
(b listb)
...
(k listk )
)
Each pair has the form ’(symbol list), where, for example, list can be the symbol’s FIRST
set or FOLLOW set. So, the command (updateAssocList assocList b set) with assocList
being the associative list given above, will output the following associative list:
’(
(a lista)
(b listb ∪ set )
...
(k listk )
)
v.
(getInitSets NTs startSymbol setting): Depending on the setting (either ’first or ’follow),
getInitSets returns the initialized FIRST sets or FOLLOW sets for all non-terminals in the
grammar. Inputs:
• NTS: the list of all nonterminals in the grammar
• startSymbol: the start nonterminal of the grammar
• setting: either ’first or ’follow
Output: Depending on what the setting is, initSets returns an initialized list of empty FIRST
sets or FOLLOW sets for each nonterminal of the grammar.
Output format example:
Given an example grammar:
4
NT1 ::= rhs1 | rhs2
NT2 ::= rhs3
Which is of the form
’(( NT1 ( rhs1 ) ( rhs2 )) ( NT2 ( rhs3 )))
(getInitSets ’(NT1 NT2) NT1 ’first) returns
’(( NT1 ()) ( NT2 ()))
(getInitSets ’(NT1 NT2) NT1 ’follow) returns
’(( NT1 ( eof )) ( NT2 ()))
2. Complete the functions genFirstFunc, recurseFirstSets, genFollowFunc, recurseFollowSets, getFollowSets,
and getPredictSets.
(a) Function signatures and descriptions:
i.
(genFirstFunc NTs) returns a function that computes FIRST(symbolSeq) given a sequence
of symbols (terminals and nonterminals) symbolSeq and an initial list of FIRST sets for all
nonterminals in the grammar.
Input: NTs: the list of all nonterminals in the grammar
Output: a function called first that takes as input a sequence of symbols symbolSeq, and an
initial list of FIRST sets firstSets, and outputs FIRST(symbolSeq). first has the following inputs
and outputs:
• Inputs:
– symbolSeq: sequence of symbols
– firstSets: a list of (potentially unfinished) FIRST sets for all nonterminals in the grammar
• Output: FIRST(symbolSeq), i.e. the application (first symbolSeq firstSets) given
inputs symbolSeq and firstSets.
ii.
(recurseFirstSets rules firstSets firstFunc) goes through each rule in the grammar and
updates the FIRST sets based on the current rule.
Inputs:
• rules: the list of all production rules in the grammar
• firstSets: a list of (potentially unfinished) FIRST sets
• firstFunc: a function that takes as input a sequence of symbols and list of FIRST sets, and
returns FIRST(sequence of symbols). (genFirstFunc generates this argument.)
Output: an updated firstSets after making one pass through all the rules in the grammar
iii.
(genFollowFunc NTs) returns a function that takes as input the sequence of symbols (terminals
and nonterminals) and an initial list of FOLLOW sets for all nonterminals in the grammar, along
with a variable called trailer and the completed FIRST sets, and returns the updated FOLLOW
sets.
Input: NTs: the list of all nonterminals in the grammar
Output: a function follow that takes as input a sequence of symbols symbolSeq, and an initial
list of FOLLOW sets, the variable trailer, and the list of COMPLETED FIRST sets firstSets,
and outputs the updated FOLLOW sets. follow has the following inputs and output:
• Inputs:
– symbolSeq: sequence of symbols
– followSets: a list of (potentially unfinished) FOLLOW sets for all nonterminals in the
grammar
– trailer: trailer variable (Check the algorithm on FOLLOW sets)
– firstSets: list of COMPLETED FIRST sets for all nonterminals in the grammar
• Output: The updated FOLLOW sets, i.e. the application (follow symbolSeq followSets
trailer firstSets) given inputs symbolSeq, followSets, trailer, and firstSets.
iv.
(recurseFollowSets rules followSets followFunc) goes through each rule in the grammar
and updates the FOLLOW sets based on the current rule.
Inputs:
5
• rules: a list of production rules in the grammar
• followSets: a list of (potentially unfinished) FOLLOW sets
• followFunc: a function that takes as input a sequence of symbols and list of FOLLOW sets,
and returns an updated FOLLOW sets. (genFollowFunc generates this argument.)
Output: an updated followSets after making one pass through all the rules in the grammar
v.
(getFollowSets rules followSets followFunc) returns the FOLLOW sets of all nonterminals in the grammar if the FOLLOW sets had no change.
Inputs:
• rules: a list of production rules in the grammar
• followSets: a list of (potentially unfinished) FOLLOW sets
• followFunc: a function that takes as input a sequence of symbols and list of FOLLOW sets,
and returns an updated FOLLOW sets.
Output: the updated followSets, which is a list of the FOLLOW sets of every non-terminal in
the grammar
vi.
(getPredictSets rules NTs firstSets followSets firstFunc) returns the PREDICT sets
for each rule in the grammar
Inputs:
• rules: the list of production rules of the grammar
• NTs: the list of all nonterminals in the grammar
• firstSets: the FIRST sets of all nonterminals in the grammar
• followSets: the FOLLOW sets of all nonterminals in the grammar
• firstFunc: a function that takes as input a sequence of symbols and list of FIRST sets, and
returns FIRST(sequence of symbols).
Output: a list of pairs, one for each production rule in the grammar, where the first element
is the production rule output as a list, and the second element is the PREDICT set for that
production rule
Output format example: Given an example grammar:
A ::= xB | " eps "
B ::= yA | " eps "
Which is of the form
’(( A ( x B ) ()) ( B ( y A ) ()))
getPredictSets should return
’(
(( A ::= ( x B )) ( x ))
(( A ::= (" eps ")) ( eof ))
(( B ::= ( y A )) ( y ))
(( B ::= (" eps ")) ( eof )))
You MUST use let* for this problem: How do you define A? How do you define delta? How do
you define FIRST(delta)?
2.1 Function closure
genFirstFunc and genFollowFunc requires you to return a function, rather than a list or numerical value.
If you bind certain variables inside the closure (either using let, let*, letrec, or lambda), you can pre-load those
bound variables when defining the function you wish to return. For example, given the function plusXY:
( define plusXY
( lambda ( x y ) (+ x y )))
a way to return a function plusX that is similar to plusXY with y pre-loaded can be as follows:
( define plusX
( lambda ( y )
( let (( plusXFunc ( lambda ( x ) (+ x y ))))
plusXFunc )))
Then, (plusX 2) will return the function (lambda (x) (+ x 2)). For example, ((plusX 2) 10) = 12.
6
2.2 Given Functions
For this project, most of the functions are self-contained, so there should be no need to construct new helper
functions. If you decide to use helper functions that you construct, please detail clearly what the function does
using comments. Remember to use the semi-colon for comments. Here is a list of the given utility functions
(in utilFuncs.ss) and useful built-in Scheme functions:
1. (contains? lst x) checks whether or not an element x is in the list lst. (This is a top-level check:
meaning (contains ’(1 2 3) 2) returns #t but (contains ’(1 (2 3)) 2) returns #f.)
2. (union a b) returns a new list that is a ∪ b, where a and b are sets. Since a set is defined to be a list
where no element appears twice, the new list should also contain only elements of a and b such that there
are no duplicates.
3. (removeMatch lst x) returns a new list with every instance of x removed from the list lst. If x is not in
lst at all, removeMatch just returns the original list lst with no changes.
4. (epsilon) defines ? as ”eps”. You can call this function without arguments like this: (epsilon). It will
return the symbol ”eps”.
In addition to these functions, you are also given getFirstSets.
• (getFirstSets rules firstSets firstFunc) returns the FIRST sets of all nonterminals in the grammar if the FIRST sets had no change.
Inputs:
– rules: the list of all production rules in the grammar
– firstSets: a list of (potentially unfinished) FIRST sets
– firstFunc: a function that takes as input a sequence of symbols symbolSeq and list of FIRST sets,
and returns FIRST(symbolSeq). (genFirstFunc generates this argument.)
Output: the updated firstSets, which is a list of the FIRST sets of every non-terminal in the grammar
3 Implementation & Useful Built-in Scheme Functions
For the implementation of the required functions, you can use standard built-in Scheme functions such as
append, list, let, let*, letrec, map, assoc, and reverse. Here are some useful built-in Scheme functions:
1. (map function list) is a useful built-in Scheme function that returns an updated list, where each
element of the list is applied with function. For example: (map even? ’(1 2 3 4 5)) = ’(#f #t #f
#t #f).
2. (assoc assocList symbol) is a useful built-in Scheme function. An associative list is a list of pairs
consisting of a symbol and a list corresponding to that symbol. An associative list has the following form.
’(
( a Lista)
( b Listb)
...
( k Listk )
)
(assoc symbol assocList) takes as input an associative list assocList, and a symbol, and returns the symbol
and corresponding list of the symbol as a list. In this case, given the above associative list as assocList,
(assoc ’a assocList) returns ’(a Lista).
3. (reverse lst) is a useful built-in Scheme function that returns the reverse of a list lst. For example:
(reverse ’(1 (2 3) 4 (5 6 7))) = ’((5 6 7) 4 (2 3) 1).
Some utility functions will be provided for you in utilFuncs.ss. You will use each of let, let*, letrec at
least once. You must not use assignment (set!) in Scheme.
7
4 How To Get Started
Copy the files proj2.ss and utilFuncs.ss into your own subdirectory. Both files must be in the same
subdirectory. Open proj2.ss using either DrRacket or racket from the terminal (check Recitation week 7 for
more information on how to do this). Remember to set the specific language to be R5RS. The file
proj2.ss is split into five sections: The first section contains the utility functions, the next three sections
contains the FIRST, FOLLOW, PREDICT set generators, and the last section contains the example grammars
and the tester functions.
Fill in the functions where it says ”YOUR CODE GOES HERE”. Once you have filled in the functions
with your own Scheme code, run proj2.ss.
The tester functions pre-define the variables and outputs the PREDICT sets for each grammar. In order
to swap out grammars, replace ”grammar0” with the appropriate numbered grammar in the line (define
grammar grammar<num). In the last section of the project description, we provide the FIRST, FOLLOW, and
PREDICT sets for the four example grammars given to you.
In addition to the tester functions and the example LL(1) grammars, we have also provided commands to
generate FIRST and FOLLOW sets separately. These commands are included in section 5 of proj2.ss. REMEMBER TO COMMENT OUT THE TEST COMMANDS BEFORE SUBMITTING YOUR
FILE. In another words, your proj2.ss should print nothing.
Read the proj2.ss already given to you, and understand the structure of the code. Use similar techniques
to assist you in constructing the required functions.
5 Grading
You will submit your modified version of the file proj2.ss via Sakai. Your programs will be graded primarily
on functionality, i.e. whether you have printed the correct PREDICT sets or not for each rule. Note that for
grading purposes, you have to print each set with the correct elements, i.e. the order of elements in each set
does not matter. This will be verified through automatic testing on a list of 10 correct LL(1) grammars, 4 of
which have been given to you as example grammars. You do not have to re-submit utilFuncs.ss.
6 Questions
Remember, if you have any questions regarding this project, the Sakai forum has a section provided for
questions regarding Project 2. START EARLY, since you will run into issues that will need to be resolved
on the way.
7 FIRST, FOLLOW, & PREDICT sets for the Example Grammars
There are four grammars given to you in section 5 of proj2.ss. This section lists their FIRST, FOLLOW,
and PREDICT sets, in the form that would be returned by proj2.ss. Note that the order of elements in the
FIRST, FOLLOW, and PREDICT sets do not matter. For example, for grammar1, FIRST(a) could’ve been
’(x ”eps”) instead of ’(”eps” x). Also, note that spaces between the terminals and nonterminals do not matter.
Recall the structure and form of the list of FIRST, FOLLOW, and PREDICT sets. (Refer to Section 1 of this
project description for the form and structure of the FIRST, FOLLOW, and PREDICT sets.)
• grammar0
start ::= a
FIRST sets
’(( start ( a )))
FOLLOW sets
’(( start ( eof )))
PREDICT sets
’(
(( start ::= ( a )) ( a ))
)
8
• grammar1
A ::= xB | ?
B ::= yA | ?
FIRST sets
’(( a (" eps " x ))
( b (" eps " y )))
FOLLOW sets
’(( a ( eof ))
( b ( eof )))
PREDICT sets
’(
(( a ::= ( x b )) ( x ))
(( a ::= (" eps ")) ( eof ))
(( b ::= ( y a )) ( y ))
(( b ::= (" eps ")) ( eof ))
)
• grammar2
start ::= expr
expr ::= + term term
term ::= a | b | c
FIRST sets
’(
( start ("+"))
( expr ("+"))
( term ( c b a ))
)
FOLLOW sets
’(( start ( eof ))
( expr ( eof ))
( term ( a b c eof ))
)
PREDICT sets
’(
(( start ::= ( expr )) ("+"))
(( expr ::= ("+" term term )) ("+"))
(( term ::= ( a )) ( a ))
(( term ::= ( b )) ( b ))
(( term ::= ( c )) ( c ))
)
• grammar3
start ::= stmts
stmts ::= assgn morestmts
morestmts ::= , stmts | ?
assgn ::= var = value
var ::= a | b | c | d | e
value ::= 0 | 1 | 2 | 3 | ... | 9
FIRST sets
9
’(
( start ( e d c b a ))
( stmts ( e d c b a ))
( morestmts (" eps " " ,"))
( assgn ( e d c b a ))
( var ( e d c b a ))
( value (9 8 7 6 5 4 3 2 1 0))
)
FOLLOW sets
’(
( start ( eof ))
( stmts ( eof ))
( morestmts ( eof ))
( assgn (" ," eof ))
( var ("="))
( value (" ," eof ))
)
PREDICT sets
’(
(( start ::= ( stmts )) ( e d c b a ))
(( stmts ::= ( assgn morestmts )) ( e d c b a ))
(( morestmts ::= (" ," stmts )) (" ,"))
(( morestmts ::= (" eps ")) ( eof ))
(( assgn ::= ( var "=" value )) ( e d c b a ))
(( var ::= ( a )) ( a ))
(( var ::= ( b )) ( b ))
(( var ::= ( c )) ( c ))
(( var ::= ( d )) ( d ))
(( var ::= ( e )) ( e ))
(( value ::= (0)) (0))
(( value ::= (1)) (1))
(( value ::= (2)) (2))
(( value ::= (3)) (3))
(( value ::= (4)) (4))
(( value ::= (5)) (5))
(( value ::= (6)) (6))
(( value ::= (7)) (7))
(( value ::= (8)) (8))
(( value ::= (9)) (9))
)
10