Starting from:

$29

Homework #2 Lexical Analyzer for a Subset of Scheme written in Scheme

Programming Languages

Lexical Analyzer for a Subset of Scheme written in Scheme (100 points): Recall that a lexical analyzer
takes a program and generates tokens to be used in parsing. In this homework, you will develop a
lexer for a subset of Scheme programming language in Scheme.
A Subset of Scheme Language
Terminal Symbols (, ), integer_literal, boolean_literal, string_literal, quote, lambda, if, let,
define, and, or, not, identifier
Constructs primitive-literal, atom, list
A primitive-literal is one of the following
 an integer_literal
 a boolean_literal
 a string_literal
An atom is one of the following
 a primitive-literal
 an identifier
A list has the form
 ( list-items )
where list-items is a sequence of zero or more of the following (in any
combination):
 an atom
 a list
Expressions
An expression is one of the following:
 an atom
 a list-literal
 an if-expression
 a let-expression
 a lambda-expression
 a function-application
List literals A list-literal has the form ( quote list-or-atom )
If expressions An if-expression has the form
 ( if expression expression expression )
Let expressions
A let-expression has the form
 ( let ( let-pair-list ) expression )
A let-pair-list is a sequence of zero or more occurrences of let-pair. A letpair has the form
 ( identifier expression )
Lambda expressions
A lambda-expression has the form
 ( lambda ( formals-list ) expression )
A formals-list is a sequence of zero or more occurrences of identifier.
Function application
A function application has the form
 ( expression arg-list )
An arg-list is a sequence of zero or more occurrences of expression.
Top-level items
A top-level-item is one of the following:
 an expression
 a definition
A definition has the form
 ( define identifier expression )
Program A program is a sequence of one or more occurrences of top-level-item.
Project Description
Given the language defined above, implement a lexer that generates the terminal and non-terminal
symbols for parsing. The tokens and their corresponding lexeme are given in the following table.
TOKEN LEXEME
LPAREN “(“
RPAREN “)”
INTEGER_LITERAL Any sequence of one or more digits (‘0’,…,’9’)
BOOLEAN_LITERAL “#t” or “#f”
STRING_LITERAL Formed by a single double-quote (") character, followed by any
sequence of zero or more non-double-quote characters, followed
by a single double-quote (") character
QUOTE_KEYWORD "quote"
AND_KEYWORD "and"
LAMBDA_KEYWORD "lambda"
IF_KEYWORD "if"DEFINE_KEYWORD "define"
OR_KEYWORD "or"
NOT_KEYWORD "not"
IDENTIFIER formed by one identifier-character, followed by any sequence of
zero or more identifier-character-or-digit characters, where the
entire lexeme would not match any other token type. (For
example, the lexeme "quote" is a QUOTE_KEYWORD token, not an
IDENTIFIER token.)
identifier-character a letter or any of the following characters:
! $ % & * + - . / : < = ? @ ^ _ ~
identifier-character-or-digit a character that is either an itentifier-character or a digit ('0' .. '9')
Note that space characters (space, tab, newline, etc.) are not significant, except when they occur
within a string literal.
A token has two pieces of information:
 the token type: The token type is a member of the TokenType enumeration.
 the lexeme: The lexeme is the token's sequence of characters as they appear in the input file.
The lexeme is significant because some kinds of tokens -for example, identifiers- are
represented by many possible lexemes. For example, the strings "a", "+", and "eq?" are all
identifiers.
Example
Consider the following input text:
(define factorial
(lambda (n)
(if (= n 1)
1
(* n (factorial (- n 1)))
)
)
)
When reading this input, your lexical analyzer should output the following sequence of tokens:
Token type Lexeme
LPAREN (
DEFINE_KEYWORD define
IDENTIFIER factorial
LPAREN (Token type Lexeme
LAMBDA_KEYWORD lambda
LPAREN (
IDENTIFIER n
RPAREN )
LPAREN (
IF_KEYWORD if
LPAREN (
IDENTIFIER =
IDENTIFIER n
INTEGER_LITERAL 1
RPAREN )
INTEGER_LITERAL 1
LPAREN (
IDENTIFIER *
IDENTIFIER n
LPAREN (
IDENTIFIER factorial
LPAREN (
IDENTIFIER -
IDENTIFIER n
INTEGER_LITERAL 1
RPAREN )
RPAREN )
RPAREN )
RPAREN )
RPAREN )
RPAREN )
How To Get Started
Copy the files lexer.ss and test.ss into your own subdirectory. You will implement your lexer in file
lexer.ss. File test.ss contains a few test cases for your convenience. You should use your own test
cases as well.

More products