$29.99
Homework 6
A programming project: an interactive calculator.
Build an interactive calculator that basically does the following
(with indicated point values):
1. [25] inputs expressions to be evaluated, written in an algebraic
notation (e.g. "3 * (2 - 1)"). The expression notation should support
the usual conventions of operator precedence. Input initially will be
assumed to be from stdin.
2. [10] evaluate the expressions entered in step 1
3. [5] print the result (be default on stdout)
We will assume that we are dealing with integer arithmetic, supporting
the following set of operations:
+, -, *, div, mod
plus a unary negation operator ~ (as in SML).
Once the basic functionality is working, with just simple arithmetic
expressions, we can start adding features:
(a) [15] Simple variable definitions like
let x = 3 * 5
and the use of such variables in expressions (e.g. 2 * x -1)
(b) [20] simple unary function definitions, e.g.
fun f x = x * 3 - 2
along with the use of such defined functions in expressions:
f(3) + 2.
(c) [15] relational operators like ==, < and conditional
expressions of the form "if x < 3 then 2 else g y" (but not while
loops, since these would not make sense because there are no side
effects in the calculator language). Just == and < will suffice,
but you can add others like , <=, =.
(d) [5] recursive function definitions, like factorial.
For up to 20 extra credit points, you can implement a type checker for
the calculator language (once you have function and/or boolean values
in addition to integer values, so that there are types to
discriminate).
You may not be able to complete all parts of the project, but do as
many as you can. Quality is important.
*********************************************
IMPORTANT: Please make your code as neat, well-formated, and readable
as you can. Use mneumonic function and variable names. Include
comments to explain how the code works and what your assumptions are.
Your solution should have a README file explaining the overall
structure of your implementation, how to build it, and how to run it.
*********************************************
This project can be implemented in either SML or Haskell, or both (for
30% extra credit). It is essential that your code compile correctly
-- very little credit will be given for code with syntax or type
errors. It is also important that you show that you have done some
testing to prove that your implementation works correctly, by
providing test cases and testing transcripts. You can mine code
from earlier homeworks, the simple language implementation example,
and the Hutton calculator.lhs example on the Programs page.
==================================================================
Step 1. Input and simple parsing.
---------------------------------
An example of a very simple calculator is provided in the file
IO-parsing.hs (added to the Programs page) that shows how the input
readers can be used to process characters read using the IO monad (by
the getChar action in particular).
This example uses a variant of the Reader monad given in the file
Reader.hs. At the end of Reader.hs, I define a Reader action expr
which parses and evaluates extremely simple arithmetic expressions of
the for "3", "1+2", "4+7+6", i.e. expressions formed by adding
numbers. This expr Reader is used to process the input gathered by the
top-level IO action getInput. This illustrates how to divide the task
of character input from the task of input processing.
The Reader.hs file has been rewritten mainly to define
Reader as a Monad instance from the beginning, and it uses
the usual "return" and "=" names for monad operations. This
makes it possible to use the "do" notation to build Readers.
For SML we have in stream.sml and inputfn.sml how to connect
character input to input processing. Try adding an SML version
of the expr reader to inputfn.sml.
Files provided: IO-parsing.hs, Reader.hs, stream.sml, and inputfn.sml.
Step 2. Lexical analysis (or tokenization)
-------------------------------------------
The sample expr Reader at the end of Reader.hs goes directly from
an input stream of characters to the integer result of expression
evaluation. It analyzes the character stream into meaningful units
like the "+" symbol and natural numbers (digit sequences), and
then interprets the sequence of these elements as an expression
and evaluates it as it parses.
Naively, this process looks like (ignoring the complexities of
the Reader monad):
Instream --------------------- Int
expr
To support more complicated terms and definitions, it will be
convenient to factor this process into several stages. The
parsing of input characters into expressions is typically split
into two phases
(1) lexical analysis, which splits the character stream into
meaningful segments representing operators, numbers,
identifiers, keyworks, and punctuation (e.g. parentheses).
These units are usually called "tokens". For instance,
"2+3" would break down into three tokens: (number 2,
operator +, number 3). Tokens may or may not be separated
by white space.
(2) parsing the resulting sequence of tokens into expression
structures, usually represented by values of an expression
abstract syntax. For instance, (number 2, operation +,
number 3) might be parsed into
Plus(Nat 2, Nat 3)
where Plus and Nat are data constructors for an
expression datatype. We have seen such a datatype before
in Problem 6 of Homework 1.
Once we have expressions represented by such an abstract
syntax, it is fairly straightforward to write an evaluator
for them. This evaluator can take the form of a direct
interpreter:
eval :: Expr - Int (val eval : expr - int)
or if we want to be fancy we could "compile" the expression
into a sequence of abstract machine instructions as we did
in Homework 1 with compilation of expressions to the RPN
abstract machine.
So we want to define a type of tokens and then tokenization
will be a transformation of a character stream (char list,
respectively [Char] in SML and Haskell) into a token stream.
tokenize :: Reader TokenStream (Haskell)
val tokenize : tokenstream reader (SML)
where
type TokenStream = [Token] (Haskell)
type tokenstream = token list (SML)
This will be based on a token reader:
tokenRdr :: Reader Token (Haskell)
tokenRdr : token reader (SML)
The reader tokenRdr will be defined as a multi-way choice
between readers for the various kinds of token, e.g.
324 -- natural number token
x23 -- alphanumeric identifier
+ -- plus operator (and similarly for the other operators)
( -- left parenthesis
) -- right parenthesis
These different forms of token can be represented by constructors
for a token datatype:
data Token
= ID String | ... | PLUS | ... | LPAR | RPAR
Lets assume the following set of basic arithmetic operations:
+, - additive
*, /, % multipicative (% = mod)
~ (arithmetic) negation
These are all binary except for ~, which is unary.
(3) Parsing expressions
-----------------------
We define an datatype for the abstract syntax of expressions. In
Haskell this would be something like:
type Variable = String
data Expr
= Nat Int
| ... | Plus Expr Expr | ...
Next we want to take the result of tokenization, a token stream
and write parsing actions that consume the token stream and produce
expression abstract syntax (values of the Expr type).
We can express parsing as yet another monad (e.g. in Haskell):
data Parser a = P (TokenStream - Maybe (a, TokenStream))
instance Monad Parser where
return v = P (\ toks - Just(v, toks))
P p = f = P (\ toks - ...)
fail s =
by analogy with the Reader monad defined in Reader.hs.
The Parser actions will be something like the expr :: Reader Int
reader action at the end of Reader.hs, except that instead of
computing the numeric result, they will build abstract syntax
values representing the expressions that they parse.
Here is the Haskell interface:
getToken :: Parser Token -- like inputc for Reader
parse :: Parser a - (TokenStream - Maybe(a,TokenStream))
-- perform a parse, analagous to performRdr
isToken :: Token - Parser () -- recognize a given token
-- analagous to the symbol Reader
natExpr :: Parser Expr
identExpr :: Parser Expr
expr :: Parser Expr
term :: Parser Expr
factor :: Parser Expr
Here expr is the top-level parser action, which parses a complete
top-level expression. But in order to implement the usual operator
precedence rules, we need the auxiliary expression classes given
by the following grammar (omiting the unary negation operator (~)):
expr ::= term "+" expr
expr ::= term "-" expr
expr ::= term
term ::= factor "*" term
term ::= factor "/" term -- div
term ::= factor "%" term -- mod
term ::= factor
factor ::= "(" expr ")"
factor ::= "nat" -- nat token of form: Nat n
factor ::= "id" -- id token of form: Id s
Here expr, term, and factor are known as "nonterminals",
while the quoted symbols "+", "-", "nat", "id", etc. represent
terminal symbols, which are tokens. This grammar is designed to
enforce the usual precedence rules, where the multiplicative
operators "*", "/", "%"
Using some regular expression notation, this grammar can be
written more succinctly as:
empty ::=
expr ::= term ("+" expr | "-" expr | empty)
term ::= factor ("*" term | "/" term | "%" term | empty)
factor ::= "(" expr ")" | "nat" | "id"
Now each of the nonterminals expr, term, and factor is translated
into a Parser action. Here, for example, is how the grammar production
for expr translates into a Parser action:
expr :: Parser Expr
expr = do t <- term
do isToken PLUS
e <- expr
return (Plus t e)
+++
do isToken MINUS
e <- expr
return (Minus t e)
+++
return t
Observe that the three parser actions are mutually recursive.
-------------------
SML Note: In the SML version of the parser, the parser type can
be a simple abbreviation for a function type:
type 'a parser = tokenStream - ('a * tokenStream) option
Thus the parser actions expr, term, and factor will be defined
as functions, and so can be mutually recursive:
fun expr toks =
chain term (fn t =
chain (isToken T.PLUS) (fn _ =
chain expr (fn e =
return (Plus(t, e))))
+++
chain (isToken T.MINUS) (fn _ =
chain expr (fn e =
return (Minus(t, e))))
+++
return t) toks
and term toks = (...) toks
and factor toks = (...) toks
Note that we have to "eta-expand" the functions so that they are
in the right form for using the recursive "fun" declaration.
---------------------
(4) Evaluating Expressions
--------------------------
Once we have captured an expression as an abstract syntax value
of type Expr, we can evaluate it using a simple expression
interpreter:
---------
module Eval where
import Exprs
eval :: Expr - Int
eval (Nat n) = n
...
--------
Here we are ignoring identifiers (eval(Id s) can remain undefined
initially) -- we'll come back to these later after identifier
definitions have been added.
(5) Top level calculator command
--------------------------------
Now we put all these pieces together in the top-level calculator
function, which we will call calc:
Haskell version:
------------------------------
file: Calc.hs
------------------------------
-- Calc.hs
module Calc where
import Reader
import Tokens
import Exprs
import Parser
import Eval
calc :: Instream - Maybe Int
calc instr =
case (performRdr tokenize instr) of
Nothing - Nothing
Just (toks,_) -
case (parse expr toks) of
Nothing - Nothing
Just(v,_) - Just (eval v)
------------------------------
SML version:
------------------------------
file: calc.sml
------------------------------
(* calc.sml *)
structure Calc =
struct
structure R = Reader
structure T = Tokens
structure E = Exprs
structure P = Parser
structure V = Eval
(* calc : R.instream - int option *)
fun calc (instr : R.instream) =
case (T.tokenize instr)
of NONE = NONE
| SOME (toks,_) =
(case (P.expr toks)
of NONE = NONE
| SOME(v,_) = SOME (V.eval v))
end (* structure Calc *)
------------------------------
For SML, we also need a CM description
file to build the multifile program:
------------------------------
file: calc.cm
------------------------------
Group is
$/basis.cm
instream.sig
listinstream.sml
readerfn.sml
reader.sml
tokens.sml
exprs.sml
parser.sml
eval.sml
calc.sml
------------------------------
This file is located in the same directory as the source files that it
lists, and we compile the calculator by running the command:
- CM.make "calc.cm";
Here the reader.sml file just contains the following declaration:
structure Reader : READER = ReaderFn(ListInstream)
and we could change the underlying implementation of instreams by
replacing ListInstream with, say, StreamInstream.
The calculator process can now be broken into the following chain:
instream ----------------------------- tokenStream
tokenize
tokenStream --------------------------- expr
parse
expr ---------------------------------- int
eval
(6) Adding unary negation
-------------------------
We are using "~" as the symbol denoting unary negation (to avoid some
complications involved in overloading "-" as both the subtraction and
the negation operators). We will assume that "~" should have higher
precedence, i.e. should bind tighter, than any of the binary
operations.
We can add it to the grammar by adding a new nonterminal "signed"
signed ::= "~" factor | factor
and revising the production for term to be:
term ::= signed ("*" term | "/" term | "%" term | empty)
After adding the corresponding signed Parser action to parser.sml,
we can calculate using negation:
- Calc.calc (ListInstream.fromString "2*(3+~1)");
val it = SOME 4 : int option
(7) Identifiers and definitions
-------------------------------
Let's introduce identifiers as basic elements of expressions. At
the token level, we have a token constructor ID
data Token = ... | ID String | ...
datatype toke = ... | ID of string | ...
which takes the name of the identifier as an argument. So the
identifier x is represented by the token (ID "x"). We already
included these in the expression grammar above (section (3))
as elements of factors. The parser will translate a token like
(ID "x" :: Token) into abstract syntax for an elementary expression:
(Id "x" :: Expr) (note the use of ID for the token constructor, and
Id for a variable expression constructor).
So assuming we can parse an expression like "x+3" into abstract
syntax like Plus(Id "x", Nat 3), how do we evaluate such an
expression? Clearly we need to know what integer value is
represented by the variable x. This information is obtained from
an "environment", which is a finite map from variables to their
values (in this case values will always be integers). We have
seen a suitable data structure for representing environments
before: association lists (alists) with strings for keys.
Given an environment
env = [..., ("x", 3), ...]
we can evaluate "x+2" by looking up the value of x in env, which
gives 3, and then adding 3 and 2 to get 5.
So evaluation of an expression will now require an environment as
an additional parameter:
eval :: Expr - Env - Int
val eval : (expr * env) - int
The next question is: where do environments come from? We get
variable bindings by processing declarations. We can use the
following conventional syntax for simple variable declarations:
let variable = expr
E.g.
let x = 3-1
Lexically, we need to add two new token constructors:
data Token = ... | LET | EQ
where the token reader will translate the string "let" into the
token LET, and the string "=" into EQ. The grammar needs to be
extended with a new nonterminal, "decl", and a new rule:
decl ::= "let" id "=" expr
or
decl ::= LET (ID s) EQ expr
---------------------
*** Keyword tokens ***
LET (standing for "let") is an example of a keyword token. The
tokenize function must look for such keywords before recognizing
general identifiers like "x" or "abc", because otherwise "let" would
be translated as (ID "let"), i.e. as a general identifier. This goes
for other keywords to be introduced later, like "fun" and "if",
"then", "else".
---------------------
The parser will translate this into a new abstract syntax type
for declarations, say:
data Decl = Let Variable Expr
datatype decl = Let of variable * expr
Now the calculator input can take two forms: (1) an expression to
be evaluated, or (2) a declaration to be processed. We can call
a calculator input form a "statement", and define the statement
concrete and abstract syntax by
stmt ::= decl | expr
data Stmt = D Decl | E Expr
datatype stmt = D of decl | E of expr
Finally we can combine a sequence of declarations and expressions
into a "program":
type Prog = [Stmt]
type prog = stmt list
To evaluate a statement, we have a function
evalStmt :: Stmt - Env - (Int, Env)
For instance:
evalStmt (E(Plus (Id "x") (Nat 2)), env) == (5, env)
if env maps "x" to 3. For declarations,
evalStmt (D(Decl "x" (Nat 3)), env) == (3, ("x",3):env)
i.e. the value returned for a declaration is the value of the
defining expression (the "definiens"), and the environment is
the original environment extended with the new binding of "x".
We can process programs is a couple of ways. First we can assume
that each "line" of input is going to contain one statement, and
have the top-level evaluation loop process one line after another,
passing the environment as a parameter through the loop.
Alternatively, we can add concrete syntax for sequences of
statements and modify the parser to process them into statement
lists.
prog ::= stmt (";" prog | empty)
Then a single "line" of input could have multiple statements
separated by the semicolon token.
(8) Function definitions and applications
-----------------------------------------
Lets start with a very simple form of function definition
fun f x = x+3
using the reserved token "fun" to introduce the function
definition, followed by the function name (a variable),
the parameter variable, and then "=" followed by the
function body expression. The grammar rule for declarations
then becomes
decl ::= "let" id "=" expr | "fun" id id = expr
or
decl ::= LET ID EQ expr | FUN ID ID EQ expr
(FUN is now a new basic token, like LET).
The grammar for expressions is modified by adding
a function application "f(expr)" as a new form of
factor:
factor ::= "(" expr ")" | id "(" expr ")" | nat | id
Evaluating a function definition should add a
function name to function value binding to the environment.
A function value will be a lambda-abstraction "closure":
FunVal(Fn("x", expr), env)
where the abstract syntax for a function expression
(corresponding to a lambda expression), is defined by
datatype funExpr = Fn of variable * body
and env is the current environment at the point
where the function definition is evaluated -- it should
contain bindings for any variables (other than the
parameter "x") that appear free in the body expression.
Fval is a dataconstructor for a more general "value"
datatype
datatype value = IntVal of int
| BoolVal of bool
| FunVal of funExpr * env
For instance, after
let y = 8
the declaration
fun f x = y + x
should add the binding
("f", FunVal(Fn("x", Plus(Id "y", Id "x")), ("y", IntVal 8)::env))
to the current environment env. The environment now can contain two
kinds of bindings -- function bindings and integer bindings, so we use
the value datatype defined above for bound values (the boolean values
(BoolVal) won't appear in environments, but we need them as the value
variant produced when evaluating relational expressions).
To evaluate a factor expression of the form
f(argexpr)
in and environment env we will follow the following procedure:
(i) look up the function variable f in env, obtaining a function
value funval of the form
funval = FunVal(Fn(var, body), env')
(ii) evaluate the argument argexpr in env, obtaining a value argval.
(iii) add a binding (var, argval) to env', yielding an environment env''
(this augments the closure environment env' so that it binds
_all_ the variable appearing in body, including the function
parameter var)
(iv) evaluate the body expression in env'' to obtain the value of the
application expression.
(9) Relational operators and conditional expressions
----------------------------------------------------
We'll assume that two relational operators, "==" and "<" will suffice,
since the others can be defined in terms of these, using conditional
expressions.
The conditional operators will be infix operators of lower precedence
than any of the arithmetic operators. This can be expressed in the
grammar by adding another nonterminal rexpr with the following rule:
rexpr ::= expr "==" expr | expr "<" expr
Then conditional expressions, which will become the new top-level form
of expression, can be handled by introducing one more nonterminal,
cexpr, with grammar rule
cexpr ::= "if" rexpr "then" expr "else" expr | expr
This grammar limits the role of relational expressions (rexpr) to be
used only in the condition subexpression of conditional
expressions. Allowing more general use of relational expressions would
require a more complicated grammar.
We need to introduce three new "keyword" tokens for "if" (IF), "then"
(THEN), and "else". The infix operators "==" and "<" can be
recognized by the symbol token reader as was done for the arithmetic
operators.
The relational and conditional expressions also have to be
added to the expression abstract syntax type. They can be
added as new constructors for the general expr type:
datatype expr
= Nat of int
...
| Equal of expr * expr
| Less of expr * expr
| If of expr * expr * expr
Evaluation of these new expressions is straightforward, but
the type for expression values needs to be expanded to cover
the boolean values of relational expressions.
(10) Recursive functions
Recursive functions do not need any additional syntax, either at the
grammer level or the abstract syntax level. Recursion can be
supported simply by adding the function binding as well as the
parameter binding to the closure environment before evaluating the
function body expression. In other words, we modify step (iii) in the
procedure for function application as follows:
(iii) add bindings (var, argval) and (f, funval) to env', yielding an
environment env'' (this augments the closure environment env' so that
it binds _all_ the variable appearing in body, including the function
parameter var and the called function symbol itself, so it can be
called recursively in the body).
(11) Summary
* Grammar
Here is the final grammar for the calculator language:
-- conditional expressions, the top-level form of expressions
cexpr ::= "if" rexpr "then" expr "else" expr | expr
-- relational expressions, where the arguments are arithmetic
rexpr ::= expr "==" expr | expr "<" expr
-- arithmetic expressions
empty ::=
expr ::= term ("+" expr | "-" expr | empty)
term ::= signed ("*" term | "/" term | "%" term | empty)
signed ::= "~" factor | factor
factor ::= "(" expr ")" | id "(" expr ")" | nat | id
nat ::= Nat n -- literal Nat token
id ::= Id s -- literal Id token
-- declarations, simple and function
decl ::= "let" id "=" cexpr | "fun" id id = cexpr
-- top-level "statements", which may be either (conditional)
-- expressions, or declarations
stmt ::= decl | cexpr
-- programs, sequences of statments separated by ":"
prog ::= stmt (";" prog | empty)
The prog phrase is optional. The calculator can be designed
to read one statement per line, in which case the line break
is the implicit separator.
* Top-level REPL loop
Here is a sample calculator top-level loop (SML version)
------------------------------------------------------------------------
(* calc.sml *)
structure Calc =
struct
structure S = ListInstream
structure R = Reader
structure T = Tokens
structure E = Exprs
structure P = Parser
structure V = Eval
fun printVal (V.IntVal n) =
print (Int.toString n)
| printVal (V.BoolVal b) =
print (Bool.toString b)
| printVal (V.FunVal _) =
print ("<function")
fun printStmt (V.ExprVal v) =
(print " "; printVal v; print "\n")
| printStmt (V.DeclVal(var,v)) =
(print (" "^var^" = ");
printVal v; print "\n")
(* repl : Env.env - unit
* the calculator's read-eval-print loop *)
fun repl (env) =
(* input one line *)
case TextIO.inputLine TextIO.stdIn
of NONE = () (* no input, exit *)
| SOME line =
(* tokenize the line *)
(case (T.tokenize(S.fromString line))
of NONE =
(print "Error - tokenize\n";
repl env)
| SOME (toks,_) =
(* parse token stream *)
(case (P.stmt toks)
of NONE =
(print ("Error - parse failure: "^line^"\n");
repl env)
| SOME(s,_) =
(* parsed one statement, evaluate it *)
(let val (res,env') = V.evalStmt env s
in printStmt res;
repl env'
end
handle Env.Unbound s =
(print ("Error - unbound var: "^s^"\n");
repl env))))
fun calc () = repl (Env.empty)
end (* structure Calc *)
------------------------------------------------------------------------