Starting from:

$30

Assignment 2: Java Translator Program

Assignment 2: Java Translator Program

Overview
The E programming language is similar to C, C++, Java, Pascal, and Modula-2, but it differs syntactically
and semantically, and is considerably simpler.
You are to write a Java program that translates E programs to their semantically equivalent C programs
(not C++). That is, the input to your program (henceforth, the translator) is an E program; the output from
your program is a C program (henceforth, the generated code, or simply GC). To verify that your translator
works correctly, you are to compile and execute the GC. This assignment should help you develop a better
understanding of parsing and compilation, and some familiarity and programming experience with Java.
The general approach to translation that you will use does all its work in a single pass. That is, its three
activities of syntactic analysis, semantic analysis and code generation are intermixed. As your translator
reads a token or a small group of tokens in the E program, it will determine if the token(s) is syntactically and
semantically valid, and generate code for the token(s). E is designed to facilitate one-pass translation. In
contrast, many real compilers perform multiple passes, e.g., one pass for each of the three activities.
The E Language
The following grammar partially describes the E language. In the EBNF below, nonterminals are in lowercase;
terminal symbols are enclosed in single quotes to avoid potential confusion in the use of characters that are
both meta-symbols in the grammar and terminals in E, e.g., “[”.
program ::= block
block ::= declaration_list statement_list
declaration_list ::= {declaration}
statement_list ::= {statement}
declaration ::= ’@’ id { ’,’ id }
statement ::= assignment | print | do | if
print ::= ’!’ expr
assignment ::= ref_id ’=’ expr
ref_id ::= [ ’˜’ [ number ] ] id
do ::= ’<’ guarded_command ’’
if ::= ’[’ guarded_command { ’|’ guarded_command } [ ’%’ block ] ’]’
guarded_command ::= expr ’:’ block
expr ::= term { addop term }
term ::= factor { multop factor }
factor ::= ’(’ expr ’)’ | ref_id | number
addop ::= ’+’ | ’-’
multop ::= ’*’ | ’/’
1
The following rules complete the syntactic definition of E:
• An E program is followed by an end-of-file indicator; extra text is not legal.
• The nonterminal id represents a nonempty sequence of letters. Similarly, the nonterminal number
represents a non-empty sequence of digits.
• Special characters (e.g., ’[’ and ’!’) serve the role of keywords. Hence there are no reserved words.
• As is the case in most languages, tokens are formed by taking the longest possible sequences of constituent
characters. For example, the input “abcd” represents a single identifier, not several identifiers. Whitespace
consists of one or more blanks, tabs, or newlines.
For example, “x:10” and “x : 10” are equivalent. Note that whitespace delimits tokens; e.g., “abc” is one
token whereas “a bc” is two.
• A comment in E begins with a “#” and consists of all characters up to, but not including, a newline or
end-of-file.
E’s semantics follow C’s semantics for the most part, but there are significant differences. The important
semantic points are:
• E has only one type of variable: integer; as with local variables in C, the initial value of E variables is not
defined.
• As in C, variables must be declared before use, redeclaration of a variable (in the same block) is an error,
and the declaration of a variable in an inner block hides the declaration of variable(s) of the same name
declared in outer blocks. Thus, this part of E’s scoping rules are just like C’s. Unlike C, though, E provides
the scoping operator ’˜’. Its meaning is as follows:
˜0 x: the variable x that is declared in the current block.
˜1 x: the variable x that is declared in the immediately enclosing block (i.e., 1 level up).
˜2 x: the variable x that is declared in the enclosing block 2 levels up
...: etc.
˜x: the variable x that is declared at the global level.
˜N x is an error if N is greater than the current depth of nesting. Note that ˜x acts like C++’s scope
resolution operator, i.e., ::x.
• An expression in a guard—i.e., as the expr before the ’:’ in a guarded command—is considered true if
non-positive and false if positive. For example,
i=1
<i-10: !i i=i+1
is a loop that prints out the first 10 positive integers. (The value of i is 11 after the loop.)
• As in most languages, the if statement tests its guards in lexical order, until it finds one that is true; the
corresponding block is then executed. If no guard is found true, the else block (if any) is executed. If no
else is present, then the if has no effect.
To guide you in developing your program, this assignment is divided into several parts.
2
Part 1: The Scanner
A scanner is sometimes called a lexer — since it does lexical analysis — or a tokenizer — since it breaks up
its input into tokens. For example, the GetToken procedure described in Louden is a scanner.
You will be provided with a nearly complete scanner. You need to provide a few omitted tokens (in
Scan.java and TK.java). This part is simple, but look at the scanner code to see how it works and use the
provided test scripts. In fact, use the provided test scripts before you make any changes; from their output, it
should be pretty clear what tokens are missing.
For simplicity, the scanner imposes a limit on the length of tokens. If the scanner encounters a token
whose length exceeds that limit, it prints an appropriate error message and ignores the extra characters. Also,
if the scanner encounters a character that is not in E’s character set, it prints an appropriate error message,
ignores the unknown character 1, and continues by examining the next character in the input. Finally, the
scanner simply discards whitespace and comments.
Part 2: The Parser
First, rewrite the grammar using syntax graphs. Then, determine the first sets; this should be straightforward
because the grammar is simple. Finally, translate the syntax graph into a parser. See the textbook for details.
If an error, e.g., missing “=” in an assignment statement — is encountered in parsing, give an appropriate
error message via “System.err.println” and then stop your program via “System.exit(1)”.
Test your parser to see that it recognizes syntactically legal E programs and complains about syntactically
illegal programs.
You will be provided with a small part of the parser and the main program. You will need to copy from
the previous part the Java code files (but not e2c.java) you used for your scanner.
Part 3: The Symbol Table
Up to this point, we have dealt with syntax. Now, we need to enforce the semantic constraints that deals with
variables: We must detect undeclared and redeclared variables.
A table of variables in the current scope needs to be maintained. Each time a variable is declared, the
table is checked. If the variable is already in the table entries for the current block, then it is being redeclared.
Otherwise, the variable is added to the table. Each time a variable is referenced, the table is checked to ensure
that the variable has been declared.
Here is one implementation approach. Use a stack of lists of variable names. When a new block is
entered, a new list of variable names is created to hold the variables in this new block. Each (legal) variable
declaration adds to the list for the current block. When a block is exited, the list for that block is popped from
the stack.
An undeclared variable, then, is one that does not appear anywhere in the entire stack of lists. (Thus, the
stack is used in an “impure” sense. It provides more than pop and push operations; it also provides search.) In
a real compiler, the symbol table would be searched beginning with the newest block to locate the most recent
symbol table entry with the given name; the symbol table entry would contain additional information such as
the variable’s type. Given that E’s scoping rules are like C’s, then variables should generally be located in
that fashion too. However, E (unlike C) provides the scoping operator ’˜’, whose use will affect how you
1
More precisely, such a character is treated as a whitespace as it delimits tokens.
3
represent levels and how you search in the symbol table. The numeric argument to ’˜’ specifies the exact
scoping level in which to find the given variable; it is an error if the variable is not found within that level,
even if the variable is currently in scope (i.e., appears at a different level in the symbol table).
For a redeclared variable, give an appropriate error message, and then continue by ignoring the redeclaration. For an undeclared variable (including a bad ’˜’ reference), give an appropriate error message via
“System.err.println” and then stop your program via “System.exit(1)”.
Since the number of variables in the program is neither known in advance nor bounded, your program
must use a dynamically allocated data structure for the symbol table. Break this part into subparts. First, get
everything working without handling the scoping operator. Then, handle the scoping operator.
Part 4: Generating Code
Modify your parser so that it outputs appropriate C code. For example, for the E program:
@i
i =1
<i-10 : ! i #this is a comment
i=i+1
your translator might produce something like:
main(){
int x_i;
x_i = 1;
while( (x_i-10) <= 0 ){
printf("%d\n",x_i);
x_i = x_i+1;
}
}
Note that since the scanner discards whitespace and comments, the output is not as neatly formatted as
the E program. In fact, the actual output from your translator will probably be less formatted than shown
above; e.g., it is fine (and simpler) to output a single C token per line. Also note that the translator has
prepended each E variable name with x to avoid conflicts with C reserved words. (Hint: You might want to
further “munge” identifiers for implementing levels for the ’˜’ operator.)
To test your translator, examine its C output (the GC). Then, compile the GC and execute the resulting
program. Verify that it does indeed execute as the source E program specifies.
Part 5: E Language Changes: Design and Implementation
Add a definite iteration statement (e.g., like for in C or C++) to the language definition and your translator.
This part has no one “correct” solution. Do, however, strive to be syntactically and semantically consistent
with the other constructs in E. Give, in your README file:
• the new and modified BNF rules.
• a concise description as to why you chose the particular form you did.
4
To demonstrate that your implementation of the definite iteration is correct, include several wellcommented E test program files and their “correct” output files. These tests should include tests that
demonstrate error checking.
Notes
• On the CSIF systems, be sure to use Java 1.8:
/usr/bin/javac
/usr/bin/java
Test that you are using the correct Java by
which javac java
• Do not be overly concerned with efficiency. On the other hand, do not write a grossly inefficient program.
• Strive for simplicity in your programming. Do not try to be fancy. For example, the syntax graphs translate
directly into code. Learn and use that technique—do not spend your time trying to improve that code.
You will find it most helpful to use the names of the nonterminals as the names of your parsing methods
(except “if” and “do” are keywords in Java).
• Express your program neatly. Part of your grade will be based on presentation. Indentation and comments
are important. Be sure to define each variable used with a concise comment describing its purpose. In
general, each method should have a comment at its beginning describing its purpose. In this program,
however, the parsing method do not need such introductory comments; a single explanatory comment at
their beginning should suffice.
• Write your program so that it works on standard input. Do not bother trying to get it to work on a file
whose name is specified on the command line. Thus, your program, named e2c, will be invoked as java
e2c if you want input to come from the keyboard, or as java e2c < t01.e if you want input to come
from the file t01.e; use java e2c < t01.e t01.c to take input from t01.e and to output to t01.c.
Output the generated code to stdout via “System.out.println” (or “System.out.print”). So as not to intermix
generated code and error/warning messages, output error messages to stderr via “System.err.println”.
• Your program must work with the provided Makefile and shell scripts.
• The provided shell scripts depend on the exit status of your program. If your program detects a ”serious”
error, your program should exit via “System.exit(1)”, as described earlier. If your program finishes
normally, your main method should just end normally or equivalently “System.exit(0)”.
• Be sure to use the provided test files and test scripts. Your output must match the “correct” output, except
for the wording of and the level of detail in the error messages. By “match”, we mean match exactly
character by character, including blanks, on each line; also, do not add or omit any blank lines. Your error
messages can be more or less detailed as you wish, provided they are reasonably understandable. Don’t
spend a lot of time making fancy error messages, though.
5
• If you run your program on Windows, the output might look the same, but a file comparison program
might complain due to extra \r’s in Windows’ output. Retest it on CSIF to be sure, and as is required.
• To convert String s to its integer equivalent and store the result in int num, use
num = Integer.parseInt(tok.string);
• You must follow the steps described above in developing your program. Grading will be divided as
follows:
Part Percentage
1 10
2 30
3 30
4 20
5 10
• In seeking assistance on this project, bring a listing of the last working part along with your attempt at the
next part.
• You will be expected to turn in, at least, the last working part along with your attempt at the next part. So,
be sure to save your work for each part. No credit will be given if the last working part is not turned in.
• Points will be deducted for not following instructions, such as the above.
• Get started now to avoid the last minute rush.
6

More products