CS 241 — Winter 2024 — Assignment 6 / Out of 15 marks

Assignments for CS 241
← Assignment 5	Assignment 6	Assignment 7 →
	Friday, Mar 22th at 5:00 pm
P1 • P2 • P3 • P4 • P5

In this assignment, you will implement the context-sensitive analysis phase (also known as semantic analysis) of the WLP4 compiler.

The main goal of this phase is to catch all remaning compile-time errors that were not caught by the scanner or parser. A secondary goal is to compute type information for expressions, which will be needed in the code generation phase.

Filenames

In this assignment, like the assembler assignment, you are gradually building up one large program across multiple problems. You can submit the same program to all problems if you want.

For all problems, you must submit one of the following two files:

wlp4type.cc for C++
wlp4type.rkt for Racket

You can submit additional files if you wish to divide your program into multiple files, but a file with one of the above names must be present in your submission.

Input and Output Format

The input for all problems will be a WLP4 Intermediate (.wlp4i) file representing a parse tree, that is, the output of the parsing phase. If you did not complete your own WLP4 parser in the previous assignment, you can use the provided wlp4parse tool to generate inputs. Note that wlp4parse expects input in the format produced by the scanner wlp4scan, so you will need to run your WLP4 source code through both commands to produce a .wlp4i file:

$ wlp4scan < program.wlp4 | wlp4parse > program.wlp4i

If the input program has an error in it, you simply need to produce an error message containing ERROR on standard error, as usual.

If the program is correct, should produce a WLP4 Typed Intermediate (.wlp4ti) file on standard output. This is a representation of the parse tree that has been annotated with type information. You can produce this output by traversing the tree and assigning a type to all expression nodes, then doing a second (preorder) traversal where you print out the tree along with the assigned types. An expression node is defined as follows:

A terminal node representing a NUM, NULL, or ID token.
A nonterminal node where the left hand side of the corresponding rule is expr, term, factor, or lvalue.

The wlp4type tool can be used to convert a .wlp4i file to a .wlp4ti file. It performs context sensitive analysis and produces the type-annotated parse tree. You can use this to check the correctness of your output on the assignment.

Problem 1: Simple Programs [2 marks, 1 release+secret]

In this problem, you will perform context-sensitive analysis on a heavily restricted class of WLP4 programs:

Only the wain procedure is allowed.
The body of wain does not contain declarations or statements, only a return expression.
The return expression is always a constant (a number or NULL). It cannot be a variable, and it cannot be a more complex expression.

More formally, only the following grammar rules will appear in the parse tree of the program:

start → BOF procedures EOF
procedures → main
main → INT WAIN LPAREN dcl COMMA dcl RPAREN LBRACE dcls statements RETURN expr SEMI RBRACE 
type → INT
type → INT STAR
dcl → type ID
dcls →
statements →
expr → term 
term → factor 
factor → NUM  
factor → NULL
factor → LPAREN expr RPAREN

Your main task is to read in a .wlp4i file (from standard input) which is a representation of the parse tree that the parser constructed, and rebuild the tree.
(Having to construct the tree in the parser and then immediately reconstruct the tree here may seem silly, but the assignment is structured this way so that people who did not finish the WLP4 parser can still do this assignment.)

After building the tree, you will need traverse it to check for semantic errors in the WLP4 program. There are only a few possible semantic errors in these restricted WLP4 programs:

The two parameters of wain have the same name.
The second parameter of wain is not int type.
The return expression of wain is not int type.

If the program contains a semantic error, you should print ERROR to standard error in all caps.

If there are no semantic errors, you are to print a .wlp4ti file representing the parse tree of the WLP4 program with type annotations to standard output.

To produce this output, you will need to assign a type to each node that represents an expression. For these restricted WLP4 programs, this should not be too difficult. The only nodes representing expressions are the ID nodes in the dcl subtrees, and the nodes in the return expression subtree. For the dcl subtrees, the type is stored right beside the ID, and for the return expression subtree, the type of each node just depends on whether there is NUM or NULL at the leaf (and having NULL at the leaf is a semantic error).

Example

For the following WLP4 program:

int wain(int a, int b) { return 241; }

The input would not be this program, but rather the corresponding .wlp4i file produced by passing the program through wlp4scan and wlp4parse:

start BOF procedures EOF
BOF BOF
procedures main
main INT WAIN LPAREN dcl COMMA dcl RPAREN LBRACE dcls statements RETURN expr SEMI RBRACE
INT int
WAIN wain
LPAREN (
dcl type ID
type INT
INT int
ID a
COMMA ,
dcl type ID
type INT
INT int
ID b
RPAREN )
LBRACE {
dcls .EMPTY
statements .EMPTY
RETURN return
expr term
term factor
factor NUM
NUM 241
SEMI ;
RBRACE }
EOF EOF

And the output would be the corresponding type-annotated .wlp4ti file:

start BOF procedures EOF
BOF BOF
procedures main
main INT WAIN LPAREN dcl COMMA dcl RPAREN LBRACE dcls statements RETURN expr SEMI RBRACE
INT int
WAIN wain
LPAREN (
dcl type ID
type INT
INT int
ID a : int
COMMA ,
dcl type ID
type INT
INT int
ID b : int
RPAREN )
LBRACE {
dcls .EMPTY
statements .EMPTY
RETURN return
expr term : int
term factor : int
factor NUM : int
NUM 241 : int
SEMI ;
RBRACE }
EOF EOF

Notice that the expr, term, factor, ID, and NUM lines have been annotated with the corresponding type.

The wlp4type tool can be used to generate expected outputs. The input is a .wlp4i file (not WLP4 source code).

Problem 2: Building a Symbol Table [2 marks, 1 release+secret]

Extend your context-sensitive analysis to allow for programs with declarations in the body of wain, and programs where the return expression of wain is a variable.

This means the following additional rules can appear in the parse tree:

dcls → dcls dcl BECOMES NUM SEMI
dcls → dcls dcl BECOMES NULL SEMI
factor → ID

The main change to your program in this problem is that you will need to create a symbol table to keep track of which variables have been declared and their types.

Your program now must detect the following semantic errors:

The second parameter of wain is not int type.
The type of a variable declared in the body of wain does not match the type of its initialization value.
Two variables with the same name are declared in wain.
The return expression of wain is not int type.
A variable is used in the return expression of wain, but was not declared in wain.

If the program contains a semantic error, you should print ERROR to standard error in all caps.
If there are no semantic errors, you are to print a .wlp4ti file representing the parse tree of the WLP4 program with type annotations to standard output.

Problem 3: Type Checking for Operations [3 marks, 2 release+secret]

Extend your context-sensitive analysis to allow for binary arithmetic operations (+ - * / %) and unary pointer operations (* & new) in expressions.

This means you should handle the following additional grammar rules:

expr → expr PLUS term
expr → expr MINUS term
term → term STAR factor
term → term SLASH factor
term → term PCT factor
factor → AMP lvalue
factor → STAR factor
factor → NEW INT LBRACK expr RBRACK
lvalue → ID
lvalue → STAR factor
lvalue → LPAREN lvalue RPAREN

The main change to your program in this problem will be in your code for assigning types to the nodes in the return expression subtree of wain. This will become significantly more complicated.

These new rules introduce many new possible semantic errors. You will need to handle all type errors in expressions that do not involve procedure calls.

Dependencies and Part Marks

The test cases for this problem do not have declarations in the body of wain, so you can get full marks on this problem even if you have not completely solved Problem 2. In particular, this problem can be solved without a symbol table data structure. However, you will need to handle the factor → ID rule in the case where the ID is a parameter of wain, so you will need to at least keep track of those two parameter types.
The public test and the first release+secret test do not contain binary operations, so you can get up to 2 marks on this problem if you just complete the type checking for unary operations (dereference, address-of, and "new").

Problem 4: Procedures [4 marks, 2 release+secret, 1 secret]

Extend your context-sensitive analysis to allow for procedures other than wain in the program, and to allow for procedure calls in expressions. Procedures are still not allowed to contain statements.

This means you should handle the following additional grammar rules:

procedures → procedure procedures
procedure → INT ID LPAREN params RPAREN LBRACE dcls statements RETURN expr SEMI RBRACE
params →
params → paramlist
paramlist → dcl
paramlist → dcl COMMA paramlist
factor → ID LPAREN RPAREN
factor → ID LPAREN arglist RPAREN
arglist → expr
arglist → expr COMMA arglist

Your program will need several major changes:

Instead of having just one symbol table (for wain), you will need separate symbol tables for each procedure's local variables.
You will need to keep track of which procedures have been declared to check for "call before declaration" and "duplicate declaration" errors.
In addition to processing the return expression of wain, you will need to process the return expressions of every other procedure.
Your type checking for expressions will need to account for procedure calls. You will need to keep track of procedure signatures, so that when a procedure is called, you can check that the number and types of arguments match up with the signature.
You should be able to distinguish between whether an identifier refers to a variable or to a procedure in the current scope, so that you can catch errors like variable() or 1 + procedure.

Dependencies and Part Marks

The test cases for this problem do not have expressions that contain arithmetic operations and pointer operations. Expressions will only contain:
- Constants
- Variables
- Procedure calls
This means you can get full marks on this problem even if you have not solved Problem 3. However, declarations will appear in the bodies of procedures, so you need to have solved Problem 2 to do this problem.
The public test and the first release+secret test do not contain:
- Procedure declarations with parameters (other than wain)
- Procedures calls with arguments
This means you can get up to 2 marks on this problem if you just handle non-wain procedures that do not take parameters.

Problem 5: Statements [4 marks, 2 release+secret, 1 secret]

Extend your context-sensitive analysis to allow for statements. You should now handle arbitrary WLP4 programs; all rules from the WLP4 grammar can appear in the parse tree. The following grammar rules are new for this problem:

statements → statements statement
statement → lvalue BECOMES expr SEMI
statement → IF LPAREN test RPAREN LBRACE statements RBRACE ELSE LBRACE statements RBRACE
statement → WHILE LPAREN test RPAREN LBRACE statements RBRACE
statement → PRINTLN LPAREN expr RPAREN SEMI
statement → DELETE LBRACK RBRACK expr SEMI
test → expr EQ expr
test → expr NE expr
test → expr LT expr
test → expr LE expr
test → expr GE expr
test → expr GT expr

You will need to extend your analysis for expressions to process all expressions that appear in statements. You will also need to check for type errors in statements. Determining the full list of new semantic errors to check for is left as an exercise.

Dependencies and Part Marks

The public and release+secret test cases for this problem only contain the following elements:
- The wain procedure
- Simple expressions (variables and constants)
- Statements
In particular, this means you don't have to handle: declarations, expressions with operations, and non-wain procedures. This means you can get up to 3 marks on this problem even if you have not solved Problems 2, 3 or 4.
However, passing the secret-only test requires completing all previous problems.
The public test and the first release+secret test do not contain:
- While loops
- If-else statements
This means you can get up to 2 marks on this problem if you just handle assignment, print, and delete statements.