CS 241 -- Winter 2010 -- Assignment 6 -- The WL Programming Language and Scanning

This assignment is the first in a series of four that involve learning WL (short for WaterLanguage) and writing a compiler that translates WL to MIPS assembly language.

Please consult WL Programming Language Tutorial for an informal explanation of WL; The WL Language Specification should be consulted for the definitive specification of the WL language.

All solutions must be submitted to Marmoset. Your submissions for A6P1 and for A6P2 should each be a file containing a complete WL program. Your submission for A6P3 should be a DFA in the format used in A5. Your submission for A6P4 should be a Java, Scheme, or C program that implements a scanner for WL.

Due date: Friday, February 26, 5:00p.m.

The following WL program computes the sum of two integers, a and b.

//
// WL program with two integer parameters, a and b
//   returns the sum of a and b
// 
   int wain(int a, int b) {
      return a + b;   // unhelpful comment about summing a and b
   }
You may test this program on the student.cs environment by placing it in a file named test.wl and entering the following commands, which compile it to MIPS machine language and run it with the familiar mips.twoints command from A1 and A2:
   java cs241.wlc < test.wl > test.mips
   java mips.twoints test.mips
A6P1 through A6P3 familiarize you with the WL programming language. Beginning with A6P4 and through the next four assignments, you will be writing a compiler that translates WL into MIPS assembly language.

Problem A6P1 (10 marks of 60) (filename: sum.wl)

Write a WL program like the one above to sum all the numbers between a and b, inclusive, where a and b are parameters. You may assume that -10000 ≤ a,b ≤ 10000, but do not assume that a < b.

Problem A6P2 (10 marks of 60) (filename: gcd.wl)

Write a WL program that takes two parameters, x and y, and returns the greatest common divisor gcd(x,y). You may assume that 0 < x,y < 231. Hint:
http://www.google.ca/search?q=euclidean+algorithm

Problem A6P3 (20 of 60 marks) (filename: wl.dfa)

Using the
DFA description file format described in assignment 5, create a deterministic finite automaton that recognizes any valid WL token, from the list given in the WL specification. The alphabet consists of every character that may appear in any token; this does not include white space characters.

[While a lexical scanner for WL must allow WL tokens to be separated by white space so that it may distinguish between two consequtive tokens (eg 42 and 17) whose concatenation would comprise a single token (eg 4217), a DFA that accepts only input comprising a single token has no such need.]

Hint: WL tokens include keywords (such as int, wain, return, etc.), as well as identifiers. When scanning WL, two approaches are equally valid. One approach is to distinguish between keywords within the DFA, using separate states for each keyword. Another approach is to design a DFA that just recognizes identifiers and keywords the same way, and write additional code outside of the DFA to determine whether the token is a specific keyword or an identifier. For this problem and for Problem 4, you may choose either approach.

Problem A6P4 (20 of 60 marks) (filename: WLScan.java or wlscan.ss or wlscan.c or wlscan.cc)

Write a scanner (lexical analyzer) for the WL programming language. Your scanner should read an entire WL program and identify the tokens, in the order they appear in the input. For each token, your scanner should compute the name of the token and the lexeme (the string of characters making up the token), and print it to standard output, one line per token.

For example, the correct output for the example WL program above is:

INT int
WAIN wain
LPAREN (
INT int
ID a
COMMA ,
INT int
ID b
RPAREN )
LBRACE {
RETURN return
ID a
PLUS +
ID b
SEMI ;
RBRACE }

If the input cannot be scanned as a sequence of valid tokens (possibly separated by white space as detailed in the WL language specification), your program should produce exactly one line of output to System.err containing the string ERROR, in upper case. We recommend that you try to make the error message informative.

You may wish to modify Asm.java, asm.ss, or asm.c (or the revsied version of asm.c) to solve this problem. These are the scanners that were given to you for assignment 3. Note that the lexer in Asm.java, asm.ss, and asm.c is based on a DFA that repeatedly recognizes the longest prefix of a string that is a token. You should be able to replace the DFA (which recognizes MIPS assembly language tokens) by one which recognizes WL tokens (see Problem 3 above).

For this problem only, all of the Marmoset tests are public tests. There are no release tests or blind tests.