CS 241 — Winter 2024 — Assignment 4 / Out of 13 marks

Assignments for CS 241
← Assignment 3 Assignment 4 Assignment 5 →
Monday, Feb 26th at 5:00 pm
P1P2P3P4P5

On this assignment, you will create DFAs for certain regular languages, write a tool that determines if a DFA recognizes a string, then extend this tool to a Simplified Maximal Munch scanner.

Then, you will be introduced to WLP4, a simple programming language that we will incrementally write a compiler for in this course. Your compiler will translate WLP4 source code into MIPS assembly language.

You will get some practice with WLP4 to better understand its various quirks and limitations, and then you will specialize your Simplified Maximal Munch scanner to work with WLP4 programs, completing the first component of the compiler.

Problem 1: Bytes and Alfalfa (filenames: alfalfa.dfa AND byte.dfa) [2 marks]

For this problem, you are to construct two DFAs:

To allow Marmoset to check the correctness of your DFAs, you must prepare them using a custom DFA file format we have created for the course.

Submit both DFA files together in a zip file. If you submit only one, Marmoset will still run the tests but you will only get marks for the one you submitted.

Extra Practice Problems

Here are some extra optional problems if you would like more practice with DFAs. You can submit the following files to A4P1 on Marmoset, and Marmoset will check that your DFA is correct.

These problems are not worth marks. There is no bonus for completing them.

Problem 2: DFA Recognizer (filenames: dfa.cc OR dfa.rkt) [2 marks, 1 release+secret]

Write a program that reads a DFA file from standard input, and for each string in the input section of the file, determine whether the string is accepted by the DFA. Print a single line of output (to standard output) for each string:

Starter Code

To simplify your task, starter code is provided that reads a DFA file from standard input. The starter code does not actually do anything, but comments are provided at the locations where the code has finished reading certain pieces of information. You should fill in these sections with your own code that sets up appropriate data structures and implements the DFA recognition algorithm.

C++ Starter Code

Racket Starter Code

Clarifications

Testing

Compare your results to cs241.DFA. For this problem, Marmoset uses diff -Zb to compare your output, so that small differences in whitespace are ignored (e.g., extra space at the end of the line).

For instance, on the student Linux system:

$ ./dfa < input.dfa > output.txt
$ cs241.DFA < input.dfa > expect.txt
$ diff -Zb output.txt expect.txt

An input/output example is given on the DFA file format page.

Problem 3: Simplified Maximal Munch (filenames: smm.cc OR smm.rkt) [3 marks, 2 release+secret]

In this problem, you will extend your DFA recognizer to a Simplified Maximal Munch scanner. As in the previous problem, your program will read a DFA file from standard input. However, it should treat the input section as a single string, rather than multiple whitespace-separated strings. Your program should attempt to tokenize the string using the provided DFA and the Simplified Maximal Munch algorithm.

Clarifications

Example

If you are given the following input:

.ALPHABET
$ 0-9
.STATES
start $ $n!
.TRANSITIONS
start $ $
$  0-9 $n
$n 0-9 $n
.INPUT
$2$4$1$241

Then your output should be:

$2
$4
$1
$241

Testing

For this problem, there is no sample executable to compare your output with. Try testing with simple DFAs where you can easily understand how the string should be broken down into tokens.

As with the previous problem, Marmoset will use the -Zb options with diff to allow for some leniency in terms of whitespace. However, you should not really be printing whitespace other than a single line feed after each token you output.

Intermission: Introduction to WLP4

Starting from this assignment, you will begin writing a compiler that translates a C-like programming language called WLP4 into MIPS assembly language.

WLP4 is quite similar to a restricted subset of C, but it uses C++ style memory allocation (new/delete instead of malloc/free). The syntax is harshly restricted to make it easier to compile. For example:

More information about WLP4 is below:

How to Compile and Run WLP4 Programs

Brief Introduction to WLP4

Formal WLP4 Specification

Problem 4: WLP4 Practice (filename: practice.wlp4) [1 mark]

Warm-Up Exercise 1 (not submitted)

Write a WLP4 procedure that computes the absolute value of an integer. You do not need to handle the case of -231 = -2147483648 (this integer has no positive counterpart in 32-bit two's complement).

Warm-Up Exercise 2 (not submitted)

Write a WLP4 procedure called tieBreak that takes two distinct integers as parameters.

To Submit

Write a WLP4 program that takes an array of integers in the range -241 to 241 (inclusive) as input and returns the integer that occurs most frequently. If there is a tie for the most frequent integer, break the tie using the rule from the tieBreak procedure described in Warm-Up Exercise 2.

Your program should run in linear time (in the size of the input array). This likely means you will need to allocate a sufficiently large block of memory to store the number of times each integer occurs.

Examples

Suppose the input array is [1, 2, 2, 3, 3, 3]. Your program should return 3.

Suppose the input array is [4, 4, 4, 2, 2, 2, -3, -3, -3]. There's a three way tie between 4, 2 and -3. Since 2 has the least absolute value, your program should return 2.

Suppose the input array is [1, -1, 2, -2]. There's a four way tie between 1, -1, 2, and -2. Among these, 1 and -1 tie for the smallest absolute value. To break this tie, you choose the negative number, so you return -1.

Testing

Compile your program with wlp4c, then run the compiled code with mips.array. Since wlp4c produces machine code directly, you don't need to use cs241.binasm.

Alternatively, on the student Linux system:

$ wlp4c < practice.wlp4 > practice.mips
$ mips.array practice.mips

Please remember that the student Linux system is unreliable. Do your development and testing on your own system, using the web tools when possible, so that you won't be stuck when (rather than if) the student Linux servers go down.

The wain function in your WLP4 program should take an int* as the first parameter and an int as the second parameter. The first parameter will receive the value that mips.array places in $1 (a pointer to the array) and the second parameter will receive the value it places in $2 (the size of the array).

When the program terminates, the return value of the wain function will be stored in $3. Ensure the value in $3 is correct.

Problem 5: WLP4 Scanner (filenames: wlp4scan.cc OR wlp4scan.rkt) [5 marks, 2 release+secret, 2 secret]

Write a scanner for WLP4. The scanner should read a WLP4 program from standard input and tokenize it, producing a sequence of lines, each representing one token of the program. Each line should be terminated by a line feed (ASCII 0x0A) and should contain the token kind (the all-caps name indicating the type of the token, such as ID, NUM, LPAREN, etc. as defined in the WLP4 Specification), followed by a single space, followed by the token lexeme (the actual string represented by the token).

Whitespace and comments are not considered tokens and should not be part of the output. However, tokens in the input might be separated by whitespace or comments.

If the input cannot be scanned into a sequence of valid tokens as per the requirements in the WLP4 Specification, your program must print an error message containing the string ERROR in all caps to standard error and exit normally.

For programs with a valid tokenization, your output should be identical to that of the wlp4scan tool.

Clarifications

Example

If the input is this WLP4 program:

//
// WLP4 program with two integer parameters, a and b
//   returns the sum of a and b
//
   int wain(int a, int b) {
     return a + b;   // unhelpful comment about summing a and b
   }

Then the correct output is:

INT int
WAIN wain
LPAREN (
INT int
ID a
COMMA ,
INT int
ID b
RPAREN )
LBRACE {
RETURN return
ID a
PLUS +
ID b
SEMI ;
RBRACE }

Testing

Compare your results to wlp4scan. For this problem, Marmoset again uses diff -Zb, so that small differences in whitespace are ignored (e.g., extra space at the end of the line).

On the student Linux system:

$ ./wlp4scan < input.wlp4 > output.txt
$ wlp4scan < input.wlp4 > expect.txt
$ diff -Zb output.txt expect.txt

Be careful not to confuse wlp4scan (the course-provided reference implementation of the scanner) with your own program. Perhaps name your own executable something different, like mywlp4scan.