CS241 — Spring 2018 — Assignment 5

Assignments for CS 241
← Assignment 4 Assignment 5 Assignment 6 →
Wednesday, June 6 at 5:00pm Friday, June 15 at 5:00pm Friday, June 27 at 5:00pm
P1P2P3P4P5P6P7P8P9

Part I: Scanning

Problem 1 — 15% (wlp4.dfa)

Using the DFA file format described in A4, create a deterministic finite automaton that recognizes any valid WLP4 token, from the list given in the WLP4 specification. The alphabet consists of every character that may appear in any token; this does not include whitespace characters.

While a lexical scanner for WLP4 must allow WLP4 tokens to be separated by whitespace so that it may distinguish between two consecutive tokens (eg. 42 and 17) whose concatenation would comprise a single token (eg. 4217), a DFA that accepts only input comprising a single token has no such need.

Hint: WLP4 includes keywords (such as int, return, etc) as well as identifiers. When scanning WLP4, two approaches are equally valid. One approach is to distinguish between keywords within the DFA using separate states for each keyword. Another approach is to design a DFA that just recognizes identifiers and keywords the same way, and write additional code outside of the DFA to determine whether the token is a specific keyword or an identifier. For this problem and for Problem 2, you may choose either approach.

Click here to go back to the top of the page.

Problem 2 — 24% (wlp4scan.rkt or wlp4scan.cc)

Write a scanner (lexical analyzer) for the WLP4 programming language. Your scanner should read an entire WLP4 program and identify the tokens, in the order they appear in the input. For each token, your scanner should compute the name of the token and the lexeme (the string of characters making up the token), and print it to standard output, one line per token.

For example, using the example WLP4 program from A4:

//
// WLP4 program with two integer parameters, a and b
//   returns the sum of a and b
// 
int wain(int a, int b) {
   return a + b;   // unhelpful comment about summing a and b
}

The correct output is:

INT int
WAIN wain
LPAREN (
INT int
ID a
COMMA ,
INT int
ID b
RPAREN )
LBRACE {
RETURN return
ID a
PLUS +
ID b
SEMI ;
RBRACE }

If the input cannot be scanned as a sequence of valid tokens (possibly separated by whitespace as detailed in the WLP4 language specification), your program should print an error message to standard error containing the string ERROR in upper case. We recommend that you try to make the error message informative to help with debugging.

You can use the wlp4scan tool to help check the correctness of your solution.

You may wish to modify the A3 starter code (asm.rkt or asm.cc) to solve this problem. Note that the starter code is based upon a DFA that repeatedly recognizes the longest prefix of a string that is a token. You should be able to replace the DFA (which recognizes MIPS assembly language tokens) with one that recognizes WLP4 tokens (see Problem 1 above).

For this problem only, all of the Marmoset tests are public tests. There are no release tests or blind tests.

Click here to go back to the top of the page.

Part II: Context-Free Grammars

For most of the remaining questions you will create context-free grammars and derivations represented as CFG files, each of which is a textual representation of a CFG optionally followed by a textual representation of several derivations. The student.cs environment provides a tool cs241.cfgcheck which checks the validity of the CFG and derivations.

Problem 3 — 7% (a5p3.cfg)

Create a file representing the CFG with the following production rules:

S → BOF A EOF
A → x
A → A x

Click here to go back to the top of the page.

Problem 4 — 7% (a5p4.cfg)

Add derivations for the following strings to the CFG file you created for the previous question:

BOF x EOF
BOF x x EOF
BOF x x x EOF

Click here to go back to the top of the page.

Problem 5 — 7% (a5p5.cfg)

The following production rules yield an ambiguous CFG:

S → BOF A EOF
A → x
A → A x A

Find a string for which the derivation is ambiguous. Compose a CFG file representing the CFG and two different derivations for your string.

Click here to go back to the top of the page.

Problem 6 — 7% (a5p6.cfg)

Add a derivation to this context-free grammar for the following sequence of symbols: BOF id EOF.

Click here to go back to the top of the page.

Problem 7 — 7% (a5p7.cfg)

Add a derivation to this context-free grammar for the following sequence of symbols: BOF ( id - ( id - id ) ) - id EOF.

Click here to go back to the top of the page.

Problem 8 — 7% (a5p8.cfg)

Add a derivation to this augmented grammar for WLP4 for the following (augmented) sequence of WLP4 tokens:

BOF INT WAIN LPAREN INT ID COMMA INT ID RPAREN LBRACE RETURN ID PLUS ID STAR ID SEMI RBRACE EOF

Click here to go back to the top of the page.

Problem 9 — 19% (galaxy.rkt or galaxy.cc)

Write a Racket or C++ program that reads a CFG file that contains the same grammar used for Problems 6 and 7 above with the example derivation replaced by another valid derivation for the same grammar.

Assume that each occurrence of the symbol id represents the value 42, that each occurrence of - represents subtraction, and that operations associate from left to right except where overridden by parentheses. Output a single line giving the value of the derived expression.

The correct output for the sample grammar and derivation is a single line containing -42.

Notes:

Click here to go back to the top of the page.