CS 241 -- Fall 2011 -- Assignment 8

For problems A8P1 and A8P2 you are to construct reversed rightmost derivations for two problems from assignment 7. The format of your solution will be that of a CFG-R file which is identical to that of a CFG file except that derivations in CFG-R files are not indented, and represent reversed rightmost rather than forward leftmost derivations. We supply the source code for a tool (CFGrl.java or cfgrl.ss but unfortunately not yet in C++) that reads a CFG-R file containing a single derivation and outputs the equivalent CFG file. You may use this tool to verify your submissions to A8P1 and A8P2, and you may use this tool as a model for your implementations for A8P3 and A8P4.

For problem A8P3 you are to write an LR(1) parser which reads a representation of the DFA implementing the LR(1) oracle, along with an input sequence to be parsed. For problem A8P4 you are to embed an LR(1) parser for WLPP into a Java, Scheme or C++ program in much the same way that you embedded your scanner for assignment 6.

Due date: Tuesday, November 15, 2011 at 5:00 p.m.

A8P1 - 7 marks (a8p1.cfg-r)

Prepare a .cfg-r file that solves Assignment 7, Problem 5

A8P2 - 7 marks (a8p2.cfg-r)

Prepare a .cfg-r file that solves Assignment 7, Problem 6

A8P3 - 30 marks (LR.java or lr.ss or lr.cc)

Write a Java, Scheme, or C++ program that reads an LR1 file representing a context-free grammar, an LR(1) machine, and a sequence to be recognized. If the sequence is in the language, output the unindented reversed rightmost derivation, as defined in the cfg-r file format. If the input is not recognized, output "ERROR at k", followed by a newline character, to standard error, where k is one greater than the number of tokens in the longest correct prefix.

For the sample lr1 file as input, the correct output is:

term id
expr term
term id
expr term
term ( expr )
expr expr - term
term id
expr expr - term
S BOF expr EOF
If you replace the last line of this file by
BOF id - id ) - id EOF
the correct output (to standard error) is
ERROR at 5
You may also test your program with the WLPP grammar and parse tables, in .lr1 format after adding a sequence to be recognized.

A8P4 - 16 marks (WLPPParse.java or wlppparse.ss or wlppparse.cc)

Write a WLPP parser encapsulated as a Java, Scheme, or C++ program. The input to your parser is a file representing the sequence of tokens in a WLPP program, as output by the scanner specified in A6P4.

That is, each token is represented by a line containing the name of the token, followed by a space, followed by the string recognized as the token in the WLPP source program. Note: Unlike the input of A8P3, the input to your parser in this problem will not have lines containing BOF and EOF at the beginning and end.

If the input represents a syntactically valid WLPP program, the output from your program should be a .wlppi file. The format of a .wlppi represents the derivation and also the recognized token strings.

If the input does not represent a syntactically valid WLPP program, output "ERROR at k" to standard error where k is one greater the number of tokens in the longest correct prefix of the input (that is, not counting BOF).

The student.cs undergraduate enviroment includes two commands to help you test your parser:

Extra fun

Jslr -- a Simple LR(1) [SLR(1)] parser generator is a program that generates the LR(1) DFA for any SLR(1) grammar. You can use it to generate parsers for your own grammars, or to generate additional test inputs for A8P3. The input to Jslr.java is a .cfg file. If the .cfg file represents an SLR(1) grammar, the corresponding .lr1 file is output; otherwise an error message is produced.