Lacs Programming Language Specification

Lacs is a simple programming language. In Assignments 7-11, we will write a compiler from Lacs to MIPS machine language, using the code generation code that we have already implemented in Assignments 1-6. The main features of Lacs are:

Lexical specification

A Lacs program is a sequence of tokens. Every valid token is one of the following:

Tokens that contain letters are case-sensitive; for example, Int is an INT token, while int is not.

A Lacs program must not contain a pair of consecutive tokens that both come from one of the following sets:

A Lacs program may contain a pair of such tokens if they are separated by at least one other token. In particular, this other token could be a WHITESPACE or COMMENT token.

WHITESPACE and COMMENT tokens are irrelevant to the syntactic specification of Lacs and to the meaning of a valid Lacs program. Therefore, they should be removed from the sequence of tokens after a Lacs program has been determined to satisfy the lexical specification.

Syntactic specification

A context-free grammar for a valid Lacs program is:

defdefs -> defdef defdefs
defdefs -> defdef
defdef -> DEF ID LPAREN parmsopt RPAREN COLON type BECOMES LBRACE vardefsopt defdefsopt expras RBRACE
parmsopt -> parms
parmsopt -> 
parms -> vardef COMMA parms
parms -> vardef
vardef -> ID COLON type
type -> INT
type -> LPAREN typesopt RPAREN ARROW type
typesopt -> types
typesopt -> 
types -> type COMMA types
types -> type
vardefsopt -> VAR vardef SEMI vardefsopt
vardefsopt -> 
defdefsopt -> defdefs
defdefsopt -> 
expras -> expra SEMI expras
expras -> expra
expra -> ID BECOMES expr
expra -> expr
expr -> IF LPAREN test RPAREN LBRACE expras RBRACE ELSE LBRACE expras RBRACE
expr -> term
expr -> expr PLUS term
expr -> expr MINUS term
term -> factor
term -> term STAR factor
term -> term SLASH factor
term -> term PCT factor
factor -> ID
factor -> NUM
factor -> LPAREN expr RPAREN
factor -> factor LPAREN argsopt RPAREN
test -> expr NE expr
test -> expr LT expr
test -> expr LE expr
test -> expr GE expr
test -> expr GT expr
test -> expr EQ expr
argsopt -> args
argsopt -> 
args -> expr COMMA args
args -> expr

Context-sensitive specification

A symbolic representation of the context-sensitive representation in the form of type inference rules is also available.

Each sequence derived from defdef is a declaration of a procedure. Each sequence derived from vardef is a declaration of a variable. The name of the procedure or variable is the lexeme corresponding to the child ID of the defdef or vardef. Names are case sensitive; for example, FOO and foo are distinct names.

Each procedure is associated with exactly one scope. The declaration subtrees of a procedure are the sequences derived from the children parmsopt, vardefsopt, and defdefsopt of the declaration of p. A procedure q whose declaration is in the sequences derived from the declaration subtrees of p is said to be nested in p. The scope of a procedure p contains all variables and procedures whose declarations are in the sequences derived from the declaration subtrees of p, except for those variables and procedures whose declarations are in the subsequences derived from the declaration subtrees of some other procedure q that is nested in p. A scope declares the names of all of the variables and procedures that it contains.

A scope must not contain two or more declarations with the same name.

A name is said to be used in procedure q when it appears as an ID in a sequence derived from the child expras of the declaration of q. When a name n is used in procedure q, a declaring scope of the use of n is a scope that declares n, and is either the scope of q or of some other procedure p in which q is nested. Every use of a name must have some declaring scope. The closest declaring scope of a use of n is the scope of some procedure p whose scope is a declaring scope of the use of n, and such that no other procedure q nested in p has a scope that is a declaring scope of the use of n. Each use of a name denotes the variable or procedure with that name declared in the closest declaring scope of the use.

A type is a sequence derived from type. A type is called a procedure type if it is derived from LPAREN typesopt RPAREN ARROW type. The sequence of types derived from the child typesopt of the procedure type is called the parameter types. The child type of the procedure type is called the return type. The type of a variable is the sequence derived from the child type of the vardef that declares the variable. The type of a procedure p is a procedure type whose parameter types are the types of the variables declared in the sequence derived from the child parmsopt of the declaration of p, and whose return type is the sequence derived from the child type of the declaration of p.

The first procedure declared in a Lacs program is the main procedure. The type of the main procedure must be (Int, Int) => Int.

Semantics

Any Lacs program that obeys the lexical, context-free, and context-sensitive specifications above is also a sequence of valid Scala functions, with one exception. Scala requires every variable to have an initializer (e.g. var x: Int = 42;), but Lacs prohibits such initializers (e.g. var x: Int;). To transform a Lacs program into Scala functions, add the initializer 0 for every variable of type Int, and the initializer null for every variable of procedure type.

The meaning of a Lacs program is defined to be identical to the meaning of the corresponding Scala functions. A Lacs program is given two integers as input (in registers 1 and 2), and produces an integer as output (in register 3). The output value computed by the Lacs program must be the same as the return value when the first of the Scala functions is called with the same two input integers. If the execution of the Scala function aborts with an exception, then the meaning of the Lacs program is undefined.