MIPS Assembly Language (CS 241 Dialect)

Version 20170517.0

Introduction

MIPS Assembly Language is a textual human-readable representation of MIPS Machine Language. Assembly language may be translated into machine language by hand, as in assignment 1, or using a program called an assembler, as in assignment 2. In assignments 3 and 4 you will write an assembler in Racket or C++.

MIPS Machine Language

A MIPS program consists of a sequence of 32-bit instruction words, whose meanings and encoding are described in the MIPS Reference Sheet. The location of the first word is defined to be 0; the location of the next word is 4; and so on. Traditionally, locations are given in hexadecimal notation. So, for example, the location of the 11th word would be 0x28 (the hexadecimal representation of 40).

The MIPS CPU can interpret only valid MIPS instruction words; however these words, being encoded in binary, are not easily manipulated by humans. For this reason we define an assembly language, which is easier for humans to understand and manipulate. There is a direct correspondence between assembly language statements and machine language instructions.

MIPS Assembly Language

A MIPS Assembly program is a Unix text file consisting of a number of lines. Each line has the general format
    labels instruction comment
Each of these components -- labels, instruction, and comment is optional; a particular line may have all three, any two, any one, or none at all.

Every line with an instruction specifies a corresponding machine language instruction word. Lines without an instruction are called null lines and do not specify an instruction word. That is, an assembly language program with n non-null lines specifies a machine language program with n words, in 1-1 ordered correspondence.

Labels

The labels component lists zero or more labels, each followed by a colon (:). A label is a string of alphanumeric characters, the first of which must be a letter of the alphabet. For example, fred123x is a valid label but 123fred is not.

A label appearing in the labels component is said to be defined; a particular label may be defined at most once in an assembly language program. Labels are case-sensitive; that is, fred and Fred are distinct labels.

The location of a line in an assembly language program is 4n, where n is the number of non-null lines preceding it. The first line therefore has location 0. If the first line is non-null, the second line has location 4. On the other hand, if the first line is null, the location of the second line is also 0. Note that the location of any non-null line is exactly the same as the location of the machine language word that it specifies. And all null lines immediately preceding it have the same location.

The value of a label is defined to be the location of the line on which it is defined.

Comments

A comment is any sequence of characters beginning with a semicolon (;) and ending with the end-of-line character that terminates the line. Comments have meaning only to the reader; they do not contribute to the specification of the equivalent machine language program.

Instructions

An instruction takes the form of an opcode followed by one or more operands. The opcode may be add, sub, mult, multu, div, divu, mfhi, mflo, lis, lw, sw, slt, sltu, beq, bne, jr, jalr, or the pseudo-opcode .word. The number, allowed formats, and meaning of operands depend on the opcode. Operands are, unless otherwise specified, separated by commas (,). An operand may be

Operand Format - add, sub, slt, sltu

These opcodes all take three register operands; for example
   add $1, $2, $3
The first operand is $d (the destination register) as specified in the MIPS Reference Sheet. The second and third operands are $s and $t respectively. So in the example above we have d=1, s=2, and t=3, and the 5-bit representations of these values are encoded in the corresponding machine instruction.

Operand Format - mult, multu, div, divu

These opcodes take two register operands corresponding to $s and $t. For example
   mult $4, $5
specifies that s=4 and t=5 are encoded in the instruction word. $d is not used and is encoded as 0 in the instruction word.

Operand Format - mfhi, mflo, lis

These opcodes have a single register operand, $d.

Operand Format - lw, sw

These opcodes have two register operands, $s and $t, and in addition an immediate operand, i. The general format is
   opcode $t, i($s)  
For example,
   lw $4, 400($7)
$s and $t are registers, and i may be an unsigned decimal integer, a negative decimal integer, or a hexadecimal number. i is encoded as a 16-bit two's complement integer. If specified in decimal, i must be in the range -32768 through 32767. If specified as hexadecimal i must not exceed 0xffff.

Operand Format - beq, bne

These operations take three operands: registers $s and $t, and an immediate operand i. i may be specified as an unsigned decimal integer, a negative decimal integer, a hexadecimal number, or a label. If i is a decimal or hexadecimal number, i is encoded as a 16-bit two's complement number; i must therefore be in the range -32768 through 32767 if i is decimal, and must not exceed 0xffff if i is hexadecimal.

If i is a label, the value (i-L-4)/4 is encoded where i is the value of the label and L is the location of the beq or bne instruction. (i-L-4)/4 must be in the range -32768 through 32767.

Opearand Format -- jr, jalr

These operations have one register operand, $s.

Operand Format .word

.word is not a true opcode as it does not necessarily encode a MIPS instruction at all. .word has one operand i which is a 32-bit signed or unsigned number. If i is a decimal number, it must be in the range -2^31 through 2^32-1 (that is, the union of the ranges for signed and unsigned 32-bit integers). If i is hexadecimal, it must not exceed 0xffffffff. If a label is used for i, its value is encoded as a 32-bit integer.