Macquarie University, Department of Computing

COMP332 Compiler Lecture Notes: RISC Architecture

The Obr compiler generates code to run on a simple RISC architecture that was originally described by Niclaus Wirth in his book "Theory and Techniques of Compiler Constuction" (which may be downloaded from http://www-old.oberon.ethz.ch/WirthPubl/CBEAll.pdf). This, in turn, was designed to be close in spirit the MIPS architecture which was originally designed by John L. Hennessey of Stamford University.

These notes provide a brief overview of RISC architecture sufficient for understanding the code generated by the Obr compiler used in COMP332 exercises and assignments. Further detailed information about the RISC instruction set and machine execution may be gleaned by browsing the files RISC.scala and RISCISA.scala in this code bundle.

Memory and data types

RISC is a 32-bit architecture in which all general purpose registers, memory addresses and machine instructions are 32 bits wide. In the subset provided by our emulator code RISC instructions support the direct manipulation of 32-bit integers stored in twos-complement form. Furthermore, it supports the direct addressing of a memory space of upto 2^32 bytes. While addresses in the RISC architecture refer to individual bytes, all 32-bit values can only be directly accessed if they are aligned on 4 byte boundaries.

Registers

The RISC architecture provides access to 32 general purpose 32-bit integer registers named $0-$31. While most of these registers may be used for any task, some of them are allocated to special functions:

Note: the frame associated with a procedure activation is a block of memory allocated on the machine stack in which the procedure stores its local variables, temporaries, return address and so forth.

Instruction set

For our purposes the RISC architecture has instructions that can be categorised as follows:

RISC instructions themselves come in four different formats, each of which packs the instruction and information about its operands into a single 32-bit word. These formats are:

Notice here that the constant values contained in instructions cannot be 32-bit values, since all of the information for each complete instruction (its identity, register operand information and any constant values) must be packed into a single 32-bit word. In most cases these 16- and 26-bit constants are sign extended to 32-bits, although when given as operands to bitwise operations they are simply converted to 32-bit values by padding them on the left (high bits) with 0s.

Load/store instructions are the only instructions that access memory. The ldw and stw instructions load and store word/single quantities from/to integer registers.

ldw reg1, reg2, disp   ! Load word from address reg2 + disp into reg1.
stw reg1, reg2, disp   ! Store contents of reg1 to address reg2 + disp.
The RISC architecture also provides byte load/store ldb and stb and push/pop instructions push and pop (which maintain a stack discipline by appropriately updating the address stored in reg2). However, as things stand the Obr compiler does not make use of these instructions.

Here are some loading and storing examples:

ldw $2, $1, 0      ! Load from the address stored in $1 into register $2
ldw $2, $1, 8      ! Load the word from an 8 byte offset from the
                   ! address stored in $1 into register $2.
stw $31, $0, 32    ! Store the value held in the link pointer (which is
                   ! probably a return address) in memory at absolute
                   ! address 32 (because register $0 always contains 0).
Most arithmetic operations come in type 0 (operating on three registers) and type 1 (operating on two registers and a constant) versions. In each case the first operand specifies the destination register in which a calculated result will be placed and the last two operands specify the sources of the data upon which the operation will act. Notice here that there is noting to prevent an instruction from writing its result back into one of the registers it is using as a source.

Here is a complete list of the 32-bit arithmetic and bitwise operations provided by our version of the RISC architecture:

add dreg, reg1, reg2   ! add values in registers reg1 and reg2 and place result in dreg
addi dreg, reg1, im    ! add value in register reg1 to constant value im and place result in dreg
sub dreg, reg1, reg2   ! subtract value in register reg2 from that in reg1 and place result in dreg
subi dreg, reg1, im    ! subtract constant value im from value in register reg1 and place result in dreg
mul dreg, reg1, reg2   ! multiply values in registers reg1 and reg2 and place result in dreg
muli dreg, reg1, im    ! multiply value in register reg1 to constant value im and place result in dreg
div dreg, reg1, reg2   ! divide value in register reg1 by that in reg2 and place result in dreg
divi dreg, reg1, im    ! divide value in register reg1 by constant value im and place result in dreg
mod dreg, reg1, reg2   ! take remainder of division of value in register reg1 by that in reg2 and place result in dreg
modi dreg, reg1, im    ! take remainder of division of value in register reg1 byconstant value im and place result in dreg
and dreg, reg1, reg2   ! bitwise and values in registers reg1 and reg2 and place result in dreg
andi dreg, reg1, im    ! bitwise and value in register reg1 with constant value im and place result in dreg
or dreg, reg1, reg2    ! bitwise or values in registers reg1 and reg2 and place result in dreg
ori dreg, reg1, im     ! bitwise or value in register reg1 with constant value im and place result in dreg
xor dreg, reg1, reg2   ! bitwise exclusive or values in registers reg1 and reg2 and place result in dreg
xori dreg, reg1, im    ! bitwise exclusive or value in register reg1 with constant value im and place result in dreg
This architecture also combines the common operations of bit shifting, negation and the copying of values from one register to another into a single family of "move" instructions:
mov dreg reg1 reg2     ! shift the value in reg2 the number of bits specified by the value
                       ! in reg1 and put the result in dreg
movi dreg reg1 im      ! shift the constant value im the number of bits specified by the value
                       ! in reg1 and put the result in dreg
mvn dreg reg1 reg2     ! shift the value in reg2 the number of bits specified by the value
                       ! in reg1, negate that shifted number and put the result in dreg
mvni dreg reg1 im      ! shift the constant value im the number of bits specified by the value
                       ! in reg2, negate that shifted number and put the result in dreg
Note: in each of these instructions, if register reg1 contains the positive number n then the bit shifter shifts the value in the other source operand n bits to the left. On the other hand, if register reg1 contains the negative number -n then the bit shifter shifts the value in the other source operand n bits to the right.

In the Obr compiler we mainly use the move instructions to simply copy numbers from one register to another

mov $1, $0, $2         ! copies the value from register $2 into register $1 without shifting
                       ! (because register $0 always contains 0)
or to negate the value stored in a register:
mvn $12, $0, $12       ! negate the value stored in register $12
However, we also combine it with the or instruction in order to load a 32-bit constant values into a register. We are required to do this because the 3rd (constant) operand of type 1 and type 2 instructions are only 16-bit numbers. So if we wish to load a 32-bit constant we must first split it into two 16-bit pieces and load these separately into the high (using the mov instruction) and low (using the or instruction) 16-bit segments of the destination register. For example, if we have a 32-bit constant int we divide it into two 16-bit portions int_hi and int_lo and load it into register $10 using the following code:
movi $10, $0, 16       ! Load the constant 16 into register $10 - this constant
                       ! specifies the number of bits to shift int_hi as it is loaded
movi $10, $10, int_hi  ! Shift int_hi 16 bits to the left and load the result into register $10.
                       ! In other words, this loads int_hi into the high 16 bits of register $10.
ori $10, $10, int_lo   ! Bitwise or int_lo onto the low 16-bits of the contents of register $10.
Control transfer instructions alter the program counter. They are used to implement source-level control statements and procedure calls. All branch instructions are type 3 instructions, taking a single 26-bit operand which is interpreted as a relative instruction offset. So, for example. the instruction br -10 will cause execution to jump to the instruction 10 instructions prior to the current one. In RISC assembly code places that we wish to jump to are marked by symbolic labels and our "assembler" automatically turns branches to these labels into appropriate offset values.

The br (branch) instruction is an unconditional transfer of control.

br label1              ! Transfer control to label label1.
Subroutine / procedure calling is implemented bsr instruction, which saves the value of the program counter + 1 into the link pointer (register $31) before the actual branch action is performed. Typically, on entry to a procedure the contents of the link pointer is written into the frame on the call stack of this new procedure activation. On the other hand the ret instruction takes a single register operand and makes a jump to the address whose offset from the current instruction is given by the content of that register. So we may effect a procedure return by loading the stored return address from the frame of the current procedure activation into the link pointer and executing the instruction ret $31.

As a special case, if the register referred to by a ret instruction contains the value 0 then execution of that instruction will cause the machine to halt completely. So the following instruction is used by the Obr compiler to halt the execution of a program:

ret $0                ! halts the machine, since register $0 is guaranteed to contain 0
Conditional transfers are implemented with the aid of a set of two condition codes called Z for zero and N for negative. These are set by the following instructions:
cmp reg1, reg2     ! Subtract the value of reg2 from that of reg1
                   !    if the result is 0 then set Z to 1 else set it to 0
                   !    if the result is negative then set N to 1 else set it to 0
cmp reg1, im       ! Similar to cmp, but subtracts the constant value im from that in
                   ! register reg1
Once the condition codes have been set, branch instructions can be used to transfer control if particular condition code bits are set. These conditional branch instructions are:
beq label          ! branch if Z = 1
bne label          ! branch if Z = 0
blt label          ! branch if N = 1
bge label          ! branch if N = 0
ble label          ! branch if N = 1 or Z = 1
bgt label          ! branch if N = 0 and Z = 0

Terminal IO instructions are a slightly unusual feature of this architecture. Usually interations with hardware external to a processor would be achieved via some kind of memory mapped IO mechanism, where registers of the IO hardware would be mapped to memory addresses which may then be written to and read from by the processor. However, for simplicity's sake the RISC architecture provides some higher level instructions for writing and reading integers to and from the terminal:

rd reg             ! read a signed 32-bit integer in decimal format from
                   ! the terminal and store it in register reg.
wrd reg            ! write the contents of register reg to the terminal
                   ! as a signed decimal number.
wrh reg            ! write the contents of register reg to the terminal
                   ! as an (unsigned) hexadecimal number.
wrl                ! write a newline character to the terminal.

Assembler syntax

The RISC assembler syntax is fairly conventional. Instruction syntaxes are illustrated by the preceding examples. Comments extend from an exclamation mark to the end of line. Labels are declared by following them by a colon, as in:
label2:
    add $1, $1, $1            ! double value held in register $1
    cmpi $1, 256              ! compare value in register $1 with constant 256
    ble label2                ! branch back if comparison showed that value in $1 is <= 256
As an example, when the Obr compiler compiles the Obr program
PROGRAM X (x : INTEGER) : INTEGER;
VAR
    i : INTEGER;
    sum : INTEGER;
BEGIN
    sum := 0;
    FOR i := 1 TO 10 DO
        sum := sum + i;
    END
    RETURN sum;
END X.
it outputs the following RISC assembly code
    ! Prologue
    movi $27, $0, 0
    ! StW(Local(0),Read())
    rd $1
    stw $1, $27, 0
    ! StW(Local(8),IntDatum(0))
    movi $1, $0, 0
    stw $1, $27, 8
    ! StW(Local(4),IntDatum(1))
    movi $1, $0, 1
    stw $1, $27, 4
    ! StW(Local(20),IntDatum(10))
    movi $2, $0, 10
    stw $2, $27, 20
    ! Bne(CmpgtW(LdW(Local(4)),LdW(Local(20))),Label(3))
    ldw $1, $27, 4
    ldw $2, $27, 20
    cmp $1, $2
    movi $1, $0, 1
    bgt label6
    movi $1, $0, 0
label6:
    cmpi $1, 0
    bne label3
    ! Jmp(Label(2))
    br label2
    ! LabelDef(Label(4))
label4:
    ! StW(Local(4),AddW(LdW(Local(4)),IntDatum(1)))
    ldw $1, $27, 4
    movi $2, $0, 1
    add $1, $1, $2
    stw $1, $27, 4
    ! LabelDef(Label(2))
label2:
    ! StW(Local(8),AddW(LdW(Local(8)),LdW(Local(4))))
    ldw $1, $27, 8
    ldw $2, $27, 4
    add $1, $1, $2
    stw $1, $27, 8
    ! Bne(CmpltW(LdW(Local(4)),LdW(Local(20))),Label(4))
    ldw $1, $27, 4
    ldw $2, $27, 20
    cmp $1, $2
    movi $1, $0, 1
    blt label7
    movi $1, $0, 0
label7:
    cmpi $1, 0
    bne label4
    ! LabelDef(Label(3))
label3:
    ! Write(LdW(Local(8)))
    ldw $1, $27, 8
    wrd $1
    wrl
    ! Ret()
    br label5
    ! Epilogue
label5:
    ret $0

Tony Sloane and Dominic Verity
Last Modified: Tue Oct 05 17:00:54 +1100 2010
Copyright (c) 2010-2011 by Macquarie University. All rights reserved.