Department of Computing, Macquarie University

COMP332 Obr Translation into RISC Code

This document describes

The document concludes with examples of translations of a couple of simple Obr programs.

The code that supports the tree structure and encoding described in this document can be found in RISCTree.scala and Encoder.scala in the Obr compiler. The mapping from Obr trees to RISC trees can be found in Transformation.scala.

The RISC Target Program Tree

All of the information that the translation task of the compiler provides about the target program is embodied in the target program tree. If a particular item of information cannot be accessed via this tree, then it cannot be obtained at all. Information is encoded in the "shape" of the tree and in values stored at the leaves.

This section defines the set of possible target program trees by defining all of the concepts and constructs of the target language.

RISC Concepts

Datum

A datum is a construct yielding an explicit value that can be stored or used as an operand for other operations. The encoder uses the attribute reg to associate a local machine register with each Datum to provide storage for the Datum's value; hence the transformation phase does not have to perform register allocation.

Item

An item is a construct that does not yield a value.

RISC Constructs

A RISCProg node represents a complete RISC program. This RISCProg node is the root of the target program tree, and never appears in any other position.

The following productions summarise the constructs of the RISC by giving the structure of the subtree for each construct.

RISCProg:        RISCNode: Item+

Beq:             Item: Datum Label
Bne:             Item: Datum Label
Jmp:             Item: Label
LabelDef:        Item: Label
Read:            Item: Address
Ret:             Item
StW:             Item: Address Datum
Write:           Item: Datum

AddW:            Datum: Datum Datum
Cond:            Datum: Datum Datum Datum
CmpeqW:          Datum: Datum Datum
CmpneW:          Datum: Datum Datum
CmpgtW:          Datum: Datum Datum
CmpltW:          Datum: Datum Datum
DivW:            Datum: Datum Datum
IntDatum:        Datum: Int
LdW:             Datum: Address
MulW:            Datum: Datum Datum
NegW:            Datum: Datum
Not:             Datum: Datum
RemW:            Datum: Datum Datum
SubW:            Datum: Datum Datum
SequenceDatum:   Datum: Item+ Datum

Local:           Address: Int
Indexed:         Address: Local Datum

The "W" in some of the node names means that those operations operate on word-sized values (four bytes on the RISC) which in this compiler are used to implement both integer and Boolean values.

The following subsections describe the constructs of the table. Some of those constructs represent specific RISC instructions and others represent collections of instructions that involve related decisions about operand access.

RISC

RISC:     RISC: Item+

The encoding of a RISC construct is very simple, we simply encode each of the items in its body, concatenate the resulting RISC code sequences and then add prologue (initilisation) and epilogue (termination) code. Currently this prologue code simply initialises register $27 topoints to the memory segment which will be used to store the values of global variables and temporaries. The epilogue is begun by a standard label to enable the Ret construct (see below) to transfer control to it. Then it simply terminates the program by executing a ret $0 instruction.

Branches

Beq:       Item: Datum Label
Bne:       Item: Datum Label
Jmp:       Item: Label

A branch (Beq or Bne) is encoded as the encoding of its Datum component followed by a test and branch to the Label component. A Beq does a branch on equal to zero and a Bne does a branch on not equal to zero.

A Jmp does an unconditional branch to its Label component.

LabelDef

LabelDef:  Item: Label

A LabelDef construct represents a definition of a label and is encoded by emitting that definition in the appropriate assembler syntax.

Read and Write

Read:      Item: Address
Write:     Item: Datum

These constructs are encoded using the corresponding terminal IO RISC instructions rd, wrd and wrl. In the case of Read the value read is stored in the location given by the Address component. In the case of Write the value written is that given by the Datum component which is encoded first.

Ret

Ret:       Item

A Ret construct is encoded by an unconditional jump to a label at the end of the code comprising the program (i.e., to the beginning of the epilogue). This encoding ensures that a return from any part of the program will complete necessary processing before exiting the program.

StW

StW:       Item: Address Datum

A StW construct is encoded by encoding the Datum component followed by an instruction to store the value of the Datum into the given address.

Arithmetic operations: AddW, DivW, MulW, NegW, Not, RemW, SubW

AddW:      Datum: Datum Datum
DivW:      Datum: Datum Datum
MulW:      Datum: Datum Datum
NegW:      Datum: Datum
RemW:      Datum: Datum Datum
SubW:      Datum: Datum Datum

Most of the arithmetic operations are encoded by encoding their Datum component(s) followed by a single instruction that performs the appropriate operation. The NegW operation is implemented by subtracting the given operand from 0 (the value of register $0).

IntDatum

IntDatum:  Datum: Int

An IntDatum construct is encoded as a move of the integer value into the location required by the Datum.

Cond and Not

Cond:      Datum: Datum Datum Datum
Not:       Datum: Datum

A Cond construct is encoded by encoding its first Datum component, followed by a sequence of instructions that evaluate the second Datum if the first Datum is non-zero, or evaluate the third Datum if the first Datum is zero. In either case the result value will be left in the location required by the Cond Datum itself. The Not construct is encoded as if it were converted to a corresponding Cond tree under the following translation:

Not (d) -> Cond (d, IntDatum(0), IntDatum(1))

Comparisons: CmpeqW, CmpneW, CmpgtW, CmpltW

CmpeqW:    Datum: Datum Datum
CmpneW:    Datum: Datum Datum
CmpgtW:    Datum: Datum Datum
CmpltW:    Datum: Datum Datum

The comparison constructs CmpeqW, CmpgtW, CmpltW, and CmpneW are encoded as the encoding of their operands, followed by a comparison instruction, followed by moves and conditional branches as appropriate to establish the result value of 0 or 1 in the register associated with the given comparison Datum.

LdW

LdW:       Datum: Address

A LdW construct is encoded as a load of a word value from the location specified by its Address component into the register associated with this LdW.

Addresses: Local, Indexed

Local:     Address: Int
Indexed:   Address: Local Datum

A Local address represents a word-sized storage location in the main block of memory that is accessible to an Obr program. Its Int child specifies the offset in bytes from the start of the memory block at which the word is located.

An Indexed address is an address that is computed as a byte offset from a local address. The offset is given by a computation expressed as a Datum.

When an address is used in another construct (i.e., an LdW or an StW) it is first encoded, then used as an operand in the load or store. Local address do not produce any code when they are encoded. Indexed addresses encode their Datum component.

SequenceDatum

A SequenceDatum construct of the form
SequenceDatum Item-1 ... Item-n Datum
is implemented as follows:
Code for Item-1
...
Code for Item-n
Code for Datum
Here Item-i is the ith element of the component Item list.

Transforming Obr Source Trees to RISC Target Trees

The results of the mapping process from source to target are reflected in the properties and structure of the target tree. This section describes how Obr source data and actions are mapped to target constructs.

Obr/RISC Data Mapping

Obr programs can manipulate only integer and Boolean basic values plus structured values that are arrays and records. Both parameters and variables can be declared. Therefore, a definition of the data mapping task must specify how values of these types are implemented on the RISC, and how storage is allocated for parameters and variables.

Because there is no possibility of recursion in Obr, it is possible to implement data storage for parameters and variables statically. The Obr "parameters" really aren't parameters at all --- they are top-level variables that must be initialised by reading them from the standard input before executing the body of the Obr program. Thus their storage is implemented just like variables.

Obr constants do not need any storage since the compiler knows their value and can construct an IntDatum node that can be used directly.

An Obr integer is implemented by a RISC word (32 bits). For convenience, Boolean values are also represented by RISC words. True is represented by 1, and false is represented by 0.

Storage for all of the variables declared in an Obr program is allocated in a single area of RISC memory. During execution, register $27 contains the address of the beginning of the memory area. Thus, any variable's location can be specified by the sum of a non-negative integer and the contents of register $27. Since each variable occupies four bytes of memory, the offsets from the content of register $27 are all multiples of 4: The topmost variable is in location $27, the next variable is in location $27 + 4, and so on.

Arrays and records are allocated as contiguous memory as if the array elements or fields were declared as individual integer variables. (Recall that array elements and fields must be integers.) Therefore an array of N elements or a record with N fields is allocated as N contiguous words of memory.

Obr/RISC Action Mapping

Most of the Obr constructs map to RISC constructs in an obvious way. Some constructs do not generate any code (e.g, constructs like IntVar and ArrayVar that represent declarations). This section describes how the other constructs are translated into RISC target tree constructs.

Assignment

AssignStmt constructs are translated into a StW construct whose left child is the address of the variable, array element or field being assigned, and whose right child is the translation of the expression on the right-hand side of the assignment.

Boolean Expressions and Operations

A BoolExp is translated into an IntDatum where 0 is used for FALSE and 1 for TRUE.

AndExp and OrExp translate into uses of the Cond target construct in order to achieve short-circuit evaluation. They are translated as follows:

AndExp (e1, e2) -> Cond (t1, t2, 0)
OrExp (e1, e2) -> Cond (t1, 1, t2)

In both of these translations t1 and t2 are the translations of e1 and e2, respectively.

NotExp translates into a boolean complement operation using the Not target construct.

Comparison Operations

The comparison operators EqualExp, NotEqualExp, GreaterExp, and LessExp are translated to the CmpeqW, CmpneW, CmpgtW and CmpltW constructs, respectively.

ExitStmt

An ExitStmt is implemented by a jump to the terminating label of the closest containing LoopStmt. See also the description of the LoopStmt construct below.

FieldExp

A FieldExp translates to a LdW from the address of the given record field.

ForStmt

A ForStmt construct is implemented as follows:

ForStmt (id, e1, e2, s) ->
    StW (idmem, t1),
    StW (mem, t2),
    Bne (CmpgtW (LdW (idmem), LdW (mem)), L2),
    Jmp (L1),
    LabelDef (L3),
    StW (idmem, AddW (LdW (idmem), IntDatum (1))),
    LabelDef (L1),
    i
    Bne (CmpltW (LdW (idmem), LdW (mem)), L3),
    LabelDef (L2)

Here, i is the list of Item nodes that is the translation of s, t1 is the translation of e1, and t2 is the translation of e2. idmem is the storage location being used for the variable id, and mem is a new integer memory location not used elsewhere.

Note that this scheme avoids a problem if the maximum expression e2 evaluates to the maximum integer possible, because id is not incremented unless overflow cannot happen.

IdnExp

An IdnExp is translated into either an IntDatum containing the integer value of the identifier (if it denotes a constant), or a LdW from the location in which the variable is stored.

IfStmt

An IfStmt construct is implemented as follows:

IfStmt (e, s1, s2) ->
    Beq (t, L1)
    i1
    Jmp (L2)
    LabelDef (L1)
    i2
    LabelDef (L2)

Here, i1 and i2 are the lists of Item nodes that are the translations of s1 and s2, respectively, and t is the translation of e.

IndexExp

An IndexExp translates to a LdW from the address of the given array element. In general, the index is not constant so it must be calculated as part of the address computation.

Integer Expressions and Arithmetic Operations

An IntExp is translated into an IntDatum whose value is the Int component of the IntExp.

The arithmetic target constructs are used to implement the arithmetic operators (MinusExp, NegExp, ModExp, PlusExp, SlashExp and StarExp) in the obvious way. For example, PlusExp is represented by AddW, ModExp by RemW, and NegExp by NegW.

IntParam

Parameter declarations are always represented by IntParam constructs and are translated into a Read construct whose child is address of the storage allocated to the parameter.

LoopStmt

A LoopStmt construct is implemented as follows:

Loop (s) ->
    LabelDef (L1)
    i
    Jmp (L1)
    LabelDef (L2)

Here, i is the list of Item nodes that is the translation of s. L2 is a label that can be used as the destination of jumps implementing ExitStmt constructs within the loop.

ObrInt

The ObrInt construct is translated into a RISC construct whose children are the Item nodes comprising the translation of its Declaration and Statement components. The RISC node also is given an Int component to record the maximum size of storage used by the program.

ReturnStmt

The ReturnStmt construct is implemented by a Write construct whose child is the translation of the component Expression to be returned, followed by a Ret construct.

WhileStmt

The WhileStmt construct is implemented as follows:
WhileStmt (e, s) ->
    Jmp (L1)
    LabelDef (L2)
    i
    LabelDef (L1)
    Bne (t, L2)

Here, i is the list of Item nodes that is the translation of s, and t is the translation of e.

Running Compiled Programs

The default behaviour of the Obr compiler is to execute all syntactic, semantic and code generation phases, reporting any errors that occur but doing nothing else. To alter this behaviour we provide three command line flags:

Detailed Examples

This section shows the complete RISC target trees and assembly code that would be produced for the factorial and GCD Obr programs.

Consider the Obr version of Euclid's algorithm for calculating the greatest common divisor of two numbers.

PROGRAM GCD (x : INTEGER; y : INTEGER) : INTEGER;

BEGIN
    WHILE x # y DO
        IF x > y
            THEN x := x - y;
            ELSE y := y - x;
        END
    END
    RETURN x;
END GCD.

From this code, the Obr compiler generates the following target tree:

RISCProg(
    List(
        StW(Local(0),Read()), 
        StW(Local(4),Read()), 
        Jmp(Label(2)), 
        LabelDef(Label(3)), 
        Beq(CmpgtW(LdW(Local(0)),LdW(Local(4))),Label(4)),
        StW(Local(0),SubW(LdW(Local(0)),LdW(Local(4)))), 
        Jmp(Label(5)), 
        LabelDef(Label(4)), 
        StW(Local(4),SubW(LdW(Local(4)),LdW(Local(0)))), 
        LabelDef(Label(5)), 
        LabelDef(Label(2)), 
        Bne(CmpneW(LdW(Local(0)),LdW(Local(4))),Label(3)), 
        Write(LdW(Local(0))), 
        Ret()))

From this target tree, the encoder produces the following RISC assembly code. Note that the encoder includes the target constructs as comments (starting with exclamation marks) to make the correspondence clearer.

    ! Prologue
    movi $27, $0, 0
    ! StW(Local(0),Read())
    rd $1
    stw $1, $27, 0
    ! StW(Local(4),Read())
    rd $1
    stw $1, $27, 4
    ! Jmp(Label(2))
    br label2
    ! LabelDef(Label(3))
label3:
    ! Beq(CmpgtW(LdW(Local(0)),LdW(Local(4))),Label(4))
    ldw $1, $27, 0
    ldw $2, $27, 4
    cmp $1, $2
    movi $1, $0, 1
    bgt label7
    movi $1, $0, 0
label7:
    cmpi $1, 0
    beq label4
    ! StW(Local(0),SubW(LdW(Local(0)),LdW(Local(4))))
    ldw $1, $27, 0
    ldw $2, $27, 4
    sub $1, $1, $2
    stw $1, $27, 0
    ! Jmp(Label(5))
    br label5
    ! LabelDef(Label(4))
label4:
    ! StW(Local(4),SubW(LdW(Local(4)),LdW(Local(0))))
    ldw $1, $27, 4
    ldw $2, $27, 0
    sub $1, $1, $2
    stw $1, $27, 4
    ! LabelDef(Label(5))
label5:
    ! LabelDef(Label(2))
label2:
    ! Bne(CmpneW(LdW(Local(0)),LdW(Local(4))),Label(3))
    ldw $1, $27, 0
    ldw $2, $27, 4
    cmp $1, $2
    movi $1, $0, 1
    bne label8
    movi $1, $0, 0
label8:
    cmpi $1, 0
    bne label3
    ! Write(LdW(Local(0)))
    ldw $1, $27, 0
    wrd $1
    wrl
    ! Ret()
    br label6
    ! Epilogue
label6:
    ret $0

Here is the same information for the Obr factorial program.

PROGRAM Factorial (v : INTEGER) : INTEGER;

CONST
    limit = 7;

VAR
    c : INTEGER;
    fact : INTEGER;

BEGIN
    IF (v < 0) OR (v > limit) THEN
        RETURN -1;
    ELSE
        c := 0;
        fact := 1;
        WHILE c < v DO
            c := c + 1;
            fact := fact * c;
        END
        RETURN fact;
    END
END Factorial.

From this code, the Obr compiler generates the following target tree:

RISCProg(
    List(
        StW(Local(0),Read()), 
        Beq(
            Cond(
                CmpltW(LdW(Local(0)),IntDatum(0)),
                IntDatum(1),
                CmpgtW(LdW(Local(0)),IntDatum(7))),
            Label(2)),
        Write(NegW(IntDatum(1))), 
        Ret(), 
        Jmp(Label(3)), 
        LabelDef(Label(2)), 
        StW(Local(4),IntDatum(0)), 
        StW(Local(8),IntDatum(1)), 
        Jmp(Label(4)), 
        LabelDef(Label(5)), 
        StW(Local(4),AddW(LdW(Local(4)),IntDatum(1))), 
        StW(Local(8),MulW(LdW(Local(8)),LdW(Local(4)))), 
        LabelDef(Label(4)), 
        Bne(CmpltW(LdW(Local(4)),LdW(Local(0))),Label(5)), 
        Write(LdW(Local(8))), 
        Ret(), 
        LabelDef(Label(3))))

From this target tree, the encoder produces the following RISC assembly code:

    ! Prologue
    movi $27, $0, 0
    ! StW(Local(0),Read())
    rd $1
    stw $1, $27, 0
    ! Beq(Cond(CmpltW(LdW(Local(0)),IntDatum(0)),IntDatum(1),CmpgtW(LdW(Local(0)),IntDatum(7))),Label(2))
    ldw $1, $27, 0
    movi $2, $0, 0
    cmp $1, $2
    movi $1, $0, 1
    blt label9
    movi $1, $0, 0
label9:
    cmpi $1, 0
    beq label7
    movi $1, $0, 1
    mov $1, $0, $1
    br label8
label7:
    ldw $1, $27, 0
    movi $2, $0, 7
    cmp $1, $2
    movi $1, $0, 1
    bgt label10
    movi $1, $0, 0
label10:
    mov $1, $0, $1
label8:
    cmpi $1, 0
    beq label2
    ! Write(NegW(IntDatum(1)))
    movi $1, $0, 1
    sub $1, $0, $1
    wrd $1
    wrl
    ! Ret()
    br label6
    ! Jmp(Label(3))
    br label3
    ! LabelDef(Label(2))
label2:
    ! StW(Local(4),IntDatum(0))
    movi $1, $0, 0
    stw $1, $27, 4
    ! StW(Local(8),IntDatum(1))
    movi $1, $0, 1
    stw $1, $27, 8
    ! Jmp(Label(4))
    br label4
    ! LabelDef(Label(5))
label5:
    ! StW(Local(4),AddW(LdW(Local(4)),IntDatum(1)))
    ldw $1, $27, 4
    movi $2, $0, 1
    add $1, $1, $2
    stw $1, $27, 4
    ! StW(Local(8),MulW(LdW(Local(8)),LdW(Local(4))))
    ldw $1, $27, 8
    ldw $2, $27, 4
    mul $1, $1, $2
    stw $1, $27, 8
    ! LabelDef(Label(4))
label4:
    ! Bne(CmpltW(LdW(Local(4)),LdW(Local(0))),Label(5))
    ldw $1, $27, 4
    ldw $2, $27, 0
    cmp $1, $2
    movi $1, $0, 1
    blt label11
    movi $1, $0, 0
label11:
    cmpi $1, 0
    bne label5
    ! Write(LdW(Local(8)))
    ldw $1, $27, 8
    wrd $1
    wrl
    ! Ret()
    br label6
    ! LabelDef(Label(3))
label3:
    ! Epilogue
label6:
    ret $0

Tony Sloane and Dominic Verity
Last Modified: Tue Oct 05 16:10:03 +1100 2010
Copyright (c) 1998-2011 by Macquarie University. All rights reserved.