The document concludes with examples of translations of a couple of simple Obr programs.
The code that supports the tree structure and encoding described in
this document can be found in RISCTree.scala
and
Encoder.scala
in the Obr compiler. The mapping from Obr
trees to RISC trees can be found in Transformation.scala
.
All of the information that the translation task of the compiler provides about the target program is embodied in the target program tree. If a particular item of information cannot be accessed via this tree, then it cannot be obtained at all. Information is encoded in the "shape" of the tree and in values stored at the leaves.
This section defines the set of possible target program trees by defining all of the concepts and constructs of the target language.
A datum is a construct yielding an explicit value that can be
stored or used as an operand for other operations. The encoder
uses the attribute reg
to associate a local machine
register with each Datum to provide storage for the Datum's
value; hence the transformation phase does not have to perform
register allocation.
An item is a construct that does not yield a value.
A RISCProg node represents a complete RISC program. This RISCProg node is the root of the target program tree, and never appears in any other position.
The following productions summarise the constructs of the RISC by giving the structure of the subtree for each construct.
RISCProg: RISCNode: Item+ Beq: Item: Datum Label Bne: Item: Datum Label Jmp: Item: Label LabelDef: Item: Label Read: Item: Address Ret: Item StW: Item: Address Datum Write: Item: Datum AddW: Datum: Datum Datum Cond: Datum: Datum Datum Datum CmpeqW: Datum: Datum Datum CmpneW: Datum: Datum Datum CmpgtW: Datum: Datum Datum CmpltW: Datum: Datum Datum DivW: Datum: Datum Datum IntDatum: Datum: Int LdW: Datum: Address MulW: Datum: Datum Datum NegW: Datum: Datum Not: Datum: Datum RemW: Datum: Datum Datum SubW: Datum: Datum Datum SequenceDatum: Datum: Item+ Datum Local: Address: Int Indexed: Address: Local Datum
The "W" in some of the node names means that those operations operate on word-sized values (four bytes on the RISC) which in this compiler are used to implement both integer and Boolean values.
The following subsections describe the constructs of the table. Some of those constructs represent specific RISC instructions and others represent collections of instructions that involve related decisions about operand access.
RISC: RISC: Item+
The encoding of a RISC construct is very simple, we simply encode each of the items in its body, concatenate the resulting RISC code sequences and then add prologue (initilisation) and epilogue (termination) code. Currently this prologue code simply initialises register $27 topoints to the memory segment which will be used to store the values of global variables and temporaries. The epilogue is begun by a standard label to enable the Ret
construct (see below) to transfer control to it. Then it simply terminates the program by executing a ret $0
instruction.
Beq: Item: Datum Label Bne: Item: Datum Label Jmp: Item: Label
A branch (Beq
or Bne
) is encoded as the encoding of its Datum component followed by a test and branch to the Label component. A Beq
does a branch on equal to zero and a Bne
does a branch on not equal to zero.
A Jmp
does an unconditional branch to its Label component.
LabelDef
LabelDef: Item: Label
A LabelDef
construct represents a definition of a label and is encoded by emitting that definition in the appropriate assembler syntax.
Read
and Write
Read: Item: Address Write: Item: Datum
These constructs are encoded using the corresponding terminal IO RISC instructions rd
, wrd
and wrl
. In the case of Read
the value read is stored in the location given by the Address component. In the case of Write
the value written is that given by the Datum component which is encoded first.
Ret
Ret: Item
A Ret
construct is encoded by an unconditional jump to a label at the end of the code comprising the program (i.e., to the beginning of the epilogue). This encoding ensures that a return from any part of the program will complete necessary processing before exiting the program.
StW
StW: Item: Address Datum
A StW
construct is encoded by encoding the Datum component followed by an instruction to store the value of the Datum into the given address.
AddW
, DivW
, MulW
, NegW
, Not
, RemW
, SubW
AddW: Datum: Datum Datum DivW: Datum: Datum Datum MulW: Datum: Datum Datum NegW: Datum: Datum RemW: Datum: Datum Datum SubW: Datum: Datum Datum
Most of the arithmetic operations are encoded by encoding their Datum component(s) followed by a single instruction that performs the appropriate operation. The NegW
operation is implemented by subtracting the given operand from 0 (the value of register $0).
IntDatum: Datum: Int
An IntDatum construct is encoded as a move of the integer value into the location required by the Datum.
Cond
and Not
Cond: Datum: Datum Datum Datum Not: Datum: Datum
A Cond
construct is encoded by encoding its first Datum component, followed by a sequence of instructions that evaluate the second Datum if the first Datum is non-zero, or evaluate the third Datum if the first Datum is zero. In either case the result value will be left in the location required by the Cond
Datum itself. The Not
construct is encoded as if it were converted to a corresponding Cond
tree under the following translation:
Not (d) -> Cond (d, IntDatum(0), IntDatum(1))
CmpeqW
, CmpneW
, CmpgtW
, CmpltW
CmpeqW: Datum: Datum Datum CmpneW: Datum: Datum Datum CmpgtW: Datum: Datum Datum CmpltW: Datum: Datum Datum
The comparison constructs CmpeqW
, CmpgtW
, CmpltW
, and CmpneW
are encoded as the encoding of their operands, followed by a comparison instruction, followed by moves and conditional branches as appropriate to establish the result value of 0 or 1 in the register associated with the given comparison Datum.
LdW
LdW: Datum: Address
A LdW
construct is encoded as a load of a word value from the location specified by its Address component into the register associated with this LdW
.
Local
, Indexed
Local: Address: Int Indexed: Address: Local Datum
A Local
address represents a word-sized storage location in the main block of memory that is accessible to an Obr program. Its Int
child specifies the offset in bytes from the start of the memory block at which the word is located.
An Indexed
address is an address that is computed as a byte offset from a local address. The offset is given by a computation expressed as a Datum.
When an address is used in another construct (i.e., an LdW
or an StW
) it is first encoded, then used as an operand in the load or store. Local
address do not produce any code when they are encoded. Indexed
addresses encode their Datum component.
SequenceDatum
SequenceDatum
construct of the form
SequenceDatum Item-1 ... Item-n Datumis implemented as follows:
Code for Item-1 ... Code for Item-n Code for DatumHere
Item-i
is the ith element of the component Item list.
The results of the mapping process from source to target are reflected in the properties and structure of the target tree. This section describes how Obr source data and actions are mapped to target constructs.
Obr programs can manipulate only integer and Boolean basic values plus structured values that are arrays and records. Both parameters and variables can be declared. Therefore, a definition of the data mapping task must specify how values of these types are implemented on the RISC, and how storage is allocated for parameters and variables.
Because there is no possibility of recursion in Obr, it is possible to implement data storage for parameters and variables statically. The Obr "parameters" really aren't parameters at all --- they are top-level variables that must be initialised by reading them from the standard input before executing the body of the Obr program. Thus their storage is implemented just like variables.
Obr constants do not need any storage since the compiler knows their value and can construct an IntDatum
node that can be used directly.
An Obr integer is implemented by a RISC word (32 bits). For convenience, Boolean values are also represented by RISC words. True is represented by 1, and false is represented by 0.
Storage for all of the variables declared in an Obr program is allocated in a single area of RISC memory. During execution, register $27 contains the address of the beginning of the memory area. Thus, any variable's location can be specified by the sum of a non-negative integer and the contents of register $27. Since each variable occupies four bytes of memory, the offsets from the content of register $27 are all multiples of 4: The topmost variable is in location $27, the next variable is in location $27 + 4, and so on.
Arrays and records are allocated as contiguous memory as if the array elements or fields were declared as individual integer variables. (Recall that array elements and fields must be integers.) Therefore an array of N elements or a record with N fields is allocated as N contiguous words of memory.
IntVar
and ArrayVar
that represent declarations). This section describes how the other constructs are translated into RISC target tree constructs.
AssignStmt
constructs are translated into a StW
construct whose left child is the address of the variable, array element or field being assigned, and whose right child is the translation of the expression on the right-hand side of the assignment.
A BoolExp
is translated into an IntDatum
where 0 is used for FALSE
and 1 for TRUE
.
AndExp
and OrExp
translate into uses of the Cond
target construct in order to achieve short-circuit evaluation. They are translated as follows:
AndExp (e1, e2) -> Cond (t1, t2, 0) OrExp (e1, e2) -> Cond (t1, 1, t2)
In both of these translations t1
and t2
are the translations of e1
and e2
, respectively.
NotExp
translates into a boolean complement operation using the Not
target construct.
The comparison operators EqualExp
, NotEqualExp
, GreaterExp
, and LessExp
are translated to the CmpeqW
, CmpneW
, CmpgtW
and CmpltW
constructs, respectively.
ExitStmt
An ExitStmt
is implemented by a jump to the terminating label of the closest containing LoopStmt
. See also the description of the LoopStmt
construct below.
FieldExp
A FieldExp
translates to a LdW
from the address of the given record field.
ForStmt
A ForStmt
construct is implemented as follows:
ForStmt (id, e1, e2, s) -> StW (idmem, t1), StW (mem, t2), Bne (CmpgtW (LdW (idmem), LdW (mem)), L2), Jmp (L1), LabelDef (L3), StW (idmem, AddW (LdW (idmem), IntDatum (1))), LabelDef (L1), i Bne (CmpltW (LdW (idmem), LdW (mem)), L3), LabelDef (L2)
Here, i
is the list of Item nodes that is the translation of s
, t1
is the translation of e1
, and t2
is the translation of e2
. idmem
is the storage location being used for the variable id
, and mem
is a new integer memory location not used elsewhere.
Note that this scheme avoids a problem if the maximum expression e2
evaluates to the maximum integer possible, because id
is not incremented unless overflow cannot happen.
IdnExp
An IdnExp
is translated into either an IntDatum
containing the integer value of the identifier (if it denotes a constant), or a LdW
from the location in which the variable is stored.
IfStmt
An IfStmt
construct is implemented as follows:
IfStmt (e, s1, s2) -> Beq (t, L1) i1 Jmp (L2) LabelDef (L1) i2 LabelDef (L2)
Here, i1
and i2
are the lists of Item nodes that are the translations of s1
and s2
, respectively, and t
is the translation of e
.
IndexExp
An IndexExp
translates to a LdW
from the address of the given array element. In general, the index is not constant so it must be calculated as part of the address computation.
An IntExp
is translated into an IntDatum
whose value is the Int
component of the IntExp
.
The arithmetic target constructs are used to implement the arithmetic operators (MinusExp
, NegExp
, ModExp
, PlusExp
, SlashExp
and StarExp
) in the obvious way. For example, PlusExp
is represented by AddW
, ModExp
by RemW
, and NegExp
by NegW
.
IntParam
Parameter declarations are always represented by IntParam
constructs and are translated into a Read
construct whose child is address of the storage allocated to the parameter.
LoopStmt
A LoopStmt
construct is implemented as follows:
Loop (s) -> LabelDef (L1) i Jmp (L1) LabelDef (L2)
Here, i
is the list of Item nodes that is the translation of s
. L2
is a label that can be used as the destination of jumps implementing ExitStmt
constructs within the loop.
ObrInt
The ObrInt
construct is translated into a RISC construct whose children are the Item nodes comprising the translation of its Declaration and Statement components. The RISC node also is given an Int
component to record the maximum size of storage used by the program.
ReturnStmt
The ReturnStmt
construct is implemented by a Write
construct whose child is the translation of the component Expression to be returned, followed by a Ret
construct.
WhileStmt
WhileStmt
construct is implemented as follows:
WhileStmt (e, s) -> Jmp (L1) LabelDef (L2) i LabelDef (L1) Bne (t, L2)
Here, i
is the list of Item nodes that is the translation of s
, and t
is the translation of e
.
The default behaviour of the Obr compiler is to execute all syntactic, semantic and code generation phases, reporting any errors that occur but doing nothing else. To alter this behaviour we provide three command line flags:
-t
spill the target tree constructed by the translation phase to the standard output.
-a
spill the output of the encoding phase to the standard output as RISC assembly language code.
-e
assemble and execute the generated RISC code in the Obr compilers built in RISC machine emulator.
This section shows the complete RISC target trees and assembly code that would be produced for the factorial and GCD Obr programs.
Consider the Obr version of Euclid's algorithm for calculating the greatest common divisor of two numbers.
PROGRAM GCD (x : INTEGER; y : INTEGER) : INTEGER; BEGIN WHILE x # y DO IF x > y THEN x := x - y; ELSE y := y - x; END END RETURN x; END GCD.
From this code, the Obr compiler generates the following target tree:
RISCProg( List( StW(Local(0),Read()), StW(Local(4),Read()), Jmp(Label(2)), LabelDef(Label(3)), Beq(CmpgtW(LdW(Local(0)),LdW(Local(4))),Label(4)), StW(Local(0),SubW(LdW(Local(0)),LdW(Local(4)))), Jmp(Label(5)), LabelDef(Label(4)), StW(Local(4),SubW(LdW(Local(4)),LdW(Local(0)))), LabelDef(Label(5)), LabelDef(Label(2)), Bne(CmpneW(LdW(Local(0)),LdW(Local(4))),Label(3)), Write(LdW(Local(0))), Ret()))
From this target tree, the encoder produces the following RISC assembly code. Note that the encoder includes the target constructs as comments (starting with exclamation marks) to make the correspondence clearer.
! Prologue movi $27, $0, 0 ! StW(Local(0),Read()) rd $1 stw $1, $27, 0 ! StW(Local(4),Read()) rd $1 stw $1, $27, 4 ! Jmp(Label(2)) br label2 ! LabelDef(Label(3)) label3: ! Beq(CmpgtW(LdW(Local(0)),LdW(Local(4))),Label(4)) ldw $1, $27, 0 ldw $2, $27, 4 cmp $1, $2 movi $1, $0, 1 bgt label7 movi $1, $0, 0 label7: cmpi $1, 0 beq label4 ! StW(Local(0),SubW(LdW(Local(0)),LdW(Local(4)))) ldw $1, $27, 0 ldw $2, $27, 4 sub $1, $1, $2 stw $1, $27, 0 ! Jmp(Label(5)) br label5 ! LabelDef(Label(4)) label4: ! StW(Local(4),SubW(LdW(Local(4)),LdW(Local(0)))) ldw $1, $27, 4 ldw $2, $27, 0 sub $1, $1, $2 stw $1, $27, 4 ! LabelDef(Label(5)) label5: ! LabelDef(Label(2)) label2: ! Bne(CmpneW(LdW(Local(0)),LdW(Local(4))),Label(3)) ldw $1, $27, 0 ldw $2, $27, 4 cmp $1, $2 movi $1, $0, 1 bne label8 movi $1, $0, 0 label8: cmpi $1, 0 bne label3 ! Write(LdW(Local(0))) ldw $1, $27, 0 wrd $1 wrl ! Ret() br label6 ! Epilogue label6: ret $0
Here is the same information for the Obr factorial program.
PROGRAM Factorial (v : INTEGER) : INTEGER; CONST limit = 7; VAR c : INTEGER; fact : INTEGER; BEGIN IF (v < 0) OR (v > limit) THEN RETURN -1; ELSE c := 0; fact := 1; WHILE c < v DO c := c + 1; fact := fact * c; END RETURN fact; END END Factorial.
From this code, the Obr compiler generates the following target tree:
RISCProg( List( StW(Local(0),Read()), Beq( Cond( CmpltW(LdW(Local(0)),IntDatum(0)), IntDatum(1), CmpgtW(LdW(Local(0)),IntDatum(7))), Label(2)), Write(NegW(IntDatum(1))), Ret(), Jmp(Label(3)), LabelDef(Label(2)), StW(Local(4),IntDatum(0)), StW(Local(8),IntDatum(1)), Jmp(Label(4)), LabelDef(Label(5)), StW(Local(4),AddW(LdW(Local(4)),IntDatum(1))), StW(Local(8),MulW(LdW(Local(8)),LdW(Local(4)))), LabelDef(Label(4)), Bne(CmpltW(LdW(Local(4)),LdW(Local(0))),Label(5)), Write(LdW(Local(8))), Ret(), LabelDef(Label(3))))
From this target tree, the encoder produces the following RISC assembly code:
! Prologue movi $27, $0, 0 ! StW(Local(0),Read()) rd $1 stw $1, $27, 0 ! Beq(Cond(CmpltW(LdW(Local(0)),IntDatum(0)),IntDatum(1),CmpgtW(LdW(Local(0)),IntDatum(7))),Label(2)) ldw $1, $27, 0 movi $2, $0, 0 cmp $1, $2 movi $1, $0, 1 blt label9 movi $1, $0, 0 label9: cmpi $1, 0 beq label7 movi $1, $0, 1 mov $1, $0, $1 br label8 label7: ldw $1, $27, 0 movi $2, $0, 7 cmp $1, $2 movi $1, $0, 1 bgt label10 movi $1, $0, 0 label10: mov $1, $0, $1 label8: cmpi $1, 0 beq label2 ! Write(NegW(IntDatum(1))) movi $1, $0, 1 sub $1, $0, $1 wrd $1 wrl ! Ret() br label6 ! Jmp(Label(3)) br label3 ! LabelDef(Label(2)) label2: ! StW(Local(4),IntDatum(0)) movi $1, $0, 0 stw $1, $27, 4 ! StW(Local(8),IntDatum(1)) movi $1, $0, 1 stw $1, $27, 8 ! Jmp(Label(4)) br label4 ! LabelDef(Label(5)) label5: ! StW(Local(4),AddW(LdW(Local(4)),IntDatum(1))) ldw $1, $27, 4 movi $2, $0, 1 add $1, $1, $2 stw $1, $27, 4 ! StW(Local(8),MulW(LdW(Local(8)),LdW(Local(4)))) ldw $1, $27, 8 ldw $2, $27, 4 mul $1, $1, $2 stw $1, $27, 8 ! LabelDef(Label(4)) label4: ! Bne(CmpltW(LdW(Local(4)),LdW(Local(0))),Label(5)) ldw $1, $27, 4 ldw $2, $27, 0 cmp $1, $2 movi $1, $0, 1 blt label11 movi $1, $0, 0 label11: cmpi $1, 0 bne label5 ! Write(LdW(Local(8))) ldw $1, $27, 8 wrd $1 wrl ! Ret() br label6 ! LabelDef(Label(3)) label3: ! Epilogue label6: ret $0