The document concludes with examples of translations of a couple of simple Obr programs.
The code that supports the tree structure and encoding described in
this document can be found in SPARCTree.scala
and
Encoder.scala
in the Obr compiler. The mapping from Obr
trees to SPARC trees can be found in Transformation.scala
.
All of the information that the translation task of the compiler provides about the target program is embodied in the target program tree. If a particular item of information cannot be accessed via this tree, then it cannot be obtained at all. Information is encoded in the "shape" of the tree and in values stored at the leaves.
This section defines the set of possible target program trees by defining all of the concepts and constructs of the target language.
A datum is a construct yielding an explicit value that can be
stored or used as an operand for other operations. The encoder
uses the attribute reg
to associate a local machine
register with each Datum to provide storage for the Datum's
value; hence the transformation phase does not have to perform
register allocation.
An item is a construct that does not yield a value.
A SPARC is a complete SPARC program. The SPARC node is the root of the target program tree, and never appears in any other position.
The following productions summarise the constructs of the SPARC by giving the structure of the subtree for each construct.
SPARC: SPARC: Item+ Int Beq: Item: Datum Label Bne: Item: Datum Label Jmp: Item: Label LabelDef: Item: Label Read: Item: Address Ret: Item StW: Item: Address Datum Write: Item: Datum AddW: Datum: Datum Datum Cond: Datum: Datum Datum Datum CmpeqW: Datum: Datum Datum CmpneW: Datum: Datum Datum CmpgtW: Datum: Datum Datum CmpltW: Datum: Datum Datum DivW: Datum: Datum Datum IntDatum: Datum: Int LdW: Datum: Address MulW: Datum: Datum Datum NegW: Datum: Datum Not: Datum: Datum RemW: Datum: Datum Datum SubW: Datum: Datum Datum Local: Address: Int Indexed: Address: Local Datum
The "W" in some of the node names means that those operations operate on word-sized values (four bytes on the SPARC) which in this compiler are used to implement both integer and Boolean values.
The following subsections describe the constructs of the table. Some of those constructs represent specific SPARC instructions and others represent collections of instructions that involve related decisions about operand access.
SPARC: SPARC: Item+ Int
The encoding of a SPARC construct depends on the assembler and operating system in use. The encoder will first encode a standard prologue to enable the operating system to invoke the generated code and establish memory space for the program's variables. The Int component of the SPARC node is the required maximum size for this storage in bytes. After the prologue the Items in the component list are encoded in order. Finally, a standard epilogue is encoded to enable return from the program to the operating system. This epilogue is begun by a standard label to enable the "Ret" construct (see below) to transfer control to it.
Beq: Item: Datum Label Bne: Item: Datum Label Jmp: Item: Label
A branch (Beq or Bne) is encoded as the encoding of its Datum component followed by a test and branch to the Label component. A Beq does a branch on equal to zero and a Bne does a branch on not equal to zero.
A Jmp does an unconditional branch to its Label component.
LabelDef: Item: Label
A LabelDef construct represents a definition of a label and is encoded by emitting that definition in the appropriate assembler syntax.
Read: Item: Address Write: Item: Datum
These constructs are encoded as calls to C library routines to read or write a single piece of integer data. In the case of Read the value read is stored in the location given by the Address component. In the case of Write the value written is that given by the Datum component which is encoded first.
Ret: Item
A Ret construct is encoded by an unconditional jump to a label at the end of the code comprising the program (i.e., to the beginning of the epilogue). This encoding ensures that a return from any part of the program will complete necessary processing before exiting the program.
StW: Item: Address Datum
A StW construct is encoded by encoding the Datum component followed by an instruction to store the value of the Datum into the given address.
AddW: Datum: Datum Datum DivW: Datum: Datum Datum MulW: Datum: Datum Datum NegW: Datum: Datum Not: Datum: Datum RemW: Datum: Datum Datum SubW: Datum: Datum Datum
Most of the arithmetic operations are encoded by encoding their Datum component(s) followed by a single instruction that performs the appropriate operation.
The SPARC has no remainder instruction, so RemW constructs are encoded by a call to a C run-time routine that implements integer remainder.
Cond: Datum: Datum Datum Datum
A Cond construct is encoded by encoding its first Datum component, followed by a sequence of instructions that evaluate the second Datum if the first Datum is non-zero, or evaluate the third Datum if the first Datum is zero. In either case the result value will be left in the location required by the Cond Datum itself.
CmpeqW: Datum: Datum Datum CmpneW: Datum: Datum Datum CmpgtW: Datum: Datum Datum CmpltW: Datum: Datum Datum
The comparison constructs CmpeqW, CmpgtW, CmpltW, and CmpneW are encoded as the encoding of their operands, followed by a comparison instruction, followed by moves and conditional branches as appropriate to establish the result value of 0 or 1 in the location required by the comparison Datum.
IntDatum: Datum: Int
An IntDatum construct is encoded as a move of the integer value into the location required by the Datum.
LdW: Datum: Address
A LdW construct is encoded as a load of a word value from the location specified by its Address component into the location of the LdW.
Local: Address: Int Indexed: Address: Local Datum
A Local address represents a word-sized storage location in the main block of memory that is accessible to an Obr program. The Int component is the offset in bytes from the start of the memory block at which the word is located.
An Indexed address is an address that is computed as a byte offset from a local address. The offset is given by a computation expressed as a Datum.
When an address is used in another construct (i.e., an LdW or an StW) it is first encoded, then used as an operand in the load or store. Local address do not produce any code when they are encoded. Indexed addresses encode their Datum component.
The results of the mapping process from source to target are reflected in the properties and structure of the target tree. This section describes how Obr source data and actions are mapped to target constructs.
Obr programs can manipulate only integer and Boolean basic values plus structured values that are arrays and records. Both parameters and variables can be declared. Therefore, a definition of the data mapping task must specify how values of these types are implemented on the SPARC, and how storage is allocated for parameters and variables.
Because there is no possibility of recursion in Obr, it is possible to implement data storage for parameters and variables statically. The Obr "parameters" really aren't parameters at all---they are top-level variables that must be initialised by reading them from the standard input before executing the body of the Obr program. Thus their storage is implemented just like variables.
Obr constants do not need any storage since the compiler knows their value and can construct an IntDatum node that can be used directly.
An Obr integer is implemented by a SPARC word (32 bits). For convenience, Boolean values are also represented by SPARC words. True is represented by one, and false is represented by zero.
Storage for all of the variables declared in an Obr program is allocated in a single area of SPARC memory. During execution, register %l0 contains the address of the beginning of the memory area. Thus, any variable's location can be specified by the sum of a non-negative integer and the contents of register %l0. Since each variable occupies four bytes of memory, the offsets from the content of register %l0 are all multiples of 4: The topmost variable is in location [%l0], the next variable is in location [%l0+4], and so on.
Arrays and records are allocated as contiguous memory as if the array elements or fields were declared as individual integer variables. (Recall that array elements and fields must be integers.) Therefore an array of N elements or a record with N fields is allocated as N contiguous words of memory.
AssignStmt constructs are translated into a StW construct whose left child is the address of the variable, array element or field being assigned, and whose right child is the translation of the expression on the right-hand side of the assignment.
A BoolExp is translated into an IntDatum where zero is used for FALSE and 1 for TRUE.
AndExp and OrExp translate into uses of the Cond target construct in order to achieve short-circuit evaluation. They are translated as follows:
AndExp (e1, e2) -> Cond (t1, t2, 0) OrExp (e1, e2) -> Cond (t1, 1, t2)
In both of these translations t1 and t2 are the translations of e1 and e2, respectively.
NotExp translates into a Boolean complement operation using the Not target construct.
The comparison operators EqualExp, NotEqualExp, GreaterExp, and LessExp are translated to the CmpeqW, CmpneW, CmpgtW and CmpltW constructs, respectively.
An ExitStmt is implemented by a jump to the terminating label of the closest containing LoopStmt. See also the description of the LoopStmt construct below.
A FieldExp translates to a LdW from the address of the given record field.
A ForStmt construct is implemented as follows:
ForStmt (id, e1, e2, s) -> StW (idmem, t1), StW (mem, t2), Bne (CmpgtW (LdW (idmem), LdW (mem)), L2), Jmp (L1), LabelDef (L3), StW (idmem, AddW (LdW (idmem), IntDatum (1))), LabelDef (L1), i Bne (CmpltW (LdW (idmem), LdW (mem)), L3), LabelDef (L2)
Here, i is the list of Item nodes that is the translation of s, t1 is the translation of e1, and t2 is the translation of e2. idmem is the storage location being used for the variable id, and mem is a new integer memory location not used elsewhere.
Note that this scheme avoids a problem if the maximum expression e2 evaluates to the maximum integer possible, because id is not incremented unless overflow cannot happen.
An IdnExp is translated into either an IntDatum containing the integer value of the identifier (if it denotes a constant), or a LdW from the location in which the variable is stored.
An IfStmt construct is implemented as follows:
IfStmt (e, s1, s2) -> Beq (t, L1) i1 Jmp (L2) LabelDef (L1) i2 LabelDef (L2)
Here, i1 and i2 are the lists of Item nodes that are the translations of s1 and s2, respectively, and t is the translation of e.
An IndexExp translates to a LdW from the address of the given array element. In general, the index is not constant so it must be calculated as part of the address computation.
An IntExp is translated into an IntDatum whose value is the Int component of the IntExp.
The arithmetic target constructs are used to implement the arithmetic operators (MinusExp, NegExp, ModExp, PlusExp, SlashExp and StarExp) in the obvious way. For example, PlusExp is represented by AddW, ModExp by RemW, and NegExp by NegW.
Parameter declarations are always represented by IntParam constructs and are translated into a Read construct whose child is address of the storage allocated to the parameter.
A LoopStmt construct is implemented as follows:
Loop (s) -> LabelDef (L1) i Jmp (L1) LabelDef (L2)
Here, i is the list of Item nodes that is the translation of s. L2 is a label that can be used as the destination of jumps implementing ExitStmt constructs within the loop.
The ObrInt construct is translated into a SPARC construct whose children are the Item nodes comprising the translation of its Declaration and Statement components. The SPARC node also is given an Int component to record the maximum size of storage used by the program.
The ReturnStmt construct is implemented by a Write construct whose child is the translation of the component Expression to be returned, followed by a Ret construct.
WhileStmt (e, s) -> Jmp (L1) LabelDef (L2) i LabelDef (L1) Bne (t, L2)
Here, i is the list of Item nodes that is the translation of s, and t is the translation of e.
The assembly language code produced by the Obr compiler can be
assembled and run on SPARC machines such as pompeii in the
Department of Computing. It is necessary to use the
gcc
command, since the Obr code uses
scanf
and printf
from the C library. If
factorial.s
contains the code generated by the Obr
compiler for the factorial program then you can compile and run it
as follows on pompeii.
gcc -o factorial factorial.s factorial 4
This execution should print 24.
This section shows the complete SPARC target trees and assembly code that would be produced for the factorial and GCD Obr programs.
Consider the Obr version of Euclid's algorithm for calculating the greatest common divisor of two numbers.
PROGRAM GCD (x : INTEGER; y : INTEGER) : INTEGER; BEGIN WHILE x # y DO IF x > y THEN x := x - y ELSE y := y - x END END; RETURN x END GCD.
From this code, the Obr compiler generates the following target tree:
SPARC (List (Read(Local(0)), Read(Local(4)), Jmp(L1), LabelDef(L2), Beq(CmpgtW(LdW(Local(0)),LdW(Local(4))),L3), StW(Local(0),SubW(LdW(Local(0)),LdW(Local(4)))), Jmp(L4), LabelDef(L3), StW(Local(4),SubW(LdW(Local(4)),LdW(Local(0)))), LabelDef(L4), LabelDef(L1), Bne(CmpneW(LdW(Local(0)),LdW(Local(4))),L2), Write(LdW(Local(0))), Ret), 8)
From this target tree, the encoder produces the following SPARC assembly code. Note that the encoder includes the target constructs as comments (starting with exclamation marks) to make the correspondence clearer.
! Prologue .seg "data" ifmt: .asciz "%d" ofmt: .asciz "%d\n" .align 4 mem: .skip 8 .seg "text" .globl main main: save %sp, -112, %sp set mem, %l0 ! Read(Local(0)) set ifmt, %o0 add %l0, 0, %o1 call scanf nop ! Read(Local(4)) set ifmt, %o0 add %l0, 4, %o1 call scanf nop ! Jmp(L1) ba L1 nop ! LabelDef(L2) L2: ! Beq(CmpgtW(LdW(Local(0)),LdW(Local(4))),L3) ld [%l0], %l1 ld [%l0+4], %l2 cmp %l1, %l2 mov 1, %l2 bg L5 nop mov 0, %l2 L5: tst %l2 be L3 nop ! StW(Local(0),SubW(LdW(Local(0)),LdW(Local(4)))) ld [%l0], %l1 ld [%l0+4], %l2 sub %l1, %l2, %l2 st %l2, [%l0] ! Jmp(L4) ba L4 nop ! LabelDef(L3) L3: ! StW(Local(4),SubW(LdW(Local(4)),LdW(Local(0)))) ld [%l0+4], %l1 ld [%l0], %l2 sub %l1, %l2, %l2 st %l2, [%l0+4] ! LabelDef(L4) L4: ! LabelDef(L1) L1: ! Bne(CmpneW(LdW(Local(0)),LdW(Local(4))),L2) ld [%l0], %l1 ld [%l0+4], %l2 cmp %l1, %l2 mov 1, %l2 bne L6 nop mov 0, %l2 L6: tst %l2 bne L2 nop ! Write(LdW(Local(0))) ld [%l0], %l1 set ofmt, %o0 mov %l1, %o1 call printf nop ! Ret ba go nop ! Epilogue go: ret restore
Here is the same information for the Obr factorial program.
PROGRAM Factorial (v : INTEGER) : INTEGER; CONST limit = 7; VAR c : INTEGER; fact : INTEGER; BEGIN IF (v < 0) OR (v > limit) THEN RETURN -1; ELSE c := 0; fact := 1; WHILE c < v DO c := c + 1; fact := fact * c; END RETURN fact; END END Factorial.
From this code, the Obr compiler generates the following target tree:
SPARC (List (Read(Local(0)), Beq(Cond(CmpltW(LdW(Local(0)),IntDatum(0)),IntDatum(1), CmpgtW(LdW(Local(0)),IntDatum(7))),L1), Write(NegW(IntDatum(1))), Ret, Jmp(L2), LabelDef(L1), StW(Local(4),IntDatum(0)), StW(Local(8),IntDatum(1)), Jmp(L3), LabelDef(L4), StW(Local(4),AddW(LdW(Local(4)),IntDatum(1))), StW(Local(8),MulW(LdW(Local(8)),LdW(Local(4)))), LabelDef(L3), Bne(CmpltW(LdW(Local(4)),LdW(Local(0))),L4), Write(LdW(Local(8))), Ret, LabelDef(L2)), 12)
From this target tree, the encoder produces the following SPARC assembly code:
! Prologue .seg "data" ifmt: .asciz "%d" ofmt: .asciz "%d\n" .align 4 mem: .skip 12 .seg "text" .globl main main: save %sp, -112, %sp set mem, %l0 ! Read(Local(0)) set ifmt, %o0 add %l0, 0, %o1 call scanf nop ! Beq(Cond(CmpltW(LdW(Local(0)),IntDatum(0)),IntDatum(1),CmpgtW(LdW(Local(0)),IntDatum(7))),L1) ld [%l0], %l1 mov 0, %l2 cmp %l1, %l2 mov 1, %l2 bl L7 nop mov 0, %l2 L7: tst %l2 be L5 nop mov 1, %l3 mov %l3, %l2 ba L6 nop L5: ld [%l0], %l4 mov 7, %l5 cmp %l4, %l5 mov 1, %l5 bg L8 nop mov 0, %l5 L8: mov %l5, %l2 L6: tst %l2 be L1 nop ! Write(NegW(IntDatum(1))) mov 1, %l1 neg %l1, %l1 set ofmt, %o0 mov %l1, %o1 call printf nop ! Ret ba go nop ! Jmp(L2) ba L2 nop ! LabelDef(L1) L1: ! StW(Local(4),IntDatum(0)) mov 0, %l1 st %l1, [%l0+4] ! StW(Local(8),IntDatum(1)) mov 1, %l1 st %l1, [%l0+8] ! Jmp(L3) ba L3 nop ! LabelDef(L4) L4: ! StW(Local(4),AddW(LdW(Local(4)),IntDatum(1))) ld [%l0+4], %l1 mov 1, %l2 add %l1, %l2, %l2 st %l2, [%l0+4] ! StW(Local(8),MulW(LdW(Local(8)),LdW(Local(4)))) ld [%l0+8], %l1 ld [%l0+4], %l2 smul %l1, %l2, %l2 st %l2, [%l0+8] ! LabelDef(L3) L3: ! Bne(CmpltW(LdW(Local(4)),LdW(Local(0))),L4) ld [%l0+4], %l1 ld [%l0], %l2 cmp %l1, %l2 mov 1, %l2 bl L9 nop mov 0, %l2 L9: tst %l2 bne L4 nop ! Write(LdW(Local(8))) ld [%l0+8], %l1 set ofmt, %o0 mov %l1, %o1 call printf nop ! Ret ba go nop ! LabelDef(L2) L2: ! Epilogue go: ret restore