There has always been a plan to,at some point, attempt to compile lispBM programs to some kind of bytecode. I started on an attempt a while ago, then written in C and it was annoying. It feels like it should be much more pleasant to write a compiler in a language that does automatic garbage collection and where it is very easy and natural to build structures in memory such as linked lists, association lists and so on. The SICP book shows how to compile a lisp into a list of instructions, this is in the final chapter (5.5) of the book.
So for this attempt at writing a bytecode compiler for lispBM, lispBM is used as the implementation language as well. The compiler is very strongly influenced by the presentation in Chapter 5.5 of SICP, but there are differences.
This is work in progress and while the end goal is to compile to an array of bytes, as a fist step the compiler just creates a list of instructions in a way similar to how SICP does it. The instructions are at a lower level compared to the description in SICP, as I see it, and more directly amenable to bytecode translation. To translate the list of instructions into an array of bytes is future work as is implementing some machine to evaluate the bytecode. I will keep writing about this as the pieces are finished.
This lispBM compiler is the largest program written in lispBM so far. It has been really helpful to try and write a larger program, so many bugs have been found along the way!
As usual I much appreciate all constructive feedback and thanks a lot for reading!
This part of the code is very similar to SICP
chapter 5.5.1 except that lispBM does not have the cond
form. So here the main branching structure of the compiler is
implemented as nested if
conditionals. Compared to SICP,
there is a let
form branch here that, I think, is not
present in SICP as they are converted into lambdas. There are also
differences when it comes to sequences of expressions. In lispBM only
the progn
form allows sequencing of expressions. The real
differences compared to SICP are inside of the individual compiler cases
that will be shown further below.
The compiler creates a list containing three elements. The first element is a list of registers read within the code, the second is a list of registers written to and the third element is a list of instructions. Each instruction is a list of op-code and arguments.
Below is an example that shows what the output is when compiling the
simple program (+ 1 2)
.
# (compile-instr-list '(+ 1 2) 'val 'next)
> (nil (proc argl val)
((movimm proc +)
(movimm argl nil)
(movimm val 2)
(cons argl val)
(movimm val 1)
(cons argl val)
(callf)))
It does not depend on values in any registers and it updates the
proc
, argl
and val
registers. A
list of all registers and their intended usage will be shown later in
the text.
The compiler takes an expression as input together with a target register and a specified linkage. The linkage is most importance when compiler is called recursively to generate different sub-parts of the instruction list. The linkage describes how the compiled instruction list should end. Returning from a function application will be special for example, but there will be more info about this in the following sections.
The compiler looks at the passed in expression and dispatches to an
appropriate function that generates an instruction list for that case.
Each of these case specific functions may call
compile-instr-list
on subexpressions.
(define compile-instr-list
(lambda (exp target linkage)
(if (is-self-evaluating exp)
(compile-self-evaluating exp target linkage)
(if (is-quoted exp)
(compile-quoted exp target linkage)
(if (is-symbol exp)
(compile-symbol exp target linkage)
(if (is-def exp)
(compile-def exp target linkage)
(if (is-lambda exp)
(compile-lambda exp target linkage)
(if (is-progn exp)
(compile-progn (cdr exp) target linkage)
(if (is-let exp)
(compile-let exp target linkage)
(if (is-list exp)
(compile-application exp target linkage)
(print "Not recognized")))))))))))
The compile-instr-list
function makes use of a number of
predicates to decide which branch to take. The implementation of these
are mostly very similar to each other. They take an expression as
argument and check if it is of some particular shape, such as it being a
symbol or that it is a list and the first element is a
lambda
.
(define is-symbol
(lambda (exp) (= type-symbol (type-of exp))))
(define is-label
(lambda (exp) (= (car exp) 'label)))
(define is-def
(lambda (exp) (= (car exp) 'define)))
(define is-nil
(lambda (exp) (= exp 'nil)))
(define is-quoted
(lambda (exp) (= (car exp) 'quote)))
(define is-lambda
(lambda (exp) (= (car exp) 'lambda)))
(define is-let
(lambda (exp) (= (car exp) 'let)))
(define is-progn
(lambda (exp) (= (car exp) 'progn)))
(define is-list
(lambda (exp) (= type-list (type-of exp))))
(define is-last-element
(lambda (exp) (and (is-list exp) (is-nil (cdr exp)))))
Some expressions evaluate to them self, such as a number or an array.
These will be compiler into immediate values in the resulting
instruction list. The following functions are used to identify
self-evaluating
expressions.
(define is-number
(lambda (exp)
(let ((typ (type-of exp)))
(or (= typ type-i28)
(= typ type-u28)
(= typ type-i32)
(= typ type-u32)
(= typ type-float)))))
(define is-array
(lambda (exp)
(= (type-of exp) type-array)))
(define is-self-evaluating
(lambda (exp)
(or (is-number exp)
(is-array exp))))
The generated code will contain jumps and when at some later producing the actual byte code, these will be jumps to absolute addresses. When generating the instruction list, however, there will be labels with a unique number at the place for each jump target. A later pass will calculate what the "address" into the bytecode each label is at and replace jumps to label with jumps to address.
A label counter keeps track of the next label identifier to generate
and a function called mk-label
is called to generate a
fresh label.
(define label-counter 0)
(define incr-label
(lambda ()
(progn
;; Define used to redefine label-counter
(define label-counter (+ 1 label-counter))
label-counter)))
(define mk-label
(lambda (name)
(list 'label name (incr-label))))
A label is a list containing the symbol label
as first
element, the next element is a string (mostly for easier understanding
of a generated instruction list) and the third element is the label
value.
The compiler generates code for a machine with 5 special purpose registers.
(define all-regs '(env
proc
val
argl
cont))
Below is a list of instructions and their sizes in bytes (with their arguments).
(define instr-size
'((jmpcnt 1)
(jmpimm 5)
(jmpval 1)
(movimm 6)
(mov 3)
(lookup 6)
(setglbval 5)
(push 2)
(pop 2)
(bpf 5)
(exenvargl 5)
(exenvval 5)
(cons 3)
(consimm 6)
(cdr 3)
(cadr 3)
(caddr 3)
(callf 1)
(label 0)))
This is something that will most likely change a bit but for now and for simplicity this is what I am working with. Using a full byte for an op-code is a waste as there are only roughly 17 of them. Instructions that right now take a register as argument may later be replicated for each possible register that is can use, turning fewer general instructions into more and specialized but smaller in size.
A machine that can execute these instructions will also have a
program counter, pc
, register.
The jmpcnt
instruction can then be thought of as
performing the operation:
pc <- cont
As a more involved example, exenvargl <symbol>
performs the following operations:
env <- (cons (cons <symbol> (car argl)) env)
argl <- cdr argl
the output of the compiler is a triple containing the list of registers needed, the list of registers modified and the list of instructions. This section shows a set of functions to create and modify such triples.
The first of these function is mk-instr-seq
that takes
tree input and puts those three in a list. This is used to create a
snippet of output code while specifying also the registers needed and
used.
(define mk-instr-seq
(lambda (needs mods stms)
(list needs mods stms)))
A set of accessor functions can be used to extract information from an instruction sequence.
(define regs-needed
(lambda (s)
(if (is-label s)
'()
(car s))))
(define regs-modified
(lambda (s)
(if (is-label s)
'()
(car (cdr s)))))
(define statements
(lambda (s)
(if (is-label s)
(list s)
(car (cdr (cdr s))))))
The following two functions check if a specific register is used or needed by a instruction sequence.
(define needs-reg
(lambda (s r)
(mem r (regs-needed s))))
(define modifies-reg
(lambda (s r)
(mem r (regs-modified s))))
mem
is the function that checks if an element is present
in a list.
(define mem
(lambda (x xs)
(if (is-nil xs)
'nil
(if (= x (car xs))
't
(mem x (cdr xs))))))
The instr-seq-bytes
function calculates how many bytes a
compact representation (bytecode) of an instruction sequence will
require.
(define instr-seq-bytes
(lambda (s)
(let ((sum-bytes
(lambda (x acc)
(if (is-nil x) acc
(sum-bytes (cdr x) (+ (lookup (car (car x)) instr-size) acc))))))
(sum-bytes (car (cdr (cdr s))) 0))))
Combining two instruction sequences is a bit tricky as the resulting combined instruction sequence must also represent registers needed and modified correctly. Most of the code, with few exceptions, in this section is close to identical to the presentation in SICP only translated to the lispBM-dialect ;).
The following functions implements appending of two instructions
sequences, appending of an arbitrary number of sequences passed in as a
list, combination of totally independent sequences of instructions
(called parallel-instr-seqs
) and a tack-on function used to
insert a function body into the instruction sequence.
(define append-two-instr-seqs
(lambda (s1 s2)
(mk-instr-seq
(list-union (regs-needed s1)
(list-diff (regs-needed s2)
(regs-modified s1)))
(list-union (regs-modified s1)
(regs-modified s2))
(append (statements s1) (statements s2))))))
(define append-instr-seqs
(lambda (seqs)
(if (is-nil seqs)
empty-instr-seq
(append-two-instr-seqs (car seqs)
(append-instr-seqs (cdr seqs))))))
(define tack-on-instr-seq
(lambda (s b)
(mk-instr-seq (regs-needed s)
(regs-modified s)
(append (statements s) (statements b)))))
(define parallel-instr-seqs
(lambda (s1 s2)
(mk-instr-seq
(list-union (regs-needed s1)
(regs-needed s2))
(list-union (regs-modified s1)
(regs-modified s2))
(append (statements s1) (statements s2)))))
The helper functions list-union
and
list-diff
are implemented as follows in lispBM. Again this
is just a translation from the SICP lisp dialect to the
lispBM-dialect.
(define list-union
(lambda (s1 s2)
(if (is-nil s1) s2
(if (mem (car s1) s2)
(list-union (cdr s1) s2)
(cons (car s1) (list-union (cdr s1) s2))))))
(define list-diff
(lambda (s1 s2)
(if (is-nil s1) '()
(if (mem (car s1) s2) (list-diff (cdr s1) s2)
(cons (car s1) (list-diff (cdr s1) s2))))))
When generating code for things like nested function applications
(for example (+ 1 (+ 2 3) 4)
, there is a need for
preserving registers that are used on both levels. The
preserving
function combines two instruction sequences
while automatically inserting push
and pop
of
the registers specified around the first of the sequences.
(define preserving
(lambda (regs s1 s2)
(if (is-nil regs)
(append-instr-seqs (list s1 s2))
(let ((first-reg (car regs)))
(if (and (needs-reg s2 first-reg)
(modifies-reg s1 first-reg))
(preserving (cdr regs)
(mk-instr-seq
(list-union (list first-reg)
(regs-needed s1))
(list-diff (regs-modified s1)
(list first-reg))
(append `((push ,first-reg))
(append (statements s1) `((pop ,first-reg)))))
s2)
(preserving (cdr regs) s1 s2))))))
There are two different kinds of ways that a sequence of instructions
can end. an instruction sequence that represents a function body ends
with a jump to an address stored in the cont
registers.
Other instruction sequences just flow naturally into each other.
The end-with-linkage
function inserts the appropriate
code depending on an argument called linkage
that is passed
as an argument. If this linkage
is 'return
code that returns from a function call is inserted and if the
linkage
is next
nothing is added at all.
The linkage can also be a label. In this case code that jumps to that label is inserted.
(define empty-instr-seq (mk-instr-seq '() '() '()))
(define compile-linkage
(lambda (linkage)
(if (= linkage 'return)
;; jmpcnt implies usage of cont register
(mk-instr-seq '(cont) '() '((jmpcnt)))
(if (= linkage 'next)
empty-instr-seq
(mk-instr-seq '() '() `((jmpimm ,linkage)))))))
(define end-with-linkage
(lambda (linkage seq)
(preserving '(cont)
seq
(compile-linkage linkage))))
Ok! All up to this point is just plumbing. Now we cget to write compilers for the different kinds of language constructs. This part differs more from the SICP implementation than previous sections but still is quite similar. The differences come from lispBM having a bit different set of special forms compared to the SICP dialect. Another difference is the "abstraction level" of the generated instructions, here the aim is at slightly lower level.
The different compile-X
functions take three arguments,
the expression to compile, the register to store the result in and a
linkage. The output is an instruction sequence.
An expression is considered self evaluating if it is a number or an array. In this case the expression is stored into the target registers using a move immediate instruction.
(define compile-self-evaluating
(lambda (exp target linkage)
(end-with-linkage linkage
(mk-instr-seq '()
(list target)
`((movimm ,target ,exp))))))
The idea is that this can be turned into bytecode consisting of one byte for instruction, one byte for target register and 4 bytes for the value. Since we have very few registers and instructions using a full byte for each is a huge waste. But it is ok to be working in this way for now. Later a more compact representation should be used. I don't want to do this "optimization" too early, before knowing exactly what other kinds of instructions or registers that may be added later.
Compilation of a symbol generates code that performs a lookup in the environment and stores the result of that lookup in a target register.
(define compile-symbol
(lambda (exp target linkage)
(end-with-linkage linkage
;; Implied that the lookup looks in env register
(mk-instr-seq '(env)
(list target)
`((lookup ,target ,exp))))))
Now it gets a bit complicated and also very different from how SICP does it. In SICP, a quoted expression doesn't seem to be compiled at all (as I understand it). Rather the expression is just "pointed to" from the compiled instructions sequence. I do not see how this would allow that the compiled result is stored and loaded into a fresh runtime system. Being able to load compiled source into a freshly started RTS is one of the goals with the compiler implemented here. There are Other problems with generating code that can be loaded into a fresh RTS, I will try to explain these in the closing section.
So, to be able to load in the compiled code that contains quoted expressions, my feeling is that the code mush reconstruct the quoted expression on the heap. This is my first attempt at this compilation and it has not been extensively tested, so there surely are buggy corner cases.
compile-quoted
passes the expression following the
quote
symbol to a function called compile-data
that traverses the heap structure while generating code that recreates
that same structure.
compile-data
calls either compile-data-list
or compile-data-prim
depending on the type of
subexpressions (if they are lists or not).
compile-data-list
calls compile-data-nest
that
then forms a mutually recursive loop.
compile-data-nest takes care of the preservation of registers needed when dealing with nested structures. The
preserving`
function is not used, but it is likely that the code could be
reimplemented using it.
(define compile-quoted
(lambda (exp target linkage)
(end-with-linkage linkage
(compile-data (car (cdr exp)) target))))
(define compile-data
(lambda (exp target)
(if (is-list exp)
(append-two-instr-seqs
(mk-instr-seq '() (list target)
`((movimm ,target nil)))
(compile-data-list (reverse exp) target))
(compile-data-prim exp target))))
(define compile-data-nest
(lambda (exp target)
(if (is-list exp)
(append-two-instr-seqs
(mk-instr-seq (list target) '()
`((push ,target)
(movimm ,target nil)))
(append-two-instr-seqs
(compile-data-list (reverse exp) target)
(mk-instr-seq (list target) (list target 'argl)
`((mov argl ,target)
(pop ,target)
(cons ,target argl)))))
(append-two-instr-seqs
(compile-data-prim exp 'argl)
(mk-instr-seq '(argl) (list target)
`((cons ,target argl)))))))
(define compile-data-prim
(lambda (exp target)
(mk-instr-seq '() (list target)
`((movimm ,target ,exp)))))
(define compile-data-list
(lambda (exp target)
(if (is-nil exp)
empty-instr-seq
(append-instr-seqs
(map (lambda (e)
(compile-data-nest e target))
exp)))))
Definitions are compiled into code that updates the global environment.
(define compile-def
(lambda (exp target linkage)
(let ((var (car (cdr exp)))
(get-value-code
(compile-instr-list (car (cdr (cdr exp))) 'val 'next)))
(end-with-linkage linkage
(append-two-instr-seqs get-value-code
(mk-instr-seq '(val) (list target)
`((setglbval ,var)
(movimm ,target ,var))))))))
The generated code uses setglbval
to update the
environment. It can be thought of as performing the following
operations.
global-env <- (cons (cons (var val)) global-env)
The target register will hold the bound variable after executing the generated sequence.
Lambda expressions are also slightly different in lispBM. The function is just a single expression and cannot be a list of expressions as in the SICP dialect.
Lambdas are compiled into code that creates a procedure "object" and a function body. The procedure object is the result of executing the lambda. This procedure can then later be called, which should result in a jump to the function body.
The code that generates the procedure object and the function body are stored consecutively in the instruction stream.
A procedure object is a list with first element 'proc
,
second element is a jump label and third element is an environment. This
is very similar to how in the evaluator a lambda is evaluated into a
closure that can be called later.
(define compile-lambda
(lambda (exp target linkage)
(let ((proc-entry (mk-label "entry"))
(after-lambda (mk-label "after-lambda"))
(lambda-linkage (if (= linkage 'next) after-lambda linkage)))
(append-two-instr-seqs
(tack-on-instr-seq
(end-with-linkage lambda-linkage
(mk-instr-seq '(env) (list target)
`((movimm ,target nil)
(cons ,target env)
(consimm ,target ,proc-entry)
(consimm ,target proc)))) ;; put symbol proc first in list
(compile-lambda-body exp proc-entry))
after-lambda))))
(define compile-lambda-body
(lambda (exp proc-entry)
(let ((formals (car (cdr exp))))
(append-two-instr-seqs
(append-two-instr-seqs
(mk-instr-seq '(env proc argl) '(env)
`(,proc-entry
(caddr env proc)))
(append-instr-seqs
(map (lambda (p)
(mk-instr-seq '(argl) '(env)
`((exenvargl ,p))))
formals)))
(compile-instr-list (car (cdr (cdr exp))) 'val 'return)))))
The only way to write a program that sequentially executes a number
of expressions in lispBM is the progn
form. Oh, actually a
top level program is also a list of sequential expressions. But in a
function, you need to use progn
.
The code that is generated for the sequence of expressions insert
register preserving pushes and pulls. To allow progn
to be
used in a function body of a recursive function and getting
tail-recursion, the last expression in the list must be surrounded by
these pushes and pulls. This is why there is a special case for the last
element.
(define compile-progn
(lambda (exp target linkage)
(if (is-last-element exp)
(compile-instr-list (car exp) target linkage)
(preserving '(env continue)
(compile-instr-list (car exp) target 'next)
(compile-progn (cdr exp) target linkage)))))
This is a first go at compiling let expressions. I think it will not allow the definition of mutual recursion as it stands now. Must fix this.
The generated code for a let expression will contain an
evenvval
instruction per variable to bind. Between these
instructions there will be some sequence generated by
compile-instr-list
that each output their result into the
val
register.
(define compile-let
(lambda (exp target linkage)
(append-two-instr-seqs
(append-instr-seqs (map compile-binding (car (cdr exp))))
(compile-instr-list (car (cdr (cdr exp))) target linkage))))
(define compile-binding
(lambda (keyval)
(let ((get-value-code
(compile-instr-list (car (cdr keyval)) 'val 'next))
(var (car keyval)))
(append-two-instr-seqs get-value-code
(mk-instr-seq '(val) (list 'env)
`((exenvval ,var)))))))
Function application is compiled by first compiling all of the
arguments. The arguments are all appended to a list created in the
argl
register (the actual arguments are stored as a list on
the heap, a pointer to the list is in argl).
(define compile-application
(lambda (exp target linkage)
(let ((proc-code
(if (is-fundamental (car exp))
(mk-instr-seq '() '(proc)
`((movimm proc ,(car exp))))
(compile-instr-list (car exp) 'proc 'next)))
(operand-codes (map (lambda (o) (compile-instr-list o 'val 'next)) (cdr exp))))
(preserving '(env cont)
proc-code
(preserving '(proc cont)
(construct-arglist operand-codes)
(if (is-fundamental (car exp))
(compile-fundamental-proc-call target linkage)
(if (is-lambda (car exp))
(compile-lambda-proc-call target linkage)
(compile-proc-call target linkage))))))))
(define add-fst-argument
(lambda (arg-code)
(append-two-instr-seqs
arg-code
(mk-instr-seq '(val) '(argl)
'((movimm argl ())
(cons argl val))))))
(define construct-arglist
(lambda (codes)
(let ((operand-codes (reverse codes)))
(if (is-nil operand-codes)
(mk-instr-seq '() '(argl)
'((movimm argl ())))
(if (is-nil (cdr operand-codes))
(add-fst-argument (car operand-codes))
(preserving '(env)
(add-fst-argument (car operand-codes))
(get-rest-args (cdr operand-codes))))))))
(define get-rest-args
(lambda (operand-codes)
(let ((code-for-next-arg
(preserving '(argl)
(car operand-codes)
(mk-instr-seq '(val argl) '(argl)
'((cons argl val))))))
(if (is-nil (cdr operand-codes))
code-for-next-arg
(preserving '(env)
code-for-next-arg
(get-rest-args (cdr operand-codes)))))))
As an optimization, to generate small code when it is known that the
procedure is a fundamental or the result of lambda, the specific
compile-fundamental-proc-call
or
compile-lambda-proc-call
functions can be used. When it is
unknown what type of procedure is at hand, the general
´compile-proc-call` function is used.
The general function inserts code that checks what kind of procedure is going to be called and then branches either to a fundamental or a lambda case.
(define compile-fundamental-proc-call
(lambda (target linkage)
(end-with-linkage linkage
(mk-instr-seq '(proc argl)
(list target)
'((callf))))))
(define compile-lambda-proc-call
(lambda (target linkage)
(let ((after-call (mk-label "after-call"))
(compiled-linkage (if (= linkage 'next)
after-call
linkage)))
(append-two-instr-seqs
(compile-proc-appl target compiled-linkage)
after-call))))
(define compile-proc-call
(lambda (target linkage)
(let ((fund-branch (mk-label "fund-branch"))
(compiled-branch (mk-label "comp-branch"))
(after-call (mk-label "after-call"))
(compiled-linkage (if (= linkage 'next)
after-call
linkage)))
(append-instr-seqs
(list (mk-instr-seq '(proc) '()
`((bpf ,fund-branch)))
(parallel-instr-seqs
(append-two-instr-seqs
compiled-branch
(compile-proc-appl target compiled-linkage))
(append-two-instr-seqs
fund-branch
(end-with-linkage linkage
(mk-instr-seq '(proc argl)
(list target)
'((callf))))))
after-call)))))
Jumping to and how to return from a procedure call is implemented in function below. It is split up into cases depending on target register and linkage type.
If linkage is a label and target is the val register, then code is inserted that sets the cont register to the label from the linkage, the function body label is extracted from the procedure object and then a jump instruction transfers execution to that label.
The other cases are slight modifications to this flow.
(define compile-proc-appl
(lambda (target linkage)
(if (and (= target 'val) (not (= linkage 'return)))
(mk-instr-seq '(proc) all-regs
`((movimm cont ,linkage)
(cadr val proc)
(jmpval)))
(if (and (not (= target 'val))
(not (= linkage 'return)))
(let ((proc-return (mk-label "proc-return")))
(mk-instr-seq '(proc) all-regs
`((movimm cont ,proc-return)
(cadr val proc)
(jmpval)
,proc-return
(mov ,target val)
(jmpimm ,linkage))))
(if (and (= target 'val)
(= linkage 'return))
(mk-instr-seq '(proc continue) all-regs
'((cadr val proc)
(jmpval)))
'compile-error)))))
First example is an application of a fundamental operation. This
generates code that sets up the argument list in argl
and
then issues callf
that will execute the fundamental
operation stored in the proc
register.
# (compile-instr-list '(+ 1 2) 'val 'next)
> (nil (proc argl val)
((movimm proc +)
(movimm val 2)
(movimm argl nil)
(cons argl val)
(movimm val 1)
(cons argl val)
(callf)))
The next example is an application of a lambda, tis results in code
that creates a procedure object in the proc
register and a
lambda function body. After setting up the procedure object, control is
transfered to the label that is directly after the function body. Here,
the arguments are added to the argl
list, cont
is set to a label to return to after the function application and a jump
to the function body is performed.
# (compile-instr-list '((lambda (x y) y) 10 20) 'val 'next)
> ((env) (env proc val argl cont)
((movimm proc nil)
(cons proc env)
(consimm proc (label "entry" 1))
(consimm proc proc)
(jmpimm (label "after-lambda" 2))
(label "entry" 1)
(caddr env proc)
(exenvargl x)
(exenvargl y)
(lookup val y)
(jmpcnt)
(label "after-lambda" 2)
(movimm val 20)
(movimm argl nil)
(cons argl val)
(movimm val 10)
(cons argl val)
(movimm cont (label "after-call" 3))
(cadr val proc)
(jmpval)
(label "after-call" 3)))
The last example is the result of compiling a quoted expression. This
results in code that recreates the quoted expression on the heap and
points to this structure from the val
register. Here
argl
is used as a temporary register while building the
data structure.
# (compile-instr-list ''(+ 1 2) 'val 'next)
> (nil (argl val)
((movimm val nil)
(movimm argl 2)
(cons val argl)
(movimm argl 1)
(cons val argl)
(movimm argl +)
(cons val argl)))
Since the goal is to be able to generate code that can be saved to disk and then later loaded into a fresh RTS, a problem with symbols presents itself.
Say I load the lisp source containing the code
(define apa 10)
into lispBM, then a mapping is created
between the text string "apa" and a symbol identifier in the symbol
table. The actual heap representation does not contain these
symbol-strings, rather just a value that via the symbol table is mapped
to a name. So the internal representation of `(define apa 10) could be
something like ([431431] [342432] 10) where the [] illustrates that
these are numbers of a different type to the number 10.
The code generated from the compiler, as it is currently implemented, will only contain the numerical symbol values. This means that if compiled code is loaded into a fresh RTS, no binding between the numerical symbol identifier and the string "apa" will exist. This will be a problem if the compiled code is supposed to return a symbol and that symbol should be presented to the user. There would be no way to do that!
So some way to store the symbol mapping into the generated code must be designed. A first thought here is to store a symbol renumbering mapping in the code. This mapping would contain string names of symbols and the value it has in the code. Then when loading the code the symbols will be registered with the RTS and the values provided from the RTS for the symbol should be substituted into the code. This has a problem though. If the there are two symbols in the code "apa" and "bepa" with ID 42 and 43, lets say when loading this into the RTS the renumbering for apa is 43 so we replace all 42 with 43 in the code. Then "bepa" is given renumbering 768, so we replace all 43s with 768s in the code. Now there are no apas in the code, only bepas...
I currently plan to give symbols in the code IDs pulled from a completely different type compared to symbol ids. I call this type symbol-indirection
So the compiler should be augmented with functions that find symbols
used (that are not special always present symbols at unique ids, such as
+
, define
and so on) and renames it using a
symbol-indirection value. A mapping of "name" to
symbol-indirection-value will be created and stored into the code. This
should rule out that problem with renaming where a name ends up
overwritten since the replacement will be into values of a completely
different type.
This is by far the largest program written in lispBM so far and of course many bugs have been found while writing it. I'm sure there are more bugs lurking.
Thanks for reading! As usual I would be very thankful for helpful feedback.
Todos
mk-instr-seq
could be rewritten to derive the sets
of used and modified registers from the instruction sequence.