This text is a more in depth look at the code that evaluates expressions in lispBM. I hope that It can be of value to someone, just like the Lisperator code was useful to me.
This code is work in progress and some things look the way they do currently because there is some vague thought of a future direction it may go in. I will try to point out these extra complexities that currently have no direct purpose.
The goal is that the evaluator should have a certain set of properties.
Tail-calls and even infinite recursion (in tail-position) should be possible without ever-growing memory consumption.
At any point where heap space is allocated it must be possible to pause the computation, perform garbage collection (gc), and then pick up execution again where the computation was paused.
Evaluation should be correct in relation to environments (such
that the body of a let is evaluated in an environment extended with the
bindings). There are also other examples of where similar properties
should hold, such as when lambdas are evaluated into
closures.
Evaluating an expression should give the expected result.
Writing the code that does all 4 of the above is (at least to me) a bit tricky and I am not entirely sure that every path through the evaluator really maintains these properties yet. As problems shows up, they are addressed.
After writing down the overview of lispBM, some
possible simplifications was spotted. While refactoring these I also
noticed a case of violating property 1. The code that evaluates the
PROGN functionality was written in a way that led to
ever-growing memory use if involved in a infinite recursion. For
example:
(define f
(lambda (x)
(progn (print "hello")
(f (+ x 1)))))
But the problem was possible to fix. I will point out where I fell in the trap later.
Property 1, is tackled by implementing the evaluator in continuation passing style and as a while loop with explicitly managed state. For example, the explicit continuation stack keeps track of arguments to the continuation and which continuation to apply. Care has to be taken that the interactions with the stack happen in the correct order so that we do not end up with ever-growing stack usage. One way to put it is that nothing should be pushed onto the stack before a tail-call and then popped when finally the tail-call returns.
Property 2, is handled at each place where cons-cells are allocated. The exact details differ a little from one place to another, so I will point this out in the code further down. But in essence what it means is that the stack and the evaluation context is set to a previously known state and then the evaluation loop is re-entered with a perform gc flag set. The evaluator will then perform GC in the next iteration. When GC is done the next iteration of the evaluator will continue work with the evaluation context.
Property 3, is dealt with in the cases involving define,
let and lambda. Either by making changes to
the globally defined environment (in the case of define) or
by updating the local environment that the evaluation context
maintains.
Property 4, is loosely checked by a growing set of tests located
under the tests directory.
There are also a set of flags, perform_gc,
apply_continuation and done. These are set in
the evaluator loop to signal to itself what it should be doing in the
next iteration. done is set when there is an error that it
is not possible to recover from. apply_continuation is set
if the next iteration should pop information off the stack and process
that. This happens for example when evaluation results in base type
(such as a number). That number may be a part of an addition as in (+ 1
2). So in the next iteration of the loop the continuation will contain
information about what to do with that number. The
perform_gc flag is set (as hinted above), when heap is full
and we cannot proceed unless gc is executed.
There a couple of often used functions throughout the
eval_cps code that comes from other parts of lispBM. Many
of these are called enc_X where X can be
something like sym, i, u and so
on. These functions encode values from the C world into lispBM
VALUEs, for example enc_i turns a C integer
into a lispBM VALUE representing a 28Bit integer. Similarly
there are functions called dec_X that transform values in
the other direction.
Now to the code! Evaluation makes use of almost everything else. So most of the lispBM header files are included.
#include "symrepr.h"
#include "heap.h"
#include "env.h"
#include "eval_cps.h"
#include "stack.h"
#include "fundamental.h"
#include "extensions.h"
#ifdef VISUALIZE_HEAP
#include "heap_vis.h"
#endif
Below are the new set of continuation functions (after changes
applied since the earlier
text on lispBM). There are now only 7 continuation functions instead
of 9. The difference is that before there was the three continuations
FUNCTION, FUNCTION_APP and
ARG_LIST that are now replaced by just
APPLICATION and APPLICATION_ARGS. There was
also one continuation function that was no longer used at all and could
be dropped.
#define DONE 1
#define SET_GLOBAL_ENV 2
#define BIND_TO_KEY_REST 3
#define IF 4
#define PROGN_REST 5
#define APPLICATION 6
#define APPLICATION_ARGS 7
There is a little bit of global state maintained by the evaluator. It
consists of the global environment (where things defined
go) and the evaluation context. There are also some VALUEs,
NIL and NONSENSE these symbols are just given
names that are easier to write and stand out a bit as they are used
quite frequently.
static VALUE eval_cps_global_env;
static VALUE NIL;
static VALUE NONSENSE;
eval_context_t *eval_context = NULL;
Then there are a bunch of functions that handle evaluation contexts. These are not very important now. They are written in this way because of a vague plan to let the evaluator work on more than one context where each context represents a task. Since it is possible to pause the evaluation of a context it should be possible to also implement task switching by pausing one context and then activating another.
eval_context_t *eval_cps_get_current_context(void) {
return eval_context;
}
eval_context_t *eval_cps_new_context_inherit_env(VALUE program, VALUE curr_exp) {
eval_context_t *ctx = malloc(sizeof(eval_context_t));
ctx->program = program;
ctx->curr_exp = curr_exp;
ctx->curr_env = eval_context->curr_env; /* TODO: Copy environment */
ctx->K = stack_init(100, true);
ctx->next = eval_context;
eval_context = ctx;
return ctx;
}
void eval_cps_drop_top_context(void) {
eval_context_t *ctx = eval_context;
eval_context = eval_context->next;
stack_del(ctx->K);
free(ctx);
}
The eval_cps_get_env is meant to be called by the REPL
implementation in case it is interested in knowing the contents of the
environment.
VALUE eval_cps_get_env(void) {
return eval_cps_global_env;
}
The eval_cps_bi_eval function connects to the
fundamental.c module and provides the fundamental function
called eval.
VALUE eval_cps_bi_eval(VALUE exp) {
eval_context_t *ctx = eval_cps_get_current_context();
ctx->curr_exp = exp;
return run_eval(ctx);
}
The eval_cps_init functions initializes the evaluation
state. It takes an argument that decides whether or not the continuation
stack is allowed to grow or if it will be of fixed size. The global
environment is initialized and a single mapping is added that map
nil to nil. The global evaluation context is
allocated.
int eval_cps_init(bool grow_continuation_stack) {
int res = 1;
NIL = enc_sym(symrepr_nil());
NONSENSE = enc_sym(symrepr_nonsense());
eval_cps_global_env = NIL;
eval_context = (eval_context_t*)malloc(sizeof(eval_context_t));
eval_context->K = stack_init(100, grow_continuation_stack);
VALUE nil_entry = cons(NIL, NIL);
eval_cps_global_env = cons(nil_entry, eval_cps_global_env);
if (type_of(nil_entry) == VAL_TYPE_SYMBOL ||
type_of(eval_cps_global_env) == VAL_TYPE_SYMBOL) res = 0;
return res;
}
The following function frees up the allocated state used by the evaluator.
void eval_cps_del(void) {
stack_del(eval_context->K);
free(eval_context);
}
The function that is called from the REPL is called
eval_cps_program and takes a list of expressions as
argument. This function currently modifies the global evaluation context
and then loops over the list of expressions and evaluates them from
first to last using the run_eval function. the result of
the last expression is returned to the caller.
VALUE eval_cps_program(VALUE lisp) {
eval_context_t *ctx = eval_cps_get_current_context();
ctx->program = lisp;
VALUE res = NIL;
VALUE curr = lisp;
if (symrepr_is_error(dec_sym(lisp))) return lisp;
while (type_of(curr) == PTR_TYPE_CONS) {
if (ctx->K->sp > 0) {
stack_clear(ctx->K); // clear stack if garbage left from failed previous evaluation
}
ctx->curr_exp = car(curr);
ctx->curr_env = NIL;
res = run_eval(ctx);
curr = cdr(curr);
}
return res;
}
So now all the surrounding functions are covered, somewhat, and we
can go in depth with the evaluation function, run_eval.
The run eval function starts out with pushing the DONE
continuation onto the stack and defines some state needed throughout the
evaluation:
VALUE run_eval(eval_context_t *ctx){
push_u32(ctx->K, enc_u(DONE));
VALUE r = NIL;
bool done = false;
bool perform_gc = false;
bool app_cont = false;
uint32_t non_gc = 0;
The r value is what will hold the result of
evaluating the expression. r also holds intermediate
results throughout evaluation.
The done flag signals that evaluation is finished,
either successfully or ended in an error.
The perform_gc flag, if set, indicates that this
iteration of the evaluation loop will include a round of garbage
collection.
The app_cont flag signals that a continuation should
be applied at this point.
There is a counter non_gc that counts how many time
the loop iterated without the garbage collector running. So if
perform_gc is set and the counter is 0, we are really
entirely out of memory. I think the counter should be replaced with a
boolean flag that indicates if GC was run in the iteration
before.
After defining up these bookkeeping variables, the loop that runs
while (!done) is started.
while (!done) {
#ifdef VISUALIZE_HEAP
heap_vis_gen_image();
#endif
heap_vis is a module of lispBM that can be used for
debugging purposes. Every time heap_vis_gen_image() is
called an image is generated that has one pixel per cons-cell on the
heap. Different colors in this image represent things like
free, marked and in use.
Next there is a check if this iteration will include a round of garbage collection.
if (perform_gc) {
if (non_gc == 0) {
done = true;
r = enc_sym(symrepr_merror());
continue;
}
non_gc = 0;
heap_perform_gc_aux(eval_cps_global_env,
ctx->curr_env,
ctx->curr_exp,
ctx->program,
r,
ctx->K->data,
ctx->K->sp);
perform_gc = false;
} else {
non_gc ++;
}
If perform_gc is set, and we are not in the problematic
case when gc was run last iteration, a call to
heap_perform_gc_aux will happen. The arguments to this
functions are the things that the mark phase should mark as being in
use. This involves the continuation stack! The continuation stack can
hold pointers to heap structures that are needed in the future and that
are not pointed to by any environment.
If the app_cont flag is set, the execution path goes
into a function called apply_continuation that takes all
the status flags as well as the current intermediate result
r as input. The apply_continuation function
will pop the top off from the stack and use the value stored there to
decide what continuation (from the list of defined continuations) to
apply. The r value is given to this continuation but it may
also pop additional arguments from the stack.
if (app_cont) {
r = apply_continuation(ctx, r, &done, &perform_gc, &app_cont);
continue;
}
Now it is time for the big switch statement that
evaluates each of the different language constructs. The value
head is used later when the current expression is a list,
depending on what the head of that list is evaluation will take
different paths. The value variable is a temporary used
throughout.
VALUE head;
VALUE value = enc_sym(symrepr_eerror());
switch (type_of(ctx->curr_exp)) {
The first case deals with symbols. When a symbol is evaluated it is
looked up in the environment, local and global. A symbol can, however,
also refer to a fundamental function (such as +) or to an
extension (user provided c function). If the symbol is not found in any
environment and does not represent a fundamental or extension an
evaluation error is returned and the done flag is set.
case VAL_TYPE_SYMBOL:
value = env_lookup(ctx->curr_exp, ctx->curr_env);
if (type_of(value) == VAL_TYPE_SYMBOL &&
dec_sym(value) == symrepr_not_found()) {
value = env_lookup(ctx->curr_exp, eval_cps_global_env);
if (type_of(value) == VAL_TYPE_SYMBOL &&
dec_sym(value) == symrepr_not_found()) {
if (is_fundamental(ctx->curr_exp)) {
value = ctx->curr_exp;
} else if (extensions_lookup(dec_sym(ctx->curr_exp)) == NULL) {
r = enc_sym(symrepr_eerror());
done = true;
continue;
} else {
value = ctx->curr_exp; // symbol representing extension
// evaluates to itself at this stage.
}
}
}
app_cont = true;
r = value;
break;
Next the switch statement deals with the other basic types, numbers
and arrays, that don't need further evaluation but rather just that they
will be passed as argument to some continuation. This is done by setting
the app_cont flag and setting r to the what we
want passed to the continuation as argument.
case PTR_TYPE_BOXED_F:
case PTR_TYPE_BOXED_U:
case PTR_TYPE_BOXED_I:
case VAL_TYPE_I:
case VAL_TYPE_U:
case VAL_TYPE_CHAR:
case PTR_TYPE_ARRAY:
app_cont = true;
r = ctx->curr_exp;
break;
The next two cases ref and stream are not
implemented and are currently just vague ideas.
case PTR_TYPE_REF:
case PTR_TYPE_STREAM:
r = enc_sym(symrepr_eerror());
done = true;
break;
So, if the expression is not a symbol or other basic type, it should be a list. The next case checks if the expression is a list and then depending on what the head of that list is the evaluation takes different paths.
case PTR_TYPE_CONS:
head = car(ctx->curr_exp);
The head of the list is stored in the head variable for
easy access.
if (type_of(head) == VAL_TYPE_SYMBOL) {
// Special form: QUOTE
if (dec_sym(head) == symrepr_quote()) {
r = car(cdr(ctx->curr_exp));
app_cont = true;
continue;
}
The QUOTE case is easy. It corresponds to an expression
such as for example '(+ 1 2), so r is set to
(+ 1 2) and then the continuation is applied.
The define form takes two arguments, a symbol and an
expression. The expression is evaluated and then the result of that
evaluation is stored in a mapping from the symbol to the value in the
environment. This will be the first case where a continuation is
created. This function also performs a check to see that the programmer
is not rebinding nil.
// Special form: DEFINE
if (dec_sym(head) == symrepr_define()) {
VALUE key = car(cdr(ctx->curr_exp));
VALUE val_exp = car(cdr(cdr(ctx->curr_exp)));
if (type_of(key) != VAL_TYPE_SYMBOL ||
key == NIL) {
done = true;
r = enc_sym(symrepr_eerror());
continue;
}
push_u32_2(ctx->K, key, enc_u(SET_GLOBAL_ENV));
ctx->curr_exp = val_exp;
continue;
}
The key and val_exp variables are set to
the arguments to define. Then at the end, the
key and the continuation SET_GLOBAL_ENV are
pushed to the continuation stack. The current expression is set to
val_exp. This means that in the next iteration the
evaluation loop will evaluate the val_exp. When evaluating
val_exp is done, the continuation will be called with the
result of that evaluation and an additional argument, the
key, that is already on the stack. So when entering into
the SET_GLOBAL_ENV continuation all the data is available
to create the global binding.
The progn takes a sequence of expressions as arguments
that are all supposed to be evaluated in turn, from the first to the
last. The result of evaluating the last expression in the sequence is
the result of the whole thing. There is a check for an empty sequence of
expressions which results in nil otherwise a continuation
is created in this case as well.
// Special form: PROGN
if (dec_sym(head) == symrepr_progn()) {
VALUE exps = cdr(ctx->curr_exp);
if (type_of(exps) == VAL_TYPE_SYMBOL && exps == NIL) {
r = enc_sym(symrepr_nil());
app_cont = true;
continue;
}
if (symrepr_is_error(exps)) {
r = exps;
done = true;
continue;
}
push_u32_2(ctx->K, cdr(exps), enc_u(PROGN_REST));
ctx->curr_exp = car(exps);
continue;
}
The progn case sets up for the continuation by setting
the current expression to the head of the sequence of expressions. the
pointer to the rest of the list is pushed onto the stack as
well as the PROGN_REST continuation. The
PROGN_REST continuation thus has two arguments to work with
once it gets called. The result of the previously evaluated expression
(head) and the rest of the sequence. Later in this text we will see
exactly what the continuation functions does in all of these cases.
The lambda case does not create a continuation but it is
interesting for another reason. This is the first example of a case that
allocates heap memory and thus must be pausable and restartable in the
case that there is no free heap space.
// Special form: LAMBDA
if (dec_sym(head) == symrepr_lambda()) {
VALUE env_cpy = env_copy_shallow(ctx->curr_env);
if (type_of(env_cpy) == VAL_TYPE_SYMBOL &&
dec_sym(env_cpy) == symrepr_merror()) {
perform_gc = true;
app_cont = false;
continue; // perform gc and resume evaluation at same expression
}
VALUE env_end;
VALUE body;
VALUE params;
VALUE closure;
env_end = cons(env_cpy,NIL);
body = cons(car(cdr(cdr(ctx->curr_exp))), env_end);
params = cons(car(cdr(ctx->curr_exp)), body);
closure = cons(enc_sym(symrepr_closure()), params);
if (type_of(env_end) == VAL_TYPE_SYMBOL ||
type_of(body) == VAL_TYPE_SYMBOL ||
type_of(params) == VAL_TYPE_SYMBOL ||
type_of(closure) == VAL_TYPE_SYMBOL) {
perform_gc = true;
app_cont = false;
continue; // perform gc and resume evaluation at same expression
}
app_cont = true;
r = closure;
continue;
}
The lambda starts out by doing a shallow copy of the
local environment, this means that skeleton of the environment is
traversed and copied but the copy will point to the same mappings as the
original. So only enough new cons-cells are used to for the
skeleton-structure of the environment, none for the data itself. This
consing can fail with an memory error,
symrepr_merror(), symbol as result in which case we are out
of heap. In this case the perform_gc flag is set and
app_cont is cleared. An important detail is that it does
not alter the current expression which means that once the evaluation
starts up again the current expression will still be the same lambda
case. Thus evaluation will resume. There are more complicated cases of
this where changes to the stack have been made and we have to restore
this to proper state before continuing, this will appear later.
After copying the environment, a closure is created from
the environment copy and the contents of the lambda. This
again uses cons and allocates heap space and is followed by a similar
check for a memory error. If it all works out though,
app_cont is set and r is set to the
closure and evaluation continues.
Now the if case:
// Special form: IF
if (dec_sym(head) == symrepr_if()) {
push_u32_3(ctx->K,
car(cdr(cdr(cdr(ctx->curr_exp)))), // Else branch
car(cdr(cdr(ctx->curr_exp))), // Then branch
enc_u(IF));
ctx->curr_exp = car(cdr(ctx->curr_exp));
continue;
}
The if case sets up for a continuation that takes the
then and else branch expressions as arguments. Then the current
expression is set to be the boolean expression that is used to select
then or else. So once the boolean has been evaluated we will enter into
the continuation that can then set up the system for either evaluating
the then or the else branch.
The let takes two arguments, a list of bindings to
extend the local environment with and an expression to evaluate in the
extended environment. So, to begin with these pieces are bound to
variables orig_env, binds and exp
for easy access.
// Special form: LET
if (dec_sym(head) == symrepr_let()) {
VALUE orig_env = ctx->curr_env;
VALUE binds = car(cdr(ctx->curr_exp)); // key value pairs.
VALUE exp = car(cdr(cdr(ctx->curr_exp))); // exp to evaluate in the new env.
VALUE curr = binds;
VALUE new_env = orig_env;
if (type_of(binds) != PTR_TYPE_CONS) {
// binds better be nil or there is a programmer error.
ctx->curr_exp = exp;
continue;
}
// Implements letrec by "preallocating" the key parts
while (type_of(curr) == PTR_TYPE_CONS) {
VALUE key = car(car(curr));
VALUE val = NIL;
VALUE binding;
binding = cons(key, val);
new_env = cons(binding, new_env);
if (type_of(binding) == VAL_TYPE_SYMBOL ||
type_of(new_env) == VAL_TYPE_SYMBOL) {
perform_gc = true;
app_cont = false;
continue;
}
curr = cdr(curr);
}
VALUE key0 = car(car(binds));
VALUE val0_exp = car(cdr(car(binds)));
push_u32_5(ctx->K, exp, cdr(binds), new_env,
key0, enc_u(BIND_TO_KEY_REST));
ctx->curr_exp = val0_exp;
ctx->curr_env = new_env;
continue;
}
The kind of let implemented here allows bindings defined
early in the list of bindings to be used in the definition of the later
ones. This is accomplished by a two phase approach where first the
environment containing the bindings is created using just the keys (and
mapping each of the keys to nil). Then the expressions whose results are
to be bound to the keys are evaluated one after the other, in order, in
this new environment. Once an expression is evaluated the mapping for
that key (that currently is nil) is updated. Of course it
is not possible to use a later binding in a currently evaluated
expression as that binding would return 'nil` rather than what it is
supposed to.
So after setting up the environment with key -
nil bindings, a continuation is created and the current
expression is set to evaluate the first value to be bound.
Now we have dealt with all special forms (but one) the application form. Once getting to this point we just assume that if the current expression is a list, then it is a function application. This may turn out to be wrong (as it may also be a programmer mistake) in which case evaluation results in an error.
} // If head is symbol
push_u32_4(ctx->K,
ctx->curr_env,
enc_u(0),
cdr(ctx->curr_exp),
enc_u(APPLICATION_ARGS));
ctx->curr_exp = head; // evaluate the function
continue;
This case creates a continuation that needs to have access to the
current environment, a counter enc_u(0) (to count
arguments), the list of arguments cdr(ctx->curr_exp) and
the continuation itself is called APPLICATION_ARGS. The
current expression is set to the head of the list. This will be
evaluated next and will result in either a function object of
some kind or an error. A function object could for example be a closure
or a symbol pointing out a fundamental operation or an extension.
This concludes the evaluation function. The rest just checks for some
serious errors and if the loop is exited r is returned as
the result of evaluation.
default:
// BUG No applicable case!
done = true;
r = enc_sym(symrepr_eerror());
break;
}
} // while (!done)
return r;
}
So far there have been a number of examples of the creation of
continuations but what happens when these are applied is still untold.
The apply_continuation function, that takes care of this
continuation application, is about the same size as
run_eval and really they interact at a quite deep level.
run_eval calls apply_continuation and
apply_continuation changes the evaluation context meaning
it influences what the next iteration of the evaluation loop does.
VALUE apply_continuation(eval_context_t *ctx, VALUE arg, bool *done, bool *perform_gc, bool *app_cont){
VALUE k;
pop_u32(ctx->K, &k);
VALUE res;
*app_cont = false;
apply_continuation starts out by popping of the top of
the stack that should contain one of the continuation function
identifiers defined close to the top of this document. The rest of the
function is, just like the evaluation loop, a large switch
statement.
switch(dec_u(k)) {
case DONE:
*done = true;
return arg;
In case we have reached the DONE continuation, pushed
onto the stack when entering run_eval, we are done, the
done flag is set and the argument to the continuation
arg is returned. arg corresponds to the
variable r in the evaluator.
case SET_GLOBAL_ENV:
res = cont_set_global_env(ctx, arg, done, perform_gc);
if (!(*done))
*app_cont = true;
return res;
The SET_GLOBAL_ENV continuation is broken out into a
separate function. I cannot really decide if I should break all of these
cases out into functions or not. The cont_set_global_env
function is shown further down.
Now, the PROGN continuation is where a property one
breaking mistake was made. Towards the end of the function there is a
comment // allow for tail recursion right before a
conditional that checks if the rest of the sequence of expressions is
nil. If it is nil, we are currently in tail-call position and should not
push anything to the stack that we have to potentially wait a long time
before it is freed. Forgetting about this special case means a nil is
pushed to the stack, only to be dealt with once the tail-call is
handled.
case PROGN_REST: {
VALUE rest;
pop_u32(ctx->K, &rest);
if (type_of(rest) == VAL_TYPE_SYMBOL && rest == NIL) {
res = arg;
*app_cont = true;
return res;
}
if (symrepr_is_error(rest)) {
res = rest;
*done = true;
return res;
}
// allow for tail recursion
if (type_of(cdr(rest)) == VAL_TYPE_SYMBOL &&
cdr(rest) == NIL) {
ctx->curr_exp = car(rest);
return NONSENSE;
}
// Else create a continuation
push_u32_2(ctx->K, cdr(rest), enc_u(PROGN_REST));
ctx->curr_exp = car(rest);
return NONSENSE;
}
The PROGN case sets up the head of the sequence of
expressions to evaluated in the next iteration of the evaluation loop
and pushes the rest of the sequence along with the same
PROGN_REST continuation onto the stack.
The APPLICATION case is a bit long and messy but really
tricky compared to the other cases. It just gets a bit messy because
there are a number of things that can be applied. There are
closures, fundamental operations and
extensions and each of these get a special case in the
APPLICATION continuation.
When entering into the APPLICATION case, all the
arguments, a counter and the function are present of the stack. in the
case of fundamental or extension operations
this knowledge is utilized and a pointer to the first argument on the
stack is passed to the function. When it comes to the closures however,
the arguments are extracted from the stack and added to an augmented
environment for the closures body to evaluate in.
It is important that after running a fundamental or an
extension the stack is cleared, or we will fall into the
property 1 issue again.
case APPLICATION: {
VALUE count;
pop_u32(ctx->K, &count);
UINT *fun_args = stack_ptr(ctx->K, dec_u(count)+1);
VALUE fun = fun_args[0];
if (type_of(fun) == PTR_TYPE_CONS) { // a closure (it better be)
VALUE args = NIL;
for (UINT i = dec_u(count); i > 0; i --) {
args = cons(fun_args[i], args);
if (type_of(args) == VAL_TYPE_SYMBOL) {
push_u32_2(ctx->K, count, enc_u(APPLICATION));
*perform_gc = true;
*app_cont = true;
return fun;
}
}
VALUE params = car(cdr(fun));
VALUE exp = car(cdr(cdr(fun)));
VALUE clo_env = car(cdr(cdr(cdr(fun))));
if (length(params) != length(args)) { // programmer error
*done = true;
return enc_sym(symrepr_eerror());
}
VALUE local_env = env_build_params_args(params, args, clo_env);
if (type_of(local_env) == VAL_TYPE_SYMBOL) {
if (dec_sym(local_env) == symrepr_merror() ) {
push_u32_2(ctx->K, count, enc_u(APPLICATION));
*perform_gc = true;
*app_cont = true;
return fun;
}
if (dec_sym(local_env) == symrepr_fatal_error()) {
return local_env;
}
}
stack_drop(ctx->K, dec_u(count)+1);
ctx->curr_exp = exp;
ctx->curr_env = local_env;
return NONSENSE;
} else if (type_of(fun) == VAL_TYPE_SYMBOL) {
VALUE res;
if (is_fundamental(fun)) {
res = fundamental_exec(&fun_args[1], dec_u(count), fun);
if (type_of(res) == VAL_TYPE_SYMBOL &&
dec_sym(res) == symrepr_eerror()) {
*done = true;
return res;
} else if (type_of(res) == VAL_TYPE_SYMBOL &&
dec_sym(res) == symrepr_merror()) {
push_u32_2(ctx->K, count, enc_u(APPLICATION));
*perform_gc = true;
*app_cont = true;
return fun;
}
stack_drop(ctx->K, dec_u(count) + 1);
*app_cont = true;
return res;
}
}
// It may be an extension
extension_fptr f = extensions_lookup(dec_sym(fun));
if (f == NULL) {
*done = true;
return enc_sym(symrepr_eerror());
}
VALUE ext_res = f(&fun_args[1] , dec_u(count));
if (type_of(ext_res) == VAL_TYPE_SYMBOL &&
(dec_sym(ext_res) == symrepr_merror())) {
push_u32_2(ctx->K, count, enc_u(APPLICATION));
*perform_gc = true;
*app_cont = true;
return fun;
}
stack_drop(ctx->K, dec_u(count) + 1);
*app_cont = true;
return ext_res;
}
The `APPLICATION_ARGS' continuation handles evaluation of the list of arguments to a function.
case APPLICATION_ARGS: {
VALUE count;
VALUE env;
VALUE rest;
pop_u32_3(ctx->K, &rest, &count, &env);
push_u32(ctx->K, arg);
if (type_of(rest) == VAL_TYPE_SYMBOL &&
rest == NIL) {
// no more arguments
push_u32_2(ctx->K, count, enc_u(APPLICATION));
*app_cont = true;
return NONSENSE;
}
push_u32_4(ctx->K, env, enc_u(dec_u(count) + 1), cdr(rest), enc_u(APPLICATION_ARGS));
ctx->curr_exp = car(rest);
ctx->curr_env = env;
return NONSENSE;
}
Here the current expression is set up to evaluate the head of the list of the rest of the arguments. A continuation is created for evaluating the rest of the arguments. When creating this continuation the number-of-arguments counter is incremented.
Each time APPLICATION_ARGS is entered the previously
evaluated argument (or the function in case of the first call to
APPLICATION_ARGS) is pushed onto the stack. This means that
the stack is now also the vehicle for transport of arguments to the
functions. When the list of arguments have been fully evaluated, the
stack will contain the function, all of the arguments and the counter
value and a APPLICATION continuation is created.
The BIND_TO_KEY_REST continuation, that is created in
the let case of the evaluation loop, takes care of
evaluating all the expressions to be bound to variables in the local
environment.
case BIND_TO_KEY_REST:{
VALUE key;
VALUE env;
VALUE rest;
pop_u32_3(ctx->K, &key, &env, &rest);
env_modify_binding(env, key, arg);
if ( type_of(rest) == PTR_TYPE_CONS ){
VALUE keyn = car(car(rest));
VALUE valn_exp = car(cdr(car(rest)));
push_u32_4(ctx->K, cdr(rest), env, keyn, enc_u(BIND_TO_KEY_REST));
ctx->curr_exp = valn_exp;
ctx->curr_env = env;
return NONSENSE;
}
// Otherwise evaluate the expression in the populated env
VALUE exp;
pop_u32(ctx->K, &exp);
ctx->curr_exp = exp;
ctx->curr_env = env;
return NONSENSE;
}
Each time BIND_TO_KEY_REST is called the environment is
modified with the recently evaluated value. Once all bindings have been
dealt with, the let body is set up to be evaluated within the newly
formed local environment.
The IF continuation gets the result of evaluating the
boolean condition as an argument. It then sets up for evaluation of
either the then or the else branch in the evaluation loop.
case IF: {
VALUE then_branch;
VALUE else_branch;
pop_u32_2(ctx->K, &then_branch, &else_branch);
if (type_of(arg) == VAL_TYPE_SYMBOL && dec_sym(arg) == symrepr_true()) {
ctx->curr_exp = then_branch;
} else {
ctx->curr_exp = else_branch;
}
return NONSENSE;
}
Now, if we do reach the end of this switch, something is very wrong and an error is return and evaluation is aborted.
} // end switch
*done = true;
return enc_sym(symrepr_eerror());
}
Back to that broken out function that sets the global environment. It
does the work needed for the SET_GLOBAL_ENV continuation.
The key to use is on the stack and the value is passed as
argument to the continuation. Not we set up a new environment which is
the key - val pair consed onto the old global
environment. This consing (that is done inside of the
env_set function) can fail as usual and we need to be able
to be able to pause the computation and perform garbage collection.
However, if all works out fine, the global variable
eval_cps_global_env is updated with the value of the newly
created one.
VALUE cont_set_global_env(eval_context_t *ctx, VALUE val, bool *done, bool *perform_gc){
VALUE key;
pop_u32(ctx->K, &key);
VALUE new_env = env_set(eval_cps_global_env,key,val);
if (type_of(new_env) == VAL_TYPE_SYMBOL) {
if (dec_sym(new_env) == symrepr_merror()) {
push_u32_2(ctx->K, key, enc_u(SET_GLOBAL_ENV));
*perform_gc = true;
return val;
}
if (dec_sym(new_env) == symrepr_fatal_error()) {
*done = true;
return new_env;
}
}
eval_cps_global_env = new_env;
return enc_sym(symrepr_true());
}
In order to see what happens when we fail to honor property 1, the
push_u32 function that is used in all functions that push
onto the stack is augmented with a print statement. This print statement
prints the stack pointer sp and the value that is being
pushed onto the stack. This experiment is run with a stack that does not
allow growing, this is checked internally in the function
stack_grow and it will not reallocate the stack if this
flag is set.
int push_u32(stack *s, UINT val) {
int res = 1;
s->data[s->sp] = val;
/* Added printf */
printf("Stack sp %d : ",s->sp); simple_print(val); printf("\n");
s->sp++;
if ( s->sp >= s->size) {
res = stack_grow(s);
}
return res;
}
The PROGN_REST continuation is a good target to
introduce this error in. Below is what it looks like with the code that
makes it //allow for tail recursion.
case PROGN_REST: {
VALUE rest;
pop_u32(ctx->K, &rest);
if (type_of(rest) == VAL_TYPE_SYMBOL && rest == NIL) {
res = arg;
*app_cont = true;
return res;
}
if (symrepr_is_error(rest)) {
res = rest;
*done = true;
return res;
}
// allow for tail recursion
if (type_of(cdr(rest)) == VAL_TYPE_SYMBOL &&
cdr(rest) == NIL) {
ctx->curr_exp = car(rest);
return NONSENSE;
}
// Else create a continuation
push_u32_2(ctx->K, cdr(rest), enc_u(PROGN_REST));
ctx->curr_exp = car(rest);
return NONSENSE;
}
So let's modify the code by removing that little block. Now it looks as follows.
case PROGN_REST: {
VALUE rest;
pop_u32(ctx->K, &rest);
if (type_of(rest) == VAL_TYPE_SYMBOL && rest == NIL) {
res = arg;
*app_cont = true;
return res;
}
if (symrepr_is_error(rest)) {
res = rest;
*done = true;
return res;
}
// Else create a continuation
push_u32_2(ctx->K, cdr(rest), enc_u(PROGN_REST));
ctx->curr_exp = car(rest);
return NONSENSE;
}
Now if we build lispBM and the REPL with this faulty code and enter the following program which is a tail-recursive run-forever program.
(define f
(lambda (x)
(progn (print "hello")
(f (+ x 1)))))
The following is the result:
Lisp REPL started!
Type :quit to exit.
:info for statistics.
# (define f (lambda (x) (progn (print "hello") (f (+ x 1)))))
Stack sp 0 : 1
Stack sp 1 : f
Stack sp 2 : 2
> t
# (f 0)
Stack sp 0 : 1
Stack sp 1 : nil
Stack sp 2 : 0
Stack sp 3 : (0 nil)
Stack sp 4 : 8
Stack sp 1 : (closure ((x nil) ((progn ((print ("hello" nil)) ((f ((+ (x (1 nil))) nil)) nil))) (nil nil))))
Stack sp 2 : nil
Stack sp 3 : 1
Stack sp 4 : nil
Stack sp 5 : 8
Stack sp 2 : 0
Stack sp 3 : 1
Stack sp 4 : 7
Stack sp 1 : ((f ((+ (x (1 nil))) nil)) nil)
Stack sp 2 : 6
Stack sp 3 : ((x 0) nil)
Stack sp 4 : 0
...
Stack sp 98 : 48
Stack sp 99 : 1
Stack sp 100 : 7
Stack sp 97 : ((f ((+ (x (1 nil))) nil)) nil)
Stack sp 98 : 6
Stack sp 99 : ((x 48) nil)
Stack sp 100 : 0
Stack sp 101 : ("hello" nil)
Stack sp 102 : 8
Segmentation fault
The stack just keeps growing until it hits about 100 elements (the
size of the stack) and a segfault is triggered. We can also see that
x has reached the value 48, so 48 iterations of the
infinite recursion have been executed and I guess we can conclude that
the stack is growing with approximately 2 element per iteration.
When the REPL runs with the fixed code and the same program it keeps running:
Stack sp 1 : (closure ((x nil) ((progn ((print ("hello" nil)) ((f ((+ (x (1 nil))) nil)) nil))) (nil nil))))
Stack sp 2 : ((x 3159) nil)
Stack sp 3 : 1
Stack sp 4 : nil
Stack sp 5 : 7
Stack sp 6 : ((x 3159) nil)
Stack sp 7 : 0
Stack sp 8 : (x (1 nil))
Stack sp 9 : 7
Stack sp 6 : +
Stack sp 7 : ((x 3159) nil)
Stack sp 8 : 1
Stack sp 9 : (1 nil)
Stack sp 10 : 7
Stack sp 7 : 3159
Stack sp 8 : ((x 3159) nil)
Stack sp 9 : 2
Stack sp 10 : nil
Stack sp 11 : 7
Stack sp 8 : 1
Stack sp 9 : 2
Stack sp 10 : 6
Stack sp 2 : 3160
Stack sp 3 : 1
Stack sp 4 : 6
Stack sp 1 : ((f ((+ (x (1 nil))) nil)) nil)
Stack sp 2 : 5
Stack sp 3 : ((x 3160) nil)
Stack sp 4 : 0
Stack sp 5 : ("hello" nil)
Stack sp 6 : 7
Stack sp 3 : print
Here execution was stopped after a short while with ctrl+c and we can
see that the stack pointer is at most 11 and also that we have
incremented x to 3160 by now. Looks a lot better!