Table of Contents

1 Abstract

We demonstrate how to model the following common programming constructs in terms of an applicative order language similar to LISP:

  • Simple Recursion
  • Iteration
  • Compound Statements and Expressions
  • GO TO and Assignment
  • Continuation—Passing
  • Escape Expressions
  • Fluid Variables
  • Call by Name, Call by Need, and Call by Reference

The models require only (possibly self-referent) lambda application, conditionals, and (rarely) assignment. No complex data structures such as stacks are used. The models are transparent, involving only local syntactic transformations. Some of these models, such as those for GO TO and assignment, are already well known, and appear in the work of Landin, Reynolds, and others. The models for escape expressions, fluid variables, and call by need with side effects are new. This paper is partly tutorial in intent, gathering all the models together for purposes of context. This report describes research done at the Artificial Intelligence Laboratory of the Massachusetts Institute of Teehnology. Support for the laboratory's artificial intelligence research is provided in part by the Advanced Research Projects Agency of the Department of Defense under Office of Naval Research contract N000l4-75-C-0643.

2 Introduction

We catalogue a number of common programming constructs. For each construct we examine "typical" usage in well-known programming languages, and then capture the essence of the semantics of the construct in terms of a common meta-language. The lambda calculus {Note Alonzowins} is often used as such a meta- language. Lambda calculus offers clean semantics, but it is clumsy because it was designed to be a minimal language rather than a convenient one. All lambda calculus "functions" must take exactly one "argument"; the only "data type" is lambda expressions; and the only "primitive operation‘ is variable‘ substitution. While its utter simplicity makes lambda calculus ideal for logicians, it is too primitive for use by programmers. The meta-language we use is a programming language called SCHEME {Note Schemepaper) which is based on lambda calculus. SCHEME is a dialect of LISP. [McCarthy 62] It is an expression- oriented, applicative order, interpreter-based language which allows one to manipulate programs as data. It differs from most current dialects of LISP in that it closes all lambda expressions in the environment of their definition or declaration, rather than in the execution environment. {Note Closures} This preserves the substitution semantics of lambda calculus, and has the consequence that all variables are lexically scoped, as in ALGOL. [Naur 63] Another difference is that SCHEME is implemented in such a way that tail- recursions execute without net growth of the interpreter stack. {Note Schemenote} We have chosen to use LISP syntax rather than, say, ALGOL syntax because we want to treat programs as data for the purpose of describing transformations on the code. LISP supplies names for the parts of an executable expression and standard operators for constructing expressions and extracting their components. The use of LISP syntax makes the structure of such expressions manifest. We use ALGOL as an expository language, because it is familiar to many people, but ALGOL is not sufficiently powerful to express the necessary concepts; in particular, it does not allow functions to return functions as values. we are thus forced to use a dialect of LISP in many cases. We will consider various complex programming language constructs and show how to model them in terms of only a few simple ones. As far as possible we will use only three control constructs from SCHEME: LAMBDA expressions, as in LISP, which are Just functions with lexically scoped free variables; LABELS, which allows declaration of mutually recursive procedures (Note Labelsdef}; and IF, a primitive conditional expression. For more complex modelling we will introduce an assignment primitive (ASET).i We will freely assume the existence of other comon primitives, such as arithmetic functions. The constructs we will examine are divided into four broad classes. The first is sfinph?Lo0pl; this contains simple recursions and iterations, and an introduction to the notion of continuations. The second is hmponflivo Connrucls; this includes compound statements, G0 T0, and simple variable assignments. The third is continuations, which encompasses the distinction between statements and expressions, escape operators (such as Landin's J- operator [Landin 65] and Reynold's escape expression [Reynolds 72]). and fluid (dynamically bound) variables. The fourth is Parameter Passing Mechanisms, such as ALGOL call—by-name and FORTRAN call-by-location. Some of the models presented here are already well-known, particularly those for G0 T0 and assignment. [McCarthy 60] [Landin 65] [Reynolds 72] Those for escape operators, fluid variables, and call-by-need with side effects are new.

2.1 Simple Loops

By simple loops we mean constructs which enable programs to execute the same piece of code repeatedly in a controlled manner. Variables may be made to take on different values during each repetition, and the number of repetitions may depend on data given to the program.

2.1.1 Simple Recursion

One of the easiest ways to produce a looping control structure is to use a recursive function, one which calls itself to perform a subcomputation. For example, the familiar factorial function may be written recursively in ALGOL: '

integer procedure fact(n) value n: integer n:
fact := if n=0 then 1 else n*fact(n-1);

The invocation fact(n) computes the product of the integers from 1 to n using the identity n!=n(n-1)! (n>0). If \(n\) is zero, 1 is returned; otherwise fact: calls itself recursively to compute \((n-1)!\) , then multiplies the result by \(n\) and returns it.

This same function may be written in Clojure as follows:

(ns categorical.imperative)
(defn fact[n]
  (if (= n 0) 1 (* n (fact (dec n))))
)

Clojure does not require an assignment to the ``variable'' fan: to return a value as ALGOL does. The IF primitive is the ALGOL if-then-else rendered in LISP syntax. Note that the arithmetic primitives are prefix operators in SCHEME.

2.1.2 Iteration

There are many other ways to compute factorial. One important way is through the use of iteration. A comon iterative construct is the DO loop. The most general form we have seen in any programming language is the MacLISP DO [Moon 74]. It permits the simultaneous initialization of any number of control variables and the simultaneous stepping of these variables by arbitrary functions at each iteration step. The loop is terminated by an arbitrary predicate, and an arbitrary value may be returned. The DO loop may have a body, a series of expressions executed for effect on each iteration. A version of the MacLISP DO construct has been adopted in SCHEME.

The general form of a SCHEME DO is:

(DO ((<var1> (init1> <stepl>)
((var2> <init2> <step2>)
(<varn> <intn> (stepn>))
(<pred> (value>)
(optional body>)

The semantics of this are that the variables are bound and initialized to the values of the <initi> expressions, which must all be evaluated in the environment outside the D0; then the predicate <pred> is evaluated in the new environment, and if TRUE, the (value) is evaluated and returned. Otherwise the (optional body) is evaluated, then each of the steppers <stepi> is evaluated in the current environment, all the variables made to have the results as their values, the predicate evaluated again, and so on. Using D0 loops in both ALGOL and SCHEME, we may express FACT by means of iteration.

integer procedure fact(n); value n; integer n:
begin
integer m, ans;
ans := 1;
for m := n step -l until 0 do ans := m*ans;
fact := ans;
end;

(in-ns 'categorical.imperative)
(defn fact-do[n]
)

Note that the SCHEME D0 loop in FACT has no body – the stepping functions do all the work. The ALGOL DO loop has an assignment in its body; because an ALGOL DO loop can step only one variable, we need the assignment to step the the variable "manually'. In reality the SCHEME DO construct is not a primitive; it is a macro which expands into a function which performs the iteration by tail—recursion. Consider the following definition of FACT in SCHEME. Although it appears to be recursive, since it "calls itself", it is entirely equivalent to the DO loop given above, for it is the code that the DO macro expands into! It captures the essence of our intuitive notion of iteration, because execution of this program will not produce internal structures (e.g. stacks or variable bindings) which increase in size with the number of iteration steps.

(in-ns 'categorical.imperative)
(defn fact-do-expand[n]
  (let [fact1
        (fn[m ans]
          (if (zero? m) ans
              (recur (dec m) (* m ans))))]
    (fact1 n 1)))

From this we can infer a general way to express iterations in SCHEME in a manner isomorphic to the HacLISP D0. The expansion of the general D0 loop

(DO ((<varl> <init1> <step1>)
(<var2> (init2) <step2>)
...
(<varn> <initn> <stepn>))
(<pred> <va1ue>)
<body>)

is this:

(let [doloop
(fn [dummy <var1> <var2> ... <varn>)
(if <pred> <value>
(recur <body> <step1> <step2> ... <stepn>)))]

(doloop nil <init1> <init2> ... <initn>))

The identifiers doloop and dummy are chosen so as not to conflict with any other identifiers in the program. Note that, unlike most implementations of D0, there are no side effects in the steppings of the iteration variables. D0 loops are usually modelled using assignment statements. For example:

for r :1 0 step b until c do <statement>;

can be modelled as follows: [Naur 63]

begin
x := a;
L: if (x-c)*sign(n) > 0 then go to Endloop;
<statement>;
x := x+b;
go to L:
Endloop:
end;

Later we will see how such assignment statements can in general be modelled without using side effects.

3 Imperative Programming

Lambda calculus (and related languages, such as ``pure LISP'') is often used for modelling the applicative constructs of programming languages. However, they are generally thought of as inappropriate for modelling imperative constructs. In this section we show how imperative constructs may be modelled by applicative SCHEME constructs.

3.1 Compound Statements

The simplest kind of imperative construct is the statement sequencer, for example the compound statement in ALGOL:

begin
S1;
S2;
end

This construct has two interesting properties:

  • (1) It performs statement S1 before S2, and so may be used for sequencing.
  • (2) S1 is useful only for its side effects. (In ALGOL, S2 must also be a statement, and so is also useful only for side effects, but other languages have compound expressions containing a statement followed by an expression.)

The ALGOL compound statement may actually contain any number of statements, but such statements can be expressed as a series of nested two-statement compounds. That is:

begin
S1;
S2;
...
Sn-1;
Sn;
end

is equivalent to:

begin
S1;
begin
S2;
begin
...

begin
Sn-1;
Sn;
end;
end;
end:
end

It is not immediately apparent that this sequencing can be expressed in a purely applicative language. We can, however, take advantage of the implicit sequencing of applicative order evaluation. Thus, for example, we may write a two-statement sequence as follows:

((fn[dummy] S2) S1)

where dummy is an identifier not used in S2. From this it is manifest that the value of S1 is ignored, and so is useful only for side effects. (Note that we did not claim that S1 is expressed in a purely applicative language, but only that the sequencing can be so expressed.) From now on we will use the form (BLOCK S1 S2) as an abbreviation for this expression, and (BLOCK S1 S2...Sn-1 Sn) as an abbreviation for

(BLOCK S1 (BLOCK S2 (BLOCK ... (BLOCK Sn-1 Sn)...))) 

3.2 2.2. The G0 TO Statement

A more general imperative structure is the compound statement with labels and G0 T0s. Consider the following code fragment due to Jacopini, taken from Knuth: [Knuth 74]

begin 
L1: if B1 then go to L2; S1;
    if B2 then go to L2; S2;
    go to L1;
L2: S3;

It is perhaps surprising that this piece of code can be syntactically transformed into a purely applicative style. For example, in SCHEME we could write:

(in-ns 'categorical.imperative)
(let
    [L1 (fn[]
          (if B1 (L2) (do S1
                          (if B2 (L2) (do S2 (L1))))))
     L2 (fn[] S3)]
  (L1))

As with the D0 loop, this transformation depends critically on SCHEME's treatment of tail-recursion and on lexical scoping of variables. The labels are names of functions of no arguments. In order to ‘go to" the labeled code, we merely call the function named by that label.

3.3 2.3. Simple Assignment

Of course, all this sequencing of statements is useless unless the statements have side effects. An important side effect is assignment. For example, one often uses assignment to place intermediate results in a named location (i.e. a variable) so that they may be used more than once later without recomputing them:

begin
real a2, sqrtdisc;
a2 := 2*a;
sqrtdisc := sqrt(b^2 - 4*a*c);
root1 := (- b + sqrtdisc) / a2;
root2 := (- b - sqrtdisc) / a2;
print(root1);
print(root2);
print(root1 + root2);
end

It is well known that such naming of intermediate results may be accomplished by calling a function and binding the formal parameter variables to the results:

(fn [a2 sqrtdisc]
  ((fn[root1 root2]
    (do (println root1)
        (println root2)
        (println (+ root1 root2))))
  (/ (+ (- b) sqrtdisc) a2)
  (/ (- (- b) sqrtdisc) a2))

  (* 2 a)
  (sqrt (- (* b b) (* 4 a c))))

This technique can be extended to handle all simple variable assignments which appear as statements in blocks, even if arbitrary G0 T0 statements also appear in such blocks. {Note Mccarthywins}

For example, here is a program which uses G0 TO statements in the form presented before; it determines the parity of a non-negative integer by counting it down until it reaches zero.

begin
L1: if a = 0 then begin parity := 0; go to L2; end;
    a := a - 1;
    if a = 0 then begin parity := 1; go to L2; end;
    a := a - 1;
    go to L1;
L2: print(parity);

This can be expressed in SCHEME:

(let
    [L1 (fn [a parity]
          (if (zero? a) (L2 a 0)
              (L3 (dec a) parity)))
     L3 (fn [a parity]
          (if (zero? a) (L2 a 1)
              (L1 (dec a) parity)))
     L2 (fn [a parity]
          (println parity))]
  (L1 a parity))

The trick is to pass the set of variables which may be altered as arguments to the label functions. {Note Flowgraph} It may be necessary to introduce new labels (such as L3) so that an assignment may be transformed into the binding for a function call. At worst, one may need as many labels as there are statements (not counting the eliminated assignment and GO TO statements).

3.4 Compound Expressions '

At this point we are almost in a position to model the most general form of compound statement. In LISP, this is called the 'PROG feature". In addition to statement sequencing and G0 T0 statements, one can return a value from a PROG by using the RETURN statement.

Let us first consider the simplest compound statement, which in SCHEME we call BLOCK. Recall that (BLOCK s1 sz) is defined to be ((lambda (dummy) s2) s1)

Notice that this not only performs Sl before S2, but also returns the value of S2. Furthermore, we defined (BLOCK S1 S2 ... Sn) so that it returns the value of Sn. Thus BLOCK may be used as a compound expression, and models a LISP PROGN, which is a PROG with no G0 T0 statements and an implicit RETURN of the last ``statement'' (really an expression).

Host LISP compilers compile D0 expressions by macro-expansion. We have already seen one way to do this in SCHEME using only variable binding. A more common technique is to expand the D0 into a PROG, using variable assignments instead of bindings. Thus the iterative factorial program:

(oarxnz FACT .
(LAMaoA (n) .
(D0 ((M N (- H 1))
(Ans 1 (* M Ans)))
((- H 0) A"$))))

would expand into:

(DEFINE FACT
. (LAMeoA (M)
(PRO6 (M Ans)
(sszro M n
Ans 1) ~
LP (tr (- M 0) (RETURN Ans))
(ssero M (- n 1)
Ans (' M Ans))
(60 LP))))

where SSETQ is a simultaneous multiple assignment operator. (SSETQ is not a SCHEME (or LISP) primitive. It can be defined in terms of a single assignment operator, but we are more interested here in RETURN than in simultaneous assignment. The SSETQ's will all be removed anyway and modelled by lambda binding.) We can apply the same technique we used before to eliminate G0 T0 statements and assignments from compound statements:

(DEFINE FACT
(LAHBOA (I)
(LABELS ((L1 (LAManA (M Ans)
(LP n 1)))
(LP (LAMaoA (M Ans)
(IF (- M o) (nztunn Ans)
(£2 H An$))))
(L2 (LAMaoA (M An )
(LP (- " 1) (' H flN$)))))
(L1 NIL NlL))))

clojure

We still haven't done anything about RETURN. Let's see…

  • ==> the value of (FACT 0) is the value of (Ll NIL NIL)
  • ==> which is the value of (LP 0 1)
  • ==> which is the value of (IF (= 0 0) (RETURN 1) (L2 0 1))
  • ==> which is the value of (RETURN 1)

Notice that if RETURN were the identity function (LAMBDA (X) X), we would get the right answer. This is in fact a general truth: if we Just replace a call to RETURN with its argument, then our old transformation on compound statements extends to general compound expressions, i.e. PROG.

4 Continuations

Up to now we have thought of SCHEME's LAMBDA expressions as functions, and of a combination such as (G (F X Y)) as meaning ``apply the function F to the values of X and Y, and return a value so that the function G can be applied and return a value …'' But notice that we have seldom used LAMBDA expressions as functions. Rather, we have used them as control structures and environment modifiers. For example, consider the expression: (BLOCK (PRINT 3) (PRINT 4))

This is defined to be an abbreviation for: ((LAMBDA (DUMMY) (PRINT 4)) (PRINT 3))

We do not care about the value of this BLOCK expression; it follows that we do not care about the value of the (LAMBDA (DUMMY) ...). We are not using LAMBDA as a function at all.

It is possible to write useful programs in terms of LAHBDA expressions in which we never care about the value of any lambda expression. We have already demonstrated how one could represent any "FORTRAN" program in these terms: all one needs is PROG (with G0 and SETQ), and PRINT to get the answers out. The ultimate generalization of this imperative programing style is continuation-passing. (Note Churchwins} .

4.1 Continuation-Passing Recursion

Consider the following alternative definition of FACT. It has an extra argument, the continuation, which is a function to call with the answer, when we have it, rather than return a value

procedure Inc!(n, c); value n, c;
integer n: procedure c(integer value);
if n-=0 then c(l) else
begin
procedure l(!mp(d) value a: integer a;
c(mm);
Iacl(n-1, romp):
end;

(in-ns 'categorical.imperative)
(defn fact-cps[n k]
  (if (zero? n) (k 1)
      (recur (dec n) (fn[a] (k (* n a))))))

clojure It is fairly clumsy to use this version of‘ FACT in ALGOL; it is necessary to do something like this:

begin
integer ann
procedure :emp(x); value 2:; integer x;
ans :1 x;
]'act(3. temp);
comment Now the variable "am" has 6;
end;

Procedure fact does not return a value, nor does temp; we must use a side effect to get the answer out.

FACT is somewhat easier to use in SCHEME. We can call it like an ordinary function in SCHEME if we supply an identity function as the second argument. For example, (FACT 3 (LAMBDA (X) X)) returns 6. Alternatively, we could write (FACT 3 (LAHBDA (X) (PRINT X))); we do not care about the value of this, but about what gets printed. A third way to use the value is to write (FACT 3 (LAMBDA (x) (SQRT x))) instead of (SQRT (FACT 3 (LAMBDA (x) x)))

In either of these cases we care about the value of the continuation given to FACT. Thus we care about the value of FACT if and only if we care about the value of its continuation!

We can redefine other functions to take continuations in the same way. For example, suppose we had arithmetic primitives which took continuations; to prevent confusion, call the version of "+" which takes a continuation '++", etc. Instead of writing (- (+ B Z)(* 4 A C)) we can write

(in-ns 'categorical.imperative)
(defn enable-continuation "Makes f take a continuation as an additional argument."[f]
  (fn[& args] ((fn[k] (k (apply f (reverse (rest (reverse args)))))) (last args)) ))
(def +& (enable-continuation +))
(def *& (enable-continuation *))
(def -& (enable-continuation -))


(defn quadratic[a b c k]
(*& b b
    (fn[x] (*& 4 a c
               (fn[y]
                 (-& x y k))))))

where k is the continuation for the entire expression.

This is an obscure way to write an algebraic expression, and we would not advise writing code this way in general, but continuation-passing brings out certain important features of the computation:

  • 1 The operations to be performed appear in the order in which they are

performed. In fact, they must be performed in this order. Continuation- passing removes the need for the rule about left-to-right argument evaluation{Note Evalorder)

  • 2 In the usual applicative expression there are two implicit temporary values: those of (* B B) and (* 4 A C). The first of these values must be preserved over the computation of the second, whereas the second is used as soon as it is produced. These facts are manifest in the appearance and use of the variable X and Y in the continuation-passing version.

In short, the continuation-passing version specifies exactly and explicitly what steps are necessary to compute the value of the expression. One can think of conventional functional application for value as being an abbreviation for the more explicit continuation-passing style. Alternatively, one can think of the interpreter as supplying to each function an implicit default continuation of one argument. This continuation will receive the value of the function as its argument, and then carry on the computation. In an interpreter this implicit continuation is represented by the control stack mechanism for function returns. Now consider what computational steps are implied by:

(LAMBDA (A B C ...) (F X Y Z ...)) when we "apply" the LAMBDA expression we have some values to apply it to; we let the names A, B, C … refer to these values. We then determine the values of X, Y, Z … and pass these values (along with "the buck", i.e. control!) to the lambda expression F (F is either a lambda expression or a name for one). Passing control to F is an unconditional transfer. (Note Jrsthack) {Note Hewitthack) Note that we want values from X, Y, Z, … If these are simple expressions, such as variables, constants, or LAMBDA expressions, the evaluation process is trivial, in that no temporary storage is required. In pure continuation-passing style, all evaluations are trivial: no combination is nested within another, and therefore no ‘hidden temporaries" are required. But if X is a combination, say (G P Q), then we want to think of G as a function, because we want a value from it, and we will need an implicit temporary place to keep the result while evaluating Y and Z. (An interpreter usually keeps these temporary places in the control stack!) On the other hand, we do not necessarily need a value from F. This is what we mean by tail- recursion: F is called tail-recursively, whereas G is not. A better name for tail-recursion would be "tail-transfer", since no real recursion is implied. This is why we have made such a fuss about tail-recursion: it can be used for transfer of control without making any commitment about whether the expression expected to return a value. Thus it can be used to model statement-like control structures. Put another way, tail—recursion does not require a control stack as nested recursion does. In our models of iteration and imperative style all the LAMBDA expressions used for control (to simulate GO statements, for example) are called in tail-recursive fashion.

4.2 Escape Expressions

Reynolds [Reynolds 72] defines the construction = escape x in r =

to evaluate the expression \(r\) in an environment such that the variable \(x\) is bound to an escape function. If the escape function is never applied, then the value of the escape expression is the value of \(r\). If the escape function is applied to an argument \(a\), however, then evaluation of \(r\) is aborted and the escape expression returns \(a\). {Note J-operator} (Reynolds points out that this definition is not quite accurate, since the escape function may be called even after the escape expression has returned a value; if this happens, it “returns again"!)

As an example of the use of an escape expression, consider this procedure to compute the harmonic mean of an array of numbers. If any of the numbers is zero, we want the answer to be zero. We have a function hannaunl which will sum the reciprocals of numbers in an array, or call an escape function with zero if any of the numbers is zero. (The implementation shown here is awkward because ALGOL requires that a function return its value by assignment.)

begin
real procedure Imrmsum(a, n. escfun)§ p
real array at integer n; real procedure esciun(real);
begin
real sum;
sum := 0;
for i :1 0 until n-l do
begin
if a[i]=0 then cscfun(0);
sum :1 sum ¢ I/a[i];
enm
harmsum SI sum;
end: .
real array b[0:99]:
print(escape x in I00/hm-m.mm(b, 100, x));
end

If harmsum exits normally, the number of elements is divided by the sum and printed. Otherwise, zero is returned from the escape expression and printed without the division ever occurring. This program can be written in SCHEME using the built-in escape operator CATCH:

abc

Footnotes:

1 DEFINITION NOT FOUND: 1

2 DEFINITION NOT FOUND: 2

Date: 2011-07-15 02:49:04 EDT

Author: Guy Lewis Steele Jr. and Gerald Jay Sussman

Org version 7.6 with Emacs version 23

Validate XHTML 1.0