Tail call
In computer science, a tail call is a subroutine call performed as the final action of a procedure. If a tail call might lead to the same subroutine being called again later in the call chain, the subroutine is said to be tail-recursive, which is a special case of recursion. Tail recursion is particularly useful, and often easy to handle in implementations.
Tail calls can be implemented without adding a new stack frame to the call stack. Most of the frame of the current procedure is not needed any more, and it can be replaced by the frame of the tail call, modified as appropriate (similar to overlay for processes, but for function calls). The program can then jump to the called subroutine. Producing such code instead of a standard call sequence is called tail call elimination. Tail call elimination allows procedure calls in tail position to be implemented as efficiently as goto statements, thus allowing efficient structured programming. In the words of Guy L. Steele, "in general procedure calls may be usefully thought of as GOTO statements which also pass parameters, and can be uniformly coded as [machine code] JUMP instructions."
Traditionally, tail call elimination is optional. However, in functional programming languages, tail call elimination is often guaranteed by the language standard, and this guarantee allows using recursion, in particular tail recursion, in place of loops. In such cases, it is not correct (though it may be customary) to refer to it as an optimization. The special case of tail recursive calls, when a function calls itself, may be more amenable to call elimination than general tail calls.
Description
When a function is called, the computer must "remember" the place it was called from, the return address, so that it can return to that location with the result once the call is complete. Typically, this information is saved on the call stack, a simple list of return locations in order of the times that the call locations they describe were reached. For tail calls, there is no need to remember the place we are calling from – instead, we can perform tail call elimination by leaving the stack alone (except possibly for function arguments and local variables[1]), and the newly called function will return its result directly to the original caller. Note that the tail call doesn't have to appear lexically after all other statements in the source code; it is only important that the calling function return immediately after the tail call, returning the tail call's result if any, since the calling function will never get a chance to do anything after the call if the optimization is performed.
For non-recursive function calls, this is usually an optimization that saves little time and space, since there are not that many different functions available to call. When dealing with recursive or mutually recursive functions where recursion happens through tail calls, however, the stack space and the number of returns saved can grow to be very significant, since a function can call itself, directly or indirectly, creating new call stacks each iteration. In fact, it often asymptotically reduces stack space requirements from linear, or O(n), to constant, or O(1). Tail call elimination is thus required by the standard definitions of some programming languages, such as Scheme,[2][3] and languages in the ML family among others. In the case of Scheme, the language definition formalizes the intuitive notion of tail position exactly, by specifying which syntactic forms allow having results in tail context.[4] Implementations allowing an unlimited number of tail calls to be active at the same moment, thanks to tail call elimination, can also be called 'properly tail-recursive'.[2]
Besides space and execution efficiency, tail call elimination is important in the functional programming idiom known as continuation passing style (CPS), which would otherwise quickly run out of stack space.
Syntactic form
A tail call can be located just before the syntactical end of a subroutine:
function foo(data) { a(data); return b(data); }
Here, both a(data)
and b(data)
are calls, but b
is the last thing the procedure executes before returning and is thus in tail position. However, not all tail calls are necessarily located at the syntactical end of a subroutine. Consider:
function bar(data) { if ( a(data) ) { return b(data); } return c(data); }
Here, both calls to b
and c
are in tail position. This is because each of them lies in the end of if-branch respectively, even though the first one is not syntactically at the end of bar
's body.
Now consider this code:
function foo1(data) { return a(data) + 1; }
function foo2(data) { var ret = a(data); return ret; }
function foo3(data) { var ret = a(data); return (ret == 0) ? 1 : ret; }
Here, the call to a(data)
is in tail position in foo2
, but it is not in tail position either in foo1
or in foo3
, because control must return to the caller to allow it to inspect or modify the return value before returning it.
Example programs
Take this Scheme program as an example:
;; factorial : number -> number ;; to calculate the product of all positive ;; integers less than or equal to n. (define (factorial n) (if (= n 0) 1 (* n (factorial (- n 1)))))
This program is not written in a tail recursion style. Now take this Scheme program as an example:
;; factorial : number -> number ;; to calculate the product of all positive ;; integers less than or equal to n. (define (factorial n) (let fact ([i n] [acc 1]) (if (zero? i) acc (fact (- i 1) (* acc i)))))
The inner procedure fact
calls itself last in the control flow. This allows an interpreter or compiler to reorganize the execution which would ordinarily look like this:
call factorial (3) call fact (3 1) call fact (2 3) call fact (1 6) call fact (0 6) return 6 return 6 return 6 return 6 return 6
into the more efficient variant, in terms of both space and time:
call factorial (3) call fact (3 1) replace arguments with (2 3) replace arguments with (1 6) replace arguments with (0 6) return 6 return 6
This reorganization saves space because no state except for the calling function's address needs to be saved, either on the stack or on the heap, and the call stack frame for fact
is reused for the intermediate results storage. This also means that the programmer need not worry about running out of stack or heap space for extremely deep recursions. It is also worth noting that, in typical implementations, the tail recursive variant will be substantially faster than the other variant, but only by a constant factor.
Some programmers working in functional languages will rewrite recursive code to be tail-recursive so they can take advantage of this feature. This often requires addition of an "accumulator" argument (acc
in the above example) to the function. In some cases (such as filtering lists) and in some languages, full tail recursion may require a function that was previously purely functional to be written such that it mutates references stored in other variables.
Tail recursion modulo cons
Tail recursion modulo cons is a generalization of tail recursion optimization introduced by David H. D. Warren[5] in the context of compilation of Prolog, seen as an explicitly set once language. It was described (though not named) by Daniel P. Friedman and David S. Wise in 1974[6] as a LISP compilation technique. As the name suggests, it applies when the only operation left to perform after a recursive call is to prepend a known value in front of a list returned from it (or to perform a constant number of simple data-constructing operations, in general). This call would thus be a tail call save for the said cons operation. But prefixing a value at the start of a list on exit from a recursive call is the same as appending this value at the end of the growing list on entry into the recursive call, thus building the list as a side effect, as if in an implicit accumulator parameter. The following Prolog fragment illustrates the concept:
% tail recursive modulo cons: partition([], _, [], []). partition([X|Xs], Pivot, [X|Rest], Bigs) :- X @< Pivot, !, partition(Xs, Pivot, Rest, Bigs). partition([X|Xs], Pivot, Smalls, [X|Rest]) :- partition(Xs, Pivot, Smalls, Rest). |
-- In Haskell, guarded recursion: partition [] _ = ([],[]) partition (x:xs) p | x < p = (x:a,b) | otherwise = (a,x:b) where (a,b) = partition xs p |
% With explicit unifications: % non-tail recursive translation: partition([], _, [], []). partition(L, Pivot, Smalls, Bigs) :- L=[X|Xs], ( X @< Pivot -> partition(Xs,Pivot,Rest,Bigs), Smalls=[X|Rest] ; partition(Xs,Pivot,Smalls,Rest), Bigs=[X|Rest] ). |
% tail recursive translation: partition([], _, [], []). partition(L, Pivot, Smalls, Bigs) :- L=[X|Xs], ( X @< Pivot -> Smalls=[X|Rest], partition(Xs,Pivot,Rest,Bigs) ; Bigs=[X|Rest], partition(Xs,Pivot,Smalls,Rest) ). |
Thus in tail recursive translation such a call is transformed into first creating a new list node and setting its first
field, and then making a tail call with the pointer to the node's rest
field as argument, to be filled recursively.
As another example, consider a recursive function in C that duplicates a linked list:
list *duplicate(const list *ls) { list *head = NULL; if (ls != NULL) { list *p = duplicate(ls->next); head = malloc(sizeof *head); head->value = ls->value; head->next = p; } return head; } |
;; in Scheme, (define (duplicate ls) (if (not (null? ls)) (cons (car ls) (duplicate (cdr ls))) '())) |
%% in Prolog, dup([X|Xs],[X|Ys]):- dup(Xs,Ys). dup([],[]). |
In this form the function is not tail-recursive, because control returns to the caller after the recursive call duplicates the rest of the input list. Even if it were to allocate the head node before duplicating the rest, it would still need to plug in the result of the recursive call into the next
field after the call.[lower-alpha 1]
So the function is almost tail-recursive. Warren's method pushes the responsibility of filling the next
field into the recursive call itself, which thus becomes tail call:
list *duplicate(const list *ls) { list head; duplicate_aux(ls, &head); return head.next; } void duplicate_aux(const list *ls, list *end) { if (ls != NULL) { end->next = malloc(sizeof *end); end->next->value = ls->value; duplicate_aux( ls->next, end->next); } else { end->next = NULL; } } |
;; in Scheme, (define (duplicate ls) (let ((head (list 1))) (let dup ((ls ls) (end head)) (if (not (null? ls)) (begin (set-cdr! end (list (car ls))) (dup (cdr ls) (cdr end))))) (cdr head))) |
%% in Prolog, dup([X|Xs],R):- R=[X|Ys], dup(Xs,Ys). dup([],[]). |
Note how the callee now appends to the end of the growing list, rather than have the caller prepend to the beginning of the returned list. The work is now done on the way forward from the list's start, before the recursive call which then proceeds further, a.o.t. backward from the list's end, after the recursive call has returned its result. It is thus similar to accumulating parameter technique, turning a recursive computation into an iterative one.
Characteristically for this technique, a parent frame is created here on the execution call stack, which calls the tail-recursive callee which can reuse its own call frame if the tail-call optimization is present.
The properly tail-recursive implementation can now be converted into an explicitly iterative form, i.e. an accumulating loop:
list *duplicate(const list *ls) { list head, *end; end = &head; while (ls != NULL) { end->next = malloc(sizeof *end); end->next->value = ls->value; ls = ls->next; end = end->next; } end->next = NULL; return head.next; } |
;; in Scheme, (define (duplicate ls) (let ((head (list 1))) (do ((end head (cdr end)) (ls ls (cdr ls ))) ((null? ls) (cdr head)) (set-cdr! end (list (car ls)))))) |
History
In a paper delivered to the ACM conference in Seattle in 1977, Guy L. Steele summarized the debate over the GOTO and structured programming, and observed that procedure calls in the tail position of a procedure can be best treated as a direct transfer of control to the called procedure, typically eliminating unnecessary stack manipulation operations.[7] Since such "tail calls" are very common in Lisp, a language where procedure calls are ubiquitous, this form of optimization considerably reduces the cost of a procedure call compared to other implementations. Steele argued that poorly implemented procedure calls had led to an artificial perception that the GOTO was cheap compared to the procedure call. Steele further argued that "in general procedure calls may be usefully thought of as GOTO statements which also pass parameters, and can be uniformly coded as [machine code] JUMP instructions", with the machine code stack manipulation instructions "considered an optimization (rather than vice versa!)".[7] Steele cited evidence that well optimized numerical algorithms in Lisp could execute faster than code produced by then-available commercial Fortran compilers because the cost of a procedure call in Lisp was much lower. In Scheme, a Lisp dialect developed by Steele with Gerald Jay Sussman, tail call elimination is mandatory.[8]
Implementation methods
Tail recursion is important to some high-level languages, especially functional and logic languages and members of the Lisp family. In these languages, tail recursion is the most commonly used way (and sometimes the only way available) of implementing iteration. The language specification of Scheme requires that tail calls are to be optimized so as not to grow the stack. Tail calls can be made explicitly in Perl, with a variant of the "goto" statement that takes a function name: goto &NAME;
[9]
Implementing tail call elimination only for tail recursion, rather than for all tail calls, is something significantly easier. For example, in the Java Virtual Machine (JVM), tail-recursive calls can be eliminated (as this reuses the existing call stack), but general tail calls cannot be (as this changes the call stack).[10][11] As a result, functional languages such as Scala that target the JVM can efficiently implement direct tail recursion, but not mutual tail recursion.
Various implementation methods are available.
In assembly
Tail calls are often optimized by interpreters and compilers of functional programming and logic programming languages to more efficient forms of iteration. For example, Scheme programmers commonly express while loops as calls to procedures in tail position and rely on the Scheme compiler or interpreter to substitute the tail calls with more efficient jump instructions.[12]
For compilers generating assembly directly, tail call elimination is easy: it suffices to replace a call opcode with a jump one, after fixing parameters on the stack. From a compiler's perspective, the first example above is initially translated into pseudo-assembly language (in fact, this is valid x86 assembly):
foo: call B call A ret
Tail call elimination replaces the last two lines with a single jump instruction:
foo: call B jmp A
After subroutine A
completes, it will then return directly to the return address of foo
, omitting the unnecessary ret
statement.
Typically, the subroutines being called need to be supplied with parameters. The generated code thus needs to make sure that the call frame for A is properly set up before jumping to the tail-called subroutine. For instance, on platforms where the call stack does not just contain the return address, but also the parameters for the subroutine, the compiler may need to emit instructions to adjust the call stack. On such a platform, consider the code:
function foo(data1, data2) B(data1) return A(data2)
where data1
and data2
are parameters. A compiler might translate that to the following pseudo assembly code[lower-alpha 2]:
-
foo:
-
mov reg,[sp+data1] ; fetch data1 from stack (sp) parameter into a scratch register.
-
push reg ; put data1 on stack where B expects it
-
call B ; B uses data1
-
pop ; remove data1 from stack
-
mov reg,[sp+data2] ; fetch data2 from stack (sp) parameter into a scratch register.
-
push reg ; put data2 on stack where A expects it
-
call A ; A uses data2
-
pop ; remove data2 from stack.
-
ret
A tail call optimizer could then change the code to:
-
foo:
-
mov reg,[sp+data1] ; fetch data1 from stack (sp) parameter into a scratch register.
-
push reg ; put data1 on stack where B expects it
-
call B ; B uses data1
-
pop ; remove data1 from stack
-
mov reg,[sp+data2] ; fetch data2 from stack (sp) parameter into a scratch register.
-
mov [sp+data1],reg ; put data2 where A expects it
-
jmp A ; A uses data2 and returns immediately to caller.
This changed code is more efficient both in terms of execution speed and use of stack space.
Through trampolining
However, since many Scheme compilers use C as an intermediate target code, the problem comes down to coding tail recursion in C without growing the stack, even if the back-end compiler does not optimize tail calls. Many implementations achieve this by using a device known as a trampoline, a piece of code that repeatedly calls functions. All functions are entered via the trampoline. When a function has to call another, instead of calling it directly it returns the address of the function to be called, the arguments to be used, and so on, to the trampoline. This ensures that the C stack does not grow and iteration can continue indefinitely.
It is possible to implement trampolining using higher-order functions in languages that support them, such as Groovy, Visual Basic .NET and C#.[13]
Using a trampoline for all function calls is rather more expensive than the normal C function call, so at least one Scheme compiler, Chicken, uses a technique first described by Henry Baker from an unpublished suggestion by Andrew Appel,[14] in which normal C calls are used but the stack size is checked before every call. When the stack reaches its maximum permitted size, objects on the stack are garbage-collected using the Cheney algorithm by moving all live data into a separate heap. Following this, the stack is unwound ("popped") and the program resumes from the state saved just before the garbage collection. Baker says "Appel's method avoids making a large number of small trampoline bounces by occasionally jumping off the Empire State Building."[14] The garbage collection ensures that mutual tail recursion can continue indefinitely. However, this approach requires that no C function call ever returns, since there is no guarantee that its caller's stack frame still exists; therefore, it involves a much more dramatic internal rewriting of the program code: continuation-passing style.
Relation to while construct
Tail recursion can be related to the while control flow operator by means of a transformation such as the following:
function foo(x) is: if predicate(x) then return foo(bar(x)) else return baz(x)
The above construct transforms to:
function foo(x) is: while predicate(x) do: x ← bar(x) return baz(x)
In the preceding, x may be a tuple involving more than one variable: if so, care must be taken in designing the assignment statement x ← bar(x) so that dependencies are respected. One may need to introduce auxiliary variables or use a swap construct.
More general uses of tail recursion may be related to control flow operators such as break and continue, as in the following:
function foo(x) is: if p(x) then return bar(x) else if q(x) then return baz(x) ... else if t(x) then return foo(quux(x)) ... else return foo(quuux(x))
where bar and baz are direct return calls, whereas quux and quuux involve a recursive tail call to foo. A translation is given as follows:
function foo(x) is: do: if p(x) then x ← bar(x) break else if q(x) then x ← baz(x) break ... else if t(x) then x ← quux(x) continue ... else x ← quuux(x) continue loop return x
By language
- Python - Language inventor Guido van Rossum contends that stack traces are altered by tail call elimination making debugging harder, and prefers that programmers use explicit iteration instead.[15]
- Scheme - Required by the language definition.[16][17]
- Lua - Tail recursion is performed by the reference implementation.[18]
- Tcl - Since Tcl 8.6, Tcl has a tailcall command.[19]
- JavaScript - ECMAScript 6.0 will have tail calls.[20]
See also
- Course-of-values recursion
- Recursion (computer science)
- Inline expansion
- Leaf subroutine
- Corecursion
Notes
- ↑ Like this:
if (ls != NULL) { head = malloc( sizeof *head); head->value = ls->value; head->next = duplicate( ls->next); }
- ↑ The
call
instruction first pushes the current code location onto the stack and then performs an unconditional jump to the code location indicated by the label. Theret
instruction first pops a code location off the stack, then performs an unconditional jump to the retrieved code location.
References
Look up tail recursion in Wiktionary, the free dictionary. |
- ↑ "recursion - Stack memory usage for tail calls - Theoretical Computer Science". Cstheory.stackexchange.com. 2011-07-29. Retrieved 2013-03-21.
- ↑ 2.0 2.1 "Revised^6 Report on the Algorithmic Language Scheme". R6rs.org. Retrieved 2013-03-21.
- ↑ "Revised^6 Report on the Algorithmic Language Scheme - Rationale". R6rs.org. Retrieved 2013-03-21.
- ↑ "Revised^6 Report on the Algorithmic Language Scheme". R6rs.org. Retrieved 2013-03-21.
- ↑ D. H. D. Warren, DAI Research Report 141, University of Edinburgh, 1980.
- ↑ Daniel P. Friedman and David S. Wise, Technical Report TR19: Unwinding Structured Recursions into Iterations, Indiana University, Dec. 1974.
- ↑ 7.0 7.1 Steele, Guy Lewis (1977). "Debunking the “expensive procedure call” myth or, procedure call implementations considered harmful or, LAMBDA: The Ultimate GOTO". Proceedings of the 1977 annual conference on - ACM '77. pp. 153–162. doi:10.1145/800179.810196. ISBN 978-1-4503-2308-6. hdl:1721.1/5753.
- ↑ R5RS Sec. 3.5, Richard Kelsey, William Clinger, Jonathan Rees et al. (August 1998). "Revised5 Report on the Algorithmic Language Scheme". Higher-Order and Symbolic Computation 11 (1): 7–105. doi:10.1023/A:1010051815785.
- ↑ Contact details. "goto". perldoc.perl.org. Retrieved 2013-03-21.
- ↑ "What is difference between tail calls and tail recursion?", Stack Overflow
- ↑ "What limitations does the JVM impose on tail-call optimization", Programmers Stack Exchange
- ↑ Probst, Mark (20 July 2000). "proper tail recursion for gcc". GCC Project. Retrieved 10 March 2015.
- ↑ Samuel Jack, Bouncing on your tail. Functional Fun. April 9, 2008.
- ↑ 14.0 14.1 Henry Baker, "CONS Should Not CONS Its Arguments, Part II: Cheney on the M.T.A."
- ↑ http://neopythonic.blogspot.com/2009/04/tail-recursion-elimination.html
- ↑ http://www.schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-6.html#%_sec_3.5
- ↑ http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-8.html#node_sec_5.11
- ↑ http://www.lua.org/manual/5.2/manual.html#3.4.9
- ↑ http://www.tcl.tk/man/tcl/TclCmd/tailcall.htm
- ↑ http://bdadam.com/blog/video-douglas-crockford-about-the-new-good-parts.html
This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later.