Tail Recursion (C)
What It Is
This is actually pretty logical if you think about it for a second, but I never really did. As we all know, function calls are pricy because they require context-switches. A function that uses recursion is inheritly quite limited because the stack will quickly run out of space to store all the function calls. Enter tail recursion: when the last operation in a function is another function call, there is no need to store the current context. Suppose function a calls function b, which in turn only calls function c. Instead of making c return to b when it’s done, it might just as well return directly to a. During recursion this can save a lot of time if the function calls itself very often, even making it at all possible to recurse a nearly infinite amount of times (at least you will not be bound by the stack size anymore).
So, if you write a time-critical function that uses recursion, try to make the recurring call the last thing the function does. Compiler optimization should take care of the rest.
Example: recurse.c
#include <limits.h> #include <stdio.h> volatile int sink = INT_MAX; void recurse(void) { if (--sink) recurse(); } int main() { recurse(); printf("Success\n"); return 0; }
$ gcc recurse.c -o recurse_normal $ gcc recurse.c -O3 -o recurse_optimized $ ./recurse_normal zsh: segmentation fault ./recurse_normal $ ./recurse_optimized Success
You can see this in the assembly output (GCC’s -S flag):
recurse: @ Function supports interworking. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. ldr r3, .L6 @@ r3 := &sink .L2: ldr r2, [r3, #0] @@ r2 := *r3 sub r2, r2, #1 @@ r2 := r2 - 1 str r2, [r3, #0] @@ *r3 := r2 ldr r2, [r3, #0] @@ r2 := *r3 cmp r2, #0 @@ r2 = 0 ? bne .L2 @@ no: goto .L2 bx lr @@ yes: goto *lr
Single-@ comments are GCC’s, double-@ are mine. As you can see, GCC explicitly mentions that it enabled tail recursion (“link register save eliminated”). The critical point here is the next-to-last line. There, instead of doing a full branch-load instruction (bl) which saves the current instruction pointer in the return register, a simple branch instruction is made that saves nothing. This is what you would expect from a normal loop, and it explains is why there is no stack overflow.