Simple Tree

Concurrent data structures, they’re all the rage. But what are they? That’s not the question I’ll answer in this post.

Instead, I’m going to peacock a very simple concurrent tree that I dreamed of a few years ago. It’s so simple that nobody ever talks about it, so I don’t know what the official name is. We’ll call it Simple Tree.

Simple Tree Sketch

The requirements for this tree are as follows:

In particular, this non-requirement makes life easier:

This is useful in situations where multiple concurrent workers independently work on eachother’s results, and the tree stores result sets as nodes. While working on it, they require a snapshot of the tree, although they don’t particularly care if they miss extensions during that period.

The obvious choice is to store pointers to parent nodes in each child node. Not the other way around. This leaves one open problem: how do you store a reference to the entire tree? If you keep a list of all leaf nodes, you’re in trouble: how do you update the tree, now? If you update that list in-place, you’ll need to copy it for every snapshot. Woops, O ( N ) . If you make a copy of the leaf nodes for each update, now suddenly your append operation is O ( N ) !

Instead, a linked list of all nodes, not just leaf nodes, is kept in sync with the tree itself. Because the tree is append-only, no nodes need ever be removed from this linked list, which makes every reference to a node in that list inherently a snapshot. O ( 1 ) . Appending remains O ( 1 ) , and lock-free.

Let’s cover the steps it took to build up the example tree illustrated earlier. There are five values in this tree: A, B, C, D and E. They were added in that order: A is the root, B, E and D are the leaves.

An empty tree is just a NULL pointer:

Empty Simple Tree

Adding the first element, the root. Create a tree-node, store the value (A) and set the parent to NULL. Then create a new linked list node, set its next pointer to NULL and its tree pointer to the A node:

Simple tree step 1

Don’t forget to update the tree pointer, *T. Setting that to point to the new linked list element, which points to A, is what actually “commits” the tree.

Adding a new node is done by first extending the tree: create a B node and set its parent to A. Then update the linked list by prepending a node pointing to B, and finally update the entire tree by setting *T to that node:

Simple tree with two nodes

Updating a linked list can be done lock-free (with a CAS, or other atomic operation). When a thread loses a race to update, it deletes its fresh list node (not the tree node, that remains) and starts over.

Now, just continue this way for every node:

Simple tree with three nodes

If you want to snapshot the tree at this point, just save a reference to the current value of *T. No matter what happens afterwards, you can traverse the tree at will and its structure will remain the same.

Even when other threads continue to extend it:

Simple tree with four nodes

At this point, anyone who holds an old copy of *T will not see node D. But that old copy of *T, which points to C, is still a valid tree. It’s just old.

Simple tree with five nodes

And that’s how you get a Simple Tree!

P.S.: If someone knows the official name of this data structure, please do tell.

Date: 2015-02-09

Copyright © 2015 Hraban Luyat