Concurrent data structures, they’re all the rage. But what are they? That’s not the question I’ll answer in this post.
Instead, I’m going to peacock a very simple concurrent tree that I dreamed of a few years ago. It’s so simple that nobody ever talks about it, so I don’t know what the official name is. We’ll call it Simple Tree.
The requirements for this tree are as follows:
- Immutable: once a node is added, it will never be removed
- Easily extendable: adding a node should be (of course)
- Easy ancestry fetching: getting a parent node should be
- Lock-free (that’s good for concurrency)
- Cheap snap-shotting: make a persistent “copy” of the tree structure in
In particular, this non-requirement makes life easier:
- No easy access to child nodes required:
This is useful in situations where multiple concurrent workers independently work on eachother’s results, and the tree stores result sets as nodes. While working on it, they require a snapshot of the tree, although they don’t particularly care if they miss extensions during that period.
The obvious choice is to store pointers to parent nodes in each child node. Not the other way around. This leaves one open problem: how do you store a reference to the entire tree? If you keep a list of all leaf nodes, you’re in trouble: how do you update the tree, now? If you update that list in-place, you’ll need to copy it for every snapshot. Woops, . If you make a copy of the leaf nodes for each update, now suddenly your append operation is !
Instead, a linked list of all nodes, not just leaf nodes, is kept in sync with the tree itself. Because the tree is append-only, no nodes need ever be removed from this linked list, which makes every reference to a node in that list inherently a snapshot. . Appending remains , and lock-free.
Let’s cover the steps it took to build up the example tree illustrated earlier. There are five values in this tree:
E. They were added in that order:
A is the root,
D are the leaves.
An empty tree is just a
Adding the first element, the root. Create a tree-node, store the value (
A) and set the parent to
NULL. Then create a new linked list node, set its next pointer to
NULL and its tree pointer to the
Don’t forget to update the tree pointer,
*T. Setting that to point to the new linked list element, which points to
A, is what actually “commits” the tree.
Adding a new node is done by first extending the tree: create a
B node and set its parent to
A. Then update the linked list by prepending a node pointing to
B, and finally update the entire tree by setting
*T to that node:
Updating a linked list can be done lock-free (with a CAS, or other atomic operation). When a thread loses a race to update, it deletes its fresh list node (not the tree node, that remains) and starts over.
Now, just continue this way for every node:
If you want to snapshot the tree at this point, just save a reference to the current value of
*T. No matter what happens afterwards, you can traverse the tree at will and its structure will remain the same.
Even when other threads continue to extend it:
At this point, anyone who holds an old copy of
*T will not see node
D. But that old copy of
*T, which points to
C, is still a valid tree. It’s just old.
And that’s how you get a Simple Tree!
P.S.: If someone knows the official name of this data structure, please do tell.