Cache Craftiness for Fast Multicore Key-Value Storage
The Big Idea
Consider the problem of storing, in memory, millions of
(key, value) pairs, where
key is a
variable-length string. If we just wanted to support point lookup, we’d use a hash table. But
assuming we want to support range queries, some kind of tree structure is probably required. One
candidate might be a traditional B+-tree.
In such a B+-tree, the number of levels of the tree are kept small thanks to the fact that each node has a high fan-out. However, that means that a large number of keys are packed into a single node, and so there’s still a large number of key comparisons to perform when searching through the tree.
This is further exacerbated by variable-length keys (e.g. strings), where the cost of key comparisons can be quite high. If the keys are really long they can each occupy multiple cache lines, and so comparing two of them can really mess up your cache locality.
This paper proposes an efficient tree data structure that relies on splitting variable length keys into a variable number of fixed-length keys called slices. As you go down the tree, you compare the first slice of each key, then the second, then the third and so on, but each comparision has constant cost.
For example, think about the string
the quick brown fox jumps over the lazy dog. This string
consists of the following 8-byte slices:
s over t,
dog. To find a string in a tree, you can look for all strings that match the first
slice first, and then look for the second slice only in strings that matched the first slice, and so
on - only comparing a fixed size subset of the key at any time. This is much more efficient than
comparing long strings to one another over and over again. The trick is to design a structure that
takes advantage of the cache benefits of doing these fixed-size comparisons, without losing a
tradeoff based on the large cardinality of the slice ‘alphabet’. Enter the Masstree.