- Published at
Everything you need to know about B-Trees
B-Trees are a common type of self-balancing data structure used for storing and retrieving data. In this post, I'll explain what B-Tree indexes are, how they work, and why they are used.
Table of Contents
Many modern databases are very fast. This is because they maintain indexes to query data. These indexes are stored in some type of data structure and stored in memory for faster access.
One of the most common data structure that is used to store indexes are called B-Trees. Conceptually B-Trees and Binary Trees are similar to each other. They both store information in a tree like structure where finding the correct data requires traversing the tree starting from the top most node. But thats where the similarities end. B-Trees have a very different and more complicated mechanism of writing and retrieving information.
B-Trees are able to store many keys in a single node. This can make the comparisons costly but this significantly reduces the depth of the tree. This can be helpful because in most systems because comparisons done by the processor are cheaper and faster than retrieving the next node. In general, modern databases store several hundred keys on a single node.
Properties of B-Trees
There are a few important properties of B-Trees that you should be aware of. These, properties become more important when we discuss operations like insertion and deletion.
- The tree should always be balanced. One side of the B-Tree should not be deeper than another side of the tree. A tree with n keys has a depth of O(log n).
- Each node should have a minimum and maximum limit for keys. The maximum value can be set depending on the requirement of the system, but the minimum value should always be half the max value. For a tree with maximum limit of 10, the minimum value is 5 (10/2). The number can be rounded down if necessary.
- The number of subtrees for a node is set to the number of keys in that node + 1. If a node stores 10 keys, then the number of subtrees of that node should be 11.
- All the keys in the node should be sorted in ascending order. For any key in a particular node, all keys in its left subtree should be smaller than that key and all keys in its right subtree should be greater than that node. This is because each key acts as a separator for its subtrees. If the target key is greater than that key you should traverse to the right subtree and if the target key is smaller you should traverse left.
Operations on B-Trees
Now, let’s discuss how common operations like find, insert, update and delete work on B-Trees.
Find
Finding keys is relatively simple. You start at the parent node and make comparisons with each key in that node. Each key in the node acts as a separator for its child nodes. Keys smaller than the corresponding key is located to the left and keys greater than the corresponding key is located to the right. If a key falls between two keys, then we look for nodes between those keys.
Insert
When inserting keys into a B-Tree we need to make sure that it respects its properties. We first start inserting keys at the root node. When the node is full, split the node into two parts and take the middle key to create a parent key that references the two nodes. This algorithm of splitting the children node and pushing the middle key to top node is recursive.
Delete
What about deleting data? To delete a key from the tree, we start from the root node as usual and search for the key want to delete, and then delete that key. However, this creates a few problems we might want to look at.
Let’s suppose we are deleting a key from the leaf (bottom most) node. But, what happens if the node we are deleting from already has the minimum key count. Then we cannot just remove the key, as that violates the property of the B-Tree.
In such case, we can borrow a key from our sibling node. But this means that the parent key referencing our node is now incorrect. In order to solve this problem, we let the key from sibling become the new parent and take the parent key and move it to the new node. This preserves the property of our B-Tree. Keys to the left of the node, should be smaller than that seperator and keys to the right of the node should be greater than the seperator.
But what if the sibling node that we want to borrow from already has the minimum key count. Then, we cannot just borrow from the sibling. In such case, we merge the two sibling nodes and the parent seperator key into one node.
Now, lets look at another scenario where we want to delete a key from a parent node that points to multiple sub trees. If we delete a key from the node, then the child nodes referred by that key becomes invalid. In that case, we have the option to either get the largest key from the left subtree or the smallest key from the right subtree.
In this post we talked about what B-Trees are and how common operations work on them. Knowing how B-Trees work can be very helpful to understand the underlying data structure of many systems.