'Bulk loading in a B-Tree
https://en.wikipedia.org/wiki/B-tree#Initial_construction
Currently I know of 2 ways for building a B-Tree : bulkloading and inserting key after key.
In the wiki example the keys are sorted, which is a precondition for bulkloading.
What is the advantage of bulkloading if the keys are unsorted ? hence I have to sort them myself still resulting in O(nlogn) , same as inserting key after key in the B-Tree.
Thanks.
Solution 1:[1]
Consider the following scenarios:
- If the data is already sorted, then you don't need to sort the data yourself. This may result in O(n) loading (I'm no expert in bulk loading).
- If the tree is very large and stored on disk or on multiple machines, then memory locality may play a role. Bulk loading avoids 'loading' parts of the tree into memory before adding something.
Solution 2:[2]
A difference a regular say, red-black tree, and a 4-tree is one of hysteresis; the B-tree reserves space for future use. Only when the node has too many keys does it split in half. The inefficiency arises when one submits keys in order; that is, half-filled nodes that never become full because one is always adding to one side.
This is an example of a 3-tree (using Knuth's definition) that I've inserted the numbers 1 -- 8 in ascending order. Most of the tree is at the low-end of occupancy. The expected value of the number of nodes to access data is 2.5.
Bulk loading is a process by which we ignore the rules of splitting and pack it in as tight as possible, knowing that we probably have more to come from the right. It also helps avoid unnecessary copying, at the expense of maybe having to fix the tree to restore the B-tree invariants on the right. In this, I've bulk-loaded the same data.
Though the asymptotic space and runtime are the same, instead of using little-more than half the space, it uses almost all the space. Thus, the average lookup cost is less at this point, expected value of 1.75. This is more useful when loading a large volume of data that remains relatively static.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | TilmannZ |
| Solution 2 |


