Re: [Boost-users] [boost] Interest in B-tree library for Boost?

16 Sep 2010

      On Wed, Sep 15, 2010 at 10:40 PM, Cory Nelson <phrosty@gmail.com> wrote:
...
On Wed, Sep 15, 2010 at 6:08 PM, Beman Dawes <bdawes@acm.org> wrote:
...
On Wed, Sep 15, 2010 at 7:26 PM, Cory Nelson <phrosty@gmail.com> wrote:
...
- Will this support multiple B-trees in one file?
Not currently. That would be relatively easy to support. What is the
motivating use case?
Just ease of distribution, really.
I'm currently using a SQLite database to distribute a few tables of
read-only data that doesn't need SQL, relational, or ACID properties
-- it's just the best thing available in terms of ease of distribution
and portability.  I'd love to remove all the unneeded layers and just
use a B-tree directly.
Sounds quite useful. I can think of several past apps that would have benefited.

I've added this to the Wish List:

Multiple B-trees in one file. Requested by Cory Nelson. Possible
design: Initial btree_map in the file is an index of its children.
Key: name, Data: page id of the child's header page. Recursion
allowed; the header page pointed could itself be an index header.
Current interface might gain additional constructor and open()
overload taking a ref to the parent index and the desired name.
...
...
...
- Will this implement ACID properties?
No. The model is a standard library associative container, not a
database. ACID properties can be built on top of B-trees, but the
B-trees themselves don't have ACID properties, other than a few
aspects like atomic writes.
Alright, that clears up my understanding about the scope of this.
Still sounds great!
...
...
- Can this be adapted for in-memory use as well, with full non-POD support?
No current plans for that. Why wouldn't you just a standard library
associative container for that?
Storing keys together improves cache usage a lot for small keys, and
allows the CPU to detect ordered iteration to prefetch the next cache
line.  It also reduces fragmentation and the maximum number of pages
that must be touched to find a result, which in turn reduces TLB
misses and page faults.  The tests here indicate that this can be very
significant with large collections:
That's also the point made by the article Thorsten referenced,
http://queue.acm.org/detail.cfm?id=1814327, although it is claiming
much greater speedups IIRC.
...
http://idlebox.net/2007/stx-btree/stx-btree-0.8.3/doxygen-html/speedtest.htm...
Interesting. So someone has already done the work for drop in STL
associative container replacements.

The timings only covered small trees. 16,000 was the number of
elements mentioned. I'm testing with up to 100 million elements, and
plan to expand that to a few billion elements.

Thanks,

--Beman