[boost] Boost Storytellers: Unordered

21 Jun 2023

      Hey everyone,

I figured I'd take some time to talk about Unordered and where it's at.

At the time of writing, Unordered's develop branch has been updated to include
a new container: `boost::concurrent_flat_map`. A high-level description of the
container and its algorithms can be found here:
https://www.boost.org/doc/libs/develop/libs/unordered/doc/html/unordered.htm...

It's been an honor and a privilege to develop alongside our own Joaquín M López
Muñoz and Peter Dimov.

While the advancements done to the algorithms themselves are note-worthy, what's
not often talked about is taming implementation complexity. After all,
algorithms
don't mean much until they're translated to code.

Boost.Unordered offers several different containers, all each a
similar flavor of
one another. The broad categorization is closed-addressing containerss,
open-addressing containers and finally the concurrent containers, which are in
actuality a subset of the open-addressing ones.

The closed-adressing (CA) containers are the most straight-forward to
abstract over
as an `unordered_map` and `unordered_multimap` only differ in terms of how
insertion is performed. Internally, the CA containers simply wrap a
`table` struct
that's templated over a Types policy-like which is used for pulling relevant
typedefs from. Though I must confess, the `table` itself does contain myriad
member functions such as `equals_unique` and `equal_equiv` which are used when
comparing the tables themselves via `operator==`.

The open-addressing (OA) containers are spiritually similar to the CA containers
in that they wrap a simple common table type with a type policy
template parameter
used for pulling typedefs from in the relevant spots.

But what makes the OA containers a good bit more nuanced is that there's both a
`unordered_flat_map` and a `unordered_node_map`. The implementation complexity
increases a good amount here.

The CA containers all operate in terms of a simple `node`-like type
that contains
the appropriate `value_type` but the OA containers must abstract away
a good bit more.

The flat OA containers store their `value_type`s directly in a contiguous
allocation so that lookup operations benefit from spatial locality in
addition to
being much more simple to manage from an implementation perspective. To support
`unordered_node_map`, we have to update the type that's stored to be a
`unique_ptr`-like.

This essentially further extends the type policies already used by the
containers
to abstract over the storage type, the actual value type and how those
values are
accessed from a pointer to the storage type. This includes
construction, destruction
and dereferencing.

The concurrent map adds yet another layer of abstraction but this
time, in a much
different spot and required a much larger holistic refactor of the codebase. The
fundamental differences in container types require abstracting over
how the table
reads and inserts elements itself so we introduce an extra table layer.

Previously there was but a single table type that supported all of the
OA containers
but now there's two table types that instead wrap a common
`table_core` which contains
common table primitives. The `concurrent_table` is responsible for
managing its own
locking schemes and then dispatches internally to its `table_core` and
similarly for the
`table` used by the non-concurrent containers.

The OA containers are essentially implemented in 3 layers with the `table_core`
being the lowest layer and the user-facing containers as the highest, with the
specific middle layer being responsible for managing atomicity for the
concurrent map
and for providing things like iterators for the flat/node containers
(the concurrent map
offers no iterators).

On paper it sounds relatively complex but in practice, navigating the
code with this
kind of foresight is easy and straight-forward, even more so now that
GitHub's UI
allows you to view and jump to struct definitions.

Taming such implementation complexity into a neat and organized set of
traits was
definitely something beyond my skilset so I'm happy Joaquin was able
to navigate such
waters and I have a humbled understanding of policy-based design and
how powerful it can
be when applied properly.

There's a couple of other things to note about the implementation that
Boost authors
may like.

One is that we do allocator-aware constrution of a stack-local in
emplace(). This
is so that we can avoid a heap-allocation when doing lookup in the node-based OA
containers. Originally, we actually didn't allocator-construct the
stack-local but
this turned out to be wholly incorrect because we'd move the
stack-local to the heap
when the key wasn't present in the container. This could lead to
subtle bugs like
not passing the memory resource when a user is using `std::pmr`.

Another interesting aspect is that we recently updated the
`erase(const_iterator)`
method of the OA containers to return a proxy type. For performance reasons, the
signature was originally `void erase(const_iterator)` but users found
this cumbersome
and annoying to work with as they'd often times swap containers
entirely via a typedef.
We now return a proxy type that lazily increments the internal
iterator via implicit
conversion to the container's iterator type. To this end, we're able
to essentially
have our cake and eat it too at the cost of making always-auto subtly confusing.

I'm more than happy to talk about any other implementation strategies or answer
questions if anyone is interested in the nitty gritty and can offer potential
optimizations.

- Christian

[boost] Boost Storytellers: Unordered

Christian Mazakas