[review][json] Boost.JSON review

19 Sep 2020

      This is my review of Boost.JSON by Vinnie Falco and Krystian Stasiowski.

First, a disclaimer: I have contributed to the library, mostly with design 
advice, occasionally to the implementation.

Second, let's start with the end. I think that Boost.JSON should be 
ACCEPTED. All of its aspects are of Boost quality, its maintainers are 
qualified and responsive, the lack of a JSON library in Boost is frankly 
embarrassing, and the json::value type serves as a standard variant for 
scripting and transferring language-independent scalar and structured 
values, which opens many possibilities for further integration with future 
parts of the Boost ecosystem (I'll show an example of that later on.)

To evaluate the library (I was already broadly, but not intimately, familiar 
with the structure and the interface), I started with the documentation, 
specifically with the Quick Look and Usage sections.

It's very well written, and meets and exceeds the usual Boost level. I only 
have a few remarks to make.

In Using Numbers, the documentation should explain more clearly and 
prominently that v.as_double() does not succeed when v contains an integer, 
but throws an exception. This is a common user expectation.

Instead of focusing on number_cast, It should advertise value_to<double>(v) 
as the way to obtain a double from v, converting if necessary; and 
similarly, value_to<int>(v) as the way to obtain an int, range-checking if 
necessary.

number_cast should be considered legacy at this point, and not featured. I 
understand that it offers an error_code& overload for those who don't want 
exceptions, but the throwing version is entirely superseded by value_to.

In Parsing, the examples using parse_options look like this:

    parse_options opt;
    opt.allow_comments = true;
    opt.allow_trailing_commas = true;
    opt.allow_invalid_utf8 = true;

    value jv = parse( "[1,2,3,] // comment, extra comma ", storage_ptr(), 
opt );

That's fine and is indeed the C++11 standard-compatible way to do things. 
However, when designated initializers are available, such as on g++ or 
clang++, or on VS2019 with /std:c++latest, the above can be written as

    value jv = parse( "[1,2,3,] // comment, extra comma ", {},
        { .allow_comments = true, .allow_invalid_commas = true,
          .allow_invalid_utf8 = true } );

and it might be useful to show this style too.

In Value Conversion, Converting to Foreign Types, it's shown that defining a 
constructor

    customer::customer( value const& jv );

enables value_to<customer>(jv) to work. This is, in my opinion, a legacy 
mechanism made obsolete by tag_invoke and should not be supported. This 
example should be removed from the documentation, and only the tag_invoke 
way should be featured.

And since we're on this page, it's odd that the StringLike requirement wants 
construction from (char*, std::size_t) - pointer to non-const characters. I 
don't think that any string type has such a constructor, they all take char 
const*.

With the documentation carefully read (as in, quickly scanned), I proceeded 
to try to use the library. A previous reviewer (Reiner Deyke) had brought up 
the subject of CBOR parsing and serialization, and I had looked into CBOR as 
a format. It seemed regular and easy to parse and serialize, so I decided to 
write a CBOR serializer and parser for boost::json::value. How hard can that 
be, really?

Not that hard. The serializer was quickly up and running, and it seemed 
quite performant:

Serializing canada.json to CBOR: 1056200 bytes, 4840 us
Serializing canada.json to JSON: 2306988 bytes, 13036 us

(You can find the code at 
https://github.com/pdimov/boost_json_review/blob/master/boost_json_cbor.cpp, 
the serializer is the first 140 or so lines.)

As far as using the library went, it was completely painless and 
straightforward.

Next up, the parser. I decided to write it using the json::value interface, 
like a reference implementation. This, too, went well, and the parser was 
soon parsing, and even producing the same value as the one serialized. The 
performance, though...

Parsing canada.json from JSON: 10548 us
Parsing canada.json from CBOR: 33941 us, successful roundtrip

This wasn't good. Switching to a binary format such as CBOR is supposed to 
be faster, not 3.4 times slower!

Before you say "well your parser is written like you were a college student 
who was taught Java", be aware that I actually refactored the parser a 
number of times; sometimes it was written in one style, sometimes in 
another, functions went from being hand-inlined to extracted and back a few 
times, but none of that made a dent on those 34ms.

To find out where the time was going, I looked into way more detail I needed 
to. For instance, I saw that assigning f.ex. 1.0 to a default-constructed 
json::value executes much more code than it needs to, and makes non-inlined 
calls into libboost_json. One would expect this to take one check and branch 
(do we need to deallocate anything? no), and two assignments (kind = 
kind_double; dbl_ = 1.0).

Turns out that the efficient way to assign 1.0 is not `v = 1.0;`, but 
`v.emplace_double() = 1.0;`. I of course replaced the assignments with this 
supposedly efficient form, and it of course made no difference at all.

I then in desperation actually removed all the assignments. And it barely 
made any difference.

At this point I decided that the json::value interface is not the right fit 
for a performant parser, and switched to using boost::json::value_stack. 
This is what Boost.JSON's parser itself uses, so if that's not fast, well, I 
don't know what will be.

Updating the parser to use value_stack was basically trivial:

https://github.com/pdimov/boost_json_review/commit/7aa8d2dc8c706c3f4204d6fe7...

and it worked. The performance was what I expected:

Parsing canada.json from JSON: 10044 us
Parsing canada.json from CBOR: 6332 us, successful roundtrip

(You can find the value_stack parser at 
https://github.com/pdimov/boost_json_review/blob/master/boost_json_cbor_st.c... 
and the results at 
https://github.com/pdimov/boost_json_review/blob/master/boost_json_cbor_st.m....)

(Since now changes to the parser code actually affected performance, I added 
two optimizations to the array parsing - a fast path for arrays containing 
only doubles, and a fast path for arrays containing only integers.)

Is value hopeless for parsing, then? I went back to the other parser.

This is how the part that parses objects looks like, with the irrelevant 
parts omitted:

boost::json::object & o = v.emplace_object();
o.reserve( n );

for( std::size_t i = 0; i < n; ++i )
{
    // key string
    boost::json::string_view sv( ... );

    // value
    boost::json::value w( v.storage() );
    first = parse_cbor_value( first, last, w );

    o.insert( boost::json::key_value_pair( sv, std::move( w ) ) );
}

The reason I use o.insert here is that I wanted the CBOR parser to match the 
behavior of the JSON parser with respect to key order and duplicate keys. 
The JSON parser preserves key order, which is both user-friendly and better 
from a security perspective (observing key order may leak the seed of the 
hash function.)

It also handles duplicates in an unspecified manner. Well, whatever that 
unspecified manner was, I wanted to do the same, and using `insert` looked 
like the way to do it.

There however was no `insert` taking a string_view and a value, so I had to 
create a key_value_pair. There was `insert_or_assign` that did take a 
string_view and a value, but it overwrites duplicates, and I didn't want 
that.

I consulted Vinnie Falco about the lack of `insert(string_view, value)`, and 
he pointed me at `emplace`. This sounded exactly like what my hypothetical 
`insert` would do. However, hearing `emplace` gave me an idea. In the same 
way `v.emplace_object()` gives me `object&`, `o.emplace(sv)` could have 
given me `value&`, which I could then pass directly to `parse_cbor_value`:

    first = parse_cbor_value( first, last, o.emplace(sv) );

I shared this brilliant observation with Vinnie, and he replied "obj[sv]".

obj[sv] indeed. Sometimes we forget that C++98 did have a useful feature or 
two.

This doesn't exactly work as I wanted it to, because it neither guarantees 
insertion order, nor has the "right" duplicate handling. Nevertheless, I did 
make the change:

https://github.com/pdimov/boost_json_review/commit/e6baf8f1ce81966534ac93c50...

and guess what?

Parsing canada.json from JSON: 10261 us
Parsing canada.json from CBOR: 6761 us, successful roundtrip

So the problem the entire time was that one single line:

    o.insert( boost::json::key_value_pair( sv, std::move( w ) ) );

I haven't looked into it, but I suspect that it for some reason makes a copy 
of `w`, even though it's passed using `std::move` and uses the right 
`storage_ptr`. Oh well. At least it works now.

So, there is nothing that much wrong with the `value` interface, and it's 
possible to build values the straightforward way, instead of using a 
value_stack, without significant loss in performance.

These CBOR parsers are entirely functional, and only have the following 
limitations:

* Binary strings are not supported (as they're not representable in 
json::value);
* Infinite sequences are not supported (because I was too lazy);
* 32 bit platforms need to check the string/array/object sizes for possible 
size_t overflow (because I was way too lazy.)

I had also looked at, and tried, the value conversion interface, in the 
course of developing the examples of an unreleased library of mine, 
Describe. The value_from example is

https://pdimov.github.io/describe/doc/html/describe.html#example_to_json

value_to is at

https://pdimov.github.io/describe/doc/html/describe.html#example_from_json

and a simple JSON-RPC example is at

https://pdimov.github.io/describe/doc/html/describe.html#example_json_rpc

These all worked without any trouble, except I had to link with 
Boost.Container for one of them, which didn't seem necessary. They are also 
a good illustration of the possibilities that having a standard json::value 
type presents, in terms of further library development. (One could f.ex. 
imagine a Boost.Script interpreter that uses json::value as its data type, 
and can be used to script C++ code.)

In conclusion, the library works, is well documented, and should be 
ACCEPTED. I have no acceptance conditions. Obviously, the performance 
problem concerning object::insert needs to be addressed, and more generally, 
the value interface needs a bit more performance scrutiny because it's not 
currently exercised by the benchmarks, but I trust that this will be done.

Solid work, thanks to the authors for their submission, and thank you for 
reading this far.

Peter Dimov

Vinnie Falco

tags

participants (2)