[review][json] Boost.JSON review
This is my review of Boost.JSON by Vinnie Falco and Krystian Stasiowski. First, a disclaimer: I have contributed to the library, mostly with design advice, occasionally to the implementation. Second, let's start with the end. I think that Boost.JSON should be ACCEPTED. All of its aspects are of Boost quality, its maintainers are qualified and responsive, the lack of a JSON library in Boost is frankly embarrassing, and the json::value type serves as a standard variant for scripting and transferring language-independent scalar and structured values, which opens many possibilities for further integration with future parts of the Boost ecosystem (I'll show an example of that later on.) To evaluate the library (I was already broadly, but not intimately, familiar with the structure and the interface), I started with the documentation, specifically with the Quick Look and Usage sections. It's very well written, and meets and exceeds the usual Boost level. I only have a few remarks to make. In Using Numbers, the documentation should explain more clearly and prominently that v.as_double() does not succeed when v contains an integer, but throws an exception. This is a common user expectation. Instead of focusing on number_cast, It should advertise value_to<double>(v) as the way to obtain a double from v, converting if necessary; and similarly, value_to<int>(v) as the way to obtain an int, range-checking if necessary. number_cast should be considered legacy at this point, and not featured. I understand that it offers an error_code& overload for those who don't want exceptions, but the throwing version is entirely superseded by value_to. In Parsing, the examples using parse_options look like this: parse_options opt; opt.allow_comments = true; opt.allow_trailing_commas = true; opt.allow_invalid_utf8 = true; value jv = parse( "[1,2,3,] // comment, extra comma ", storage_ptr(), opt ); That's fine and is indeed the C++11 standard-compatible way to do things. However, when designated initializers are available, such as on g++ or clang++, or on VS2019 with /std:c++latest, the above can be written as value jv = parse( "[1,2,3,] // comment, extra comma ", {}, { .allow_comments = true, .allow_invalid_commas = true, .allow_invalid_utf8 = true } ); and it might be useful to show this style too. In Value Conversion, Converting to Foreign Types, it's shown that defining a constructor customer::customer( value const& jv ); enables value_to<customer>(jv) to work. This is, in my opinion, a legacy mechanism made obsolete by tag_invoke and should not be supported. This example should be removed from the documentation, and only the tag_invoke way should be featured. And since we're on this page, it's odd that the StringLike requirement wants construction from (char*, std::size_t) - pointer to non-const characters. I don't think that any string type has such a constructor, they all take char const*. With the documentation carefully read (as in, quickly scanned), I proceeded to try to use the library. A previous reviewer (Reiner Deyke) had brought up the subject of CBOR parsing and serialization, and I had looked into CBOR as a format. It seemed regular and easy to parse and serialize, so I decided to write a CBOR serializer and parser for boost::json::value. How hard can that be, really? Not that hard. The serializer was quickly up and running, and it seemed quite performant: Serializing canada.json to CBOR: 1056200 bytes, 4840 us Serializing canada.json to JSON: 2306988 bytes, 13036 us (You can find the code at https://github.com/pdimov/boost_json_review/blob/master/boost_json_cbor.cpp, the serializer is the first 140 or so lines.) As far as using the library went, it was completely painless and straightforward. Next up, the parser. I decided to write it using the json::value interface, like a reference implementation. This, too, went well, and the parser was soon parsing, and even producing the same value as the one serialized. The performance, though... Parsing canada.json from JSON: 10548 us Parsing canada.json from CBOR: 33941 us, successful roundtrip This wasn't good. Switching to a binary format such as CBOR is supposed to be faster, not 3.4 times slower! Before you say "well your parser is written like you were a college student who was taught Java", be aware that I actually refactored the parser a number of times; sometimes it was written in one style, sometimes in another, functions went from being hand-inlined to extracted and back a few times, but none of that made a dent on those 34ms. To find out where the time was going, I looked into way more detail I needed to. For instance, I saw that assigning f.ex. 1.0 to a default-constructed json::value executes much more code than it needs to, and makes non-inlined calls into libboost_json. One would expect this to take one check and branch (do we need to deallocate anything? no), and two assignments (kind = kind_double; dbl_ = 1.0). Turns out that the efficient way to assign 1.0 is not `v = 1.0;`, but `v.emplace_double() = 1.0;`. I of course replaced the assignments with this supposedly efficient form, and it of course made no difference at all. I then in desperation actually removed all the assignments. And it barely made any difference. At this point I decided that the json::value interface is not the right fit for a performant parser, and switched to using boost::json::value_stack. This is what Boost.JSON's parser itself uses, so if that's not fast, well, I don't know what will be. Updating the parser to use value_stack was basically trivial: https://github.com/pdimov/boost_json_review/commit/7aa8d2dc8c706c3f4204d6fe7... and it worked. The performance was what I expected: Parsing canada.json from JSON: 10044 us Parsing canada.json from CBOR: 6332 us, successful roundtrip (You can find the value_stack parser at https://github.com/pdimov/boost_json_review/blob/master/boost_json_cbor_st.c... and the results at https://github.com/pdimov/boost_json_review/blob/master/boost_json_cbor_st.m....) (Since now changes to the parser code actually affected performance, I added two optimizations to the array parsing - a fast path for arrays containing only doubles, and a fast path for arrays containing only integers.) Is value hopeless for parsing, then? I went back to the other parser. This is how the part that parses objects looks like, with the irrelevant parts omitted: boost::json::object & o = v.emplace_object(); o.reserve( n ); for( std::size_t i = 0; i < n; ++i ) { // key string boost::json::string_view sv( ... ); // value boost::json::value w( v.storage() ); first = parse_cbor_value( first, last, w ); o.insert( boost::json::key_value_pair( sv, std::move( w ) ) ); } The reason I use o.insert here is that I wanted the CBOR parser to match the behavior of the JSON parser with respect to key order and duplicate keys. The JSON parser preserves key order, which is both user-friendly and better from a security perspective (observing key order may leak the seed of the hash function.) It also handles duplicates in an unspecified manner. Well, whatever that unspecified manner was, I wanted to do the same, and using `insert` looked like the way to do it. There however was no `insert` taking a string_view and a value, so I had to create a key_value_pair. There was `insert_or_assign` that did take a string_view and a value, but it overwrites duplicates, and I didn't want that. I consulted Vinnie Falco about the lack of `insert(string_view, value)`, and he pointed me at `emplace`. This sounded exactly like what my hypothetical `insert` would do. However, hearing `emplace` gave me an idea. In the same way `v.emplace_object()` gives me `object&`, `o.emplace(sv)` could have given me `value&`, which I could then pass directly to `parse_cbor_value`: first = parse_cbor_value( first, last, o.emplace(sv) ); I shared this brilliant observation with Vinnie, and he replied "obj[sv]". obj[sv] indeed. Sometimes we forget that C++98 did have a useful feature or two. This doesn't exactly work as I wanted it to, because it neither guarantees insertion order, nor has the "right" duplicate handling. Nevertheless, I did make the change: https://github.com/pdimov/boost_json_review/commit/e6baf8f1ce81966534ac93c50... and guess what? Parsing canada.json from JSON: 10261 us Parsing canada.json from CBOR: 6761 us, successful roundtrip So the problem the entire time was that one single line: o.insert( boost::json::key_value_pair( sv, std::move( w ) ) ); I haven't looked into it, but I suspect that it for some reason makes a copy of `w`, even though it's passed using `std::move` and uses the right `storage_ptr`. Oh well. At least it works now. So, there is nothing that much wrong with the `value` interface, and it's possible to build values the straightforward way, instead of using a value_stack, without significant loss in performance. These CBOR parsers are entirely functional, and only have the following limitations: * Binary strings are not supported (as they're not representable in json::value); * Infinite sequences are not supported (because I was too lazy); * 32 bit platforms need to check the string/array/object sizes for possible size_t overflow (because I was way too lazy.) I had also looked at, and tried, the value conversion interface, in the course of developing the examples of an unreleased library of mine, Describe. The value_from example is https://pdimov.github.io/describe/doc/html/describe.html#example_to_json value_to is at https://pdimov.github.io/describe/doc/html/describe.html#example_from_json and a simple JSON-RPC example is at https://pdimov.github.io/describe/doc/html/describe.html#example_json_rpc These all worked without any trouble, except I had to link with Boost.Container for one of them, which didn't seem necessary. They are also a good illustration of the possibilities that having a standard json::value type presents, in terms of further library development. (One could f.ex. imagine a Boost.Script interpreter that uses json::value as its data type, and can be used to script C++ code.) In conclusion, the library works, is well documented, and should be ACCEPTED. I have no acceptance conditions. Obviously, the performance problem concerning object::insert needs to be addressed, and more generally, the value interface needs a bit more performance scrutiny because it's not currently exercised by the benchmarks, but I trust that this will be done. Solid work, thanks to the authors for their submission, and thank you for reading this far.
participants (2)
-
Peter Dimov
-
Vinnie Falco