[JSON][Review] Boost Json Review
My review is primarily from the perspective of a library consumer. I use boost libraries and JSON on a daily basis. But neither did I ever write a boost library nor a JSON library. # Boost JSON Review
Please provide in your review information you think is valuable to understand your choice to ACCEPT or REJECT including JSON as a Boost library. Please be explicit about your decision (ACCEPT or REJECT).
I recommend ACCEPTing the library into boost. It's well designed, professionally implemented and fills an important use case. The library is well maintained and I trust the authors to keep doing so. The documentation is good, but I the reference documentation could use some improvements in a few spots. Individual reference sections appear copy-pasted a bit too aggressively, which is clearly good for the doc writer but not so much for the doc reader. I consider none of the issues below show-stoppers that would block acceptance into boost. This is also based in the experience that the authors are responsive in maintaining the library (and other libraries) and I trust them that they will use their best judgement and whether and how to address any issues. Brief side note: I'm a user of boost libraries and I consider boost a very important part of the C++ library ecosystem, in particular, for building reliable and long lasting software components and systems. It sets a high bar for acceptance. When using individual boost libraries, an important consideration is whether the library has aged well with the evolution of the C++ standard, whether it is currently well maintained and whether it will remain well maintained in the future. Here boost sometimes appears to be a bit of a mixed bag. So I consider the maintenance promise to be quite important and in particular more important than any of the issues raise below. I want to thank all the people involved in boost for keeping it alive, useful and high quality.
Some other questions you might want to consider answering:
- What is your evaluation of the design?
I evaluated primarily the DOM interface, that is the value container, but didn't look into the parser and serializer in detail. The design is solid and clear. Smaller issues raised on slack during the review have been addressed. I particularly like that it provides an easy to use, automatic ownership allocator interface. It relieves me from the pain of manually maintaining the life-time of non-global memory pools etc. Minor comments: - The `value::is_xxx` functions inconsistently return bools or pointers. This can have unexpected consequences if they are use from a context that does not implicitly convert to bool. Also I really can't remember which one returns what. For example, `is_null`, `is_bool`, `is_number`, `is_string`, `is_structured` returns a really nice mix. (Fixed on develop) - The library provides `initializer_list` construction for `value`. I personally would prefer it wouldn't do that. Too many pitfalls with a missing pair of braces causing the meaning to change quite dramatically. Still I think many expect them in modern JSON libs. So it's up to the authors. - The library is inconsistent about `shrink_to_fit`. `object does not have it, `array` has it. Also the documentation does not state what `shrink_to_fit` does, basically repeating the errors of `std::vector`. I firmly believe that the function should either be removed or clear and precise semantics should be documented. In it's current state the following question comes to mind: Why would I ever want to call it?
- What is your evaluation of the implementation?
I looked at the implementation of the `value` container. I did not look into the parser and serializer. The implementation appears clean and professional. The authors responded quickly to issues raised while studying the source code.
- What is your evaluation of the documentation?
The documentation is generally good and well structured. However, the reference is occasionally pretty terse. Some parts of it are written with the assumption that any user going to the reference section has recently read the entire documentation including examples upfront. Some comments (mostly against master and develop from early Sep 20): * `value::storage()` should say: Return a reference to the `storage_ptr` to the `memory_resource` associated with the value. The same thing applies to a few other classes. * The reference for the `initializer_list` constructor could use some improvements: e.g. "Construct an object or array from an `initializer_list`". I find this hugely important since just the mere presence of an `initializer_list` constructor changes the meaning of brace initialization. * Additionally, I doesn't mention how nested `initializer_list` how are treated. Most importantly it should show or link to some examples on how the `initializer_list` are converted to objects, arrays, and non structured values because some developers (if not most) might not immediately realize from just looking at the reference documentation how nested `initializer_list` are parsed. Maybe also reference to the `initializer_list` constructors of `array` and `object`. * `object::insert` is not well-documented. I did not understand what it does when I insert an already existing key, in particular, whether assignment takes place or not. The same is true for `object::emplace`. * The `object` initializer constructor is listed under "Copy constructor". * The `object::erase` return value documentation is wrong. * `object::reserve` says "This inserts an element into the container." * `object::swap(object)` and the friend version `swap(object,object)` have rather inconsistent briefs. * The initializer_list constructor for array is listed under move constructors. * The pilfer constructors are not quite explained. There's a reference to the paper, but I think the library should have a page that concisely explains what it does and what a user of the library can expect from pilfered objects. In particular, how the thing differs from move and why a user would ever want to use it instead of move. Also the reference pages for pilfer and so on don't explain what it does. * `array::reserve` shows `BOOST_FORCEINLINE` in the docs. * `value_to` is unclear about the precedence of the three listed options (constructor, library provided conversion or `tag_invoke`) if multiple are present provided this is actually possible. The same is true for `value_from`. * `storage_ptr` thread safety is not documented. That's a real bummer for a `shared_ptr`-like construct. * I found it rather difficult to figure out the difference in the behavior of `parser::write` and `parser::write_some` just from reading the documentation. Either reference documentation should clearly state what the difference is. I had to open both pages side by side to actually spot the wording difference. For example, I think that the documentation of `write_some` should clearly state that it's intended use case is for parsing multiple JSON values from a byte stream presented through one or more buffers whereas `write` is intended for parsing one JSON value from one or more buffers. At least that's what I concluded from the documentation after some time. * The `parser` documentation should either link to or inline provide one or more examples how the parser is supposed to be used. In particular `write_some` would benefit hugely from an example that shows how to properly extract a stream of JSON values from a sequence of buffers. * If I read the documentation correctly, `basic_parser::write` behave more like `parser::write_some` than `parser::write`. I would prefer the call it `basic_parser::write_some`. * The `basic_parser` reference documentation would benefit from either a link to an example or an inline example.
- What is your evaluation of the potential usefulness of the library?
I think the library supports a very narrow yet widespread use-case. It offers a modern interface and high performance. So I will definitely use and I believe it will also be useful to a wide audience.
- Did you try to use the library? With which compiler(s)? Did you have any problems?
I played with it in its STANDALONE mode with gcc 9.3.
- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
Roughly 5 hours playing with small examples, but mostly looking into the documentation and into the code of `json::value` and related parts of the DOM implementation.
- Are you knowledgeable about the problem domain?
I use JSON on a regular basis across different languages including c++ (mostly nlohmann's up to date). I've not implemented a JSON library myself. Kind regards, Max
On Sun, Sep 20, 2020 at 4:07 PM Maximilian Riemensberger via Boost
I recommend ACCEPTing the library into boost.
Thank you for your thoughtful review and the time you spent evaluating the library.
- The `value::is_xxx` functions inconsistently return bools or pointers.
Yes this was a design mistake, which we have reverted in develop. `value::if_*`, `array::if_*` and `object::if_*` all consistently return pointers, while `value::is_*` returns a bool. Your feedback helped us understand the flaw :) Thanks
On Sep 20, 2020, at 6:52 PM, Maximilian Riemensberger via Boost
wrote: - The library is inconsistent about `shrink_to_fit`. `object does not have it, `array` has it. Also the documentation does not state what `shrink_to_fit` does, basically repeating the errors of `std::vector`. I firmly believe that the function should either be removed or clear and precise semantics should be documented. In it's current state the following question comes to mind: Why would I ever want to call it?
Personally I prefer it keep it. In my day job I deal with very large json structures (sometimes over 1GB), and they get copied around. Boost.JSON is already more efficient than what we currently use, but I’m greedy. :) As an example, Boost.JSON's serializer allocates a lot more memory in the returned string than is actually used, in my testing. Using std::string::shrink_to_fit() it dropped about 50% of the memory usage. If std::string didn’t have such a function, I wouldn’t get that benefit. But I agree it should be clear what it does in the docs. -hadriel
On Sun, Sep 20, 2020 at 5:57 PM Hadriel Kaplan via Boost
As an example, Boost.JSON's serializer allocates a lot more memory in the returned string than is actually used, in my testing.
Actually `json::serializer` doesn't allocate anything, it is the conversion to string algorithm that grows the string. And it uses the native growth policy using the statement `s.resize( s.capacity() + 1 );` here: https://github.com/CPPAlliance/json/blob/6ddddfb16f9b54407d153abdda11a29f746... The algorithm first tries to fit the output into a local 4kb stack buffer. For serialized JSONS smaller than this size it should result in a std::string that allocates only what is needed. I am guessing that your serialized JSONs are larger (or else the resulting string would be perfectly sized). The library implementation tries to provide good general purpose performance but there will always be use-cases that it handles sub-optimally. If you have additional information about your typical JSONs, you might see a benefit to writing your own string conversion algorithm by making a copy of the `serialize_impl` function and adjusting its heuristics to better match your data. This is precisely why `json::serializer` is exposed as a public interface. Thanks
On Sep 20, 2020, at 9:19 PM, Vinnie Falco
wrote: The algorithm first tries to fit the output into a local 4kb stack buffer. For serialized JSONS smaller than this size it should result in a std::string that allocates only what is needed. I am guessing that your serialized JSONs are larger (or else the resulting string would be perfectly sized).
Yeah it was several hundred MB in the test I was trying.
The library implementation tries to provide good general purpose performance but there will always be use-cases that it handles sub-optimally. If you have additional information about your typical JSONs, you might see a benefit to writing your own string conversion algorithm by making a copy of the `serialize_impl` function and adjusting its heuristics to better match your data. This is precisely why `json::serializer` is exposed as a public interface.
Nice. Thanks! -hadriel
participants (3)
-
Hadriel Kaplan
-
Maximilian Riemensberger
-
Vinnie Falco