[JSON][review] Design: what is expected of a "JSON library"?

newer
[preprocessor] Dropping support...

Andrzej Krzemienski

15 Sep 2020 15 Sep '20

3:46 p.m.

Hi Everyone, This is not a complete Boost Review yet. In this email I wanted to discuss the design goals of the library. The point that I would like to raise is the tension between having relatively small types (int64_t, uint64_t, double) represent numbers in json::value on the one hand, and ECMA specification allowing arbitrarily big/precise numbers in JSON format on the other. Can I expect of a JSON library that when it converts a JSON contents into internal representation and then back to JSON contents, I should get the same contents (moduo white space)? This would be possible if the internal representation of JSON numbers was an arbitrary-precision decimal type or a string. But when we need to squeeze any number into 24 bits, we will soon get to the point when integer number `100000000000000000000001` after the experiment gets changed to `1E23`. Is this acceptable for a JSON library? But maybe it is not a valid goal? Maybe the goal of the JSON format is to have objects already created in internal representation converted to text and then back to objects? (assuming the recipient program is run on the same environment as the sender (no differences in word size or maximum int representation.) That is, as long as you agree to the constraints of json::value, whatever you manage to put inside, we guarantee that you get the same value when you serialize it and then parse it back. Boost.JSON does this nicely. In that other view, if I have objects of types `boost::multiprecision::cpp_int` my only option is to pass them as strings in JSON protocol. But I can pass any number as string anyway, so what is the use of numbers in JSON format? Uness it is just practical: you can choose to use numbers and then internal representations of JSON may be smaller. Do you get the concern that I am seeing? I mean I have used JSON libraries before, and this has never been a practical problem. But Boost Review puts the bar high for the libraries, so I guess this question should be answered: what guarantees can a JSON library give us with respect to accuracy of numbers? Regards, &rzej;

Show replies by date

Vinnie Falco

15 Sep 15 Sep

4:08 p.m.

On Tue, Sep 15, 2020 at 8:47 AM Andrzej Krzemienski via Boost <boost@lists.boost.org> wrote:

...

...Boost Review puts the bar high for the libraries, so I guess this question should be answered: what guarantees can a JSON library give us with respect to accuracy of numbers?

This is a great question. My philosophy is that there can be no single JSON library which satisfies all use-cases, because there are requirements which oppose each other. Some people for example want to parse comments and be able to serialize the comments back out. This would be a significant challenge to implement in the json::value container without impacting performance. It needs to be said up-front, that there are use-cases for which Boost.JSON will be ill-suited, and that's OK. The library targets a specific segment of use-cases and tries to excel for those cases. In particular, Boost.JSON is designed to be a competitor to JSON for Modern C++ ("nlohmann's json") and RapidJSON. Both of these libraries are wildly popular. Support for extended or arbitrary precision numbers is something that we can consider. It could be added as a new "kind", with a custom data type. By necessity this would require dynamic allocation to store the mantissa and exponent, which is fine. However note that the resulting serialized JSON from these arbitrary precision numbers is likely to be rejected by many implementations. In particular, Javascript in the browser and Node.js in the server would reject such numbers. As a goal of the library is suitability as a vocabulary type, homogeneity of interface (same integer and floating point representation on all platforms) is prioritized over min/maxing (using the largest representation possible). The cousin to homogeneity is compatibility - we would like to say that ANY serialized output of the library will be recognizable by most JSON implementations in the wild. If we support arbitrary precision numbers, some fraction of outputs will no longer be recognized. Here we have the aforementioned tension between features and usability. Increasing one decreases the other. Regards

Andrzej Krzemienski

11 p.m.

wt., 15 wrz 2020 o 18:09 Vinnie Falco <vinnie.falco@gmail.com> napisał(a):

...

On Tue, Sep 15, 2020 at 8:47 AM Andrzej Krzemienski via Boost <boost@lists.boost.org> wrote:

...
...Boost Review puts the bar high for the libraries, so I guess this question should be answered: what guarantees can a JSON library give us with respect to accuracy of numbers?

This is a great question. My philosophy is that there can be no single JSON library which satisfies all use-cases, because there are requirements which oppose each other. Some people for example want to parse comments and be able to serialize the comments back out. This would be a significant challenge to implement in the json::value container without impacting performance. It needs to be said up-front, that there are use-cases for which Boost.JSON will be ill-suited, and that's OK. The library targets a specific segment of use-cases and tries to excel for those cases. In particular, Boost.JSON is designed to be a competitor to JSON for Modern C++ ("nlohmann's json") and RapidJSON. Both of these libraries are wildly popular.

Support for extended or arbitrary precision numbers is something that we can consider. It could be added as a new "kind", with a custom data type. By necessity this would require dynamic allocation to store the mantissa and exponent, which is fine. However note that the resulting serialized JSON from these arbitrary precision numbers is likely to be rejected by many implementations. In particular, Javascript in the browser and Node.js in the server would reject such numbers.

As a goal of the library is suitability as a vocabulary type, homogeneity of interface (same integer and floating point representation on all platforms) is prioritized over min/maxing (using the largest representation possible). The cousin to homogeneity is compatibility - we would like to say that ANY serialized output of the library will be recognizable by most JSON implementations in the wild. If we support arbitrary precision numbers, some fraction of outputs will no longer be recognized. Here we have the aforementioned tension between features and usability. Increasing one decreases the other.

So, I can see the design goals and where they come from. For the record, I am not requesting for the support of arbitrary-precision numbers. This is just my way of trying to determine the scope of this library. I would appreciate it if you said something similar in the docs in some "design decisions" section. To me, the sentence "This library provides containers and algorithms which implement JSON" followed by a reference to Standard ECMA-262 <https://www.ecma-international.org/ecma-262/10.0/index.html> somehow implied that you are able to parse just *any* JSON input. That high-level contract -- as I understand it -- is: 1. Any json::value that you can build can be serialized and then deserialized, and you are guaranteed that the resulting json::value will be equal to the original. 2. JSON inputs where number values cannot be represented losslessly in uint64_t, int64_t and double, may render different values when parsed and then serialized back, and for extremely big number values can even fail to parse. 3. Whatever JSON output you can produce with this library, we guarantee it can be passed by any common JSON implementation (probably also based on uint64_t+int64_t+double implementation. Regards, &rzej;

Vinnie Falco

11:23 p.m.

On Tue, Sep 15, 2020 at 4:00 PM Andrzej Krzemienski <akrzemi1@gmail.com> wrote:

...

This is just my way of trying to determine the scope of this library. I would appreciate it if you said something similar in the docs in some "design decisions" section.

That high-level contract -- as I understand it -- is: 1. Any json::value that you can build can be serialized and then deserialized, and you are guaranteed that the resulting json::value will be equal to the original. 2. JSON inputs where number values cannot be represented losslessly in uint64_t, int64_t and double, may render different values when parsed and then serialized back, and for extremely big number values can even fail to parse. 3. Whatever JSON output you can produce with this library, we guarantee it can be passed by any common JSON implementation (probably also based on uint64_t+int64_t+double implementation.

Yes this sounds about right and I agree the docs should make this clear. Thanks

Eduardo Quintana

16 Sep 16 Sep

2:04 p.m.

...

Do you get the concern that I am seeing? I mean I have used JSON libraries before, and this has never been a practical problem. But Boost Review puts the bar high for the libraries, so I guess this question should be answered: what guarantees can a JSON library give us with respect to accuracy of numbers?

Regards, &rzej;

I would have liked to give users the freedom to choose the accuracy through templates. Something like: // i want speed boost::json::parser< int > p_int; // i want accuracy boost::json::parser< boost::multiprecision::cpp_int > p_infinity; Eduardo

Vinnie Falco

2:31 p.m.

On Wed, Sep 16, 2020 at 7:27 AM Eduardo Quintana via Boost <boost@lists.boost.org> wrote:

...

I would have liked to give users the freedom to choose the accuracy through templates.

The library avoids templates especially for the use-case you envision. But that said, there are plans to improve the floating point conversion algorithms using modern techniques. What is there now is fast and reasonably accurate already, so in a sense you are already getting both of the options you want. Thanks

Vinícius dos Santos Oliveira

17 Sep 17 Sep

8 p.m.

Em qua., 16 de set. de 2020 às 11:27, Eduardo Quintana via Boost <boost@lists.boost.org> escreveu:

...

I would have liked to give users the freedom to choose the accuracy through templates. Something like:

// i want speed boost::json::parser< int > p_int;

// i want accuracy boost::json::parser< boost::multiprecision::cpp_int > p_infinity;

By this example, can we assume you only care about this flexibility at parsing level and not necessarily in the DOM-like object (json::value)? -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/

Eduardo Quintana

18 Sep 18 Sep

7:40 a.m.

...

...
// i want speed boost::json::parser< int > p_int;

// i want accuracy boost::json::parser< boost::multiprecision::cpp_int > p_infinity;

By this example, can we assume you only care about this flexibility at parsing level and not necessarily in the DOM-like object (json::value)?

That what just for illustration purposes. I see that there are several objects involved. My point is: the concept of JSON is independent of the underlying types used for representing strings and numbers. That's, in my opinion, a strong suggestion for using templates, and a solution to the "precision vs speed" conundrum. Of course a json::parser<T>, has to produce a json::value<T> and yet json::value<T> can exists independently of a json::parser<T> or json::serializer<T>. Is up to the developers to choose how the abstraction is implemented. Anyways that was just a comment and a suggestion. If I was writting a json library I would have used templates.

Richard Hodges

8:18 a.m.

On Fri, 18 Sep 2020 at 09:40, Eduardo Quintana via Boost < boost@lists.boost.org> wrote:

...

...
...
// i want speed boost::json::parser< int > p_int;

// i want accuracy boost::json::parser< boost::multiprecision::cpp_int > p_infinity;

By this example, can we assume you only care about this flexibility at parsing level and not necessarily in the DOM-like object (json::value)?

That what just for illustration purposes. I see that there are several objects involved.

My point is: the concept of JSON is independent of the underlying types used for representing strings and numbers. That's, in my opinion, a strong suggestion for using templates, and a solution to the "precision vs speed" conundrum.

Of course a json::parser<T>, has to produce a json::value<T> and yet json::value<T> can exists independently of a json::parser<T> or json::serializer<T>. Is up to the developers to choose how the abstraction is implemented.

Anyways that was just a comment and a suggestion. If I was writting a json library I would have used templates.

Vinnie might have covered this already but here's my 2c. JSON is in the main used as a language and platform agnostic data interchange format. The *de-facto* reality is that every other platform and language implementation that I can think of limits real numbers to doubles and integers to the range +-2^63, so it seems to me that going beyond that will achieve little in terms of general utility. This is the situation today. I understand the argument that this library could be an opportunity to push the boundaries and lead other languages to better practice. On the other hand, this is not the stated intent of the library - it's intent (in summary) is to meet the needs of the majority of C++ programmers who wish to interoperate with the world wide web today. Along with others here I am sure, I have significant experience in dealing with financial and cryptocurrency data interchange using JSON. The *reality* is that one learns *not to rely on the various interpretations of number values at all*. If you're sending an important value (such as a price) you very quickly learn to encode it as a JSON string representing the exact value and precision you want. In reality, JSON would be a perfectly useful data format if it did not have numeric, boolean or null types at all. For this reason, it seems to me that arguing over the precision of the API representation of these types is not likely to lead to increased utility in practice. Although I appreciate that some niche use cases would appreciate the ability, it is my impression that these will be rare and in any case can be worked around with strings. R

...

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- Richard Hodges hodges.r@gmail.com office: +442032898513 home: +376841522 mobile: +376380212

Mathias Gaunard

9:35 a.m.

On Fri, 18 Sep 2020 at 09:18, Richard Hodges via Boost <boost@lists.boost.org> wrote:

...

JSON is in the main used as a language and platform agnostic data interchange format. The *de-facto* reality is that every other platform and language implementation that I can think of limits real numbers to doubles and integers to the range +-2^63, so it seems to me that going beyond that will achieve little in terms of general utility.

This is the situation today.

That is just not true. There are many implementations of JSON under the sun, some quite low-key and homemade (it's pretty trivial after all), and quite a lot do it correctly. Also many people doing it wrong is not an argument.

...

I understand the argument that this library could be an opportunity to push the boundaries and lead other languages to better practice.

On the other hand, this is not the stated intent of the library - it's intent (in summary) is to meet the needs of the majority of C++ programmers who wish to interoperate with the world wide web today.

Along with others here I am sure, I have significant experience in dealing with financial and cryptocurrency data interchange using JSON. The *reality* is that one learns *not to rely on the various interpretations of number values at all*. If you're sending an important value (such as a price) you very quickly learn to encode it as a JSON string representing the exact value and precision you want.

Well I've been working in high-frequency trading for years, and I have the opposite experience. Representation of numbers is serious business, be them fixed- or float-precision, binary or decimal. If you can't even get that right and have to use strings, you're in for a lot of problems.

Richard Hodges

11:46 a.m.

On Fri, 18 Sep 2020 at 11:35, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:

...

On Fri, 18 Sep 2020 at 09:18, Richard Hodges via Boost <boost@lists.boost.org> wrote:

...
JSON is in the main used as a language and platform agnostic data interchange format. The *de-facto* reality is that every other platform and language implementation that I can think of limits real numbers to doubles and integers to the range +-2^63, so it seems to me that going beyond that will achieve little in terms of general utility.

This is the situation today.

That is just not true. There are many implementations of JSON under the sun, some quite low-key and homemade (it's pretty trivial after all), and quite a lot do it correctly.

Also many people doing it wrong is not an argument.

I think it's fair to say that Boost.JSON's audience might be, as you put so succinctly, "many people doing it wrong". In other words, the target users would be people looking for easy interoperation with internet-facing services and the like, rather than specialist applications.

...

...
I understand the argument that this library could be an opportunity to

...
the boundaries and lead other languages to better practice.

On the other hand, this is not the stated intent of the library - it's intent (in summary) is to meet the needs of the majority of C++

push programmers

...
who wish to interoperate with the world wide web today.

Along with others here I am sure, I have significant experience in dealing with financial and cryptocurrency data interchange using JSON. The *reality* is that one learns *not to rely on the various interpretations of number values at all*. If you're sending an important value (such as a price) you very quickly learn to encode it as a JSON string representing the exact value and precision you want.

Well I've been working in high-frequency trading for years, and I have the opposite experience. Representation of numbers is serious business, be them fixed- or float-precision, binary or decimal. If you can't even get that right and have to use strings, you're in for a lot of problems.

I think in my mind, HFT would come under the heading of "specialist application", whereas say, distributing tick data to the millions of young cryptocurrency enthusiasts connected by app, browser, python bot, etc (or consuming said data) would come under the heading of "general use". In the general case, representation of real numbers is by no means standardised and cannot be relied upon. In this case, strings are often chosen to represent numbers because then the actual precision of the price of say, BTC/USDT is not in doubt. I completely understand that this is suboptimal for HFT, but then in fairness, so is the choice of JSON as an encoding format. For example, for internal communication I might prefer FlatBuffers or even unadulterated binary if performance were the major consideration. Nevertheless, I take your point that there are some applications for which Boost.JSON would not be suitable. I don't think the authors would argue with that. What I do think is that for me, Boost is the go-to repository of functionality that is missing from the standard library. And with that in mind, Boost.JSON fills a gaping hole in the Boost suite of libraries in that it gives its consumers a tool that developers using other languages have taken for granted for years. In that sense, for me, it would be a major step forward. -- Richard Hodges hodges.r@gmail.com office: +442032898513 home: +376841522 mobile: +376380212

Mathias Gaunard

9:27 a.m.

On Tue, 15 Sep 2020 at 16:46, Andrzej Krzemienski via Boost <boost@lists.boost.org> wrote:

...

The point that I would like to raise is the tension between having relatively small types (int64_t, uint64_t, double) represent numbers in json::value on the one hand, and ECMA specification allowing arbitrarily big/precise numbers in JSON format on the other.

As far as I'm concerned any JSON library that doesn't address this is defective. It's not rocket science: just allow access to the actual string representation, don't try to convert it to a bunch of random types. This library doesn't even provide it in its low-level API and forces undesired conversion to int64/double onto users. Parsing text in general should only be in charge of segmenting the text, not synthesizing attributes from those segments.

...

In that other view, if I have objects of types `boost::multiprecision::cpp_int` my only option is to pass them as strings in JSON protocol. But I can pass any number as string anyway, so what is the use of numbers in JSON format? Uness it is just practical: you can choose to use numbers and then internal representations of JSON may be smaller.

I personally do encode my decimal numbers (which have values that are not representable as double) as numbers in JSON.

1744

Age (days ago)

1747

Last active (days ago)

List overview

Download

11 comments

6 participants

participants (6)

Andrzej Krzemienski
Eduardo Quintana
Mathias Gaunard
Richard Hodges
Vinnie Falco
Vinícius dos Santos Oliveira