Re: [boost] Interest in a Boost.JSON library?

22 Dec 2008

      OvermindDL1 wrote:
...
On Sun, Dec 21, 2008 at 3:49 AM, Kirit Sælensminde
<kirit.saelensminde@gmail.com> wrote:
Interesting timing.  If you have been watching the spirit lists, a few
days ago I created a JSON parser in Spirit2x to both test for the vast
speed enhancements of Spirit2x and to donate as an example.  Mine is
templated on the string and allows you to override the character class
parsing with any Spirit supported type (which does not include Unicode
yet, but will include it later, we have been talking about it, but
does currently support wide_chars).
I'm not even sure that I knew there was a Spirit list. Do you have a 
link to yours?

One complication with JSON is that although the transmission can be 
UTF-8, UTF-16 or UTF-32, the Unicode escaping in the strings is always 
UTF-16, which is why I build into a UTF-16 buffer and then convert it 
from there. My string class uses UTF-16 on Windows and UTF-8 on Linux.

So far nearly all of the JSON that I've seen in the wild in practice 
uses ASCII with everything above 0x7f escaped using the \uXXXX notation. 
This is what my unparser does too.
...
One question, you have int64_t as a supported type, but from my
research the Number type in the current JSON spec is a 52/12-bit
floating-point type, double in other words.
I don't think that the JSON specification determines what the 
representation should be for any of the stored values, for example, 
there is no limit on the number of digits that make up the integer part 
of a number. It is true that JavaScript has only a floating point type 
though with all integer operations being emulated.

Here is the number definition from RFC4627:

          number = [ minus ] int [ frac ] [ exp ]
          decimal-point = %x2E       ; .
          digit1-9 = %x31-39         ; 1-9
          e = %x65 / %x45            ; e E
          exp = e [ minus / plus ] 1*DIGIT
          frac = decimal-point 1*DIGIT
          int = zero / ( digit1-9 *DIGIT )
          minus = %x2D               ; -
          plus = %x2B                ; +
          zero = %x30                ; 0
...
Mine is basic, a single
header file, and it stuffs it all into a Value type, which is a
Boost.Variant of a null_type, false_type, true_type (empty structs I
made, those are specified as types in the JSON standard, not bool's
for the true/false), double, StringType (whatever the templated String
type is), Object (an boost::unordered_map since the JSON standard
stats that it is an unordered map), and Array (which is just an
std::vector).  All the types are not used in any special way and just
changing their declaration should keep compatibility with the rest of
the code.  My code just returns the Value directly, not fancy wrapper
for pulling things out, but I left that open, would just require a one
line change of code to wrap it, but figured I might just do open
functions instead, allowing for a class style wrapper later, that way
it could be exported in a C style as well.
It seems to me that the parser is always going to be quite closely 
coupled with its output type, although some sort of skeleton parser 
could be envisaged that would be able to talk to an API for building the 
internal JSON representation.

This is one of the reasons I posit mine as a JSON library with the 
parser as just one part of it.

I suppose it ought to be possible though to decompose the parser enough 
that the components could be used to write to a number of different 
internal representations.
...
I did not make mine to be a 'real' library though, as stated, just an
example code, but I did try to make it as accurate to the spec as I
saw.
As soon as karma is finished for Spirit2x I was planning to make a
writer for my Value object as well, both as a condensed (efficient)
printer and a pretty printer.
I have a pretty printer, which I'm not unhappy with, but also think it 
would be better in some ways to be able to separate out better the 
pretty printing strategy from the structure walking.
...
As for comments, in my version it would be simple to change, the
whitespace skipping parser could easily be extended to catch other
things, such as comments, which would always be saved out.  As stated,
was just making it as an example of the magic of Spirit2x.
I'm not sure that I can see where the comments would be stored in my 
structure so that it made any sense. To have the parser skip them as 
whitespace is certainly doable. For that I guess the JavaScript grammar 
is the place to look for a specification.

K