
OvermindDL1 wrote:
On Sun, Dec 21, 2008 at 3:49 AM, Kirit Sælensminde <kirit.saelensminde@gmail.com> wrote: Interesting timing. If you have been watching the spirit lists, a few days ago I created a JSON parser in Spirit2x to both test for the vast speed enhancements of Spirit2x and to donate as an example. Mine is templated on the string and allows you to override the character class parsing with any Spirit supported type (which does not include Unicode yet, but will include it later, we have been talking about it, but does currently support wide_chars).
I'm not even sure that I knew there was a Spirit list. Do you have a link to yours? One complication with JSON is that although the transmission can be UTF-8, UTF-16 or UTF-32, the Unicode escaping in the strings is always UTF-16, which is why I build into a UTF-16 buffer and then convert it from there. My string class uses UTF-16 on Windows and UTF-8 on Linux. So far nearly all of the JSON that I've seen in the wild in practice uses ASCII with everything above 0x7f escaped using the \uXXXX notation. This is what my unparser does too.
One question, you have int64_t as a supported type, but from my research the Number type in the current JSON spec is a 52/12-bit floating-point type, double in other words.
I don't think that the JSON specification determines what the representation should be for any of the stored values, for example, there is no limit on the number of digits that make up the integer part of a number. It is true that JavaScript has only a floating point type though with all integer operations being emulated. Here is the number definition from RFC4627: number = [ minus ] int [ frac ] [ exp ] decimal-point = %x2E ; . digit1-9 = %x31-39 ; 1-9 e = %x65 / %x45 ; e E exp = e [ minus / plus ] 1*DIGIT frac = decimal-point 1*DIGIT int = zero / ( digit1-9 *DIGIT ) minus = %x2D ; - plus = %x2B ; + zero = %x30 ; 0
Mine is basic, a single header file, and it stuffs it all into a Value type, which is a Boost.Variant of a null_type, false_type, true_type (empty structs I made, those are specified as types in the JSON standard, not bool's for the true/false), double, StringType (whatever the templated String type is), Object (an boost::unordered_map since the JSON standard stats that it is an unordered map), and Array (which is just an std::vector). All the types are not used in any special way and just changing their declaration should keep compatibility with the rest of the code. My code just returns the Value directly, not fancy wrapper for pulling things out, but I left that open, would just require a one line change of code to wrap it, but figured I might just do open functions instead, allowing for a class style wrapper later, that way it could be exported in a C style as well.
It seems to me that the parser is always going to be quite closely coupled with its output type, although some sort of skeleton parser could be envisaged that would be able to talk to an API for building the internal JSON representation. This is one of the reasons I posit mine as a JSON library with the parser as just one part of it. I suppose it ought to be possible though to decompose the parser enough that the components could be used to write to a number of different internal representations.
I did not make mine to be a 'real' library though, as stated, just an example code, but I did try to make it as accurate to the spec as I saw.
As soon as karma is finished for Spirit2x I was planning to make a writer for my Value object as well, both as a condensed (efficient) printer and a pretty printer.
I have a pretty printer, which I'm not unhappy with, but also think it would be better in some ways to be able to separate out better the pretty printing strategy from the structure walking.
As for comments, in my version it would be simple to change, the whitespace skipping parser could easily be extended to catch other things, such as comments, which would always be saved out. As stated, was just making it as an example of the magic of Spirit2x.
I'm not sure that I can see where the comments would be stored in my structure so that it made any sense. To have the parser skip them as whitespace is certainly doable. For that I guess the JavaScript grammar is the place to look for a specification. K