There are two kinds of incremental parsers: push parsers (SAX) and pull parsers (approximately StAX.) Briefly put, push parsers traverses the input automatically and generates events for each token it finds, whereas pull parsers traverses the input manually like an iterator and the current token can be queried.
My library is kind of a push-pull framework. You can request the parser to parse one event (one event is considered the smallest parse the input format is capable of) and the parser then pushes the result to the output handler as one or more writes. Trouble is, where the parser stops parsing is format-dependent. This kind of limits the pull framework to just "event-loop" style parsing right now.
Pull parsers have some significant advantages over push parser:
* It is straight-forward to implement a push parser on top of a pull parser. This involves a loop and a switch statement (see [1] for a complete example.) Going in the other direction involves the use of coroutines; most likely stateful coroutines.
Most of these features are not currently available in cppdatalib because individual tokens are not accessible as a pull parser. If I refactored a few things, I might be able to get a full pull parser framework.
* Contextual parsing can be done directly, unlike push parsers where you have to maintain contextual state in the event handler.
Right now, contextual parsing is implemented in a base class of the output handler, so it's still isolated from the end user. Kind of hackish, though, since the parser queries the output handler for the structure of the data it's already read.
* Push parsers can be used directly in Boost.Serialization archives.
* Pull parsers are composable. For instance, you could insert a URL pull parser directly into an HTTP pull parser.
Composability is a big issue with push parsers, so removing obstacles to
that would greatly simplify some things. For certain types of information,
though, it doesn't seem like composition is important.
On Jan 13, 2018 5:05 AM, "Bjorn Reese via Boost"
community. It is basically an event-driven parsing/serialization library for common formats using a standard internal representation or simple pass-through conversions. Would anyone be interested in something like this being added to Boost?
There are two kinds of incremental parsers: push parsers (SAX) and pull parsers (approximately StAX.) Briefly put, push parsers traverses the input automatically and generates events for each token it finds, whereas pull parsers traverses the input manually like an iterator and the current token can be queried. Pull parsers have some significant advantages over push parser: * It is straight-forward to implement a push parser on top of a pull parser. This involves a loop and a switch statement (see [1] for a complete example.) Going in the other direction involves the use of coroutines; most likely stateful coroutines. * Contextual parsing can be done directly, unlike push parsers where you have to maintain contextual state in the event handler. * Push parsers can be used directly in Boost.Serialization archives. * Pull parsers are composable. For instance, you could insert a URL pull parser directly into an HTTP pull parser. For a pull parser framework see: https://github.com/breese/trial.protocol The documentation is a bit old though. [1] http://breese.github.io/trial/protocol/trial_protocol/json/t utorial/push_parser.html _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman /listinfo.cgi/boost