Re: [Boost-users] spirit and xml

13 Feb 2007


      Boris Kolpackov wrote:
...
Hi Abir,
abir basak <abirbasak@gmail.com> writes:
...
Now I am looking to use spirit for parsing an specific xml file ( w3c
inkml file). So my intension is not to have a generic xml parser, rather
than a specific xml parser (which also have some BNF grammar) . Anyone
had used spirit for domain specific xml parsing?
Trust me you don't want to go this route. Parsing XML is a lot more
than finding opening and closing tags. To implement a conforming XML
parser you will need to handle namespaces, entity references, CDATA,
etc. This is a lot harder to get right than most people think.
Yes I know the full xml grammar is really hard to implement. I had a 
tough time to implement it in ANTLR :(
Here my intension is not to use full xml grammar, and make a subset of 
it, and test how it performs esp when I know what are all tags & what 
attribute they can contain. So a generalized validation is also not 
needed, as the grammar will validate the file.
Moreover the file is not fully xml, rather also contains BNF grammar 
(like SVG or the one I had given as example).
I will surely use a full phased xml parser, if the situation demands so. 
But now I am in a mood to experiment with this particular subset of xml 
(a w3c format known as InkML, or even a subset of inkml).
...
The only time it makes sense to have a domain specific XML parser is
when you have control over all your XML instances and can make sure
that only a subset of XML 1.0 is used. This is normally done for
performance reasons.
Yes, the grammar of the file format is specific, just like xhtml or 
mathml doesn't need to match all nodes.
...
...
I believe using spirit will make it faster.
Highly unlikely since most of the XML parsers are hand-coded.
Not sure why! I  always had specific xml parsers in Antlr (the highly 
used language recognition tool) faster than the generic one.
...
...
Also I am interested to
parse only a portion of the whole document at a time, and generate data
  from that portion only, rather generating data for whole DOM (The
files are large, 4-20 MB typically)
my xml file is something like,
[...]
note that inside <trace> the grammar is a BNF (comma sep float pairs
mostly)
You can use a SAX2 parser (e.g., Expat or Xerces-C++) to handle XML and
then use Spirit-based parser to handle the data.
At present I am using Qt Sax parser. That is a good one.
This one is a thought specific to this particular task.
...
hth,
-boris
And thanks for suggestions ...


-- 
Abir Basak, Member IEEE
Software Engineer, Read Ink Technologies
B. Tech, IIT Kharagpur
email: abir@abirbasak.com
homepage: www.abirbasak.com