Re: [Boost-users] Fast XML Parser

14 Dec 2008

...
Thanks for responding. I've never used XML before and have been itching to 
learn XML lately.
http://www.w3.org/TR/REC-xml/#NT-prolog

If you are really into this for speed, you might want to try writing your
own code generator from even something simple like the spec document.
It turns out you can grep and sed this quite well and get a decent skeleton.
There are of course plenty of code generators and I'm hoping someone with
experience will comment.
I ended up with code suited to my immediate needs with each state having its
own method but most of the bodies I had to fill in by hand
but the code is was pretty simple for what I needed.

I ended up with a bunch of stuff like this that presumably would inline fairly
well. I created maps for the char classes etc but you get the
idea. 

//20    CData      ::=    Char* - Char* '>'Char*))
parse_api_type state_CData(STATESIG)
//22    prolog         ::=   XMLDecl?Misc* doctypedeclMisc*)?
//[22]    prolog         ::=  XMLDecl? Misc* (doctypedecl Misc*)?
parse_api_type state_prolog(STATESIG)
{
 ds->enter(22);
  state_XMLDecl(ps,ds); //return false;
  while (state_Misc(ps,ds));
  while (state_doctypedecl(ps,ds)) while (state_Misc(ps,ds)); 
 ds->exit(22);  

On the few test cases I ran, mostly from here,

http://www.sec.gov/Archives/edgar/xbrl.html

it seemed to perform quite well for what I was after. 

Of course there are plenty of SOAP or RSS type examples of
things you can do with XML  but I would
point to some others that may be of immediate specific interest. 
As I wasn't doing much over Thanksgiving, I thought I would put 
in a few comments in favor of computers to these folks,

http://www.ots.treas.gov/?p=OpenComment&Topic_id=c0316a9e-1e0b-8562-ebd0-1ae5298909e2

http://www.federalreserve.gov/generalinfo/FOIA/index.cfm?doc_id=OP-1338&doc_ver=1&ShowAll=Yes

( essentially the same tirade at both locations).

I summarized some existing computer facilities ( NCBI has some xml options
and the FDA AERS is IIRC SGML ) and make some suggestions for new XML databases. 
And of course their comment window is still open if you have an agenda to promote too. LOL.

Mike Marchywka
...
To: boost-users@lists.boost.org
From: jeff_j_dunlap@yahoo.com
Date: Sun, 14 Dec 2008 15:20:50 -0600
Subject: Re: [Boost-users] Fast XML Parser
"Alan M. Carroll"  wrote in message 
news:7.0.0.16.2.20081214143626.00ef62c0@network-geographics.com...
...
Let me start by saying that I am very happy with rapidXML. In fact, we 
have converted most of our XML parsing from various other libraries to 
rapidXML and have committed to a complete conversion over time (i.e., 
using rapidXML as our only XML parsing library, including replacing 
Expat). We use XML almost exclusively as a serialization format and 
rapidXML is excellent for that use case.
*However*, I would not recommend rapidXML if you are going to do 
non-trivial editing of in-place DOM trees. It is not, IMHO, well suited 
for that. If you're going to do a lot of editing, parsing speed shouldn't 
be your primary concern. You will want a much richer API as you go on and 
rapidXML just doesn't provide that. You could build one on top of 
rapidXML, but why bother when there's things just as good already out 
there?
That said, I have some wrapper code that makes rapidXML even nicer, if 
you're interested, but it doesn't perform any edit, delete, or add 
operations since my code base does not perform any of those.
Hi Alan,
Thanks for responding. I've never used XML before and have been itching to 
learn XML lately.
_________________________________________________________________
Suspicious message? There’s an alert for that. 
http://windowslive.com/Explore/hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_broad2_...