data:image/s3,"s3://crabby-images/f5afd/f5afdc76ab1aa14dd12b9977adf52998e01c35fa" alt=""
I've been looking at a bunch of C++ libraries for parsing XML and RapidXML
looks very interesting to me. They have some very impressive benchmarks on
their site due to it being a in-situ parser:
http://rapidxml.sourceforge.net/ (way faster than most other XML parsers).
I think that this is something that can be most useful for many of us. I
know that existing boost libs can get this done but probably more complex
than an exclusive lib for XML parsing.
Anyway, I cannot find samples anywhere and I am not experienced enough to
know how to use the lib. This is working code that I have so far (important
comments include):
#include "../test_utils.hpp"
#include "../../rapidxml.hpp"
#include "../../rapidxml_utils.hpp"
#include "../../rapidxml_print.hpp"
#include <string>
#include <vector>
#include <fstream>
#include <stdexcept>
#include <cstring>
#include <sstream>
using namespace std;
void test()
{
char xml[] = "<?xml version=\"1.0\" encoding=\"latin-1\"?>"
"<book>"
"</book>";
//Parse the original document
rapidxml::xml_document<> doc; // character type defaults to char
doc.parse<0>(xml); // 0 means default parse flags
// doc object is now a root of DOM tree containing representation of the
parsed XML.
// Because all RapidXml interface is contained inside namespace rapidxml,
users must either bring contents of this namespace into scope, or fully
qualify all the names.
// Class xml_document represents a root of the DOM hierarchy. By means of
public inheritance, it is also an xml_node and a memory_pool.
// Template parameter of xml_document::parse() function is used to specify
parsing flags, with which you can fine-tune behaviour of the parser.
// Note that flags must be a compile-time constant.
// To access the DOM tree, use methods of xml_node and xml_attribute
classes:
std::cout << "Name of my first node is: " << doc.first_node()->name() <<
"\n"; // Name of my first node is: book
// This works but is less descriptive than 'Create proper node type' below
//rapidxml::xml_node<> *node;
// Create proper node type
rapidxml::xml_node<char> *node = 0;
node = doc.allocate_node(/*rapidxml::node_declaration*/
rapidxml::node_element, "author", "John Doe");
doc.first_node()->append_node(node);
node = doc.allocate_node(rapidxml::node_element, "author", "Jane Doe");
doc.first_node()->append_node(node);
node = doc.allocate_node(rapidxml::node_element, "author", "Bob Doe");
doc.first_node()->append_node(node);
std::stringstream ss;
ss << *doc.first_node();
std::string result_xml = ss.str();
std::cout <
data:image/s3,"s3://crabby-images/9438b/9438b8194056a3a30fdaf63e805eca7aa72c38ee" alt=""
Let me start by saying that I am very happy with rapidXML. In fact, we have converted most of our XML parsing from various other libraries to rapidXML and have committed to a complete conversion over time (i.e., using rapidXML as our only XML parsing library, including replacing Expat). We use XML almost exclusively as a serialization format and rapidXML is excellent for that use case. *However*, I would not recommend rapidXML if you are going to do non-trivial editing of in-place DOM trees. It is not, IMHO, well suited for that. If you're going to do a lot of editing, parsing speed shouldn't be your primary concern. You will want a much richer API as you go on and rapidXML just doesn't provide that. You could build one on top of rapidXML, but why bother when there's things just as good already out there? That said, I have some wrapper code that makes rapidXML even nicer, if you're interested, but it doesn't perform any edit, delete, or add operations since my code base does not perform any of those. At 09:33 PM 12/13/2008, Jeff Dunlap wrote:
I've been looking at a bunch of C++ libraries for parsing XML and RapidXML looks very interesting to me. They have some very impressive benchmarks on their site due to it being a in-situ parser: http://rapidxml.sourceforge.net/ (way faster than most other XML parsers). I think that this is something that can be most useful for many of us. I know that existing boost libs can get this done but probably more complex than an exclusive lib for XML parsing.
Anyway, I cannot find samples anywhere and I am not experienced enough to know how to use the lib. This is working code that I have so far (important comments include): I just don't have the skill to figure out how to use this. If anyone can provide some beginner snippets on how to use this library, it would be much appreciated. Idiot proof snippets on how to add, edit, delete with comments would be awsome.
data:image/s3,"s3://crabby-images/f5afd/f5afdc76ab1aa14dd12b9977adf52998e01c35fa" alt=""
"Alan M. Carroll"
Let me start by saying that I am very happy with rapidXML. In fact, we have converted most of our XML parsing from various other libraries to rapidXML and have committed to a complete conversion over time (i.e., using rapidXML as our only XML parsing library, including replacing Expat). We use XML almost exclusively as a serialization format and rapidXML is excellent for that use case.
*However*, I would not recommend rapidXML if you are going to do non-trivial editing of in-place DOM trees. It is not, IMHO, well suited for that. If you're going to do a lot of editing, parsing speed shouldn't be your primary concern. You will want a much richer API as you go on and rapidXML just doesn't provide that. You could build one on top of rapidXML, but why bother when there's things just as good already out there?
That said, I have some wrapper code that makes rapidXML even nicer, if you're interested, but it doesn't perform any edit, delete, or add operations since my code base does not perform any of those.
Hi Alan, Thanks for responding. I've never used XML before and have been itching to learn XML lately. I figured my first task would be to create a web based log allowing web visitors to comment on various articles I have posted on my website. I know that user comments can be stored into a database but I'd like a simpler approach since there are very few comments per article and I can create an XML file for each article's comments. To me, this seems like a simpler, easier to maintain approach than setting up a database for such a trivial task. After looking at the various RapidXML it seems more complex to use than I had initially thought, especially without beginner type samples. I'm starting to think that I should just put visitor comments into a database instead of banging my head with XML. What are your thoughts? Best Regards, Jeff -- ELKNews FREE Edition - Empower your News Reader! http://www.atozedsoftware.com
data:image/s3,"s3://crabby-images/758ed/758ed636272ddc947a4ce1398eb6dee6f687ebf4" alt=""
Thanks for responding. I've never used XML before and have been itching to learn XML lately.
http://www.w3.org/TR/REC-xml/#NT-prolog If you are really into this for speed, you might want to try writing your own code generator from even something simple like the spec document. It turns out you can grep and sed this quite well and get a decent skeleton. There are of course plenty of code generators and I'm hoping someone with experience will comment. I ended up with code suited to my immediate needs with each state having its own method but most of the bodies I had to fill in by hand but the code is was pretty simple for what I needed. I ended up with a bunch of stuff like this that presumably would inline fairly well. I created maps for the char classes etc but you get the idea. //20 CData ::= Char* - Char* '>'Char*)) parse_api_type state_CData(STATESIG) //22 prolog ::= XMLDecl?Misc* doctypedeclMisc*)? //[22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? parse_api_type state_prolog(STATESIG) { ds->enter(22); state_XMLDecl(ps,ds); //return false; while (state_Misc(ps,ds)); while (state_doctypedecl(ps,ds)) while (state_Misc(ps,ds)); ds->exit(22); On the few test cases I ran, mostly from here, http://www.sec.gov/Archives/edgar/xbrl.html it seemed to perform quite well for what I was after. Of course there are plenty of SOAP or RSS type examples of things you can do with XML but I would point to some others that may be of immediate specific interest. As I wasn't doing much over Thanksgiving, I thought I would put in a few comments in favor of computers to these folks, http://www.ots.treas.gov/?p=OpenComment&Topic_id=c0316a9e-1e0b-8562-ebd0-1ae5298909e2 http://www.federalreserve.gov/generalinfo/FOIA/index.cfm?doc_id=OP-1338&doc_ver=1&ShowAll=Yes ( essentially the same tirade at both locations). I summarized some existing computer facilities ( NCBI has some xml options and the FDA AERS is IIRC SGML ) and make some suggestions for new XML databases. And of course their comment window is still open if you have an agenda to promote too. LOL. Mike Marchywka
To: boost-users@lists.boost.org From: jeff_j_dunlap@yahoo.com Date: Sun, 14 Dec 2008 15:20:50 -0600 Subject: Re: [Boost-users] Fast XML Parser
"Alan M. Carroll" wrote in message news:7.0.0.16.2.20081214143626.00ef62c0@network-geographics.com...
Let me start by saying that I am very happy with rapidXML. In fact, we have converted most of our XML parsing from various other libraries to rapidXML and have committed to a complete conversion over time (i.e., using rapidXML as our only XML parsing library, including replacing Expat). We use XML almost exclusively as a serialization format and rapidXML is excellent for that use case.
*However*, I would not recommend rapidXML if you are going to do non-trivial editing of in-place DOM trees. It is not, IMHO, well suited for that. If you're going to do a lot of editing, parsing speed shouldn't be your primary concern. You will want a much richer API as you go on and rapidXML just doesn't provide that. You could build one on top of rapidXML, but why bother when there's things just as good already out there?
That said, I have some wrapper code that makes rapidXML even nicer, if you're interested, but it doesn't perform any edit, delete, or add operations since my code base does not perform any of those.
Hi Alan,
Thanks for responding. I've never used XML before and have been itching to learn XML lately.
_________________________________________________________________ Suspicious message? There’s an alert for that. http://windowslive.com/Explore/hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_broad2_...
data:image/s3,"s3://crabby-images/f5afd/f5afdc76ab1aa14dd12b9977adf52998e01c35fa" alt=""
"Alan M. Carroll"
*However*, I would not recommend rapidXML if you are going to do non-trivial editing of in-place DOM trees. It is not, IMHO, well suited for that. If you're going to do a lot of editing, parsing speed shouldn't be your primary concern. You will want a much richer API as you go on and rapidXML just doesn't provide that. You could build one on top of rapidXML, but why bother when there's things just as good already out there?
That said, I have some wrapper code that makes rapidXML even nicer, if you're interested, but it doesn't perform any edit, delete, or add operations since my code base does not perform any of those.
Alan, I would be very interested at looking at your wrapper. I found that the file ../test_interface/main.cpp demonstrates how to add, edit, and remove nodes. But I have not found anything on how to navigate and find information in nodes. Do you know how this is done? Thanks again, Jeff -- ELKNews FREE Edition - Empower your News Reader! http://www.atozedsoftware.com
data:image/s3,"s3://crabby-images/a3b68/a3b680e52242820a95290a5e848ea441e04ef0f5" alt=""
On Sun, Dec 14, 2008 at 9:43 PM, Alan M. Carroll
Let me start by saying that I am very happy with rapidXML. In fact, we have converted most of our XML parsing from various other libraries to rapidXML and have committed to a complete conversion over time (i.e., using rapidXML as our only XML parsing library, including replacing Expat). We use XML almost exclusively as a serialization format and rapidXML is excellent for that use case.
Yes, rapidXML is quite nice. It was inspired by pugixml http://code.google.com/p/pugixml/, which is even nicer becaue it provides an xpath support. I personally find rapidXML not as useful/nice without xpath support. It's also quite easy to migrate from either one library to the other regards jose
participants (4)
-
Alan M. Carroll
-
Jeff Dunlap
-
Jose
-
Mike Marchywka