
Hi, Since there's a lot of discussion about XML parsers going on at the moment, I thought I'd mention the axemill XML parser that I've been working on. Development stalled a few years ago, but I just revamped it to work with Boost 1.34, and added a few more features. It's not ready yet, but I would like to submit it to boost when it is. For those interested, it's at http://www.sf.net/projects/axemill It's a full validating parser, and builds a model of the DTD as it gets parsed. It then uses that when building the DOM --- elements don't store their name directly, instead they store a reference to the corresponding entry in the model, which provides the name and additional information like the permitted attributes and the content model. This means that multiple elements with the same name only incur a 4-byte overhead (on my system) rather than storing the full name. There's not currently any support for xpath, xslt or schemas, but they should be able to be added without too much hassle. It is possible to reuse the same model for multiple documents, and the structure allows for validating a freshly-built document as it is constructed --- it won't allow you to create an element not in the model, or add an attribute to an element that the model doesn't specify. That's all the on-the-fly validation it does at the moment, but it wouldn't be too hard to add attribute value checking, and element content checking (though that might be better suited as a post-construction phase). Nodes within the tree are referenced by shared_ptr, though you can construct standalone nodes. Child nodes use a boost::variant to store the possible data types (comment, text, cdata, pi, element), and the interface allows for iteration through the child elements. The intention is to allow separate iteration through all the nodes as well as through the child elements, and also provide a content() function to retrieve the text content. Internally everything is processed using 32-bit Unicode, so input and output has to be converted. I provide convertTo<encoding> and convertFrom<encoding> functions to do this, as well as overloads of some functions that take std::string parameters and convert them using the "default" encoding (currently ascii). There's scope there for reading data of the web using the URIs from public identifiers to retrieve the DTD, for example, but currently the only URI scheme supported is file://. Like I said, it's not ready yet, but it might provide an interesting alternative direction. Anthony -- Anthony Williams Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL