
On Wed, Dec 10, 2008 at 1:23 PM, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:
Themis Vassiliadis wrote:
I have been working in a C++ library like Apache Digester (http://commons.apache.org/digester). I'm intending to convert it following boost policies described in Requirements and Guidelines.
What are the chances of it become a Boost library ?
Personally I would like to see something like RapidXML in Boost.
It seems that Apache Digester provides an element matching infrastructure. This could be useful, as manually iterating through the parse tree that something like RapidXML generates can be a bit tiresome. It should probably be layered on top of a lower-level XML parser.
I have a low level iterator-based parser here: http://svn.int64.org/viewvc/int64/xml/ The design I've been taking is something like this: parser.hpp (xml::parser): the lowest level. Given two UTF-32 compatible forward iterators, it returns one of (ok, done, need_more, error), a node type (element/xmldecl/etc.), and an iterator range. This parser performs no allocations, and as such does minimal structural checking. It does however have full character validation, if you so choose (by a template parameter). Really this does only slightly more than a lexer, and is available if you want need top performance and don't need full XML compliance and validation. reader.hpp (xml::reader): the next level. A UTF-32 push parser that is fully XML 1.0 and 1.1 compliant, capable of validating the document, tracking line/column numbers, entity substitution, and other normal things you'd expect from a parser. document.hpp (xml::document): a full in-memory document. A modifiable version, and constant version which uses an arena allocator to stay as compact as possible. As of now, only xml::parser is usable- everything but DTD parsing is complete. I have been really busy these past few months and haven't got a chance to complete it. The main goal I had when beginning this is to have something I/O agnostic, that can drop out when it finds an incomplete stream and be resumed later. It was really important that it work just as fantastically with parsing from memory, blocking I/O, or async I/O. It should also be very performant, which it is: the parser being very lightweight, UTF-8 decoding is actually a huge bottleneck in my tests which led me to allow the parser (via template parameter) to work directly with UTF-8 if you don't require full compliance. -- Cory Nelson