
Hi, I have been working in a C++ library like Apache Digester (http://commons.apache.org/digester). I'm intending to convert it following boost policies described in Requirements and Guidelines. What are the chances of it become a Boost library ? -- Themis Vassiliadis

On Tue, 09 Dec 2008 22:05:55 +0100, Themis Vassiliadis <tvassiliadis@gmail.com> wrote:
Hi,
I have been working in a C++ library like Apache Digester (http://commons.apache.org/digester). I'm intending to convert it following boost policies described in Requirements and Guidelines.
What are the chances of it become a Boost library ?
Is your digester a XML library or built on top of a XML library? Boris

on Tue Dec 09 2008, "Themis Vassiliadis" <tvassiliadis-AT-gmail.com> wrote:
Hi,
I have been working in a C++ library like Apache Digester (http://commons.apache.org/digester). I'm intending to convert it following boost policies described in Requirements and Guidelines.
What are the chances of it become a Boost library ?
It's hard to say without seeing some of what you've done. Boost really could use an XML library; if your work is good I'm sure many will look on it favorably. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Themis Vassiliadis wrote:
What are the chances of it become a Boost library ?
Take a look at the Boost guidelines http://www.boost.org/development/requirements.html to see how to adapt your code. Manuel Fiorelli

Themis Vassiliadis wrote:
Hi,
I have been working in a C++ library like Apache Digester (http://commons.apache.org/digester). I'm intending to convert it following boost policies described in Requirements and Guidelines.
What are the chances of it become a Boost library ?
Digester is a curious library, extremely well suited for some domains, but largely unusable in many. Let's not forget that it was originally developed as part of Tomcat, to translate the configuration into an object tree. I think it might be accepted, but if you don't take care to clearly communicate the limitation of the digester approach, you risk many people expecting the wrong thing. Sebastian

Sebastian Redl wrote:
Digester is a curious library, extremely well suited for some domains, but largely unusable in many. Let's not forget that it was originally developed as part of Tomcat, to translate the configuration into an object tree.
There is also the property_tree library (I don't know what its status within Boost is) that is quite similar to that, except it loads the whole document into memory rather than invoking some callbacks.

Themis Vassiliadis wrote:
I have been working in a C++ library like Apache Digester (http://commons.apache.org/digester). I'm intending to convert it following boost policies described in Requirements and Guidelines.
What are the chances of it become a Boost library ?
Personally I would like to see something like RapidXML in Boost. It seems that Apache Digester provides an element matching infrastructure. This could be useful, as manually iterating through the parse tree that something like RapidXML generates can be a bit tiresome. It should probably be layered on top of a lower-level XML parser. I have found this example at http://www.javaworld.com/javaworld/jw-10-2002/jw-1025-opensourceprofile.html... : public class SampleDigester { public void run() throws IOException, SAXException { Digester digester = new Digester(); // This method pushes this (SampleDigester) class to the Digesters // object stack making its methods available to processing rules. digester.push(this); // This set of rules calls the addDataSource method and passes // in five parameters to the method. digester.addCallMethod("datasources/datasource", "addDataSource", 5); digester.addCallParam("datasources/datasource/name", 0); digester.addCallParam("datasources/datasource/driver", 1); digester.addCallParam("datasources/datasource/url", 2); digester.addCallParam("datasources/datasource/username", 3); digester.addCallParam("datasources/datasource/password", 4); // This method starts the parsing of the document. digester.parse("datasource.xml"); } // Example method called by Digester. public void addDataSource(String name, String driver, String url, String userName, String password) { // create DataSource and add to collection... } } It parses XML like this: <datasources> <datasource> <name>HsqlDataSource</name> <driver>org.hsqldb.jdbcDriver</driver> <url>jdbc:hsqldb:hsql://localhost</url> <username>sa</username> <password></password> </datasource> .... This seems to be worthwhile, but I would like to think that we could improve on the details. In particular the positional-parameter numbering looks a bit clunky and it would be nice to avoid run-time parsing of the pattern expressions. I also note that the patterns look like xpath epressions, but aren't. Please tell us more about your proposal. Cheers, Phil.

On Wed, Dec 10, 2008 at 1:23 PM, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:
Themis Vassiliadis wrote:
I have been working in a C++ library like Apache Digester (http://commons.apache.org/digester). I'm intending to convert it following boost policies described in Requirements and Guidelines.
What are the chances of it become a Boost library ?
Personally I would like to see something like RapidXML in Boost.
It seems that Apache Digester provides an element matching infrastructure. This could be useful, as manually iterating through the parse tree that something like RapidXML generates can be a bit tiresome. It should probably be layered on top of a lower-level XML parser.
I have a low level iterator-based parser here: http://svn.int64.org/viewvc/int64/xml/ The design I've been taking is something like this: parser.hpp (xml::parser): the lowest level. Given two UTF-32 compatible forward iterators, it returns one of (ok, done, need_more, error), a node type (element/xmldecl/etc.), and an iterator range. This parser performs no allocations, and as such does minimal structural checking. It does however have full character validation, if you so choose (by a template parameter). Really this does only slightly more than a lexer, and is available if you want need top performance and don't need full XML compliance and validation. reader.hpp (xml::reader): the next level. A UTF-32 push parser that is fully XML 1.0 and 1.1 compliant, capable of validating the document, tracking line/column numbers, entity substitution, and other normal things you'd expect from a parser. document.hpp (xml::document): a full in-memory document. A modifiable version, and constant version which uses an arena allocator to stay as compact as possible. As of now, only xml::parser is usable- everything but DTD parsing is complete. I have been really busy these past few months and haven't got a chance to complete it. The main goal I had when beginning this is to have something I/O agnostic, that can drop out when it finds an incomplete stream and be resumed later. It was really important that it work just as fantastically with parsing from memory, blocking I/O, or async I/O. It should also be very performant, which it is: the parser being very lightweight, UTF-8 decoding is actually a huge bottleneck in my tests which led me to allow the parser (via template parameter) to work directly with UTF-8 if you don't require full compliance. -- Cory Nelson

Cory, On Wednesday 10 December 2008 15:13:32 Cory Nelson wrote:
I have a low level iterator-based parser here: http://svn.int64.org/viewvc/int64/xml/
The design I've been taking is something like this:
parser.hpp (xml::parser): the lowest level. Given two UTF-32 compatible forward iterators, it returns one of (ok, done, need_more, error), a node type (element/xmldecl/etc.), and an iterator range. This parser performs no allocations, and as such does minimal structural checking. It does however have full character validation, if you so choose (by a template parameter). Really this does only slightly more than a lexer, and is available if you want need top performance and don't need full XML compliance and validation.
reader.hpp (xml::reader): the next level. A UTF-32 push parser that is fully XML 1.0 and 1.1 compliant, capable of validating the document, tracking line/column numbers, entity substitution, and other normal things you'd expect from a parser.
document.hpp (xml::document): a full in-memory document. A modifiable version, and constant version which uses an arena allocator to stay as compact as possible.
As of now, only xml::parser is usable- everything but DTD parsing is complete. I have been really busy these past few months and haven't got a chance to complete it. The main goal I had when beginning this is to have something I/O agnostic, that can drop out when it finds an incomplete stream and be resumed later. It was really important that it work just as fantastically with parsing from memory, blocking I/O, or async I/O.
It should also be very performant, which it is: the parser being very lightweight, UTF-8 decoding is actually a huge bottleneck in my tests which led me to allow the parser (via template parameter) to work directly with UTF-8 if you don't require full compliance.
Thanks for the link! I would love to see something like this added as a Boost library. It is lightweight, and it looks very useful already (FYI, with a few minor mods, I ran the test.cpp application on Linux). I like the idea of having policies to tailor the fidelity of the parser. It's nice to not have to pay for what you don't need. It seems that your parser could be put up for review. If it is accepted, other layers could be added over time. I would really like to see a data-binding layer complete with schema validation down the road. Thanks, Justin
participants (9)
-
Boris
-
Cory Nelson
-
David Abrahams
-
KSpam
-
Manuel Fiorelli
-
Mathias Gaunard
-
Phil Endecott
-
Sebastian Redl
-
Themis Vassiliadis