
On Wed, September 6, 2006 7:30 pm, Stefan Seefeld wrote:
Have you followed the discussions around my proposal for an XML API in boost that I implemented on top of libxml2 (http://xmlsoft.org/) ?
I wasn't around for the early discussions, but have caught up on them now. It's an interesting discussion, though I'm not sure how relevant it is. I'll elaborate later.
I think that starting a new implementation from scratch is the wrong way to approach this (rather big) topic.
This in particular since an 'XML library' shouldn't just provide ways to de- and encode XML documents into generic tree structures, but instead needs to provide quite a substantional amount of functionality in order to be considered complete (even if you approach this in a modular way). As an example, imagine querying your DOM-like structure with an XPath expression. Think about all this does involve, from regular expression handling, over XPath pattern matching, http lookup, entity handling, unicode, etc., etc.
This is why I don't think that you should think about such a project one step at a time (e.g. the 'XML reading side of things').
It seems to me that your earlier proposal was mainly about a few API specifications, that were then supposed to be implemented somehow - preferably on top of an existing XML library, in order to avoid reinventing the wheel. This idea certainly has a lot of merit, but it also has some distinct disadvantages. First, an API specification is nice for standardization, but not very usable within the context of Boost. In order to be useful, there must be at least one implementation of the API. Otherwise, the specification is worth nothing to the end user. This implementation must exist within Boost, i.e. it must be completely contained within Boost. Libraries like Regex and Iostreams offer enhanced functionality if certain external libraries are available, but they will work without them, too. Obviously, the Xml library could not work without the external XML implementation if it is just a wrapper around it. This means that, if the library is a wrapper around an external one, the external library (let's for argument's sake assume libxml2, which seems to bring less licensing trouble compared to Xerces, the only other sufficiently complete XML library I can think of) must be distributed with Boost. What does this entail? The library must build as part of Boost. I haven't checked, but I assume libxml2's build system right now is based on automake. That would have to be translated to Boost.Build. As part of this process, configuration macros might need to be translated. This could easily lead to a real fork of the code base. Unless Boost wants to rely on the regression testing done by the authors of libxml2, regression tests, portability tests and everything else must be written and maintained. And last but certainly not least, there's the licensing issue. Boost is working hard to get all code under the Boost license. Would we want an external library under any other license, no matter how permissive, in that code base? Or would the authors of libxml2 permit relicensing of the source? (As a programmer, I'd rather reimplement a library than pursuing such goals. ;) ) Second, the recommendation focused on a DOM-style API. As at least two people [1][2] pointed out, DOM-style APIs are not as universally useful as other APIs. That said, I do intend to provide a DOM-syle API, but only after having completed the event-based API and thought long and hard about what a DOM-style API means in C++. Still, this is one of the main reasons why I asked for real-world use cases. My own uses of XML have usually been satisfied by SAX, although I would have preferred a pull-style API. I'd love to hear how other people use XML. I know that two Boost-internal uses could work with a pull API very well: Property Tree's XML reader and the Serialization XML archive. To sum up, I do believe we should reinvent the wheel here. But we should create an improved wheel, and I think the Boost community is uniquely suited to create a wheel that works particularly well with C++. To maintain thread integrity, I'll reply to each post individually. [1] http://lists.boost.org/Archives/boost/2005/11/96131.php [2] http://lists.boost.org/Archives/boost/2005/11/96521.php