
* Stefan Seefeld <seefeld@sympatico.ca> [2005-10-31 22:47]:
Some years ago I proposed a XML-related APIs for inclusion into boost (http://lists.boost.org/Archives/boost/2003/06/48955.php). Everybody agreed that such APIs would be very useful as part of boost. Unfortunately, though, after some weeks of discussing a number of details, I got distracted, and so I never managed to submit an enhancement.
I'v now started to look into this topic again, and wrote down a start of a DOM-like API as I believe would be suitable for boost.
Here are some highlights:
* User access dom nodes as <>_ptr objects. All memory management is hidden, and only requires the document itself to be managed.
* The API is modular to allow incremental addition of new modules with minimal impact on existing ones. This implies that implementations may only support a subset of the API. (For example, no validation.)
* All classes are parametrized around the (unicode) string type, so the code can be bound to arbitrary unicode libraries, or even std::string, if all potential input will be ASCII only.
* The Implementation uses existing libraries (libxml2, to be specific), since writing a XML DOM library requires substantial efford.
A first sketch at a XML API is submitted to the boost file vault under the 'Programming Interfaces' category. It contains demo code, as well as some auto-generated documentation.
I'm aware that this requires some more work before I can attempt a formal submission. This is simply to see whether there is still any interest into such an API, and to get some discussion on the design.
I'm going to respond off-the-cuff, so excuse me if what I mention is covered in your sketch. Simply, the Java APIs have moved away from W3C DOM. In that langauge, developers have moved to JDOM, DOM4J, or XOM, for node surgery. The W3C DOM predates namespaces, and namespaces feel kludgy. It permits the construction of documents that are unlikely in the wild. Most documents conform to XML Namespaces. Of those alterate object models noted above, only DOM4J separates interface from implementation as rigidly as W3C DOM, using the factory pattern to create all nodes. More recent object models in Java like XMLBeans move away from modeling XML as a tree of nodes connected by links, and instead models XML as a target node, with a set of axis that are traversed by iterators, rather than node references. This model is the most C++ like. There are also document object models coming out of XPath and XSLT that are not as well known but are all axis based. Saxon's NodeInfo model, Jaxen's Navigator model, and Groovy's GPath model. All of these models are immutable. They support transformations and queries. For many applications this is all that is necessary. XQuery, XSLT, and XPath all generate new documents from immutable documents. The need for document surgery for many in memory applications is not as common as one might think. Transformation is often easier to express. I'd suggest, in any language wide implementation of XML, to attempt to separate transformation and query, from update. They are two very different applications. I'd suggest starting with supporting XML documents that conform to the XPath and Query data model, and working backwards as the need arises. It makes for a much more consice library, and removes a lot of methods for rarely needed, often pathalogical, mutations. Implementing an object model would be much easier, if you ipmlement the 95% that is most frequently used. And if you sepearate the compexity of document mutation from the realative simplicity of iteration and transformation. Cheers. -- Alan Gutierrez - alan@engrm.com - http://engrm.com/blogometer/