Re: [boost] Proposal: XML APIs in boost

1 Nov 2005


      * Stefan Seefeld <seefeld@sympatico.ca> [2005-10-31 22:47]:
...
Some years ago I proposed a XML-related APIs for inclusion
into boost (http://lists.boost.org/Archives/boost/2003/06/48955.php).
Everybody agreed that such APIs would be very useful as part of boost.
Unfortunately, though, after some weeks of discussing a number of details,
I got distracted, and so I never managed to submit an enhancement.
I'v now started to look into this topic again, and wrote down a start
of a DOM-like API as I believe would be suitable for boost.
Here are some highlights:
* User access dom nodes as <>_ptr objects. All memory management is
   hidden, and only requires the document itself to be managed.
* The API is modular to allow incremental addition of new modules with
   minimal impact on existing ones. This implies that implementations
   may only support a subset of the API. (For example, no validation.)
* All classes are parametrized around the (unicode) string type, so
   the code can be bound to arbitrary unicode libraries, or even std::string,
   if all potential input will be ASCII only.
* The Implementation uses existing libraries (libxml2, to be specific),
   since writing a XML DOM library requires substantial efford.
A first sketch at a XML API is submitted to the boost file vault under
the 'Programming Interfaces' category. It contains demo code, as well
as some auto-generated documentation.
I'm aware that this requires some more work before I can attempt
a formal submission. This is simply to see whether there is still
any interest into such an API, and to get some discussion on the
design.
I'm going to respond off-the-cuff, so excuse me if what I
    mention is covered in your sketch.

    Simply, the Java APIs have moved away from W3C DOM. In that
    langauge, developers have moved to JDOM, DOM4J, or XOM, for node
    surgery.
    
    The W3C DOM predates namespaces, and namespaces feel kludgy. It
    permits the construction of documents that are unlikely in the
    wild. Most documents conform to XML Namespaces.

    Of those alterate object models noted above, only DOM4J
    separates interface from implementation as rigidly as W3C DOM,
    using the factory pattern to create all nodes.

    More recent object models in Java like XMLBeans move away from
    modeling XML as a tree of nodes connected by links, and instead
    models XML as a target node, with a set of axis that are
    traversed by iterators, rather than node references.

    This model is the most C++ like.

    There are also document object models coming out of XPath and
    XSLT that are not as well known but are all axis based.  Saxon's
    NodeInfo model, Jaxen's Navigator model, and Groovy's GPath model.

    All of these models are immutable. They support transformations
    and queries. For many applications this is all that is necessary.

    XQuery, XSLT, and XPath all generate new documents from
    immutable documents. The need for document surgery for many in
    memory applications is not as common as one might think.

    Transformation is often easier to express. 

    I'd suggest, in any language wide implementation of XML, to
    attempt to separate transformation and query, from update. They
    are two very different applications.
    
    I'd suggest starting with supporting XML documents that conform
    to the XPath and Query data model, and working backwards as the
    need arises. It makes for a much more consice library, and
    removes a lot of methods for rarely needed, often pathalogical,
    mutations.

    Implementing an object model would be much easier, if you
    ipmlement the 95% that is most frequently used. And if you
    sepearate the compexity of document mutation from the realative
    simplicity of iteration and transformation.

    Cheers.

--
Alan Gutierrez - alan@engrm.com - http://engrm.com/blogometer/