Re: [boost] Proposal: XML APIs in boost

2 Nov 2005


      * Stefan Seefeld <seefeld@sympatico.ca> [2005-11-01 10:18]:
...
Alan,
thank you for your interesting points. The API I suggest is not
modeled after the the W3C DOM IDL, neither its java implementation.
Many people have expressed discomfort both with the W3C DOM API
as well as the idea of simply transcribing the java API to C++.
Therefor, the API I suggest here is (so I hope) as C++-like as
it can be, while still giving full flexibility to operate on
(i.e. inspect as well as modify) XML documents.
From the little I could gather about the alternatives you mention,
it sounds like they would make very nice access layers on top of
the base API (axis-oriented iterators, say).
...
I'd suggest, in any language wide implementation of XML, to
    attempt to separate transformation and query, from update. They
    are two very different applications.
I'm not sure I understand what you mean by transformation. How
is it different from update ? Or is the former simply a (coarse-grained)
special case of the latter, using a particular language to express
the mapping (such as xslt) ?
Transformation engines are XQuery, XSLT, STX, and Groovy GPath.

    They do not update the document provided. The produce a new
    document. That is what I mean by transformation. The input XML
    document is not changed, it is read, and a new document is emitted.

    The document object model does not need to be mutable. Thus you
    can perform all sorts of optimizations for navigation.

    The ability to add or remove a node makes a document object
    model far more complex.

    Many people prefer this mode of operation over adding and
    removing nodes.
    
    Node insert/remove appears to be a common operation, because of
    web programming, where chaning the dom in the browser changes
    the display of the page.
    
    When you are not programminng for the pretty side-effects, node
    surgery becomes a real pain. Reading the document in, shuffling
    nodes, writing it back out is cumbersome. A lot of code is spent
    on the add and remove that is repetitious.
    
    It's much easier to express an XML operation in terms of a
    query that returns a document, or as a reactor to a set of
    events.
...
...
I'd suggest starting with supporting XML documents that conform
    to the XPath and Query data model, and working backwards as the
    need arises. It makes for a much more consice library, and
    removes a lot of methods for rarely needed, often pathalogical,
    mutations.
There are clearly very different use cases to be considered. We should
collect them and try to make sure that all of them can be expressed
in a concise way. I'm not sure all of them operate on the same API layer.
I'm sure they could, but I'm sure it would make a heavyier API
    than necessary.

    XSLT, XQuery, and XPath simply do not require "removeChild".
...
The code I posted supports xpath queries. While the result of an xpath
query can have different types, right now only node-sets are supported
Which is cool, since in XPath an atomic value is the same thing
    as a node set that contains only that atomic value.
...
(May be boost::variant would be good to describe all of the possible types).
Types are described by a qualified name in XPath. Someone who is
    implementing a host language for XPath, like XQuery or XSLT,
    will require a named type.
...
I'm not quite sure I understand what you mean by 'XPath data model'.
http://www.w3.org/TR/xpath-datamodel/
...
...
Implementing an object model would be much easier, if you
    ipmlement the 95% that is most frequently used. And if you
    sepearate the compexity of document mutation from the realative
    simplicity of iteration and transformation.
...
Could you show an example of both, what you consider (overly) complex
as well as simple ? While the API in my code is certainly not complete
(namespaces are missing, notably), I find it quite simple and intuitive.
I don't think it needs to become more much complex to be complete.
You are right on the money with W3C DOM. That is an overly
    complex object model.
    
    It allows for the creation of documents that do not adhere to
    XML Namespaces. If it were up to me, I'd create an document
    object model that was an XML Namespaces document object model,
    instead of an XML document object model.

    W3C DOM is designed to accept <a:b:c/> as a valid element name.

    For a good example of production code, I'd look at Saxon's
    NodeInfo object. The code is wooly, but describes the subset of
    data used in XPath, XQuery and XSLT, and the implementation gotchas.
    
    It really is an implementation of XPath data model, and probably
    the best example of how to implement it that is open source.
...
In particular, I'm hoping that we can make the API modular, so document
access and document validation are kept separate (for example). May be that
is what you mean, I'm not sure.
Yes. There are different breakdowns. Validation is something
    that people will want to do without, for the sake of
    performance.
    
    An XML document can be very useful read-only. I find that in my
    work, I'm don't have call to update nodes, since the XML comes
    from Atom feeds or SQL databases, replacing nodes makes little sense.

    http://www.w3.org/TR/xml-infoset/

    I'd start by modeling the information, then move on to a
    separeate interface for mutating it.
    
    I'd put axis high on the list, since that is how XML has come to
    be seen by many, and they are a natural for the C++ STL.
    
    That strikes me as the best way to work with XML in C++, using
    C++ SQL algorithms as a query language, navigating a very
    efficent XML document object model, emitting a new document.

    Cheers.

--
Alan Gutierrez - alan@engrm.com - http://engrm.com/blogometer/
    http://www.w3.org/TR/xml-infoset/