
* Stefan Seefeld <seefeld@sympatico.ca> [2005-11-01 10:18]:
Alan,
thank you for your interesting points. The API I suggest is not modeled after the the W3C DOM IDL, neither its java implementation.
Many people have expressed discomfort both with the W3C DOM API as well as the idea of simply transcribing the java API to C++.
Therefor, the API I suggest here is (so I hope) as C++-like as it can be, while still giving full flexibility to operate on (i.e. inspect as well as modify) XML documents.
From the little I could gather about the alternatives you mention, it sounds like they would make very nice access layers on top of the base API (axis-oriented iterators, say).
I'd suggest, in any language wide implementation of XML, to attempt to separate transformation and query, from update. They are two very different applications.
I'm not sure I understand what you mean by transformation. How is it different from update ? Or is the former simply a (coarse-grained) special case of the latter, using a particular language to express the mapping (such as xslt) ?
Transformation engines are XQuery, XSLT, STX, and Groovy GPath. They do not update the document provided. The produce a new document. That is what I mean by transformation. The input XML document is not changed, it is read, and a new document is emitted. The document object model does not need to be mutable. Thus you can perform all sorts of optimizations for navigation. The ability to add or remove a node makes a document object model far more complex. Many people prefer this mode of operation over adding and removing nodes. Node insert/remove appears to be a common operation, because of web programming, where chaning the dom in the browser changes the display of the page. When you are not programminng for the pretty side-effects, node surgery becomes a real pain. Reading the document in, shuffling nodes, writing it back out is cumbersome. A lot of code is spent on the add and remove that is repetitious. It's much easier to express an XML operation in terms of a query that returns a document, or as a reactor to a set of events.
I'd suggest starting with supporting XML documents that conform to the XPath and Query data model, and working backwards as the need arises. It makes for a much more consice library, and removes a lot of methods for rarely needed, often pathalogical, mutations.
There are clearly very different use cases to be considered. We should collect them and try to make sure that all of them can be expressed in a concise way. I'm not sure all of them operate on the same API layer.
I'm sure they could, but I'm sure it would make a heavyier API than necessary. XSLT, XQuery, and XPath simply do not require "removeChild".
The code I posted supports xpath queries. While the result of an xpath query can have different types, right now only node-sets are supported
Which is cool, since in XPath an atomic value is the same thing as a node set that contains only that atomic value.
(May be boost::variant would be good to describe all of the possible types).
Types are described by a qualified name in XPath. Someone who is implementing a host language for XPath, like XQuery or XSLT, will require a named type.
I'm not quite sure I understand what you mean by 'XPath data model'.
http://www.w3.org/TR/xpath-datamodel/
Implementing an object model would be much easier, if you ipmlement the 95% that is most frequently used. And if you sepearate the compexity of document mutation from the realative simplicity of iteration and transformation.
Could you show an example of both, what you consider (overly) complex as well as simple ? While the API in my code is certainly not complete (namespaces are missing, notably), I find it quite simple and intuitive. I don't think it needs to become more much complex to be complete.
You are right on the money with W3C DOM. That is an overly complex object model. It allows for the creation of documents that do not adhere to XML Namespaces. If it were up to me, I'd create an document object model that was an XML Namespaces document object model, instead of an XML document object model. W3C DOM is designed to accept <a:b:c/> as a valid element name. For a good example of production code, I'd look at Saxon's NodeInfo object. The code is wooly, but describes the subset of data used in XPath, XQuery and XSLT, and the implementation gotchas. It really is an implementation of XPath data model, and probably the best example of how to implement it that is open source.
In particular, I'm hoping that we can make the API modular, so document access and document validation are kept separate (for example). May be that is what you mean, I'm not sure.
Yes. There are different breakdowns. Validation is something that people will want to do without, for the sake of performance. An XML document can be very useful read-only. I find that in my work, I'm don't have call to update nodes, since the XML comes from Atom feeds or SQL databases, replacing nodes makes little sense. http://www.w3.org/TR/xml-infoset/ I'd start by modeling the information, then move on to a separeate interface for mutating it. I'd put axis high on the list, since that is how XML has come to be seen by many, and they are a natural for the C++ STL. That strikes me as the best way to work with XML in C++, using C++ SQL algorithms as a query language, navigating a very efficent XML document object model, emitting a new document. Cheers. -- Alan Gutierrez - alan@engrm.com - http://engrm.com/blogometer/ http://www.w3.org/TR/xml-infoset/