This is a fun topic. How should c++ play 'catchup' to other languages on xml handling. What applications will develop from such an XML API? Xml editors and xml creators/modifiers? Data flow and communications between apps, web services? What can be leveraged in c++ to do something new/faster with xml? If there was a way to dynamically load a shared library(compiled at runtime) at run time then some pretty nitfy things could be achieved with metaprogramming and expression templates. I'm not sure there are any strong backend candidates to provide satisfaction to c++ developers and users at this time but there has to be needs besides mine. Xerces is poor at large xml documents. As far as DOM is rearranging xml elements/attributes being pursued? http://xalan.apache.org/ is xslt 1.0 and after 2.0 noone wants to go back to 1.0. Binding is an important area for me. xmlbeanscxx which is based on Xerces couldn't satisfy for binding(because the underlying DOM wasn't helpful in the task of binding) data into my applications. Xml schema constraints are a must for binding. The http://sourceforge.net/projects/pion/ could really use a binder inside it's RESTful web service. In other languages compact http://relaxng.org/ is getting addressed too. I just saw http://code.google.com/p/xplus-xsd2cpp/ recently and have yet to test it. (If you do try it, do so outside of any of your own code and in its own folder) To give examples, I use cml, mathml, graphml, svg, bibtexml and a number of custom xml formats. Each of these have their quirks and are difficult to bind. Haven't tried http://vtd-xml.sourceforge.net/ for a while because its license doesn't work for my company. With custom code I've been doing something similar for simply reading data from xml documents. On 05/09/2013 10:26 AM, Stefan Seefeld wrote:
Bjorn,
we are going in circles, which is in part because we still are talking past each other.
In particular, it seems you aren't distinguishing between users and developers.
On 05/09/2013 06:00 AM, Bjorn Reese wrote:
On 05/08/2013 02:08 PM, Stefan Seefeld wrote:
You are evading the question. A user may not even care how boost.xml is implemented, as long as the functionality is there. If I'm such a user, I don't want to be confronted with the question of what backend to pick. Then create a 'boost-xml-standalone' package without dependencies, and let the 'boost-xml' package depend on the 'boost-xml-standalone' and 'libxml2' packages. Problem solved. Sorry, what problem is solved ?
Right. But again, I think you are making life much harder than it needs to be for users. As a user I want to use the boost.xml library in my own project. Do you really anticipate there to be a bunch of different backends being offered to end-users to pick from, depending on what functionality he requires ? What a drag ! Just give him a a single I thought that this was part of the GSoC proposal, which states: [...]
You are citing out of context. Implementing multiple backends has many benefits for *developers*, for example as it helps to guarantee that the API isn't tied to a particular backend. It should not affect in any way *users*, who will only use the boost.xml API (and library), without any concern for any particular implementation choice.
Having said that, with the proper defaults, the user do not have to do anything. Only if he wants to do something different does he need to include another header, pass an extra argument, or whatever. This is how the rest of Boost handles variation. Why has this suddenly become much harder? It hasn't, and when expressed that way, I actually agree. What I don't agree with is this:
Start with an XML lexer. This simply returns the next token (start tag, attribute, data, etc.) when called.
Put the XML lexer in a loop, and you get a SAX parser.
Pair the XML lexer with a parent stack, and you get an XmlReader.
Base the DOM parser on the SAX parser to create its tree. This is how libxml2 does it, and how it reuses the tree generator for parsing other formats such as HTML and DocBook.
By default, I would provide our own tree, although this is not terribly important. While the layering you describe pretty much matches a typical implementation, this doesn't have any consequences for users, as these layers can't be exchanged. You can't mix a layer from one backend and combine it with another layer from a different backend. So why care, on an API level ?
I believe your point was that you want to be able to implement only the "XML lexer", but neither the SAX nor DOM APIs, and still be able to call the result "boost.xml", yes ? I still think this is a bad idea. Otherwise, as long as the full functionality is provided, I don't care about the implementation, and in particular, whether someone will fancy to rewrite it "natively" instead of building on top of existing third-party libs.
Stefan