[boost] RFC: Boost.XML API prototype in the sandbox

12 Jul 2007

      Petr Dimov wrote:
...
Phil Endecott wrote:
...
overhead of about 150 bytes per node.
...
After such an experience my thoughts would be less oriented towards changing 
the XML in-memory class and more towards refactoring the application to not 
build the entire XML document in memory.
Yes - but...
...
this would likely mean no XSLT.
..which was exactly what I needed to do.
That brings up another question; which of these approaches do people prefer:

- Use a C++ XML library that runs on a libxml2 backend, so that it can 
also use libxslt to do XSLT transformations.

- Use a standalone C++ XML library that is incompatible with libxslt, 
and instead do XSLT-like transformations in C++.

This question brings us back a bit closer to the features of Stefan's 
propsal (which I think could be extended to do XSLT using libxslt).  I 
think that if I were starting another project of the sort that I 
described before I would probably avoid XSLT - with hindsight I 
overstretched it.  But what would ideal C++ XML transforming (or just 
XML reading) code look like?  As gchen writes: "creating xml does't 
seem a big problem, but writing xml-reader [..] is a time-consuming 
task".  Boost.Spirit can match things; can we use something vaguely 
like Spirit syntax to match XML fragments, and define actions to apply 
to them?

rule input = * person;

rule person = element("person")(name, birth, father, mother)
                                              // meaning all needed but 
in any order
               [                              // this is a Spirit-style 
"action" [].
                 return h::html[ h::body [    // these are 
declarative-XML [].
                                              // h:: is an html element namespace
                   h::h1("Timeline for "+_1), // _1 refers to 'name' 
above; did Spirit 2 add this?
                   _2, _3, _4
                 ] ]
               ];

rule name = element("name")(firstname, surname)
             [
               return x::textnode(firstname+" "+surname);
             ];

etc. etc.

Maybe that could be made to work, but writing out the example above has 
made me a bit less optimistic about it.  How would it compare with XSLT 
in terms of capabilities, performance, syntax, and so on?

Here's another approach.  Say I have

<library>
   <document>
     <author>..intersting stuff..</author>
     ...lots and lots of uninteresting stuff...
   </document>
   ...more documents...
</library>

and I just want to extract the authors' names.  So, I start off by 
parsing it into a tree of generic xml elements, and I then (somehow) 
convert those element objects into element-name-specific subclasses.  
Or maybe I parse directly into the subclasses, it doesn't matter.  
These subclasses implement an extract_authors virtual method; for the 
library and document classes, they recurse into their children, for 
author it returns the content, and for all other subclasses it returns 
without doing anything.  So I can just call root.extract_authors().

Peter Dimov also wrote:
...
Doing an XSL transform on a "virtual" document would require an abstract node interface that you 
implement on top of your existing data to provide an XML view for it
I wonder if any serialisation or introspection experts have any 
suggestions?  I think someone else has also mentioned using XPath-like 
expressions for exploring non-XML tree structures.

Regards,

Phil.