Re: [boost] [ANN] libstudxml - modern XML API for C++

21 May 2014

      Hi Stefan,

In gmane.comp.lib.boost.devel you write:
...
Does it support a DOM-like API, i.e. an in-memory representation of the
document ?
No, it does not. I spent quite a bit of time on the in-memory vs
streaming debate in my talk. How I wish the video was already
available...

Until then, to summarize the key points:

* Most people think they need DOM. I believe it is not because in-memory
  is conceptually better but because of the really awful and inconvenient
  streaming APIs (like SAX). So I tried to convince the audience that a
  well designed streaming pull API is actually sufficient for the majority
  of cases. I didn't hear many objections.

  Take a look at the API Introduction[1], it shows how to handle everything
  from converters/filters that don't care about the data, to applications
  that process the data without creating any kind of in-memory object
  model, to C++ classes that know how to persist themselves in XML.

* On that last point (C++ class persistence) a lot of applications
  extract XML data into some kind of object model (C++ classes that
  correspond to the XML vocabulary). Creating an intermediate
  representation of XML (DOM) just to throw it way moments later
  seems kind of pointless.

* Of course there will always be applications that need to revisit
  the bulk of raw XML data and for them in-memory would probably
  always be a better choice.

* Which brings us to this point: it is easy to go from streaming to
  in-memory but not the other way around.

* In fact, an even better approach would be to support hybrid, partially
  streaming/partially in-memory parsing and serialization (also discussed
  in the talk). Then, the fully in-memory would simply be a special case.

* libstudxml has the ‘hybrid’ example which shows how to implement this
  hybrid approach. You would be shocked how short and simple the code
  is (I know I was once I wrote it ;-)).

[1] http://www.codesynthesis.com/projects/libstudxml/doc/intro.xhtml#2
...
I have always strongly argued against the idea that an "XML API" was
only about parsing XML data, as there are many useful features that
involve manipulation of XML data (including transformations between
documents, xpath-based search, etc.).
You need to start somewhere. And support for (relatively) low-level XML
parsing and serialization seems like a good place.
...
In fact, I believe such an API should be robust enough to be able to
wrap different backends, rather than depending on a particular
implementation choice.
I don't think it will be robust. I think it will be awful and inconvenient.
Try to adapt straight SAX API to anything other than callback-based with
inversion of control (i.e., SAX again).

Boris

Re: [boost] [ANN] libstudxml - modern XML API for C++

Boris Kolpackov