
Sebastian Redl wrote:
Robert Ramey wrote:
Given the avaliabilty of spirite, Dan Nuffer's XML grammar, and probably some sort of tree structure - adobe, multi-index, or roll your own from STL, I'm very surprised no one has submitted a DOM and/or SAX xml parser. It seems to me that this would be a straight forward composition of these three high quality components.
Not quite. The fourth, and very important, high-quality component that is missing is Boost.Unicode (or Boost.Recode or whatever). XML requires a lot of support for international character sets, at the very, very least UTF-8 and UTF-16, BE and LE. In practice, this means passing iterators that convert automatically to Spirit. Do we have such iterators?
I had to make my own such iterators in order to make xml_archives. These are described in the serialization library under the misc/dataflow iterators. I'm not sure what BE or LE refer to. But I did adress UTF-16 and UTF-8 was addressed with Ron Garcias UTF-8 codecvt facet. This is also documented in the misc section of the serializaiton library. A "complete unicode solution" has been discussed here in some length. The requirements kept growing until the job was undoable. Fortunately, I don't think a "complete unicode solution" is required to make a serviceable XML parser that we can build on.
If I had that, I could start in the sandbox right now in my free time.
great - when do you think you'll be done? Robert Ramey