XML

Adam Badura

8 Jan 2008 8 Jan '08

10:29 p.m.

It seems to me that Boost lacks "typical XML parser" (by that I mean something offering DOM and SAX parsing as well as validation and maybe some other features). I did not look at serialization that much but I suspect that it does not offer such properties. Why is that? Any technical reasons? Or maybe "political" reasons? Or perhabs simply no one did it? Adam Badura

Show replies by date

Jonathan Turkanis

8 Jan 8 Jan

10:46 p.m.

Adam Badura wrote:

...

It seems to me that Boost lacks "typical XML parser" (by that I mean something offering DOM and SAX parsing as well as validation and maybe some other features). I did not look at serialization that much but I suspect that it does not offer such properties. Why is that? Any technical reasons? Or maybe "political" reasons? Or perhabs simply no one did it?

There is extensive discussion of this issue in the archives. Try searching for "Boost.Xml"

...

Adam Badura

-- Jonathan Turkanis CodeRage http://www.coderage.com

Adam Badura

10:52 p.m.

Oh... I did not look into the archives. My mistake. Adam Badura

Stefan Seefeld

11:11 p.m.

Jonathan Turkanis wrote:

...

Adam Badura wrote:

...
It seems to me that Boost lacks "typical XML parser" (by that I mean something offering DOM and SAX parsing as well as validation and maybe some other features). I did not look at serialization that much but I suspect that it does not offer such properties. Why is that? Any technical reasons? Or maybe "political" reasons? Or perhabs simply no one did it?

There is extensive discussion of this issue in the archives. Try searching for "Boost.Xml"

...and then check out the boost.xml sandbox project: http://svn.boost.org/trac/boost/browser/sandbox/xml I'd be glad to get some motivation to work on it some more. :-) Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...

Adam Badura

11:17 p.m.

...

...and then check out the boost.xml sandbox project:

http://svn.boost.org/trac/boost/browser/sandbox/xml

I'd be glad to get some motivation to work on it some more. :-)

It may seem to be a stupid question but how do I download this? It lookd for appropriate link (this seems moste logical) however did not found any. How many people work on this project? Adam Badura

Stefan Seefeld

11:25 p.m.

Adam Badura wrote:

...

...
...and then check out the boost.xml sandbox project:

http://svn.boost.org/trac/boost/browser/sandbox/xml

I'd be glad to get some motivation to work on it some more. :-)

It may seem to be a stupid question but how do I download this? It lookd

You need a subversion client, and then check out the code using this URL: http://svn.boost.org/svn/boost/sandbox/xml

...

for appropriate link (this seems moste logical) however did not found any. How many people work on this project?

I have written it alone, but I'm happy to collaborate. It's a relatively thin layer on top of libxml2 (http://xmlsoft.org/) that offers a DOM-like and an XMLReader-like interface. Look at the examples (http://svn.boost.org/trac/boost/browser/sandbox/xml/libs/xml/example) to see the functionality that is already implemented. Thanks, Stefan -- ...ich hab' noch einen Koffer in Berlin...

Adam Badura

11:20 p.m.

...

There is extensive discussion of this issue in the archives. Try searching for "Boost.Xml"

I did some searching on the archive however found not that much. Yes. Sure. There were few discussion however thay were mainly lists of wishes and arguing which technology and methodology would be best. I did not found any actual reaosn of the library not being in the boost still. It seems however (after Stefan's post) that some work is done right now on this subject. Good to hear (read) that. Adam Badura

Barco You

9 Jan 9 Jan

3:14 a.m.

Hi, There are xerces and miniXML ... I think it's the real reason not to do so much redundance. :) On 1/9/08, Adam Badura <abadura@o2.pl> wrote:

...

It seems to me that Boost lacks "typical XML parser" (by that I mean something offering DOM and SAX parsing as well as validation and maybe some other features). I did not look at serialization that much but I suspect that it does not offer such properties. Why is that? Any technical reasons? Or maybe "political" reasons? Or perhabs simply no one did it?

Adam Badura

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- ------------------------------- Enjoy life! Barco You

Phil Endecott

12:09 p.m.

Barco You wrote:

...

There are xerces and miniXML ... I think it's the real reason not to do so much redundance. :)

There's also RapidXML by Marcin Kalicinski (Boost license), which I wasn't aware of when Stefan presented his libxml2-based library: http://rapidxml.sourceforge.net/ http://rapidxml.sourceforge.net/manual.html Quote: "RapidXml is an attempt to create the fastest XML DOM parser possible, while retaining useability, portability and reasonable W3C compatibility. It is an in-situ parser written in C++, with parsing speed approaching that of strlen() function executed on the same data." It achieves its high performance, IIUC, by not copying the XML as it parses; instead it records pointers into the source text. This is an approach that I have used with other data formats - I recently mentioned a const_string_facade class that I have written for this - and it works well for me. It would be great to see some real-life feature-set, performance and usability comparisons of this approach and a more traditional parser. (Actually there are some numbers in the rapidxml manual linked above, but they don't include libxml2). Regards, Phil.

Richard Webb

12:39 p.m.

Phil Endecott <spam_from_boost_dev <at> chezphil.org> writes:

...

Barco You wrote:

...
There are xerces and miniXML ... I think it's the real reason not to do so much redundance. :)

There's also RapidXML by Marcin Kalicinski (Boost license), which I wasn't aware of when Stefan presented his libxml2-based library:

I've been doing a bit of testing recently with Arabica (http://www.jezuk.co.uk/cgi-bin/view/arabica). It's a bit more 'heavy duty' than some of the previously mentioned libs, but it can be configured to use a number of different XML parsers including libxml2, xerces and MSXML. It also uses Boost internally and uses a BSD type license.

Stefano Delli Ponti

1:28 p.m.

Phil Endecott wrote:

...

Barco You wrote:

...
There are xerces and miniXML ... I think it's the real reason not to do so much redundance. :)

There's also RapidXML by Marcin Kalicinski (Boost license), which I wasn't aware of when Stefan presented his libxml2-based library:

http://rapidxml.sourceforge.net/ http://rapidxml.sourceforge.net/manual.html

By the way, it seems that RapidXML will be included, without much fanfare, in the upcoming 1.35, as a detail of the property-tree library. Is this correct? Sted

Jose

3:17 p.m.

Hi, I have used both rapidxml and pugixml (which inspired rapidxml) http://code.google.com/p/pugixml/ and they are the best c++ xml libs I found (if you don't need an xml validating parser). RapidXML will be supported or is supported by the boost property-tree library but that does not mean it can be included without a review. It would be great if the authors can put forward one of the libraries (or a combined one) for a Boost review. It would be a great addition ! regards jose On Jan 9, 2008 2:28 PM, Stefano Delli Ponti <stefano.delliponti@gmail.com> wrote:

...

Phil Endecott wrote:

...
Barco You wrote:

...
There are xerces and miniXML ... I think it's the real reason not to do so much redundance. :)

There's also RapidXML by Marcin Kalicinski (Boost license), which I wasn't aware of when Stefan presented his libxml2-based library:

http://rapidxml.sourceforge.net/ http://rapidxml.sourceforge.net/manual.html

By the way, it seems that RapidXML will be included, without much fanfare, in the upcoming 1.35, as a detail of the property-tree library. Is this correct?

Sted

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Stefan Seefeld

2:01 p.m.

Phil Endecott wrote:

...

It would be great to see some real-life feature-set, performance and usability comparisons of this approach and a more traditional parser. (Actually there are some numbers in the rapidxml manual linked above, but they don't include libxml2).

Yes, being able to compare side-by-side would certainly help. Please note that my goal in writing the boost.xml API was not to endorse one particular backend API or another, but rather to use an existing library (since, as we discussed numerous times, reinventing the wheel would be rather naive) and hook it up to a *backend-independent* API. The API itself must not rely on any backend-specific details ! Thanks, Stefan -- ...ich hab' noch einen Koffer in Berlin...

Stuart Dootson

11 Jan 11 Jan

8:20 p.m.

On 09/01/2008, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:

...

It would be great to see some real-life feature-set, performance and usability comparisons of this approach and a more traditional parser. (Actually there are some numbers in the rapidxml manual linked above, but they don't include libxml2).

Regards, Phil.

Phil - I did a quick perf test of libxml2 vs rapidxml 1.1 today. I used a 12MB XML file, which I pre-loaded before doing an in-memory parse with both libraries. rapidxml was repeatably 20x faster than libxml2. Scarily quick, in fact - it parsed my 12MB file in about 100ms... I do need to verify that they both present the same set of nodes, attributes etc, but it's a promising showing by rapidxml... Stuart Dootson

6422

Age (days ago)

6425

Last active (days ago)

List overview

Download

13 comments

9 participants

participants (9)

Adam Badura
Barco You
Jonathan Turkanis
Jose
Phil Endecott
Richard Webb
Stefan Seefeld
Stefano Delli Ponti
Stuart Dootson

XML

Adam Badura

Jonathan Turkanis

Adam Badura

Stefan Seefeld

Adam Badura

Stefan Seefeld

Adam Badura

Barco You

Phil Endecott

Richard Webb

Stefano Delli Ponti

Jose

Stefan Seefeld

Stuart Dootson

tags

participants (9)