
Hi, I saw there was some discussion on this list regarding an XML library. I wrote a C++ based XML library recently as part of the work I do on e.g. MRS, a full-text retrieval system for biological databanks. (See http://mrs.cmbi.ru.nl/ ) I extracted the XML and SOAP code from the MRS project and wrapped it into a new library. This library is called libzeep and I've put the code on Berlios (http://libzeep.berlios.de/). License is Boost. The library currently consists of: - a SAX parser with validation support based on DTD. - a simple XML library for nodes, elements, etc. - an XML writing utility/wrapper. - an xpath implementation And then the code contains a full high-performance SOAP server implementation. This code generated all SOAP related code and WSDL generator code based on the signature of exported server methods. No need to run a separate code generating tool. Network code is based on asio and I've implemented an optional pre-forked mode of operation. A previous version of libzeep is in production use for over a year now and I recently added the parser and xpath stuff to be able to drop libxml2 from the requirements for MRS. Code comes with quite a few test sets. Documentation is not updated yet for the parser and xpath but there is some for the SOAP part. I need to do a bit more testing before I can release a new version. My question now is, would this library be of interest to Boost? Perhaps it can be used, or parts of it can be recycled. And if there is interest, I'm sure there must be suggestions for improvement. You can browse the code at https://svn.cmbi.ru.nl/libzeep/trunk/ (There's also a copy at Berlios, but that may be out-of-date) The application zeep-test is a simple SOAP server, in ./tests/ you can find two simple test drivers, one for the parser and one for xpaths. Best regards, -maarten hekkelman

Maarten L. Hekkelman wrote:
I wrote a C++ based XML library recently as part of the work I do on e.g. MRS, a full-text retrieval system for biological databanks. (See http://mrs.cmbi.ru.nl/ )
I extracted the XML and SOAP code from the MRS project and wrapped it into a new library. This library is called libzeep and I've put the code on Berlios (http://libzeep.berlios.de/). License is Boost.
The library currently consists of:
- a SAX parser with validation support based on DTD. - a simple XML library for nodes, elements, etc. - an XML writing utility/wrapper. - an xpath implementation
Hi Maarten, Very interesting. Just one observation for now. I see that you have a node class that points to its neighbours and parent. I think this is quite a common pattern for XML, but it differs from e.g. the standard containers. So it's not possible to use standard algorithms to iterate through your XML structure. I am interested in how an "XML container concept" can be made to look more like a standard container. For example, are there any inherent reasons why the pattern that you have used is better suited to XML? Is there anything that it can do that is hard to express with e.g. a begin(), end(), iterator style ? Regards, Phil.

Op 08-04-10 17:22, Phil Endecott schreef:
Very interesting.
Thanks.
Just one observation for now. I see that you have a node class that points to its neighbours and parent. I think this is quite a common pattern for XML, but it differs from e.g. the standard containers. So it's not possible to use standard algorithms to iterate through your XML structure.
I started with containers, but ended up with the current structure since it is so much more practical. You can still iterate in container style over children though. The member 'children()' of a node returns a std::list of pointers to children nodes. Therefore you can write code like: foreach (xml::element* e, node->children<xml::element>()) ; As I said, I started with simply nodes that had children in a member STL list. This seemed the /right/ way to do it. But then I ran into problems when I had to create processing instruction nodes, comment nodes and even attribute nodes. These are needed for a correct (and straightforward) xpath implementation. So I had to create a common base class, node, and element now is a subclass of node. Fortunately, not all is lost. The children method above can be used for traversal, but it is even more convenient to use xpath.evaluate() to create a selection and iterate that: foreach (xml::element* e, xpath("//my-node").evaluate<xml::element*>(n)) ; Or even (as a shortcut) foreach (xml::element* e, xmlDoc->find("//my-node")) ; The find method of element always returns an set of elements, stripping out the other nodes. Hope this explains the design a bit. -maarten

Maarten L. Hekkelman wrote:
Op 08-04-10 17:22, Phil Endecott schreef:
Just one observation for now. I see that you have a node class that points to its neighbours and parent. I think this is quite a common pattern for XML, but it differs from e.g. the standard containers. So it's not possible to use standard algorithms to iterate through your XML structure.
I started with containers, but ended up with the current structure since it is so much more practical.
You can still iterate in container style over children though. The member 'children()' of a node returns a std::list of pointers to children nodes. Therefore you can write code like:
foreach (xml::element* e, node->children<xml::element>()) ;
Ah, I missed that.
As I said, I started with simply nodes that had children in a member STL list. This seemed the /right/ way to do it. But then I ran into problems when I had to create processing instruction nodes, comment nodes and even attribute nodes. These are needed for a correct (and straightforward) xpath implementation. So I had to create a common base class, node, and element now is a subclass of node.
Fortunately, not all is lost. The children method above can be used for traversal, but it is even more convenient to use xpath.evaluate() to create a selection and iterate that:
foreach (xml::element* e, xpath("//my-node").evaluate<xml::element*>(n)) ;
So could your XPath implementation be decoupled? What interface does it use to traverse the XML tree? Could it be used on top of a different XML tree implementation, such as a lazy one that stores the text of the XML document, if it provided a similar interface? Regards, Phil.

Op 08-04-10 19:23, Phil Endecott schreef:
So could your XPath implementation be decoupled? What interface does it use to traverse the XML tree? Could it be used on top of a different XML tree implementation, such as a lazy one that stores the text of the XML document, if it provided a similar interface? I think most of it should work. However, the entire XPath specification is build upon the idea that everything is a node and a node does not have to be an element. That's why I changed my object hierarchy.
-maarten

On 04/08/2010 01:50 PM, Maarten L. Hekkelman wrote:
Op 08-04-10 19:23, Phil Endecott schreef:
So could your XPath implementation be decoupled? What interface does it use to traverse the XML tree? Could it be used on top of a different XML tree implementation, such as a lazy one that stores the text of the XML document, if it provided a similar interface? I think most of it should work. However, the entire XPath specification is build upon the idea that everything is a node and a node does not have to be an element. That's why I changed my object hierarchy.
For the boost.xml library I'm working on I plan to use something akin to boost.variant as the return type of an xpath query. I don't think that the XPath specification should dictate a type hierarchy on a C++ implementation. FWIW, Stefan -- ...ich hab' noch einen Koffer in Berlin...

On 04/08/10 12:58, Stefan Seefeld wrote: [snip]
For the boost.xml library I'm working on I plan to use something akin to boost.variant as the return type of an xpath query.
What does boost.variant lack that leads you to create something akin to it?
I don't think that the XPath specification should dictate a type hierarchy on a C++ implementation.
What is there about the XPath specification that makes any type hierarchy for modelling it less suitable than using something akin to boost.variant? You see, I'm wondering because using type hierarchies and virtual functions has been touted as a great advantage of OO programming; yet, it apparently lacks something which you need. I'd like to understand what that is. Apparently you're not the only one that's found a similar lack, as shown by the following thread on the spirit-general ml around Mar 4, 2010: http://preview.tinyurl.com/y4fk5rf Thanks Stefan. -regards, Larry

On 04/09/2010 11:14 AM, Larry Evans wrote:
On 04/08/10 12:58, Stefan Seefeld wrote: [snip]
For the boost.xml library I'm working on I plan to use something akin to boost.variant as the return type of an xpath query.
What does boost.variant lack that leads you to create something akin to it?
Sorry, I'm not a native English speaker. By "akin to" I didn't mean to imply that it necessarily is something else. Just that boost.variant looks functionally like what I want, but that I haven't fully made up my mind about what the best interface is for this.
I don't think that the XPath specification should dictate a type hierarchy on a C++ implementation.
What is there about the XPath specification that makes any type hierarchy for modelling it less suitable than using something akin to boost.variant?
XPath queries may yield very different results, from mere integral numbers ("count(...)") to node-sets. I don't think it is meaningful or even possible to capture all those types in a single hierarchy (at least if by "hierarchy" we mean a common base class).
You see, I'm wondering because using type hierarchies and virtual functions has been touted as a great advantage of OO programming; yet, it apparently lacks something which you need.
Indeed. Not everything can be captured with OO. Especially if you take that to the extreme of a single-rooted type hierarchy. Stefan -- ...ich hab' noch einen Koffer in Berlin...

Larry Evans wrote:
What is there about the XPath specification that makes any type hierarchy for modelling it less suitable than using something akin to boost.variant?
You see, I'm wondering because using type hierarchies and virtual functions has been touted as a great advantage of OO programming; yet, it apparently lacks something which you need. I'd like to understand what that is.
Some could argue that the point of a base class is moot if you have to downcast it to make anything useful with it, and that algebraic data types (variant-like things) are a much more elegant solution when you need to visit the different cases.

On 04/09/10 10:48, Mathias Gaunard wrote:
Larry Evans wrote:
What is there about the XPath specification that makes any type hierarchy for modelling it less suitable than using something akin to boost.variant?
You see, I'm wondering because using type hierarchies and virtual functions has been touted as a great advantage of OO programming; yet, it apparently lacks something which you need. I'd like to understand what that is.
Some could argue that the point of a base class is moot if you have to downcast it to make anything useful with it,
But boost.variant has to do the equivalent of downcasting based on discriminant (the value returned by which()). You could argue variant does that automatically, but then it might throw and exception if the target of the assignment(e.g.) was the wrong type. Of course, you could check the discriminant before the assignment, but then, how's that different than what has to be done with virtual functions (using dynamic_cast). Of course one could argue that the variant library's apply_visitor does all this checking for you before sending the correct type to your actual visitor; however, the apply_visitor is little different than the elements' (using the term from visitor pattern, http://en.wikipedia.org/wiki/Visitor_pattern ) virtual accept functions.
and that algebraic data types (variant-like things) are a much more elegant solution when you need to visit the different cases.
I'm still not seeing it :( I thought algebraic data types were one thing OO programming did well. For example, a stack is and ADT and the stl library has a stack. AFAICT, every component in boost.variant is like a element role in the visitor_pattern. -confusedly yours, Larry

On 04/09/2010 12:34 PM, Larry Evans wrote:
On 04/09/10 10:48, Mathias Gaunard wrote:
Larry Evans wrote:
What is there about the XPath specification that makes any type hierarchy for modelling it less suitable than using something akin to boost.variant?
You see, I'm wondering because using type hierarchies and virtual functions has been touted as a great advantage of OO programming; yet, it apparently lacks something which you need. I'd like to understand what that is.
Some could argue that the point of a base class is moot if you have to downcast it to make anything useful with it,
But boost.variant has to do the equivalent of downcasting based on discriminant (the value returned by which()).
No, accessing a member of a union is not really a cast, neither conceptually nor technically. Stefan -- ...ich hab' noch einen Koffer in Berlin...

Stefan Seefeld wrote:
On 04/09/2010 12:34 PM, Larry Evans wrote:
On 04/09/10 10:48, Mathias Gaunard wrote:
Some could argue that the point of a base class is moot if you have to downcast it to make anything useful with it,
But boost.variant has to do the equivalent of downcasting based on discriminant (the value returned by which()).
No, accessing a member of a union is not really a cast, neither conceptually nor technically.
Conceptually, they are doing the same: determining whether the requested type is appropriate and returning the appropriately typed value. However, dynamic_cast does so by comparing RTTI objects, which may be more expensive than the discriminator comparison, and then by apply fixups to the this pointer to account for MI, virtual bases, etc. What's more, once the dynamic_cast has been done, the various functions calls may still wind up being virtual, depending upon how things are implemented. When using Boost.Variant, the types used could have no virtual functions. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

On 04/09/2010 12:46 PM, Stewart, Robert wrote:
Stefan Seefeld wrote:
On 04/09/2010 12:34 PM, Larry Evans wrote:
On 04/09/10 10:48, Mathias Gaunard wrote:
Some could argue that the point of a base class is moot if you have to downcast it to make anything useful with it,
But boost.variant has to do the equivalent of downcasting based on discriminant (the value returned by which()).
No, accessing a member of a union is not really a cast, neither conceptually nor technically.
Conceptually, they are doing the same: determining whether the requested type is appropriate and returning the appropriately typed value.
Right, but that's not the same as casting (in the sense of narrowing down a more generic to a more specific type). The difference is that in one case both types have a common ancestor / super-type (implying a common interface of some form), while in the other case they don't.
However, dynamic_cast does so by comparing RTTI objects, which may be more expensive than the discriminator comparison, and then by apply fixups to the this pointer to account for MI, virtual bases, etc.
What's more, once the dynamic_cast has been done, the various functions calls may still wind up being virtual, depending upon how things are implemented. When using Boost.Variant, the types used could have no virtual functions.
I'm not sure why you are saying all this. The point is really that there is no common interface that integral / numerical types share with nodes and node sets. Thus there is no sense in having these share some common interface that would justify a common base class, making a union-like accessor (such as boost.variant) the obvious accessor for XPath query return values. Stefan -- ...ich hab' noch einen Koffer in Berlin...

On 04/09/10 11:46, Stewart, Robert wrote:
Stefan Seefeld wrote:
On 04/09/2010 12:34 PM, Larry Evans wrote:
On 04/09/10 10:48, Mathias Gaunard wrote:
Some could argue that the point of a base class is moot if you have to downcast it to make anything useful with it, But boost.variant has to do the equivalent of downcasting based on discriminant (the value returned by which()). No, accessing a member of a union is not really a cast, neither conceptually nor technically.
Conceptually, they are doing the same: determining whether the requested type is appropriate and returning the appropriately typed value. However, dynamic_cast does so by comparing RTTI objects, which may be more expensive than the discriminator comparison, and then by apply fixups to the this pointer to account for MI, virtual bases, etc.
What's more, once the dynamic_cast has been done, the various functions calls may still wind up being virtual, depending upon how things are implemented. When using Boost.Variant, the types used could have no virtual functions.
However if the operation to be performed is implemented as a virtual function, then no dynamic_casting is needed. Just call the virtual function. I thought this was one of the main selling points of virtual functions vs. using a switch statement to cover all elements in a type hierarchy and then performing the operation at the appropriate case clause.

On 04/09/2010 01:50 PM, Larry Evans wrote:
On 04/09/10 11:46, Stewart, Robert wrote:
Stefan Seefeld wrote:
On 04/09/2010 12:34 PM, Larry Evans wrote:
On 04/09/10 10:48, Mathias Gaunard wrote:
Some could argue that the point of a base class is moot if you have to downcast it to make anything useful with it, But boost.variant has to do the equivalent of downcasting based on discriminant (the value returned by which()). No, accessing a member of a union is not really a cast, neither conceptually nor technically.
Conceptually, they are doing the same: determining whether the requested type is appropriate and returning the appropriately typed value. However, dynamic_cast does so by comparing RTTI objects, which may be more expensive than the discriminator comparison, and then by apply fixups to the this pointer to account for MI, virtual bases, etc.
What's more, once the dynamic_cast has been done, the various functions calls may still wind up being virtual, depending upon how things are implemented. When using Boost.Variant, the types used could have no virtual functions.
However if the operation to be performed is implemented as a virtual function, then no dynamic_casting is needed. Just call the virtual function. I thought this was one of the main selling points of virtual functions vs. using a switch statement to cover all elements in a type hierarchy and then performing the operation at the appropriate case clause.
For avoidance of doubt: Does this discussion of virtual methods and their virtues wrt. common interfaces have anything to do with the original question about whether or not to capture the set of types that may be returned by an XPath query in a common base class / type hierarchy ? Thanks, Stefan -- ...ich hab' noch einen Koffer in Berlin...

Stefan Seefeld wrote:
On 04/09/2010 01:50 PM, Larry Evans wrote:
On 04/09/10 11:46, Stewart, Robert wrote:
Stefan Seefeld wrote:
On 04/09/2010 12:34 PM, Larry Evans wrote:
On 04/09/10 10:48, Mathias Gaunard wrote:
Some could argue that the point of a base class is moot if you have to downcast it to make anything useful with it,
But boost.variant has to do the equivalent of downcasting based on discriminant (the value returned by which()).
No, accessing a member of a union is not really a cast, neither conceptually nor technically.
Conceptually, they are doing the same: determining whether the requested type is appropriate and returning the appropriately typed value. However, dynamic_cast does so by comparing RTTI objects, which may be more expensive than the discriminator comparison, and then by apply fixups to the this pointer to account for MI, virtual bases, etc.
What's more, once the dynamic_cast has been done, the various functions calls may still wind up being virtual, depending upon how things are implemented. When using Boost.Variant, the types used could have no virtual functions.
However if the operation to be performed is implemented as a virtual function, then no dynamic_casting is needed. Just call the virtual function. I thought this was one of the main selling points of virtual functions vs. using a switch statement to cover all elements in a type hierarchy and then performing the operation at the appropriate case clause.
Quite right, if your condition holds. See more below.
For avoidance of doubt: Does this discussion of virtual methods and their virtues wrt. common interfaces have anything to do with the original question about whether or not to capture the set of types that may be returned by an XPath query in a common base class / type hierarchy ?
Yes. You suggested the use of Boost.Variant and Larry questioned whether a Node ABC wouldn't be better. The issue as you pointed out is that there is no reasonable common base class for the types returned. Integers, elements, etc. have no operations in common to make them useful from a common base class. This, Larry's condition doesn't hold and dynamic_cast becomes necessary to do anything useful with the object returned. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

On 9 April 2010 12:34, Larry Evans <cppljevans@suddenlink.net> wrote:
I'm still not seeing it :( I thought algebraic data types were one thing OO programming did well. For example, a stack is an ADT and the stl library has a stack.
The problem there is that while the words are right, I don't know of any case where those two examples are implemented as the words imply. class stack : public ADT {}; What goes in the ADT base class that's useful? Similarly, the library isn't written like this: class stl_library { stack stack_; }; Since the usual OO version of "HAS-A" doesn't work for libraries and types.

Larry Evans wrote:
I'm still not seeing it :( I thought algebraic data types were one thing OO programming did well. For example, a stack is and ADT and the stl library has a stack.
ADT stands for Abstract Data Type. It's something else entirely. I suggest you take a look at wikipedia: <http://en.wikipedia.org/wiki/Algebraic_data_type>

On 04/09/10 12:10, Mathias Gaunard wrote:
Larry Evans wrote:
I'm still not seeing it :( I thought algebraic data types were one thing OO programming did well. For example, a stack is and ADT and the stl library has a stack.
ADT stands for Abstract Data Type. It's something else entirely.
I suggest you take a look at wikipedia: <http://en.wikipedia.org/wiki/Algebraic_data_type>
OOPS. Sorry. Yet the example shown on the wiki page looks like an abstract syntax tree for which a class hierarchy is entirely suitable. For example, each constructor alternative in: data Expression = Number Int | Add Expression Expression | Minus Expression | Mult Expression Expression | Divide Expression Expression would be subclass of an abstract base class, Expression. So, I'm "still" still not seeing the advantage of a boost.variant over a type hierarchy. -regards, Larry

About boost.variant: boost.variant is not compatible to various other boost libraries, mostly because of conflicts with boost::get. (f.e. graph). Also the sender offers a library which uses a different approach, maybe worth considering. And still I think a pluggable solution would be best, where you can choose from different Implementations of Parsers and Datastructures f.e. regards, Jens Weller -------- Original-Nachricht --------
Datum: Thu, 08 Apr 2010 13:58:07 -0400 Von: Stefan Seefeld <seefeld@sympatico.ca> An: boost@lists.boost.org Betreff: Re: [boost] xml?
On 04/08/2010 01:50 PM, Maarten L. Hekkelman wrote:
Op 08-04-10 19:23, Phil Endecott schreef:
So could your XPath implementation be decoupled? What interface does it use to traverse the XML tree? Could it be used on top of a different XML tree implementation, such as a lazy one that stores the text of the XML document, if it provided a similar interface? I think most of it should work. However, the entire XPath specification is build upon the idea that everything is a node and a node does not have to be an element. That's why I changed my object hierarchy.
For the boost.xml library I'm working on I plan to use something akin to boost.variant as the return type of an xpath query. I don't think that the XPath specification should dictate a type hierarchy on a C++ implementation.
FWIW, Stefan
--
...ich hab' noch einen Koffer in Berlin...
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01

Phil Endecott wrote:
Maarten L. Hekkelman wrote:
I wrote a C++ based XML library recently as part of the work I do on e.g. MRS, a full-text retrieval system for biological databanks. (See http://mrs.cmbi.ru.nl/ )
The 1.x spirit package included an XML parser grammer. Has anyone made such a grammar for spirit2? Does anyone have any idea what it would take to port spirit 1.x parser (and symantics) to spirit2. More specifically, what would it take to convert the xml_archive implementation in the serialization library to spirit2 qi/karma and what benefit would be obtained? Robert Ramey

Robert Ramey wrote:
Phil Endecott wrote:
Maarten L. Hekkelman wrote:
I wrote a C++ based XML library recently as part of the work I do on e.g. MRS, a full-text retrieval system for biological databanks. (See http://mrs.cmbi.ru.nl/ )
The 1.x spirit package included an XML parser grammer.
Has anyone made such a grammar for spirit2? Does anyone have any idea what it would take to port spirit 1.x parser (and symantics) to spirit2.
More specifically, what would it take to convert the xml_archive implementation in the serialization library to spirit2 qi/karma and what benefit would be obtained?
Robert Ramey
Hi Robert - I have a spirit2 implementation that may work well for the needs of the serialization library. I was hoping to discuss it with you at boostcon. michael -- ---------------------------------- Michael Caisse Object Modeling Designs www.objectmodelingdesigns.com
participants (11)
-
Jens Weller
-
Larry Evans
-
Maarten L. Hekkelman
-
Mathias Gaunard
-
Michael Caisse
-
Phil Endecott
-
Robert Ramey
-
Scott McMurray
-
Stefan Seefeld
-
Steven Watanabe
-
Stewart, Robert