Status of property_tree library

I'll try to shed some light on what has happened to property_tree since review and why it's been dragging. Since review I had some feedback about the library. The issue raised most frequently was unsatisfactory performance. Ptree does too many memory allocations, and thus is fairly slow and unsuitable for many users. Plus some of the original parsers (XML parser specifically) were really slow and took ages to compile, especially on gcc. Because this is headers only library that is supposed to be lightweight, this was a serious problem. To solve it I spent some months implementing a very fast in-situ XML parser for the library (called rapidxml - see rapidxml project on sourceforge). This was finished in August last year and integrated with property tree. It works very well and is so fast that time to parse XML is now totally insignificant compared to the time it takes to build a ptree from the parsed data. The allocations problem remains. I have been unable to come up with a scheme that would reduce the number of allocations without compromising simplicty of the library. The key point is that I want to maintain validity of iterators in presence of insertions/erases. This rules out array based containers. The best I can think of is a custom list implementation. That has potential to reduce number of allocations by roughly 30%, which is not enough IMO. On top of that I'm not sure whether the heavy type-parametrization of the library (i.e. lots of template parameters everywhere) is a good thing. It definitely makes it harder to document, use and understand. In all the the feedback I got there is no evidence of anyone actually using these template parameters. So IMO these should be reduced to just the character type. Otherwise the library pretends to be a generic tree container, which it wasn't supposed to be. Finally, probably the main reason the library is dragging is docs. The original docs (still available at http://kaalus.atspace.com/ptree/doc/index.html) were written in a gargantuan effort manually in HTML. This was a horrible experience that I do not wish to repeat. Some sort of automated docs generation is needed. I know about QuickBook and I like its wiki syntax. About a year ago I even managed to setup the toolchain correctly (including doxygen). The output of [code->doxygen->quickbook->boostbook->xml->xslt->html] pipe was not very satisfactory though, the reference entries were badly formatted and scattered all over the place in a disorganized fashion. I wouldn't like to give up the library because I still work on it occasionally. But as I wasn't able to solve major problems (docs tools, performance) satisfactorily, the work towards the release has ground to a halt. IMO the issues that should be addressed before release are: - Simplification by removing some of the template arguments (arguable - maybe someone is using that?) - Figure out which parsers are unneccessary and get rid of them (cmdline parser?) - Get some docs toolchain to work properly and provide up-to-date docs. Lots of the text can be ported over (and my monkey english corrected :-) from the original docs. - Speedup through use of a custom container (to replace std::list, needs some profiling work if it's even worth it) If someone could help with the docs toolchain (Quickbook experts? Any alternatives?) I'm happy to do the rest. I should probably give up on trying to speed it up further and concentrate on getting it into releasable state. thanks, Marcin

Marcin Kalicinski wrote: [snip]
I wouldn't like to give up the library because I still work on it occasionally. But as I wasn't able to solve major problems (docs tools, performance) satisfactorily, the work towards the release has ground to a halt.
Maybe you and Sebastian could maintain the library together?
IMO the issues that should be addressed before release are: - Simplification by removing some of the template arguments (arguable - maybe someone is using that?)
Agree. I've written some code that uses generic property trees. Dealing with all the template parameters was just a pain. Being able to select the character type might be enough. And thanks for submitting this great library. --Johan

Marcin Kalicinski wrote:
I'll try to shed some light on what has happened to property_tree since review and why it's been dragging.
[...] I am already using the property_tree in boost svn, it very handy. I am using it for saving/loading application settings, so the memory allocation and speed are not problems for me.
Finally, probably the main reason the library is dragging is docs. The original docs (still available at http://kaalus.atspace.com/ptree/doc/index.html) >
[...] It seems this doc is the same as in boost/sandbox, it still has registry read/write. But in boost svn, there is no registry_parser.hpp, so will ptree drop supporting registry in final release?

Marcin Kalicinski skrev:
I'll try to shed some light on what has happened to property_tree since review and why it's been dragging.
The allocations problem remains. I have been unable to come up with a scheme that would reduce the number of allocations without compromising simplicty of the library. The key point is that I want to maintain validity of iterators in presence of insertions/erases. This rules out array based containers. The best I can think of is a custom list implementation. That has potential to reduce number of allocations by roughly 30%, which is not enough IMO.
For now, it doesn't really matter IMO. People often find it useful anyway, and move-semantics will probably help a lot when it arrives. I can't remember why it is important to remain iterator validity, but cnosider if it is absolutely necessary. Perhaps validity of references to elements is enough.
On top of that I'm not sure whether the heavy type-parametrization of the library (i.e. lots of template parameters everywhere) is a good thing. It definitely makes it harder to document, use and understand. In all the the feedback I got there is no evidence of anyone actually using these template parameters. So IMO these should be reduced to just the character type. Otherwise the library pretends to be a generic tree container, which it wasn't supposed to be.
Well, it's been a long time since the review, but I think that ew should respect the decisions made after the review. -Thorsten

Thorsten Ottosen skrev:
Marcin Kalicinski skrev:
I'll try to shed some light on what has happened to property_tree since review and why it's been dragging.
The allocations problem remains. I have been unable to come up with a scheme that would reduce the number of allocations without compromising simplicty of the library. The key point is that I want to maintain validity of iterators in presence of insertions/erases. This rules out array based containers. The best I can think of is a custom list implementation. That has potential to reduce number of allocations by roughly 30%, which is not enough IMO.
For now, it doesn't really matter IMO. People often find it useful anyway, and move-semantics will probably help a lot when it arrives.
I just remembered this: I think you can make a very well performing linked list using Boost.Intrusive which basically allows you to store the nodes in a vector: http://igaztanaga.drivehq.com/intrusive/ AFAICT, this will almost completely remove dynamic allocations, exceptio for growing the vector (a deque might be more appropriate). -Thorsten

I just remembered this: I think you can make a very well performing linked list using Boost.Intrusive which basically allows you to store the nodes in a vector:
http://igaztanaga.drivehq.com/intrusive/
AFAICT, this will almost completely remove dynamic allocations, exceptio for growing the vector (a deque might be more appropriate).
Yes, I thought about it. That was pretty much what I meant when I said "custom list implementation". But at them moment I think I should skip all further optimization efforts and concentrate on getting it into releasable state. This means getting some sort of docs pipeline in place. I'll give Quickbook another try. Marcin

Marcin Kalicinski wrote:
But at them moment I think I should skip all further optimization efforts and concentrate on getting it into releasable state. This means getting some sort of docs pipeline in place. I'll give Quickbook another try.
I you need any proofreading, copy-editing, or even writing, don't hesitate to contact me. Well ... hesitate until March 22nd, because I won't be available until then. Sebastian

Marcin Kalicinski wrote:
I just remembered this: I think you can make a very well performing linked list using Boost.Intrusive which basically allows you to store the nodes in a vector:
http://igaztanaga.drivehq.com/intrusive/
AFAICT, this will almost completely remove dynamic allocations, exceptio for growing the vector (a deque might be more appropriate).
Yes, I thought about it. That was pretty much what I meant when I said "custom list implementation".
But at them moment I think I should skip all further optimization efforts and concentrate on getting it into releasable state. This means getting some sort of docs pipeline in place. I'll give Quickbook another try.
Once you have the toolchain working, it's actually lovely to use: unfortunately it's the bits we don't maintain like the docbook stylesheets and XSLT processors that tend to give the most hassle in setting up :-( If you run into any problems then post a message on the boost-docs list and someone will try to help I'm sure. John.

Marcin Kalicinski wrote:
The allocations problem remains. I have been unable to come up with a scheme that would reduce the number of allocations without compromising simplicty of the library. The key point is that I want to maintain validity of iterators in presence of insertions/erases. This rules out array based containers. The best I can think of is a custom list implementation. That has potential to reduce number of allocations by roughly 30%, which is not enough IMO.
Just a thought... Maybe a deque with gaps would suffice these requirements? I'm not familiar with the library but I had a similar task once and such approach did well enough.

On Thu, Mar 13, 2008 at 5:44 AM, Marcin Kalicinski <kalita@poczta.onet.pl> wrote:
IMO the issues that should be addressed before release are: - Simplification by removing some of the template arguments (arguable - maybe someone is using that?) - Figure out which parsers are unneccessary and get rid of them (cmdline parser?) - Get some docs toolchain to work properly and provide up-to-date docs. Lots of the text can be ported over (and my monkey english corrected :-) from the original docs. - Speedup through use of a custom container (to replace std::list, needs some profiling work if it's even worth it)
If someone could help with the docs toolchain (Quickbook experts? Any alternatives?) I'm happy to do the rest. I should probably give up on trying to speed it up further and concentrate on getting it into releasable state.
Since this is a library that has already gone under review and requires some help in reaching release status, wouldn't it be appropriate to put it up on the Ideas page for the GSoC? Maybe some student would like to take it up for the summer in case Boost is accepted as a GSoC mentoring organization. Regards, Divye Kapoor -- Whether the chicken crossed the road or the road moved beneath the chicken depends upon your frame of reference. My official web site: http://www.iitr.ernet.in/studenthomepages/homepages/061305

Divye Kapoor wrote:
Since this is a library that has already gone under review and requires some help in reaching release status, wouldn't it be appropriate to put it up on the Ideas page for the GSoC? Maybe some student would like to take it up for the summer in case Boost is accepted as a GSoC mentoring organization.
Possibly, but I would just note that GSoC students can't be hired purely for documentation writing and the like. Optimizising the library for performance could be a SOC project though. John.

Marcin Kalicinski wrote:
If someone could help with the docs toolchain (Quickbook experts? Any alternatives?) I'm happy to do the rest. I should probably give up on trying to speed it up further and concentrate on getting it into releasable state.
I'll be happy to be an editor for the documentation. I have some talent in spotting errors and a good grasp of English grammar and spelling. I can also try to assist in setting up the doc toolchain - mine just worked, though, so I don't know much in the way of troubleshooting. Any other help you need, I'll do my best there, too. Sebastian

on Wed Mar 12 2008, "Marcin Kalicinski" <kalita-AT-poczta.onet.pl> wrote:
I'll try to shed some light on what has happened to property_tree since review and why it's been dragging.
Since review I had some feedback about the library. The issue raised most frequently was unsatisfactory performance. Ptree does too many memory allocations, and thus is fairly slow and unsuitable for many users. Plus some of the original parsers (XML parser specifically) were really slow and took ages to compile, especially on gcc. Because this is headers only library that is supposed to be lightweight, this was a serious problem.
To solve it I spent some months implementing a very fast in-situ XML parser for the library (called rapidxml - see rapidxml project on sourceforge). This was finished in August last year and integrated with property tree. It works very well and is so fast that time to parse XML is now totally insignificant compared to the time it takes to build a ptree from the parsed data.
The allocations problem remains. I have been unable to come up with a scheme that would reduce the number of allocations without compromising simplicty of the library. The key point is that I want to maintain validity of iterators in presence of insertions/erases. This rules out array based containers. The best I can think of is a custom list implementation. That has potential to reduce number of allocations by roughly 30%, which is not enough IMO.
Did you think about using something like http://svn.boost.org/trac/boost/browser/trunk/boost/detail/quick_allocator.h... ?
On top of that I'm not sure whether the heavy type-parametrization of the library (i.e. lots of template parameters everywhere) is a good thing. It definitely makes it harder to document, use and understand. In all the the feedback I got there is no evidence of anyone actually using these template parameters.
That's a good sign that they're not solving any real problems. But we've all made that mistake at one time or another; it can be hard to remember that generic design proceeds from concrete (non-generic) use cases. <snip>
If someone could help with the docs toolchain (Quickbook experts? Any alternatives?) I'm happy to do the rest. I should probably give up on trying to speed it up further and concentrate on getting it into releasable state.
Do you have the help you need, now? -- Dave Abrahams Boost Consulting http://boost-consulting.com
participants (9)
-
Andrey Semashev
-
David Abrahams
-
Divye Kapoor
-
gchen
-
Johan Råde
-
John Maddock
-
Marcin Kalicinski
-
Sebastian Redl
-
Thorsten Ottosen