Proposal: property_tree (property_tree.zip uploaded to vault)

Hi Everybody, Property tree is a data structure - a tree of (key, value) pairs. It differs from its cousin, "usual" PropertyMap, because it is hierarchical, not linear. Thus, it is more like a minimalistic Document Object Model, but not bound to any specific file format. It can store contents of XML files, windows registry, INI files, even command line parameters. The library contains parsers for all these formats, and more. I tried to make the library as high quality as possible before posting here, so it contains a (hopefully) full documentation, and a comprehensive set of regression tests - more than 300 test cases. By any chance it is not a substitute for (upcoming?) boost XML library - it only supports a fairly narrow subset of XML, and focuses rather on being format-independent. I uploaded it to the vault root dir, file name is property_tree.zip. For more curious, excerpt from the doc: Property tree is a recursive data structure that stores a single data string, and an ordered list of (key, value) pairs, where value is a property tree itself. It therefore forms a tree, hence the name. It is a versatile structure that can store in uniform way data coming from various sources, such as XML or INI files, as well as windows registry, program command line etc. Property tree interface is similar to the interface of a standard C++ container. It supports iterators, insertion, erasing, searching etc. One can think of it as a sort of Document Object Model, which is minimalistic, not bound to any specific file format, designed to be easy to use, and comes as Boost-compatible, headers only C++ library. Many software projects develop a similar tool at some point of their lifetime, and property tree long ago originated the same way. I hope the library can save many from reinventing the wheel. cheers, Marcin Kalicinski

Marcin Kalicinski wrote:
I uploaded it to the vault root dir, file name is property_tree.zip.
This looks highly useful to me. I am using a custom xml library with similar methods for retrieving data which has been very convenient but nowhere near as flexible as this. I would be glad to be able to replace it. After some very quick testing: The function boost::property_tree::xml_parser::validate_flags(int) is defined in a header but not a template or declared inline, causing multiple definition errors. I did not check for more problems of the same kind and only tried the xml and info parsers. Making it inline fixes the problem, and it is certainly short and simple enough. Defaulting to case-insensitve matching is surprising to me. It is true that some formats for which parsers are supplied are case insensitive, but xml and (typically) command lines are not. The parsers are independent of the tree in any case. Is there a rationale for this default? I would prefer if the third (bool pointer) parameter to ptree::get_d was optional and defaulted to 0. I would use this function in order to not have to care about the presence of the value and don't imagine I would use the flag much, if ever. I never saw a need to provide it in my own library. Overall it looks. I will experiment more.

The function boost::property_tree::xml_parser::validate_flags(int) is defined in a header but not a template or declared inline, causing multiple definition errors.
True. Additionally, some internal functions in registry_parser.hpp suffer from the same. I'll fix it. Current regressions do not detect it, because it only happens if you include xml parser or registry parser in more than one translation unit. Need to add regression for it.
Defaulting to case-insensitve matching is surprising to me. It is true that some formats for which parsers are supplied are case insensitive, but xml and (typically) command lines are not. The parsers are independent of the tree in any case. Is there a rationale for this default?
Case-insensitive matching only applies to key comparisions. The keys maintain their case during all read or write operations. So, it is going to cause problems only if you have subkeys of the same key, which only differ by case. On the other hand, case-insensitive matching is easy on non-programmers, who often write and edit scripts. When working for my previous employer I found it impossible to persuade some of the script-writers to use correct indentation, let alone use consistent case. If that is a problem, one solution would be to add iptree and wiptree typedefs that work case insensitive, while ptree and wptree work case-sensitive.
I would prefer if the third (bool pointer) parameter to ptree::get_d was optional and defaulted to 0. I would use this function in order to not have to care about the presence of the value and don't imagine I would use the flag much, if ever. I never saw a need to provide it in my own library.
The reason why it does not default to zero is that previous versions of get didn't have the distinguishing letters "_b", "_d", and defaulting of parameter caused ambiguities. Now this probably no longer applies, and it can be made default - as soon as I make sure that expressions like that are not ambiguous: bool b1 = get_d("key", false, 0); bool b2 = get_d('/', "key", false); int i1 = get_d("key", 0, 0); int i2 = get_d('/', "key", 0); cheers, Marcin Kalicinski

(Apologies if this ends up a duplicate, there was an error.) Marcin Kalicinski wrote:
I uploaded it to the vault root dir, file name is property_tree.zip.
This looks highly useful to me. I am using a custom xml library with similar methods for retrieving data which has been very convenient but nowhere near as flexible as this. I would be glad to be able to replace it. After some very quick testing: The function boost::property_tree::xml_parser::validate_flags(int) is defined in a header but not a template or declared inline, causing multiple definition errors. I did not check for more problems of the same kind and only tried the xml and info parsers. Making it inline fixes the problem, and it is certainly short and simple enough. Defaulting to case-insensitve matching is surprising to me. It is true that some formats for which parsers are supplied are case insensitive, but xml and (typically) command lines are not. The parsers are independent of the tree in any case. Is there a rationale for this default? I would prefer if the third (bool pointer) parameter to ptree::get_d was optional and defaulted to 0. I would use this function in order to not have to care about the presence of the value and don't imagine I would use the flag much, if ever. I never saw a need to provide it in my own library. Overall it looks good. I will experiment more.

Seems like a very useful library. Haven't tested it yet but got a few comments: 1. The get and put functions use operator<< and operator>> but I don't see anyway of specifying locale. I would prefer some kind of root imbue function but since the root isn't special I assume the get/put functions need a locale argument. 2. Would it be possible to change the template arguments to container and less predicate? Then you can use any string type and easliy get case insensitivity with string_algo iless. 3. Why pointers instead of references? 4. I assume paths are constants most of the time. A (w_)char or range overload would avoid a string copy (and memory allocation) for each operation.

1. The get and put functions use operator<< and operator>> but I don't see anyway of specifying locale. I would prefer some kind of root imbue function but since the root isn't special I assume the get/put functions need a locale argument.
Currently they just use global locale so it is possible to use a different one by changing it. But obviously it is not very convenient, especially in multithreading environment. Some time ago I was trying various ways of specifying locale, and I think the best way would be indeed having an extra parameter to get/put functions. But I'm not sure if it should be just a locale - some people might also want to change default formatting flags of the stream, such as hex or boolalpha. Maybe it would be a better idea to allow passing not a locale, but a whole stream to be used for conversion?
2. Would it be possible to change the template arguments to container and less predicate? Then you can use any string type and easliy get case insensitivity with string_algo iless.
I think it would be easier to get case insensitivity by just providing another version of ptree_traits, than allowing use of other string classes.
3. Why pointers instead of references?
I use pointers in many places so that NULL can be used to specify default behaviour. For example: bool exists = pt.get_b("a", NULL); // Checks for presence of "a" or ptree *pt = get_child_d("a", NULL, NULL); // If "a" is not found returns NULL With references this would be impossible.
4. I assume paths are constants most of the time. A (w_)char or range overload would avoid a string copy (and memory allocation) for each operation.
That's a valid point, but I'm not sure if the gain is big enough to justify addition of another set of overloads. Searching the path is not a simple operation, and making a single allocation and a copy is not a huge overhead. On some implementations short paths will not cause allocation anyway. regards, Marcin

On 11/9/05, Marcin Kalicinski <kalita@poczta.onet.pl> wrote:
Property tree is a data structure - a tree of (key, value) pairs. It
Here's a small patch that changes the way XML attributes are handled. Instead of creating the node node/<xmlattr>/attr, the node is created as node/@attr which allows a more XPath-style of query to be used (e.g. path.to.@attr). In either case the caller needs to know a little something about the source data (e.g. to search for <xmlattr>/attr or @attr) and I personally prefer the more XPath-like naming. Take it or leave it, its just an idea. Thanks for this nifty little library with its extensive docs! -- Caleb Epstein caleb dot epstein at gmail dot com

Instead of creating the node node/<xmlattr>/attr, the node is created as node/@attr which allows a more XPath-style of query to be used (e.g. path.to.@attr). In either case the caller needs to know a little something about the source data (e.g. to search for <xmlattr>/attr or @attr) and I personally prefer the more XPath-like naming.
XPath naming is indeed nicer, but it has a small disadvantage: you cannot iterate easily over all attributes or all keys because they are mixed together. I might add it as another option to XML parser. thanks for the patch, Marcin
participants (4)
-
Caleb Epstein
-
Daniel Wesslén
-
Marcin Kalicinski
-
Martin