[Property Tree] Some initial Review Thoughts / Questions

This isn't my full review, just a few issues I see. I think this library has great potential and value, but I'm going to stick to the issues for now. I posted most of this just before the review, but got no response: http://lists.boost.org/Archives/boost/2006/04/103305.php I see 3 serious issues at this point: 1) Find is under documented / doesn't work for me ptree pt; read_ini("test.ini", pt); ptree::const_iterator ci = pt.find("value2"); if (ci != pt.end()) { std::cout << ci->second.data() << std::endl; } //test.ini [User Data] ; This is a comment value1 = v1 value2 = v2 date = 2006-Apr-18 I tried various keys in the find including: ptree::const_iterator ci = pt.find("User Data.value2"); All returned null iterator. So there are two problems. From the docs it wasn't clear what syntax to use with find and it doesn't appear to work. 2) Iterators are underspecified It was totally unclear to me from the docs what the interface to iterators returned from the interface. I finally found a test or example and copied. std::cout << ci->second.data() << std::endl; 3) Write operations are not 'whitespace' and comment preserving To me this is a critical point not fulfilled by the library. Most real-world configuration files have comments. If the application needs to read/write the files it needs to maintain the comments as human written. In my experiment if I start with this: //test.ini [User Data] ; This is a comment value1 = v1 value2 = v2 date = 2006-Apr-18 After I write it back I get: [User Data] value1=v1 value2=v2 date=2006-Apr-18 I realize that this requirement might increase the resource usage of property_tree, so I'm fine with this being an optionally enabled feature. Jeff

Hi Jeff,
1) Find is under documented / doesn't work for me [...] I tried various keys in the find including:
ptree::const_iterator ci = pt.find("User Data.value2");
find() is a standard-container type function, it does not accept paths, only single keys. Use get_child instead. This is excerpt from docs on find(): <snip> Finds direct child stored at specified key. If there is more than one child with the same key, the first one is returned. Time complexity is O(log n). Keys equivalence is tested with a predicate supplied as basic_ptree template parameter. If child is not found, returns end(). Both const and non-const versions are provided. To find non-direct children use get_child function. <snip> I hope it states clearly enough it only finds direct children, and also gives you a colored, underlined link to a function which works with paths (get_child). Btw. all std-container functions work only on direct children. Only extra functions get/put etc. recongnize paths. This is by design. As far as I remember, it is nowhere directly said in docs, which definitely is a nasty omission.
2) Iterators are underspecified
It was totally unclear to me from the docs what the interface to iterators returned from the interface. I finally found a test or example and copied.
std::cout << ci->second.data() << std::endl;
Maybe the problem is that it is understated that basic_ptree is a standard container? All containers have value_type member that defines what is stored, and ptree is no exception. You only have to look it up in reference. Additionally there's whole section "Property tree as a container" which talks, among other things, about value type stored in ptree. I'm open to suggestions, please let me know what you think should be added and where to make it more clear.
3) Write operations are not 'whitespace' and comment preserving
To me this is a critical point not fulfilled by the library. Most real-world configuration files have comments. If the application needs to read/write the files it needs to maintain the comments as human written.
I'm afraid I must disagree with you. What is manipulated by property_tree is data. Data in case of INI file are sections, keys and values. If you want comments, you must use another format that explicitly has support for comments. Ptree at the moment supports only one such format: XML. That said, how would you imagine data about comments and layout should be stored/manipulated? Should the library distinguish between tabs and spaces, count lines between entries etc? Would it be possible to manipulate this metadata on par with normal data? What would be use of that if you deleted/added keys in runtime. For example: [Fruit] ;Citrus 1=Orange 2=Lemon ; Tropical 3=Banana Now you add "Grapefruit". How would you know you should add it after Lemon and before Banana, because otherwise it would violate your comments? There's a whole pandora box of problems. Should comments be attached to nodes? What about comments in between nodes? What about whitespace, should it be attached to nodes before it, or after it, or maybe split in half? You must also remember we are now using INI files as example. This format has really simple layout. What about JSON, where you can have multiple keys in one line, and many more variations? I think preserving layout in presence of additions/deletions or even node modifications is impractical. In case there are no modifications, there is no need to save. Even if there was, you could just store contents of the file along with the tree. Please let me know if you have idea on how whitespace/layout/comments could be preserved easily. If there is no such way I can only say this functionality does not belong to this library. It's rather text editor job. One solution to this problem that comes to my mind is to create another parser, called e.g. meta_ini_parser, that would not only extract real data, but everything. Every comment would have its own node. Every continuous stream of whitespace would have its node as well. Actually this is quite possible to do, but the tree that you got from it would not be extremely practical. Kind regards, Marcin

On Sat, 22 Apr 2006 19:41:09 +0100, Marcin Kalicinski wrote
Hi Jeff,
1) Find is under documented / doesn't work for me [...] I tried various keys in the find including:
ptree::const_iterator ci = pt.find("User Data.value2");
find() is a standard-container type function, it does not accept paths, only single keys.
Use get_child instead. This is excerpt from docs on find():
<snip> Finds direct child stored at specified key. If there is more than one child with the same key, the first one is returned. Time complexity is O(log n). Keys equivalence is tested with a predicate supplied as basic_ptree template parameter. If child is not found, returns end(). Both const and non-const versions are provided. To find non-direct children use get_child function. <snip>
I hope it states clearly enough it only finds direct children, and also gives you a colored, underlined link to a function which works with paths (get_child). Btw. all std-container functions work only on direct children. Only extra functions get/put etc. recongnize paths. This is by design. As far as I remember, it is nowhere directly said in docs, which definitely is a nasty omission.
Ok, that would certainly clarify things. I would argue for an example of using find versus get_child -- that would make it really clear.
2) Iterators are underspecified
It was totally unclear to me from the docs what the interface to iterators returned from the interface. I finally found a test or example and copied.
std::cout << ci->second.data() << std::endl;
Maybe the problem is that it is understated that basic_ptree is a standard container?
All containers have value_type member that defines what is stored, and ptree is no exception. You only have to look it up in reference. Additionally there's whole section "Property tree as a container" which talks, among other things, about value type stored in ptree. I'm open to suggestions, please let me know what you think should be added and where to make it more clear.
Well I sorta got that, but there are all kinds of standard containers. It wasn't clear that the iterators model associative container iterators. And again, there isn't a simple example in the docs so I had to hunt one down in the test/examples.
3) Write operations are not 'whitespace' and comment preserving
To me this is a critical point not fulfilled by the library. Most real-world configuration files have comments. If the application needs to read/write the files it needs to maintain the comments as human written.
I'm afraid I must disagree with you. What is manipulated by property_tree is data. Data in case of INI file are sections, keys and values. If you want comments, you must use another format that explicitly has support for comments. Ptree at the moment supports only one such format: XML.
Hmm the INI parser seems to support skipping of comments, don't see why it can't take the same approach as XML? And are you saying that the comments would be preserved in an XML config?
That said, how would you imagine data about comments and layout should be stored/manipulated? Should the library distinguish between tabs and spaces, count lines between entries etc? Would it be possible to manipulate this metadata on par with normal data? What would be use of that if you deleted/added keys in runtime. For example:
[Fruit] ;Citrus 1=Orange 2=Lemon ; Tropical 3=Banana
Now you add "Grapefruit". How would you know you should add it after Lemon and before Banana, because otherwise it would violate your comments? There's a whole pandora box of problems. Should comments be attached to nodes? What about comments in between nodes? What about whitespace, should it be attached to nodes before it, or after it, or maybe split in half?
You must also remember we are now using INI files as example. This format has really simple layout. What about JSON, where you can have multiple keys in one line, and many more variations? I think preserving layout in presence of additions/deletions or even node modifications is impractical. In case there are no modifications, there is no need to save. Even if there was, you could just store contents of the file along with the tree.
Please let me know if you have idea on how whitespace/layout/comments could be preserved easily. If there is no such way I can only say this functionality does not belong to this library. It's rather text editor job.
I'm running out the door so I'll have to answer your questions later, but I'll say that I think the adding a new node is pretty easy. It doesn't exist so there is nothing to preserve. As for comments, they need to fit into the property_tree in some way so that they can be written back. I could probably live without perfect whitespace preservation, but no comments on write is a dealbreaker for me. Reasonably complex configs always need comments for the human. If those are lost then all my work creating a nicely commented config is wiped out and I can't use your library in those cases. Jeff

Hmm the INI parser seems to support skipping of comments, don't see why it can't take the same approach as XML? And are you saying that the comments would be preserved in an XML config?
The reason why comments are preserved in XML parser is because XML standard supports comment nodes. They are part of the data.
I'm running out the door so I'll have to answer your questions later, but I'll say that I think the adding a new node is pretty easy. It doesn't exist so there is nothing to preserve. As for comments, they need to fit into the property_tree in some way so that they can be written back. I could probably live without perfect whitespace preservation, but no comments on write is a dealbreaker for me.
I understand that. Comments seem to be much easier to handle than whitespace, but still I think there are plenty of corner cases that we would need to consider, even in such a simple format as INI. For example, how about multiline comments, should they be concatenated into one node, or should each separate line get its own node? How about comments like that: [Section] key1 = value1 ;comment1 ;comment2 key2 = value2 Where would we store information that comment1 uses the same line as key1? Without it after save it would go to the next line, and be incorrectly associated (by human) with key2. One solution might be to create a mapping between comments and non-comment nodes, so that every comment is associated with a node. During save we will use this information to appropriately position comments. This mapping would be in form of another set of nodes stored somewhere in the tree. Best regards, Marcin

"Marcin Kalicinski" wrote:
I understand that. Comments seem to be much easier to handle than whitespace, but still I think there are plenty of corner cases that we would need to consider, even in such a simple format as INI. For example, how about multiline comments, should they be concatenated into one node, or should each separate line get its own node? How about comments like that:
[Section] key1 = value1 ;comment1 ;comment2 key2 = value2
Where would we store information that comment1 uses the same line as key1? Without it after save it would go to the next line, and be incorrectly associated (by human) with key2.
The general rule is to preserve as much of comments/whitespace layout as only possible. It is not just for human eyes but it gets handy when different versions of the file are compared. The tool should make guesses to which data item a comment belongs - for INI the rules are quite straightforward (I wrote such a tool once). The comments should not be accessible from the application - they are comments and not something application uses. However, it may be useful to be able to /add/ a comment (as "value added on 2006/01/23") programatically. /Pavel

Marcin Kalicinski wrote:
[Section] key1 = value1 ;comment1 ;comment2 key2 = value2
Where would we store information that comment1 uses the same line as key1? Without it after save it would go to the next line, and be incorrectly associated (by human) with key2.
Code a node have a comments field? So key1 node would contain a value of 'value1' and comments of 'comment1' 'comment2' would belong to an un-named node with no value etc. Cheers Russell
participants (4)
-
Jeff Garland
-
Marcin Kalicinski
-
Pavel Vozenilek
-
Russell Hind