
"Thorsten Ottosen" <thorsten.ottosen@dezide.com> wrote in message news:44453A4C.2040302@dezide.com... : * What is your evaluation of the design? First I apologize for having no awareness of the lib's history. This is also my first review [ I usually don't have time for that, does it suffice to say that I have 4 small kids? ] I will highlight with ** (on the beginning of the line) each question/point for which I would appreciate an answer or discussion or action. I'd say at first glance that it felt a bit XML-centered [ which is neither good nor bad, but the reflection of some design choices, and wanting to be very general ] My vision of a property-tree is more of a "node" base class with three subclasses: - Leaf -- which stores actual data ~= string - Array -- which includes an array ~= vector<node> - Record/Dictionary/Object -- named fields ~= map<string,node> This is how most scripting languages support as building block for any data structure -- and why you find this at the core of JSON ( http://www.json.org/, probably a must-read for any reviewer). I dislike the fact that any 'tree' has a default value, that 'leaves' are trees, that a node can be an hybrid of an arrray and a record ( contain both unnamed or other repeated fields, and structured named fields). I prefer a design that imposes a minimal level of validation of the structure of the data (like I prefer xhtml to the excessively 'loose' original html) -- I think that the result is more maintainable. An example issue: ** [Based on Tutorial] what happens when I call pt.get<std::string>("debug.filename"); if there is two 'filename' fields under 'debug' ? This should generate a failure (throw an exception), and this should be explained in the documentation. Also, I would have preferred to have a better built-in support for arrays. As it is, one can use a "path" to directly access a child -- except if an array is involved. In my use of related libraries, arrays are very common, either as object lists, or as "leaves" (e.g. vector coords). I find this asymmetry (support for Records > Arrays) disturbing. Regarding iteration: In similar libraries I have been using, I liked having support for 'streaming' data or visiting the whole tree as a whole. Basically being able to flatten the 'tree' into a sequence of data fields. Having such an infrastructure at the core of the library makes it trivial to implement a number of i/o formats - see the reference to the DataTree design discussed below. This will also simplify cross-tree operations such as merge/intersect/override -- which you suggest as extensions. : * What is your evaluation of the implementation? Independently of the above design decisions, there are some things I would do differently for the library's interface: ** I would prefer the member-function interface to be minimal. ptree reminds me much of std::string, with too many overloads. Overall, a ptree is nothing more that a collection of named sub-tree, and its 'own' data field. I would only include getters and setters for these two fields, and leave all the rest to non-member functions, because different users may want to implement different interfaces. Keep things orthogonal and avoid multiplying overloads. ** 'separator' should not be a data member. This mixed usage of default/non-default separators only feels like a mess. Choice of separator is an access/interface method choice, not a property of the stored data. ** 'find-by-path' functions would be a set of non-member functions. Implement them with a default-separator choice (or make an internal but documented version that takes the separator as a param for those who want to customise their separator and use their own inline-forwarder functions). ** By the way, why not include support for arrays in: your paths - for example: get(pt,"debug.modules[0]"); ** the get/get_default/get_optional should be a *single* set of non-member functions applying to a ptree, or ideally maybe an optional<ptree&>. ** I have a similar library that provided template-overloads of node >> T and node << T , which can be a convenient notation for conversion/extraction. Specializing these template overloads is how you define custom converters for your built-in types. ** There should be a way to specialize get/put functions in a way that translate e.g. vec3f (3 float coordinates) to/from a ptree ( which I will implement as a 3-value array). Or from a struct to/from a ptree? How do I do implement this? The least is to have a plan defined for supporting this. The above would make the interface much cleaner IMO. It offers increased flexibility/customizability to end-users. I don't think that this has to impair usability. If I try to rewrite the "tutorial" based on the interface I envision, it could look like: reading: using boost::property_tree::ptree; ptree const pt = read_xml( filename ); //BTW: Should a value-returning version of read_xml() // not be provided? Some care more for convenience // than performance. And hopefully we will have // R-value references soon... m_file = get<std::string>( access(pt,"debug.filename") ); //or: get(pt,"debug.filename") >> m_file; //or: pt % "debug.filename" >> m_file; // too funky ?? m_level = tryget( access(pt,"debug.level") , 0 ); // for ex. writing: // simple overall anyway... put( pt, "debug.filename", m_file ); // or whatever... : * What is your evaluation of the documentation? Besides the error-handling for ambiguous paths mentioned early on, I would like to see: ** a description of the mechanism used to convert ordinal values into 'data'. I assume this is simply ostringstream, but I haven't seen this specified anywhere. ** Can you provide examples of the representation of common types (e.g. bool(true/false), char, int, double(precision?) ) ** If I want to implement a custom 'translator' (value to text and text to value) for a custom type (e.g. "BigInt"), what template functions do I need to specialize? Can I specialize this independently from istream/ostream ? [ and again, can I define a struct<->ptree conversion ? ] : * What is your evaluation of the potential usefulness : of the library? Very high, obviously. As it is, this lib is already much better (IMO) than many other similar libraries I've seen (xml and other). : * Did you try to use the library? With what compiler? : Did you have any problems? No, just reviewed the docs and sources. Too little time, sorry. : * How much effort did you put into your evaluation? : A glance? A quick reading? In-depth study? In-depth look for few hours. : * Are you knowledgeable about the problem domain? I have personally written a similar C++ library about 12 years ago, which I have refined and iterated several times. I had been calling it "DataTree" (but PrepertyTree is a much better name). The three companies I previoiusly worked at still use it... I even once started writing a version of this library in C, which is available at http://ivan.vecerina.com/code/datatree/ , and illustrates some of the core concepts (but obviously not the interface itself, which can be much nicer with C++ -- beware of the ugly C idioms in this posted code...). I would say that the C++ version of DataTree has pros and cons compared to ptree. [ sorry, I don't see how to post the C++ version as it relies on a number of proprietary libraries... ] Some of the key design differences include: - A Tree streaming or traversal system is built-in A tree can be transferred as a sequence of "tree steps". A tree can be built from such a sequence, or traversed as such a sequence (using a visitor pattern). A 'validator' class is able to verify the consistency of the sequence of tree steps (on-the-fly). This makes it *very easy* to create a reader/writer for a number of file formats (you don't need to check the grammar/consistency in each parser, it is all done in one place).
This would be a nice-to-have; it eventually could be implemented later as an internal add-on to the current design.
- 3 types of nodes are distinguished (leaf,array,record) Basically, these are 3 subclasses of a common 'node' base class. The fields of a record are (unique and ) sorted by name -- which might be better for performance if large records are used. I prefer having this more restrictive distinction between leaves, arrays, and records. But being more 'loose' also helps ptree support closer compatibility with some formats (Windows registry, XML). Is this desirable? YMMV.
Having either an std::map<string,node> or an std::vector<node> could afford better performance, or be easier to implement. Do potiential users of ptree want this additional flexibility? I guess so. Should it be possible to check that the structure of a ptree instance obeys stricter structural rules? Maybe, but not critical, probably
- all node types use built-in reference counting (you can therefore make shallow copies of a tree if desired, and sub-trees can be passed to other objects that store them without copying).
This made sense with polymorphic base class approach I took, but ptree's current approach is good as it is.
- I implemented a couple of binary and scrambled file formats too. This *is* convenient when you don't want users to mess with a configuration file. Checksums are also a requirement when storing potentially critical data (I do medical stuff).
Adding new formats to ptree is easy for end-users to do.
- I don't use iostreams for I/O but always byte-oriented streams. I consider the internal format to be fixed and byte-oriented. I also use memory-mapping and platform-specific tricks for most of the file I/O, because performance mattered in my applications. The encoding from C++ values to bytes is determined by the top-level value i/o functions. The default converters use stringstream, but they can be specialized at well for custom types. Users may also use a different set of converters (it would be possible to use binary formatters instead of text).
ptree's approach is more conform to the boost approach.
- With the above choices, the core library is not templated. Only the value<->node conversions are done in template code, and only this can be specialized. Implementation details are hidden, compile times are faster.
ptree's approach is more conform to the boost approach. ( LOL )
- like 'archives'(IIRC) in boost::serialize, I have a system allowing to write a single template function that will either export or import values to a tree.
This would be an interesting future add-on for ptree.
- I never bothered to implement path-based access to heirs. I always wanted to do it. But in practice, I have never needed it. All in all, my C++ DataTree library design is more 'classic': more OOD, less templates (except for interface/usability magic), more performance-oriented (did matter for me). But less general. : And finally, every review should answer this question: : : * Do you think the library should be accepted as a Boost library? : Be sure to say this explicitly so that your other comments : don't obscure your overall opinion. YES - but with requests for modifications. ptree is superior to most efforts I have seen. As expressed in my above comments (**), I feel that the interface needs to be made leaner, to improve flexibility/customizability. I haven't gone into the details of how to break-up the orthogonal aspects of the interface ( 3-kinds of get, child access, path access). But I would strongly recommend to explore & do this, as it will matter to make the library more 'evolvable'. In the same line, I also would like to see a clear documentation of how the value<->tree conversions are made, and how they can be specialized. My gripe regarding the 'too loose' structure is a personal peeve, I understand well that many would not care. : In particular, consider if you can answer the following questions: : : - was the library suitable for your daily xml-tasks? If not, what was : missing? It's good that XML is supported as a text format too. But honestly I don't care. : - was the library's performance good enough? If not, can you suggest : improvements? Performance should be ok for most users. Just as for iostreams, it might be a gripe that will come up periodically. But it shouldn't be a showstopper. : - was the library's design flexible enough? If not, how would you : suggest it should be redesigned to broaden its scope of use? Too flexible in my personal opinion. This is my very first review, and this is the end of a long work day. I don't have the strengh to review what I wrote -- please forgive me. Kudos to Marcin Kalicinski, and to any others who have contributed to this excellent work. Only because it can be so important, this library is worth perfecting... Kind regards, Ivan -- http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com