Re: [boost] [Boost Review] Property Tree Library

19 Apr 2006

      "Thorsten Ottosen" <thorsten.ottosen@dezide.com> wrote in message
news:44453A4C.2040302@dezide.com...
:      * What is your evaluation of the design?

First I apologize for having no awareness of the lib's history.
This is also my first review [ I usually don't have time
for that, does it suffice to say that I have 4 small kids? ]

I will highlight with ** (on the beginning of the line)
each question/point for which I would appreciate an
answer or discussion or action.

I'd say at first glance that it felt a bit XML-centered
[ which is neither good nor bad, but the reflection of
 some design choices, and wanting to be very general ]

My vision of a property-tree is more of a "node" base class
with three subclasses:
 - Leaf -- which stores actual data ~= string
 - Array -- which includes an array ~= vector<node>
 - Record/Dictionary/Object -- named fields ~= map<string,node>
This is how most scripting languages support as building
block for any data structure -- and why you find this at
the core of JSON ( http://www.json.org/, probably a must-read
for any reviewer).

I dislike the fact that any 'tree' has a default value,
that 'leaves' are trees, that a node can be an hybrid of
an arrray and a record ( contain both unnamed or other
repeated fields, and structured named fields).
I prefer a design that imposes a minimal level of
validation of the structure of the data (like I prefer
xhtml to the excessively 'loose' original html) --
I think that the result is more maintainable.

An example issue:

** [Based on Tutorial] what happens when I call
     pt.get<std::string>("debug.filename");
  if there is two 'filename' fields under 'debug' ?
  This should generate a failure (throw an exception),
   and this should be explained in the documentation.

Also, I would have preferred to have a better built-in
support for arrays.  As it is, one can use a "path" to
directly access a child -- except if an array is involved.
In my use of related libraries, arrays are very common,
either as object lists, or as "leaves" (e.g. vector coords).
I find this asymmetry (support for Records > Arrays) disturbing.

Regarding iteration: In similar libraries I have been using,
I liked having support for 'streaming' data or visiting the
whole tree as a whole.  Basically being able to flatten
the 'tree' into a sequence of data fields.
Having such an infrastructure at the core of the library
makes it trivial to implement a number of i/o formats - see
the reference to the DataTree design discussed below.
This will also simplify cross-tree operations such
as merge/intersect/override -- which you suggest as extensions.

:      * What is your evaluation of the implementation?

Independently of the above design decisions, there are some
things I would do differently for the library's interface:

** I would prefer the member-function interface to be minimal.
ptree reminds me much of std::string, with too many overloads.
Overall, a ptree is nothing more that a collection of named
sub-tree, and its 'own' data field.
I would only include getters and setters for these two fields,
and leave all the rest to non-member functions, because
different users may want to implement different interfaces.
Keep things orthogonal and avoid multiplying overloads.

** 'separator' should not be a data member. This mixed usage
   of default/non-default separators only feels like a mess.
   Choice of separator is an access/interface method choice,
   not a property of the stored data.

** 'find-by-path' functions would be a set of non-member
   functions.  Implement them with a default-separator
   choice (or make an internal but documented version
   that takes the separator as a param for those who
   want to customise their separator and use their
   own inline-forwarder functions).
** By the way, why not include support for arrays in:
   your paths - for example: get(pt,"debug.modules[0]");

** the get/get_default/get_optional should be a *single*
   set of non-member functions applying to a ptree,
   or ideally maybe an  optional<ptree&>.

** I have a similar library that provided template-overloads
   of   node >> T   and   node << T   , which can be
   a convenient notation for conversion/extraction.
   Specializing these template overloads is how you define
   custom converters for your built-in types.

** There should be a way to specialize get/put functions
   in a way that translate e.g. vec3f (3 float coordinates)
   to/from a ptree  ( which I will implement as a 3-value
   array).  Or from a struct to/from a ptree?
   How do I do implement this?
   The least is to have a plan defined for supporting this.

The above would make the interface much cleaner IMO.
It offers increased flexibility/customizability to
end-users.  I don't think that this has to impair
usability.
If I try to rewrite the "tutorial" based on the interface
I envision, it could look like:

reading:

    using boost::property_tree::ptree;
    ptree const pt = read_xml( filename );
     //BTW:  Should a value-returning version of read_xml()
     //  not be provided?   Some care more for convenience
     //  than performance. And hopefully we will have
     //  R-value references soon...

    m_file = get<std::string>( access(pt,"debug.filename") );
    //or:  get(pt,"debug.filename")  >>  m_file;
    //or:  pt % "debug.filename" >> m_file;   // too funky ??

    m_level = tryget( access(pt,"debug.level") , 0 ); // for ex.

writing:  // simple overall anyway...
    put( pt, "debug.filename", m_file );  // or whatever...

:      * What is your evaluation of the documentation?

Besides the error-handling for ambiguous paths mentioned early
on, I would like to see:

** a description of the mechanism used to convert ordinal
   values into 'data'. I assume this is simply ostringstream,
   but I haven't seen this specified anywhere.

** Can you provide examples of the representation of
   common types (e.g. bool(true/false), char, int,
   double(precision?) )

** If I want to implement a custom 'translator' (value to
   text and text to value) for a custom type (e.g. "BigInt"),
   what template functions do I need to specialize?
   Can I specialize this independently from istream/ostream ?
   [ and again, can I define a struct<->ptree conversion ? ]

:      * What is your evaluation of the potential usefulness
:        of the library?
Very high, obviously.
As it is, this lib is already much better (IMO) than many
other similar libraries I've seen (xml and other).

:      * Did you try to use the library? With what compiler?
:        Did you have any problems?
No, just reviewed the docs and sources.  Too little time, sorry.

:      * How much effort did you put into your evaluation?
:        A glance? A quick reading? In-depth study?
In-depth look for few hours.

:      * Are you knowledgeable about the problem domain?

I have personally written a similar C++ library about 12 years ago,
which I have refined and iterated several times.  I had been calling
it "DataTree"  (but PrepertyTree is a much better name).
The three companies I previoiusly worked at still use it...

I even once started writing a version of this library in C,
which is available at  http://ivan.vecerina.com/code/datatree/ ,
and illustrates some of the core concepts (but obviously not the
interface itself, which can be much nicer with C++ -- beware
of the ugly C idioms in this posted code...).

I would say that the C++ version of DataTree has pros and cons
compared to ptree.   [ sorry, I don't see how to post the C++
version as it relies on a number of proprietary libraries... ]
Some of the key design differences include:

 - A Tree streaming or traversal system is built-in
 A tree can be transferred as a sequence of "tree steps". 
 A tree can be built from such a sequence, or traversed as such
 a sequence (using a visitor pattern). A 'validator' class is able
 to verify the consistency of the sequence of tree steps (on-the-fly).
 This makes it *very easy* to create a reader/writer for a number
 of file formats (you don't need to check the grammar/consistency
 in each parser, it is all done in one place).
...
...
This would be a nice-to-have; it eventually could be implemented
later as an internal add-on to the current design.
- 3 types of nodes are distinguished (leaf,array,record)
   Basically, these are 3 subclasses of a common 'node' base class.
   The fields of a record are (unique and ) sorted by name -- which
   might be better for performance if large records are used.
   I prefer having this more restrictive distinction between
   leaves, arrays, and records.  But being more 'loose'
   also helps ptree support closer compatibility with some
   formats (Windows registry, XML).
   Is this desirable?  YMMV.
...
...
Having either an std::map<string,node> or an std::vector<node>
   could afford better performance, or be easier to implement.
   Do potiential users of ptree want this additional flexibility?
   I guess so.
   Should it be possible to check that the structure of a ptree
   instance obeys stricter structural rules?
   Maybe, but not critical, probably
- all node types use built-in reference counting (you can
   therefore make shallow copies of a tree if desired, and
   sub-trees can be passed to other objects that store them
   without copying).
...
...
This made sense with polymorphic base class approach I took,
   but ptree's current approach is good as it is.
- I implemented a couple of binary and scrambled file formats too.
   This *is* convenient when you don't want users to mess with
   a configuration file.  Checksums are also a requirement when
   storing potentially critical data (I do medical stuff).
...
...
Adding new formats to ptree is easy for end-users to do.
- I don't use iostreams for I/O but always byte-oriented streams.
 I consider the internal format to be fixed and byte-oriented.
 I also use memory-mapping and platform-specific tricks for most
 of the file I/O, because performance mattered in my applications.
 The encoding from C++ values to bytes is determined by the top-level
 value i/o functions.  The default converters use stringstream,
 but they can be specialized at well for custom types.
 Users may also use a different set of converters
 (it would be possible to use binary formatters instead of text).
...
...
ptree's approach is more conform to the boost approach.
- With the above choices, the core library is not templated.
   Only the value<->node conversions are done in template code,
   and only this can be specialized.
   Implementation details are hidden, compile times are faster.
...
...
ptree's approach is more conform to the boost approach. ( LOL )
- like 'archives'(IIRC) in boost::serialize, I have a system
   allowing to write a single template function that will
   either export or import values to a tree.
...
...
This would be an interesting future add-on for ptree.
- I never bothered to implement path-based access to heirs.
   I always wanted to do it. But in practice, I have never needed it.

All in all, my C++ DataTree library design is more 'classic':
more OOD, less templates (except for interface/usability magic),
more performance-oriented (did matter for me).
But less general.

: And finally, every review should answer this question:
: 
:      * Do you think the library should be accepted as a Boost library?
:        Be sure to say this explicitly so that your other comments
:        don't obscure your overall opinion.

YES - but with requests for modifications.

ptree is superior to most efforts I have seen.
As expressed in my above comments (**), I feel that the interface
needs to be made leaner, to improve flexibility/customizability.

I haven't gone into the details of how to break-up the orthogonal
aspects of the interface ( 3-kinds of get, child access, path
access).  But I would strongly recommend to explore & do this,
as it will matter to make the library more 'evolvable'.

In the same line, I also would like to see a clear documentation
of how the value<->tree conversions are made, and how they can
be specialized.

My gripe regarding the 'too loose' structure is a personal peeve,
I understand well that many would not care.

: In particular, consider if you can answer the following questions:
: 
: - was the library suitable for your daily xml-tasks? If not, what was 
: missing?

It's good that XML is supported as a text format too.
But honestly I don't care.

: - was the library's performance good enough? If not, can you suggest 
: improvements?

Performance should be ok for most users.  Just as for iostreams,
it might be a gripe that will come up periodically.  But it shouldn't
be a showstopper.

: - was the library's design flexible enough? If not, how would you 
: suggest it should be redesigned to broaden its scope of use?

Too flexible in my personal opinion.

This is my very first review, and this is the end of a long work day.
I don't have the strengh to review what I wrote -- please forgive me.

Kudos to Marcin Kalicinski, and to any others who have contributed
to this excellent work.  Only because it can be so important,
this library is worth perfecting...

Kind regards,
Ivan
-- 
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Re: [boost] [Boost Review] Property Tree Library

Ivan Vecerina