Overlap of property_trees with other boost libraries

The question was raised of possible overlap between the proposed property_tree library and the existing serialization and program_options libraries. My view on this matter: The primary purpose of the program_options library is to provide an input mechanism for simple (flat) data, specifically for the purpose of providing configuration options to programs that are configured via the command line or config files. The need for this library arises because those mechanisms for entering configuration options are common in command-line tools. The primary purpose of the serialization library is to provide a way to convert live C++ data structures into a form that can be transported to another program (or to the same program at a later time) and there reconstituted into a clone of the original data structure. The need for this library arises because the need for communication with other processes and storage of application data outside of the application process is common across many problem domains. The fact that the serialization library in its current form provides only conversions to byte streams is not important. This is just an implementation detail that arises from the fact that most common forms of interprocess communication and external storage use byte streams, and so conversion of C++ data to byte streams easily addresses both of those problems. Let us assume there there exists a data representation other than a byte stream that is suitable for external data representation or interprocess communication. Such a representation would be a reasonable archive format for the serialization library. A user of the serialization library could (at least in principle) take advantage of the strengths of this data representation. Regardless of the data representation used by the serialization library, one must treat the serialized format as opaque; there is a substantial amount of metadata embedded in it, and while the format can be made to be human-readable, it's predictably difficult to make it human-editable and even harder to make it human-writable. It it not possible to eliminate this metadata and retain other features provided by the serialization library (such as pointer tracking and class versioning). Now consider those design goals and constraints: the program_options library is an input method only; the serialization library is an input and output method, with features that prevent the data format from being human-editable. It is clear that in any case in which a human-editable input and output file format is needed, neither of these libraries will satisfy the requirement. As it turns out, the domain of persistent configuration data edited both by humans and by programs (such as configuration data of most GUI programs) is such a case, and that the need for persistent user-editable configuration data is not addressed by either program_options or serialization. Enter property_trees. This library attempts to fill this need (persistent user-editable data) and as a result overlaps with some functionality already provided by serialization and program_options. If all you need is data input, you could use either property_trees or program_options, as long as your data structures are simple enough to be representable by program_options. If you need input and output, you could use either serialization and property_trees, with different data format and code complexity tradeoffs. Here are some questions that I think need to be answered in order to decide whether this overlap is a good idea: 1) Is a property tree in-memory data structure a valuable abstraction, regardless of how it is serialized? I think the answer to this is yes. Representation of a hierarchy of key-value pairs is useful across the board, whether you are writing a compiler (symbol tables) or a GUI FTP client (cached per-directory data). 2) Is the serialization of such a data structure to formats less rich than those needed by the serialization library useful? I think the answer to this is yes. The obvious motivation for this is in serialization of configuration data to a user-editable form. 3) Could such serialization be done by the serialization library? I believe so. Provided that the serialization library has the notion of an archive that doesn't allow for class versioning or pointer tracking and may have limited ability to represent hierarchical data, many common configuration file formats (including Windows INI files, Mac OS X property lists, UNIX key<delimiter>value formats) can be produced by the serialization library. 4) Should such serialization be done by the serialization library? I believe so. The serialization library already has an interface for defining a way to convert C++ objects to an external representation, and I think it would be a good idea to maintain one interface for that purpose in boost. 5) Could a serialized form of property trees be used to input program options? Yes. So, here is one way that I see to resolve the property_trees conundrum: 1) A library providing an in-memory property tree abstraction should be submitted. 2) A library providing serialization of property trees to some common configuration file formats should be submitted. This could be part of the same library as in part 1. As part of this work, serialization library may need to be modified to allow for archive formats that cannot represent the full range of object and class tracking metadata. 3) Ideally, this should result in the ability to serialize arbitrary C++ objects directly to a configuration file format -- in-memory data shouldn't have to be in the explicit form of a property tree in order to be serializable to a configuration file. 4) Property tree file formats should be allowed as inputs to the program options library (and the existing program options configuration file format should be subsumed by the library from part 2 above). This would result in orthogonal components for each of the following responsibilities: 1) In-memory property tree representation 2) External property tree representation 3) Conversion of C++ objects, including in-memory property tree representation, to external property tree representation 4) Interpretation of property trees as command line program options 5) Parsing of command line program options from argv For what it's worth, Mac OS X has a native property tree API, and I have implemented a serialization archive format that produces such a property tree from C++ data structures using the serialization library. My understanding of the various abstractions and their interactions is based on that experience. Ben -- I changed my name: <http://periodic-kingdom.org/People/NameChange.php>

Ben Artin wrote:
The question was raised of possible overlap between the proposed property_tree library and the existing serialization and program_options libraries. My view on this matter:
There is already a thread for this topic. Please post replies there. Thanks -Thorsten

Ben Artin <macdev@artins.org> writes:
So, here is one way that I see to resolve the property_trees conundrum:
1) A library providing an in-memory property tree abstraction should be submitted.
2) A library providing serialization of property trees to some common configuration file formats should be submitted. This could be part of the same library as in part 1. As part of this work, serialization library may need to be modified to allow for archive formats that cannot represent the full range of object and class tracking metadata.
3) Ideally, this should result in the ability to serialize arbitrary C++ objects directly to a configuration file format -- in-memory data shouldn't have to be in the explicit form of a property tree in order to be serializable to a configuration file.
4) Property tree file formats should be allowed as inputs to the program options library (and the existing program options configuration file format should be subsumed by the library from part 2 above).
This might be the best approach in the long run. However, IMO: a. It's unlikely to happen unless this library is accepted into Boost, because it requires too much coordination among libraries with too much speculation: the serialization and program options authors are unlikely to have time to rework their libraries to support another library that *might*, someday, be accepted. b. It shouldn't be a prerequisite for inclusion. Primarily, we review and accept *interfaces* and *documentation*, and only secondarily the implementation details. Once accepted, a library's implementation can be, and often is, changed. I'm not coming out in favor of or against acceptance (I haven't looked at the library in enough detail yet), but IMO we should be asking ourselves whether it has the right interfaces for a library in its domain. If we like the interfaces, and the implementation gets the job done, we should accept. Otherwise, ... reject. -- Dave Abrahams Boost Consulting www.boost-consulting.com

In article <ulktwv10f.fsf@boost-consulting.com>, David Abrahams <dave@boost-consulting.com> wrote:
Ben Artin <macdev@artins.org> writes:
So, here is one way that I see to resolve the property_trees conundrum:
This might be the best approach in the long run. However, IMO:
a. It's unlikely to happen unless this library is accepted into Boost, because it requires too much coordination among libraries with too much speculation: the serialization and program options authors are unlikely to have time to rework their libraries to support another library that *might*, someday, be accepted.
b. It shouldn't be a prerequisite for inclusion.
I agree on both counts. My main purpose was to clarify (IMO, anyway) some of the issues concerning scope overlap, and propose a general long-term direction -- in part to counterbalance the review discussion which has (IMO, again) had a very strong near-term focus. Ben -- I changed my name: <http://periodic-kingdom.org/People/NameChange.php>
participants (3)
-
Ben Artin
-
David Abrahams
-
Thorsten Ottosen