Overlap of property_trees with other boost libraries

22 Apr 2006

      The question was raised of possible overlap between the proposed property_tree 
library and the existing serialization and program_options libraries. My view 
on this matter:

The primary purpose of the program_options library is to provide an input 
mechanism for simple (flat) data, specifically for the purpose of providing 
configuration options to programs that are configured via the command line or 
config files. The need for this library arises because those mechanisms for 
entering configuration options are common in command-line tools. 

The primary purpose of the serialization library is to provide a way to convert 
live C++ data structures into a form that can be transported to another program 
(or to the same program at a later time) and there reconstituted into a clone of 
the original data structure. The need for this library arises because the need 
for communication with other processes and storage of application data outside 
of the application process is common across many problem domains.

The fact that the serialization library in its current form provides only 
conversions to byte streams is not important. This is just an implementation 
detail that arises from the fact that most common forms of interprocess 
communication and external storage use byte streams, and so conversion of C++ 
data to byte streams easily addresses both of those problems.

Let us assume there there exists a data representation other than a byte stream 
that is suitable for external data representation or interprocess communication. 
Such a representation would be a reasonable archive format for the serialization 
library. A user of the serialization library could (at least in principle) take 
advantage of the strengths of this data representation.

Regardless of the data representation used by the serialization library, one 
must treat the serialized format as opaque; there is a substantial amount of 
metadata embedded in it, and while the format can be made to be human-readable, 
it's predictably difficult to make it human-editable and even harder to make it 
human-writable. It it not possible to eliminate this metadata and retain other 
features provided by the serialization library (such as pointer tracking and 
class versioning). 

Now consider those design goals and constraints: the program_options library is 
an input method only; the serialization library is an input and output method, 
with features that prevent the data format from being human-editable.

It is clear that in any case in which a human-editable input and output file 
format is needed, neither of these libraries will satisfy the requirement. 

As it turns out, the domain of persistent configuration data edited both by 
humans and by programs (such as configuration data of most GUI programs) is such 
a case, and that the need for persistent user-editable configuration data is not 
addressed by either program_options or serialization.

Enter property_trees. This library attempts to fill this need (persistent 
user-editable data) and as a result overlaps with some functionality already 
provided by serialization and program_options. If all you need is data input, 
you could use either property_trees or program_options, as long as your data 
structures are simple enough to be representable by program_options. If you need 
input and output, you could use either serialization and property_trees, with 
different data format and code complexity tradeoffs.

Here are some questions that I think need to be answered in order to decide 
whether this overlap is a good idea:

1) Is a property tree in-memory data structure a valuable abstraction, 
regardless of how it is serialized?

I think the answer to this is yes. Representation of a hierarchy of key-value 
pairs is useful across the board, whether you are writing a compiler (symbol 
tables) or a GUI FTP client (cached per-directory data).

2) Is the serialization of such a data structure to formats less rich than those 
needed by the serialization library useful?

I think the answer to this is yes. The obvious motivation for this is in 
serialization of configuration data to a user-editable form.

3) Could such serialization be done by the serialization library?

I believe so. Provided that the serialization library has the notion of an 
archive that doesn't allow for class versioning or pointer tracking and may have 
limited ability to represent hierarchical data, many common configuration file 
formats (including Windows INI files, Mac OS X property lists, UNIX 
key<delimiter>value formats) can be produced by the serialization library. 

4) Should such serialization be done by the serialization library?

I believe so. The serialization library already has an interface for defining a 
way to convert C++ objects to an external representation, and I think it would 
be a good idea to maintain one interface for that purpose in boost.

5) Could a serialized form of property trees be used to input program options? 

Yes.

So, here is one way that I see to resolve the property_trees conundrum:

1) A library providing an in-memory property tree abstraction should be 
submitted.

2) A library providing serialization of property trees to some common 
configuration file formats should be submitted. This could be part of the same 
library as in part 1. As part of this work, serialization library may need to be 
modified to allow for archive formats that cannot represent the full range of 
object and class tracking metadata.

3) Ideally, this should result in the ability to serialize arbitrary C++ objects 
directly to a configuration file format -- in-memory data shouldn't have to be 
in the explicit form of a property tree in order to be serializable to a 
configuration file.

4) Property tree file formats should be allowed as inputs to the program options 
library (and the existing program options configuration file format should be 
subsumed by the library from part 2 above).

This would result in orthogonal components for each of the following 
responsibilities:

1) In-memory property tree representation
2) External property tree representation
3) Conversion of C++ objects, including in-memory property tree representation, 
to external property tree representation
4) Interpretation of property trees as command line program options
5) Parsing of command line program options from argv

For what it's worth, Mac OS X has a native property tree API, and I have 
implemented a serialization archive format that produces such a property tree 
from C++ data structures using the serialization library. My understanding of 
the various abstractions and their interactions is based on that experience.

Ben

-- 
I changed my name: <http://periodic-kingdom.org/People/NameChange.php>

Ben Artin

Thorsten Ottosen

David Abrahams

Ben Artin

tags

participants (3)