[BoostCon07][testing] Session reminder.

Boost Community, The Boost Conference will be here in very short order! And I need to get feedback for the "Testing Boost" sprint we'll be doing <http://boostcon.com/program/sessions#rivera-testing-sprint>. If you have any ideas, concerns, off the wall comments, etc. at *minimum* send them directly to me at "rrivera/acm.org". Why that account? Well because that will be the one account I can guarantee I will access to during the conference, and hence people can send me stuff up to the last minute. Not that I'm encouraging lateness, but I know how busy we all are ;-) If you are going to send it to the Boost dev list you will also want to eventually send me a summary follow up, if needed, from the ensuing discussions. Looking forward to seeing some of you at Aspen! -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Hi, I do plan to attend this session. I've got some ideas on the subject. I pitched them while ago once. And I still believe that the synchronization is the root of all evils. The only real solution to break this deadloop is independent libraries versioning. This should resolve both out release and testing issues (which are closely connected IMO) INDEPENDENT LIBRARY VERTIONING Idea is to "bite the bullet" and completely separate boost into multiple independent components and use independent versioning/testing for each one. I. Split Physically it can be done either as straightforward directory restructuring or using some kind of smart tagging. Directory restructuring may look like change from boost/ lib1/ lib2/ lib3/ .... libs/ lib1/ lib2/ lib3/ .... to lib1/ boost/ libs/ lib2/ boost/ libs/ ... II Independent versioning Each library/component will be independently versioned and released (see below how). The boost release is just a thin layer on top (see below) III. Regression tester setup Instead of single boost tree regression testers will have maintain something like this: lib1/ <version1>/ boost/ libs/ <version2>/ boost/ libs/ lib2/ <version1>/ boost/ libs/ <version2>/ boost/ libs/ ... Why? See below. IV. Makefile/Jamfile changes Instead of -I (BOOST_HOME) (in general terms - I know bjam works a bit differently). We introduce the notion of versioned dependency. For every library A we will have to determine which library it depends on (lets say B,C and D) and somewhere in makefile/Jamfile we add the statement: A.dependant_libs = B:<version of B> C:<version of C> D:<version of D> For example: A.dependant_libs = A:1.2.1 B:1.5.0 D:1.4.3 Above statements works two ways: 1. During compilation above statement causes -I A/<version of A> ... added to the compilation options. 2. For unit tests it should add appropriate library object to link with. Now every library is developed and tested independently of other libraries development. I may even stick to older version of component D even though new one is available. "version of X" is the version of component X and it may be one of those: 1. <major>.<minor>.<patch> Points to the specific version of component X. 2. <major>.<minor> Points to the latest patch of component X in <major>.<minor> version 3. LATEST Points to the LATEST released version 4. Boost:<version of boost> Points to the version of component X that belong to the appropriate version of Boost (see below for boost releasing procedure) 5. DEV Points to the development version of component X (cvs/svn HEAD). To be used to test against development version of another component. Is not recommended and would prevent releasing of your library. V Single component/library release. As soon as regression test for your library is green you are free to make release. You don't have to synchronize with/wait for anyone. It's done by single script that does following tasks: 1. Checks that all unit tests are green 2. Checks that you don't depend on development version of other components 3. Tag your library appropriately 4. Save version of all dependant component. Even it you point on the LATEST, this will save actual version at the time of release. 5. Post announcements "Library A version V is released" on dev list. No binaries is released unless we invent another procedure for independent component packaging. VI Boost release The boost release is done automatically every predefined period of time (like three month). It requires NO testing, NO branching and NO merging/reverting. It's done by single script (a bit oversimplification, but close) that does these tasks: 1. It iterates through all components and check any component that was released since previous boost release. Very important: The component should NOT depend on older version of component that is being released. For example if library A depends on version 1.23 of library B and version 1.24 of library B was released, library A won't become part of the next boost release. If your library depend on older version component that is being release also, it's your responsibility to test it with new release on dependant component and release new version. This may become a problem if core library - library many others depends on - starts frequent releases. There are couple ways to deal with this: a) Core library author shouldn't do this ;) b) Do not depend on fixed version of the component. Use LATEST instead. This way you are always testing against latest released version of the dependant component. This should minimize the chance that your library released with dependency to older component. Though it still may happened. c) Library can be released as "not latest" IOW without changing it's LATEST version. It can be done if library author wish to release new version for some users, but do not want to disturb upcoming release. 2. Once list of updates components created it iterates through it and if any library was released with <minor> version update this will be <minor> version update of the Boost. If all libraries release with <patch> version updates, it's going to be <patch> release of boost. 3. All the boost components are merged into single tree: boost/ lib1/ lib2/ lib3/ .... libs/ lib1/ lib2/ lib3/ .... At this stage we should check that no components are using the same header. 4. The rest should be as usual VII. Testing optimization. IMO There are some testing procedure optimizations that can be done: 1. If space is any concern instead of keeping multiple copies of each component make system should support retrieving it by request. IOW if during library A compilation we determine that we need version V of component X and it's not present, we get it automatically from source control system. Once testing is completed the dependant library could be removed. 2. Development only testing. We don't really need to test libraries that do not have anything new past previous release. IOW if here were no changes in a library since it's last release and none of the dependant libraries that are referred with LATEST version are changed, no need to run tests. 3. Request only testing We may introduce request only test system. If library author wish it to be tested it indicate it somehow. For example by tagging it's development version with tag for_test. Until the request is not reverted testing is performed. I must admit this proposal requires some changes in Boost.Build and other Boost procedures. It also might require more disc space on regression tester's site. But I believe this is the only direction that can really lead us to the salvation. Regards, Gennadiy

"Gennadiy Rozental" <gennadiy.rozental@thomson.com> writes: [snipped lots of sensible stuff]
VI Boost release
The boost release is done automatically every predefined period of time (like three month). It requires NO testing, NO branching and NO merging/reverting. It's done by single script (a bit oversimplification, but close) that does these tasks:
1. It iterates through all components and check any component that was released since previous boost release. Very important: The component should NOT depend on older version of component that is being released. For example if library A depends on version 1.23 of library B and version 1.24 of library B was released, library A won't become part of the next boost release.
Here I don't understand. Suppose Boost release X contains library A version 1.4, and library B version 1.23, and library A depends on library B. The developer of library B then releases a new version (1.24), and the author of library A doesn't. The next Boost release (Y) comes along, and now library A is no longer part of the Boost release? That strikes me as a bad plan --- the contents of Boost will vary from release to release as developers update their libraries at different rates. As an alternative, how about this: if library A depends on version xyz of library B, then library B is pinned at version xyz for Boost releases until library A is updated. If library A is not updated for n consecutive Boost releases, library A is dropped from Boost as unmaintained. How about this, also: a library developer can only release their library if it is built against the latest released version of all its dependent libraries. That way if a core library is updated, all other libraries will have to use the new version before they can release. Anthony -- Anthony Williams Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

On 5/3/07, Anthony Williams <anthony_w.geo@yahoo.com> wrote:
Here I don't understand. Suppose Boost release X contains library A version 1.4, and library B version 1.23, and library A depends on library B. The developer of library B then releases a new version (1.24), and the author of library A doesn't. The next Boost release (Y) comes along, and now library A is no longer part of the Boost release? That strikes me as a bad plan --- the contents of Boost will vary from release to release as developers update their libraries at different rates.
As an alternative, how about this: if library A depends on version xyz of library B, then library B is pinned at version xyz for Boost releases until library A is updated. If library A is not updated for n consecutive Boost releases, library A is dropped from Boost as unmaintained.
How about this, also: a library developer can only release their library if it is built against the latest released version of all its dependent libraries. That way if a core library is updated, all other libraries will have to use the new version before they can release.
Didn't Beman's proposal address most or all of these issues? --Michael Fawcett

"Michael Fawcett" <michael.fawcett@gmail.com> writes:
On 5/3/07, Anthony Williams <anthony_w.geo@yahoo.com> wrote:
Here I don't understand. Suppose Boost release X contains library A version 1.4, and library B version 1.23, and library A depends on library B. The developer of library B then releases a new version (1.24), and the author of library A doesn't. The next Boost release (Y) comes along, and now library A is no longer part of the Boost release? That strikes me as a bad plan --- the contents of Boost will vary from release to release as developers update their libraries at different rates.
As an alternative, how about this: if library A depends on version xyz of library B, then library B is pinned at version xyz for Boost releases until library A is updated. If library A is not updated for n consecutive Boost releases, library A is dropped from Boost as unmaintained.
How about this, also: a library developer can only release their library if it is built against the latest released version of all its dependent libraries. That way if a core library is updated, all other libraries will have to use the new version before they can release.
Didn't Beman's proposal address most or all of these issues?
Probably, I haven't seen his revised proposal yet. I was just responding to Genadiy's proposal. Anthony -- Anthony Williams Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

"Anthony Williams" <anthony_w.geo@yahoo.com> wrote in message news:ejlxj095.fsf@yahoo.com...
"Gennadiy Rozental" <gennadiy.rozental@thomson.com> writes:
[snipped lots of sensible stuff]
VI Boost release
The boost release is done automatically every predefined period of time (like three month). It requires NO testing, NO branching and NO merging/reverting. It's done by single script (a bit oversimplification, but close) that does these tasks:
1. It iterates through all components and check any component that was released since previous boost release. Very important: The component should NOT depend on older version of component that is being released. For example if library A depends on version 1.23 of library B and version 1.24 of library B was released, library A won't become part of the next boost release.
Here I don't understand. Suppose Boost release X contains library A version 1.4, and library B version 1.23, and library A depends on library B. The developer of library B then releases a new version (1.24), and the author of library A doesn't. The next Boost release (Y) comes along, and now library A is no longer part of the Boost release? That strikes me as a bad plan --- the contents of Boost will vary from release to release as developers update their libraries at different rates.
As an alternative, how about this: if library A depends on version xyz of library B, then library B is pinned at version xyz for Boost releases until library A is updated. If library A is not updated for n consecutive Boost releases, library A is dropped from Boost as unmaintained.
Hmm. All good points. How about this: 1. If a) library A depends on concrete version xyz of library B b) B is released library c) library A is not updated library A indeed in this case is not included, since it's intentionally points to the particular release of library B that is not included in new version ob boost anymore. 2. If library A depends on LATEST version of library. As soon as B is released, library A regression test will have to be resumed also (according to the procedure). If by the time we are ready to release boost library A regression is all green we do "automatic promotion/patching". We release patch version of library A that now depends on newly released library B. This process can be automatic. If regression failures are present we do need to postpone library B inclusion into boost release. Library B released in automatically made "not LATEST".
How about this, also: a library developer can only release their library if it is built against the latest released version of all its dependent libraries. That way if a core library is updated, all other libraries will have to use the new version before they can release.
No. I do not wont to restrict developers in how they do their development and what version they depend on. If you opt to depend on concrete version of the library B you need to understand the consequences: If B is released, you will have to do your own release manually or your library is not included in umbrella boost release. Gennadiy

Gennadiy Rozental wrote:
Hi,
I do plan to attend this session. I've got some ideas on the subject. I pitched them while ago once. And I still believe that the synchronization is the root of all evils. The only real solution to break this deadloop is independent libraries versioning. This should resolve both out release and testing issues (which are closely connected IMO)
This is on the right track - but seems way too complicated. Here is what I plan to do from now on: a) I will load the latest released Boost on my machine. b) I will make a (CVS or SVN or what ever) tree on my machine which includes only the serialization library files. c) I will tweak the build process so it will look into my serialization library tree before it looks in to the the latest Boost Official release. d) I will make changes in my tree as is convenient. I will test against on my local system against latest boost release. e) When it passes all my tests on the compilers I have. I will do the following: i) check in my changes into whatever tree boost decides it wants to ii) zip up the files which are different from the last boost release and upload the ziped file to a place on my website. The website will contain instructions on how to set up one's include paths so that the latest validated serialization library can be used. iii) version number isn't critical for me. Easiest would be the date of the upload. Serialization library would be "validated against the latest released version of boost" f) I will include better and more complete instructions for users to test the library on their own systems. Any users who want help with compilers I haven't tested will have to run complete test suite and report the results. This has the following features from my perspective: a) I believe this system in no way conflicts with any of the proposals that have been suggested. b) Its much less work for me to deal with. c) It will permit users to benefit from bug fixes and enancements MUCH sooner. In conjunction with this, I will reduce the default set of tests run by the boost testers. I'm sure this will be greeted with much relief as the serialization library consumes testing time way out of proportion to the importance of the library and the value they bring me as a developer. Robert Ramey

Also, I think all the libraries would benefit from this approach. But it doesnt require that everyone agree. Others are free to continue with the current "tightly coupled" system. The only problem is if the serialization library depends on another whose author makes an interfacing breaking change. This should happen very, very rarely - and in fact doesn't happen often. If it happens too often, I'll reduce my dependency on such libraries. Robert Ramey

Robert Ramey said: (by the date of Thu, 3 May 2007 22:32:52 -0700)
<snip> [ Robert's plan to release latest and greatest CVS-HEAD version of serialization library so that it can be used with boost 1.34 when it's released ]
Horray ! many thanks. It will be much easier to submit patches for you :) -- Janek Kozicki |

Robert Ramey <ramey <at> rrsd.com> writes:
Gennadiy Rozental wrote: [...]
I pitched them while ago once. And I still believe that the synchronization is the root of all evils. The only real solution to break this deadloop is independent libraries versioning. This should resolve both out release and testing issues (which are closely connected IMO)
This is on the right track - but seems way too complicated.
Here is what I plan to do from now on: [something along the lines of: develop locally against the latest Boost Official release]
e) When it passes all my tests on the compilers I have. I will do the following: i) check in my changes into whatever tree boost decides it wants to ii) zip up the files which are different from the last boost release and upload the ziped file to a place on my website. The website will contain instructions on how to set up one's include paths so that the latest validated serialization library can be used.
This is going to make it incredibly hard to use in the same project both serialization and any of the libraries that changed since you set up your environment. I believe there are two problems in Gennadiy's proposal: the granularity is too fine and the constraint of releasing Boost in a single whole is going to make things unnecessarily hard. A better approach would be to separate from core Boost some of the larger libraries once and for all. I'm thinking of Serialization, Spirit, Python, possibly a few others. In most cases other libraries should not depend on these (i.e. Preprocessor is not a good candidate). Where dependencies exist they should either be removed or moved to reside within the separate library (e.g. serialization support for core libraries should be supplied by serialization and not core [as I believe it is now, at least in many cases]). These libraries should be developed, tested and released separately, against the most recent release of core. It will be up to each library mantainers' team to decide whether to "port" one or more released versions of their library to new releases of Boost Core, while they work on a new major release. This will help shorten the Core release cycle, while allowing large libraries developers to proceed at their own pace. Users will also benefit from not having to download/install/build libraries they do not need. Cheers, Nicola Musatti

Nicola Musatti wrote:
Robert Ramey <ramey <at> rrsd.com> writes:
e) When it passes all my tests on the compilers I have. I will do the following: i) check in my changes into whatever tree boost decides it wants to ii) zip up the files which are different from the last boost release and upload the ziped file to a place on my website. The website will contain instructions on how to set up one's include paths so that the latest validated serialization library can be used.
This is going to make it incredibly hard to use in the same project both serialization and any of the libraries that changed since you set up your environment.
Why? one would have on his machine either a) last boost release version b) or last boost release version + validated upgrades. I don't see how this would make anything harder. If for some reason it does, you can always just skip the upgrade and wait for the next boost release. That is, one can be no worse off than he is now.
A better approach would be to separate from core Boost some of the larger libraries once and for all. I'm thinking of Serialization, Spirit, Python, possibly a few others. In most cases other libraries should not depend on these (i.e. Preprocessor is not a good candidate).
It seems that this is functionally equivalent to my proposal.
(e.g. serialization
support for core libraries should be supplied by serialization and not core [as I believe it is now, at least in many cases]).
Well, the current situation is a hodgepodge as it will always be. I've supplied serialization support for standard collections and Matthias Troyer has enhanced it. I've done it for a couple of types which I tought were important but it seemed it wasn't getting done any other way (e.g. shared_ptr) and some which were contributed (e.g. variant) and some have been done by the respective authors (e.g. data/time, multi-index). Right now they are in the namespace of the author - seems kind of arbitrary to me. And some authors have included serialization headers as part of "convenience headers" which I'm not crazy about. On the upside, though this "hodgepodge" lacks a certain aesthetic consistency, it doesn't create many real problems and trying to get everyone on the same page would be so hard as to not be worth the trouble.
These libraries should be developed, tested and released separately, against the most recent release of core. It will be up to each library mantainers' team to decide whether to "port" one or more released versions of their library to new releases of Boost Core, while they work on a new major release.
This will help shorten the Core release cycle, while allowing large libraries developers to proceed at their own pace.
Users will also benefit from not having to download/install/build libraries they do not need.
So, you're agreeing with me? or not? Robert Ramey
Cheers, Nicola Musatti
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Robert Ramey wrote:
Nicola Musatti wrote: [...]
This is going to make it incredibly hard to use in the same project both serialization and any of the libraries that changed since you set up your environment.
Why? one would have on his machine either
a) last boost release version b) or last boost release version + validated upgrades.
The problems of using a separate version of Spirit make me weary. [...]
A better approach would be to separate from core Boost some of the larger libraries once and for all. I'm thinking of Serialization, Spirit, Python, possibly a few others. In most cases other libraries should not depend on these (i.e. Preprocessor is not a good candidate).
It seems that this is functionally equivalent to my proposal.
Except that this state of things would be officially recognized across Boost and there would be no separate patches. [...]
So, you're agreeing with me? or not?
I do agree with the idea of developing Serialization (and possibly other libraries) separately from what I called Boost Core. I wish this could happen in a more official, agreed upon way. Cheers, Nicola Musatti

"Nicola Musatti" <Nicola.Musatti@gmail.com> wrote in message news:loom.20070504T111058-122@post.gmane.org...
Robert Ramey <ramey <at> rrsd.com> writes:
Gennadiy Rozental wrote: [...]
I pitched them while ago once. And I still believe that the synchronization is the root of all evils. The only real solution to break this deadloop is independent libraries versioning. This should resolve both out release and testing issues (which are closely connected IMO)
This is on the right track - but seems way too complicated.
Here is what I plan to do from now on: [something along the lines of: develop locally against the latest Boost Official release]
e) When it passes all my tests on the compilers I have. I will do the following: i) check in my changes into whatever tree boost decides it wants to ii) zip up the files which are different from the last boost release and upload the ziped file to a place on my website. The website will contain instructions on how to set up one's include paths so that the latest validated serialization library can be used.
This is going to make it incredibly hard to use in the same project both serialization and any of the libraries that changed since you set up your environment.
I agree.
I believe there are two problems in Gennadiy's proposal: the granularity is too fine
It's natural separation by independent libraries
and the constraint of releasing Boost in a single whole is going to make things unnecessarily hard.
Which constrain?
A better approach would be to separate from core Boost some of the larger libraries once and for all. I'm thinking of Serialization, Spirit, Python, possibly a few others. In most cases other libraries should not depend on these (i.e. Preprocessor is not a good candidate). Where dependencies exist they should either be removed or moved to reside within the separate library (e.g. serialization support for core libraries should be supplied by serialization and not core [as I believe it is now, at least in many cases]).
These libraries should be developed, tested and released separately, against the most recent release of core. It will be up to each library mantainers' team to decide whether to "port" one or more released versions of their library to new releases of Boost Core, while they work on a new major release.
1. This in no way address problem developing and releasing libraries that other depend on. And this is biggest problem IMO 2. You proposition leads to the separated librarties to be potencially unusable with latest boost release. This is not a good thing IMO. 3. Who make this decision? Which libraries are "core" and which are standalone? Gennadiy

Gennadiy Rozental wrote: [...]
I believe there are two problems in Gennadiy's proposal: the granularity is too fine
It's natural separation by independent libraries
I might agree if they really were independent, but in many cases they are not. On the other hand there are a number of rather large libraries that have fewer dependent ones.
and the constraint of releasing Boost in a single whole is going to make things unnecessarily hard.
Which constrain?
I'm under the impression that in your scheme you expect to be able to assemble a complete Boost release by choosing the appropriate releases of all the component libraries.
A better approach would be to separate from core Boost some of the larger libraries once and for all. I'm thinking of Serialization, Spirit, Python, possibly a few others. In most cases other libraries should not depend on these (i.e. Preprocessor is not a good candidate). Where dependencies exist they should either be removed or moved to reside within the separate library (e.g. serialization support for core libraries should be supplied by serialization and not core [as I believe it is now, at least in many cases]).
These libraries should be developed, tested and released separately, against the most recent release of core. It will be up to each library mantainers' team to decide whether to "port" one or more released versions of their library to new releases of Boost Core, while they work on a new major release.
1. This in no way address problem developing and releasing libraries that other depend on. And this is biggest problem IMO
All that is needed is to shift the release date of the split libraries some 2-3 months after the release of core, assuming a six month release cycle. Core developers will take advantage of the more manageable size of the library collection they work on, while split libraries' developers will gain from the resulting period of Core's guaranteed stability.
2. You proposition leads to the separated librarties to be potencially unusable with latest boost release. This is not a good thing IMO.
People that only use core will be able to switch to a new release immediately; those who need one or more of the separated libraries will have two wait up to three months. On the other hand by reducing Core Boost to a much more manageable size than whole Boost is nice, the chances of hitting planned release dates should increase. If you consider how long people have been waiting for the libraries that were introduced/improved in 1.34, not to mention those that are expected for 1.35...
3. Who make this decision? Which libraries are "core" and which are standalone?
This will have to be agreed upon, considering size, dependencies and breadth of applicability. Ideally library authors should offer to split off their libraries if they think it reasonable. In a way Robert Ramey is already heading in a similar direction with Serialization. I think he should be encouraged to do so, but within an agreed upon setup, rather than in total independence, so that other authors may benefit from the experience gained. Cheers, Nicola Musatti

Sorry for not following the thread properly, just a quick note:
(e.g. serialization support for core libraries should be supplied by serialization and not core [as I believe it is now, at least in many cases]).
This is not right. Serialization support for class X should be provided by X.hpp and it must be possible to do this without depending on any other header.

Peter Dimov wrote:
Sorry for not following the thread properly, just a quick note:
(e.g. serialization support for core libraries should be supplied by serialization and not core [as I believe it is now, at least in many cases]).
This is not right. Serialization support for class X should be provided by X.hpp and it must be possible to do this without depending on any other header.
Appologies from me, too, as now we are getting even further from the original topic. I have seen at least one case where it was indeed X.hpp that introduced the dependency to serialization code. However, this generates another problem: If X is usable without serialization, users shouldn't be forced to also link to the serialization library, just because of this dependency. (In the particular case I have in mind this dependency was controlled by a preprocessor macro. That's not very practical, since packagers surely won't provide two sets of packages for X, one with and one without this dependency.) Thus, I'd suggest to encapsulate the X-serialization functionality into a separate library (may be header-only), such as X_serialization.hpp etc. Then I can still use X stand-alone, and drag in the rest whenever I need it. Thanks, Stefan PS: the library I'm thinking of is boost.wave, and there serialization was dragged in whenever BOOST_WAVE_SERIALIZATION is defined. -- ...ich hab' noch einen Koffer in Berlin...

If X is usable without serialization, users shouldn't be forced to also
link to the serialization library, just because of this dependency. (In the particular case I have in mind this dependency was controlled by a preprocessor macro. That's not very practical, since packagers surely won't provide two sets of packages for X, one with and one without this dependency.)
Thus, I'd suggest to encapsulate the X-serialization functionality into a separate library (may be header-only), such as X_serialization.hpp etc. Then I can still use X stand-alone, and drag in the rest whenever I need it.
This has been discussed before. You don't need X_serialization.hpp, if you don't use BOOST_CLASS_EXPORT. See http://www.archivesat.com/Boost_developers/thread2900871.htm. Emil Dotchevski

Emil Dotchevski wrote:
If X is usable without serialization, users shouldn't be forced to also
link to the serialization library, just because of this dependency. (In the particular case I have in mind this dependency was controlled by a preprocessor macro. That's not very practical, since packagers surely won't provide two sets of packages for X, one with and one without this dependency.)
Thus, I'd suggest to encapsulate the X-serialization functionality into a separate library (may be header-only), such as X_serialization.hpp etc. Then I can still use X stand-alone, and drag in the rest whenever I need it.
This has been discussed before. You don't need X_serialization.hpp, if you don't use BOOST_CLASS_EXPORT. See http://www.archivesat.com/Boost_developers/thread2900871.htm.
I'm not sure how relevant that is. If X.hpp contains serialization-related code, it surely needs to include serialization header files, too, to drag in any relevant declarations. Thus there is a dependency from X to serialization. Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...

Emil Dotchevski wrote:
If X is usable without serialization, users shouldn't be forced to also
link to the serialization library, just because of this dependency. (In the particular case I have in mind this dependency was controlled by a preprocessor macro. That's not very practical, since packagers surely won't provide two sets of packages for X, one with and one without this dependency.)
Thus, I'd suggest to encapsulate the X-serialization functionality into a separate library (may be header-only), such as X_serialization.hpp etc. Then I can still use X stand-alone, and drag in the rest whenever I need it.
This has been discussed before. You don't need X_serialization.hpp, if you don't use BOOST_CLASS_EXPORT. See http://www.archivesat.com/Boost_developers/thread2900871.htm.
I'm not sure how relevant that is. If X.hpp contains serialization-related code, it surely needs to include serialization header files, too, to drag in any relevant declarations. Thus there is a dependency from X to serialization.
You need no declarations because the serialization is implemented as a function template: class foo; template<class Archive> void serialize(Archive & ar, foo & x, const unsigned int version) { ar & x.bar; .... } The above code compiles without including any boost serialization headers. I have not used boost serialization (I have my own serialization library), but as far as I can see the reason why foo.hpp that uses boost serailization needs to include serialization headers, is to be able to use BOOST_CLASS_EXPORT. For me this is good enough reason not to use BOOST_CLASS_EXPORT. Emil Dotchevski

Emil Dotchevski wrote:
Emil Dotchevski wrote:
If X is usable without serialization, users shouldn't be forced to also
link to the serialization library, just because of this dependency. (In the particular case I have in mind this dependency was controlled by a preprocessor macro. That's not very practical, since packagers surely won't provide two sets of packages for X, one with and one without this dependency.)
Thus, I'd suggest to encapsulate the X-serialization functionality into a separate library (may be header-only), such as X_serialization.hpp etc. Then I can still use X stand-alone, and drag in the rest whenever I need it.
This has been discussed before. You don't need X_serialization.hpp, if you don't use BOOST_CLASS_EXPORT. See http://www.archivesat.com/Boost_developers/thread2900871.htm.
I'm not sure how relevant that is. If X.hpp contains serialization-related code, it surely needs to include serialization header files, too, to drag in any relevant declarations. Thus there is a dependency from X to serialization.
You need no declarations because the serialization is implemented as a function template:
class foo;
template<class Archive> void serialize(Archive & ar, foo & x, const unsigned int version) { ar & x.bar; .... }
The above code compiles without including any boost serialization headers.
I have not used boost serialization (I have my own serialization library), but as far as I can see the reason why foo.hpp that uses boost serailization needs to include serialization headers, is to be able to use BOOST_CLASS_EXPORT.
If you serialize classes with base classes, you need to include base_object.hpp, and if you want your serialization to be compatible with XML archives, you have to include nvp.hpp, so it's not possible to completely avoid including serialization headers.
For me this is good enough reason not to use BOOST_CLASS_EXPORT.
If you're about to serialize derived classes via pointers to base classes, BOOST_CLASS_EXPORT is pretty much a must. - Volodya

Emil Dotchevski wrote:
Emil Dotchevski wrote:
If X is usable without serialization, users shouldn't be forced to also
link to the serialization library, just because of this dependency. (In the particular case I have in mind this dependency was controlled by a preprocessor macro. That's not very practical, since packagers surely won't provide two sets of packages for X, one with and one without this dependency.)
Thus, I'd suggest to encapsulate the X-serialization functionality into a separate library (may be header-only), such as X_serialization.hpp etc. Then I can still use X stand-alone, and drag in the rest whenever I need it.
This has been discussed before. You don't need X_serialization.hpp, if you don't use BOOST_CLASS_EXPORT. See http://www.archivesat.com/Boost_developers/thread2900871.htm.
I'm not sure how relevant that is. If X.hpp contains serialization-related code, it surely needs to include serialization header files, too, to drag in any relevant declarations. Thus there is a dependency from X to serialization.
You need no declarations because the serialization is implemented as a function template:
class foo;
template<class Archive> void serialize(Archive & ar, foo & x, const unsigned int version) { ar & x.bar; .... }
The above code compiles without including any boost serialization headers.
I have not used boost serialization (I have my own serialization library), but as far as I can see the reason why foo.hpp that uses boost serailization needs to include serialization headers, is to be able to use BOOST_CLASS_EXPORT.
If you serialize classes with base classes, you need to include base_object.hpp, and if you want your serialization to be compatible with XML archives, you have to include nvp.hpp, so it's not possible to completely avoid including serialization headers.
For me this is good enough reason not to use BOOST_CLASS_EXPORT.
If you're about to serialize derived classes via pointers to base classes, BOOST_CLASS_EXPORT is pretty much a must.
BOOST_CLASS_EXPORT is designed to make the necessary registrations automatic, but in doing so it introduces physical coupling that is undesirable for anyone who includes a particular class' header but does not need to serialize objects of that class. I also think that having BOOST_CLASS_EXPORT deduce the persistent class name automatically inhibits maintainability, as any refactoring that changes class names would break all serialized data. I prefer having one place in my program where I register all classes with their persistent names, so I can make separate decisions when to rename classes and when to change persistent names. Emil Dotchevski

Stefan,
Appologies from me, too, as now we are getting even further from the original topic.
I have seen at least one case where it was indeed X.hpp that introduced the dependency to serialization code. However, this generates another problem:
If X is usable without serialization, users shouldn't be forced to also link to the serialization library, just because of this dependency. (In the particular case I have in mind this dependency was controlled by a preprocessor macro. That's not very practical, since packagers surely won't provide two sets of packages for X, one with and one without this dependency.)
Thus, I'd suggest to encapsulate the X-serialization functionality into a separate library (may be header-only), such as X_serialization.hpp etc. Then I can still use X stand-alone, and drag in the rest whenever I need it.
I'm not sure, if this is always possible.
PS: the library I'm thinking of is boost.wave, and there serialization was dragged in whenever BOOST_WAVE_SERIALIZATION is defined.
As far as Wave is concerned, the mentioned macro has no influence on the code generated for the library. You always can define BOOST_WAVE_SERIALIZATION to zero when compiling your code, even if the library was compiled with BOOST_WAVE_SERIALIZATION=1. The generated library has no dependency on Boost.Serialization. HTH Regards Hartmut

Hartmut Kaiser wrote:
Stefan,
Appologies from me, too, as now we are getting even further from the original topic.
I have seen at least one case where it was indeed X.hpp that introduced the dependency to serialization code. However, this generates another problem:
If X is usable without serialization, users shouldn't be forced to also link to the serialization library, just because of this dependency. (In the particular case I have in mind this dependency was controlled by a preprocessor macro. That's not very practical, since packagers surely won't provide two sets of packages for X, one with and one without this dependency.)
Thus, I'd suggest to encapsulate the X-serialization functionality into a separate library (may be header-only), such as X_serialization.hpp etc. Then I can still use X stand-alone, and drag in the rest whenever I need it.
I'm not sure, if this is always possible.
This isn't a new debate, but I believe a separate user specified header is the correct solution. This is how date-time is done and others should be as well. As I recall prior discussions, not all libraries have been factored that way -- I think multi-index isn't -- Joaquin had some good reasons. Anyway, it's a bit more effort, but because of the serialization design it should always be possible do have external serialization functions. In general serialization requires a pile of extra includes that should be avoided if possible.
PS: the library I'm thinking of is boost.wave, and there serialization was dragged in whenever BOOST_WAVE_SERIALIZATION is defined.
As far as Wave is concerned, the mentioned macro has no influence on the code generated for the library. You always can define BOOST_WAVE_SERIALIZATION to zero when compiling your code, even if the library was compiled with BOOST_WAVE_SERIALIZATION=1. The generated library has no dependency on Boost.Serialization.
Seems to me that unless it's essential to library function serialization support should always be turned off by default. Jeff

As far as Wave is concerned, the mentioned macro has no influence on the code generated for the library. You always can define BOOST_WAVE_SERIALIZATION to zero when compiling your code, even if the library was compiled with BOOST_WAVE_SERIALIZATION=1. The generated library has no dependency on Boost.Serialization.
Seems to me that unless it's essential to library function serialization support should always be turned off by default.
The job of a load function is to leave the object being loaded in a well defined state -- its invariants intact -- much like a constructor. Therefore, turning this functionality on or off doesn't make any more sense than turning a particular constructor on or off. You can define loading and saving without including any headers from boost serialization. Emil Dotchevski

Jeff Garland wrote:
I'm not sure, if this is always possible.
This isn't a new debate, but I believe a separate user specified header is the correct solution. This is how date-time is done and others should be as well. As I recall prior discussions, not all libraries have been factored that way -- I think multi-index isn't -- Joaquin had some good reasons. Anyway, it's a bit more effort, but because of the serialization design it should always be possible do have external serialization functions. In general serialization requires a pile of extra includes that should be avoided if possible.
The way it's implemented in Wave is, that the library itself doesn't include any serialization headers and the serialization code for the different classes is disabled by default. If the user needs serialization support (which is strictly optional), he/she needs to include the serialization headers herself and to define the BOOST_WAVE_SERIALIZATION=1 to enable the serialization code. This ensures no serialization dependency is generated by default.
PS: the library I'm thinking of is boost.wave, and there serialization was dragged in whenever BOOST_WAVE_SERIALIZATION is defined.
As far as Wave is concerned, the mentioned macro has no influence on the code generated for the library. You always can define BOOST_WAVE_SERIALIZATION to zero when compiling your code, even if the library was compiled with BOOST_WAVE_SERIALIZATION=1. The generated library has no dependency on Boost.Serialization.
Seems to me that unless it's essential to library function serialization support should always be turned off by default.
In fact, it is disabled by default. Regards Hartmut

One more thing. I just had a look at the Wave code. By default serialization support is switched off. So serialization dependencies are generated only if you explicitly specify BOOST_WAVE_SERIALIZATION=1. Regards Hartmut
-----Original Message----- From: Hartmut Kaiser [mailto:hartmut.kaiser@gmail.com] Sent: Friday, May 04, 2007 6:59 PM To: 'boost@lists.boost.org' Subject: RE: [boost] library dependencies
Stefan,
Appologies from me, too, as now we are getting even further from the original topic.
I have seen at least one case where it was indeed X.hpp that introduced the dependency to serialization code. However, this generates another problem:
If X is usable without serialization, users shouldn't be forced to also link to the serialization library, just because of this dependency. (In the particular case I have in mind this dependency was controlled by a preprocessor macro. That's not very practical, since packagers surely won't provide two sets of packages for X, one with and one without this dependency.)
Thus, I'd suggest to encapsulate the X-serialization functionality into a separate library (may be header-only), such as X_serialization.hpp etc. Then I can still use X stand-alone, and drag in the rest whenever I need it.
I'm not sure, if this is always possible.
PS: the library I'm thinking of is boost.wave, and there serialization was dragged in whenever BOOST_WAVE_SERIALIZATION is defined.
As far as Wave is concerned, the mentioned macro has no influence on the code generated for the library. You always can define BOOST_WAVE_SERIALIZATION to zero when compiling your code, even if the library was compiled with BOOST_WAVE_SERIALIZATION=1. The generated library has no dependency on Boost.Serialization.
HTH Regards Hartmut

One more thing.
I just had a look at the Wave code. By default serialization support is switched off. So serialization dependencies are generated only if you explicitly specify BOOST_WAVE_SERIALIZATION=1.
Is it possible to refactor the Wave code headers such that they do not invoke BOOST_CLASS_EXPORT, leaving this job to client code that does need to serialize Wave objects? In that case, would you still need to include serialization headers in the Wave code headers? Emil Dotchevski

Emil Dotchevski wrote:
One more thing.
I just had a look at the Wave code. By default serialization support is switched off. So serialization dependencies are generated only if you explicitly specify BOOST_WAVE_SERIALIZATION=1.
Is it possible to refactor the Wave code headers such that they do not invoke BOOST_CLASS_EXPORT, leaving this job to client code that does need to serialize Wave objects? In that case, would you still need to include serialization headers in the Wave code headers?
Wave code headers do not include serialization headers. Regards Hartmut

Is it possible to refactor the Wave code headers such that they do not invoke BOOST_CLASS_EXPORT, leaving this job to client code that does need to serialize Wave objects? In that case, would you still need to include serialization headers in the Wave code headers?
Wave code headers do not include serialization headers.
Regards Hartmut
So what does BOOST_WAVE_SERIALIZATION disable? A few function templates that wouldn't be instantiated unless the client code calls them? Emil Dotchevski

Emil Dotchevski wrote:
Is it possible to refactor the Wave code headers such that they do not invoke BOOST_CLASS_EXPORT, leaving this job to client code that does need to serialize Wave objects? In that case, would you still need to include serialization headers in the Wave code headers?
Wave code headers do not include serialization headers.
Regards Hartmut
So what does BOOST_WAVE_SERIALIZATION disable? A few function templates that wouldn't be instantiated unless the client code calls them?
Yes, mainly. Additionally it disables supporting code, such as versioning etc., i.e. code not needed without serialization support. Regards Hartmut

Hartmut Kaiser wrote:
Wave code headers do not include serialization headers.
http://boost.cvs.sourceforge.net/boost/boost/boost/wave/cpp_context.hpp?revision=1.35&view=markup conditionally includes boost/serialization/serialization.hpp Yes, I understand that this include directive is only seen if BOOST_WAVE_SERIALIZATION is defined to something other than 0. But still, I very much doubt packagers would package boost.wave without accounting for that dependency to boost.serialization. Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...

Stefan Seefeld wrote:
Wave code headers do not include serialization headers.
http://boost.cvs.sourceforge.net/boost/boost/boost/wave/cpp_co ntext.hpp?revision=1.35&view=markup
conditionally includes boost/serialization/serialization.hpp
Yes, I understand that this include directive is only seen if BOOST_WAVE_SERIALIZATION is defined to something other than 0. But still, I very much doubt packagers would package boost.wave without accounting for that dependency to boost.serialization.
Even if BOOST_WAVE_SERIALIZATION is defined to != 0 during compilation of the library (the *.cpp files) there will be no dependency on Boost.Serialization at runtime as long as the user doesn't define this constant during the compilation of his application. But I'll have a look and try to factor out the direct dependencies on serialization headers into separate headers. Regards Hartmut

Stefan Seefeld wrote:
If X is usable without serialization, users shouldn't be forced to also link to the serialization library,
I 100% agree with this.
Thus, I'd suggest to encapsulate the X-serialization functionality into a separate library (may be header-only), such as X_serialization.hpp etc. Then I can still use X stand-alone, and drag in the rest whenever I need it.
and 100% with this as well. Robert Ramey

Peter Dimov wrote:
Sorry for not following the thread properly, just a quick note:
(e.g. serialization support for core libraries should be supplied by serialization and not core [as I believe it is now, at least in many cases]).
This is not right. Serialization support for class X should be provided by X.hpp and it must be possible to do this without depending on any other header.
I don't agree, except in the rare cases where serialization is one of the core responsibilities of the class. The one thing that I realize is often unavoidable is providing the serialization mechanism access to a class's private state and I consider this rather unfortunate. Cheers, Nicola Musatti

Nicola Musatti wrote:
Peter Dimov wrote:
Sorry for not following the thread properly, just a quick note:
(e.g. serialization support for core libraries should be supplied by serialization and not core [as I believe it is now, at least in many cases]).
This is not right. Serialization support for class X should be provided by X.hpp and it must be possible to do this without depending on any other header.
I don't agree, except in the rare cases where serialization is one of the core responsibilities of the class. The one thing that I realize is often unavoidable is providing the serialization mechanism access to a class's private state and I consider this rather unfortunate.
If your serialization methods - allow creation of objects whose invariants do not hold, or - expose implementation details in the external representation, they are broken.

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Peter Dimov
If your serialization methods
- allow creation of objects whose invariants do not hold, or - expose implementation details in the external representation,
What do you mean by the second line? Sohail

Sohail Somani wrote:
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Peter Dimov
If your serialization methods
- allow creation of objects whose invariants do not hold, or - expose implementation details in the external representation,
What do you mean by the second line?
Consider for example a set<int>. The proper way to serialize it is as a sequence of values. This is what the user sees, and this external format is robust against changes in the implementation. If you serialize it as a tree of Node classes, this would make it quite hard to switch to a skip list representation later.

----- Mensaje original ----- De: Peter Dimov <pdimov@mmltd.net> Fecha: Sábado, Mayo 5, 2007 12:17 pm Asunto: Re: [boost] Serialization support,Was: [BoostCon07][testing] Session reminder. Para: boost@lists.boost.org
Sohail Somani wrote:
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Peter Dimov
If your serialization methods
- allow creation of objects whose invariants do not hold, or - expose implementation details in the external representation,
What do you mean by the second line?
Consider for example a set<int>. The proper way to serialize it is as a sequence of values. This is what the user sees, and this external format is robust against changes in the implementation. If you serialize it as a tree of Node classes, this would make it quite hard to switch to a skip list representation later.
I'd like to join this intrusive vs. non-intrusive serialization discussion from my experience in providing serialization support for Boost.MultiIndex. As I see things, there are two related but different ways in which serialization support can be said to be intrusive: a) data intrusive: the stuff serialized reflects the internal structure of the class. The set<int> example proposed by Peter above illustrates a case of (unwise) data intrusive serialization. b) interface intrusive: the serialization algorithms cannot be implemented by using the class public interface alone. For instance, up to Boost 1.33 (if my memory serves me well), serialization of shared_ptrs was provided in an interface intrusive way because no non-intrusive approach was found --this has fortunately been corrected now. Usually, data intrusive serialization implies interface intrusive serialization, but *not* the other way around: Boost.MultiIndex serialization support is indeed interface intrusive but not data intrusive; let me explain why I had to do things this way. No one doubts non-intrusive serialization is the preferred approach, when feasible. Now consider the process of serializing a multi_index_container: we naturally want deserialization to reconstruct the order in which the elements were arranged in *every* index of the container (hashed indices excluded from this guarantee, as relying in the arbitrary order they provide is unsound to begin with). If we follow the natural approach of saving the sequence of values as traversed by some of the indices (let's say index #0), this guarantee is not held, let's see the problems encountered with each index type: * Ordered indices: no problems with *unique* ordered indices, since the order there is strictly determined by the values contained. But what about non-unique ordered indices? Consider this container: multi_index_container< int, indexed_by< ordered_non_unique<int_modulo_3_extractor>, ordered_non_unique<int_modulo_3_extractor> >
where the elements are deemed equivalent if they have the same value modulo 3. Now suppose we have the values 0,3 and 6 in the container, and index #0 lists them in the following order: 0 3 6 What can we say from this info about the traversal when done through index #1? The answer is: absolutely nothing, whatever permutation of these elements could be validly exposed by index #1. These variations from index #0 result from the fact that the values could have been inserted with hinted insert() through either of both indices. The final state is a convoluted function of the insertion history. * Sequenced and random-access indices: here it is even clearer that the order maintained by these indices cannot be inferred by those of other indices, since after insertion elements can be freely relocated. So, in order to provide our desired level of functionality (perfect reconstruction of every index traversal order) we cannot just save the elements as traversed by index #0, but we must somehow codify the variations from every other index wrt to the traversal order implied by index #0, when these variations are not unique --this is indeed what's done in an efficient way, using LIS (longest increasing subsequence) algorithms. So far, this approach is not data intrusive, since the information stored does not reveal any particular implementation detail and reflects only the nature of traversal orders allowable by index semantics. Now the remaining question is: can we use this info to reconstruct the traversal orders in a non interface intrusive way, i.e. by resorting only to the public interface of each index type? * Ordered indices: No; once an element is inserted into a multi-index container, there is no way (with public interface methods) of changing its order with respect to other equivalent elements in a non-unique ordered index, short of extracting and reinserting the element again, which, besides being terribly inefficient, destroys the relative element position gotten so far in other indices, in a sort of cacth-22 situation. * Sequenced indices: Yes. splice() (or relocate()) facilities can be used to alter the traversal order of a sequenced index without touching the rest of indices. * Random-access indices: Yes, with problems. splice() member functions are available here as for sequenced indices, but relocating elements around is a O(n) operation, implying a O(n*n) complexity for the task of reconstructing the entire index. An interface intrusive scheme can do this in linear time. So, I had no other choice but to implement serialization support for Boost.MultiIndex in an interface intrusive way. The morale of the story is: for rich-state classes where the exact state of an object depends heavily on its past history, non-intrusive serialization can be either algorithmically unfeasible (it is hard or impossible to reconstruct the history from the current state) or potentially less efficient than a interface intrusive approach. This does not mean that the class interface or the serialization support implementation are "broken". Just my 2c, sorry about the long post. JoaquÃn M López Muñoz Telefónica, Investigación y Desarrollo

"JOAQUIN LOPEZ MU?Z" wrote: [...]
a) data intrusive: the stuff serialized reflects the internal structure of the class. The set<int> example proposed by Peter above illustrates a case of (unwise) data intrusive serialization. b) interface intrusive: the serialization algorithms cannot be implemented by using the class public interface alone. For instance, up to Boost 1.33 (if my memory serves me well), serialization of shared_ptrs was provided in an interface intrusive way because no non-intrusive approach was found --this has fortunately been corrected now.
There is no such thing as "interface intrusive" serialization if you consider the serialization support a part of the documented interface of the class. I admit that this requires additional investment and is rarely done, but it's the only way to do it right. An opaque external representation (or serialization algorithm) always leads to problems in the long run. (shared_ptr's 1.32 serialization was "data intrusive".) [...]
So, I had no other choice but to implement serialization support for Boost.MultiIndex in an interface intrusive way. The morale of the story is: for rich-state classes where the exact state of an object depends heavily on its past history, non-intrusive serialization can be either algorithmically unfeasible (it is hard or impossible to reconstruct the history from the current state) or potentially less efficient than a interface intrusive approach. This does not mean that the class interface or the serialization support implementation are "broken".
If the ability to reconstruct a MultiIndex container in a particular exact state is important, there should exist a documented way to do that. Deserialization is not a special case. (This doesn't mean that the deserialization support cannot be this documented way, of course.)

----- Mensaje original ----- De: Peter Dimov <pdimov@mmltd.net> Fecha: Sábado, Mayo 5, 2007 3:40 pm Asunto: Re: [boost] Serialization support,Was: [BoostCon07][testing] Session reminder. Para: boost@lists.boost.org
"JOAQUIN LOPEZ MU?Z" wrote:
[...]
a) data intrusive: the stuff serialized reflects the internal structure of the class. The set<int> example proposed by Peter above illustrates a case of (unwise) data intrusive serialization. b) interface intrusive: the serialization algorithms cannot be implemented by using the class public interface alone. For instance, up to Boost 1.33 (if my memory serves me well), serialization of shared_ptrs was provided in an interface intrusive way because no non-intrusive approach was found --this has fortunately been corrected now.
There is no such thing as "interface intrusive" serialization if you consider the serialization support a part of the documented interface of the class. I admit that this requires additional investment and is rarely done, but it's the only way to do it right.
I'm not getting you: of course if you provide serialization support for a given class T, this support is usually documented and thus becomes part of T's interface; what additional investment are you referring to? What more do I have to document except saying "T is serializable through Boost.Serialization"?
An opaque external representation (or serialization algorithm) always leads to problems in the long run.
What kind of problems? In which sense is treating the external representation as an opaque entity different from treating the *internal* representation as an implementation detail?
(shared_ptr's 1.32 serialization was "data intrusive".)
I stand corrected.
[...]
So, I had no other choice but to implement serialization support for Boost.MultiIndex in an interface intrusive way. The morale of the story is: for rich-state classes where the exact state of an object depends heavily on its past history, non-intrusive serialization can be either algorithmically unfeasible (it is hard or impossible to reconstruct the history from the current state) or potentially less efficient than a interface intrusive approach. This does not mean that the class interface or the serialization support implementation are "broken".
If the ability to reconstruct a MultiIndex container in a particular exact state is important, there should exist a documented way to do that. Deserialization is not a special case.
On the contrary, I think deserialization is synonymous with "reconstructing an object in a particular exact state". The only specificity about this is that one choses Boost.Serialization as the reconstructing API instead of something else. But this is as good a choice as any other API (or better, if Boost.Serialization gains momentum as the de facto standard serialization interface, which would be great). So, I think I basically agree with what you say next:
(This doesn't mean that the deserialization support cannot be this documented way, of course.)
JoaquÃn M López Muñoz Telefónica, Investigación y Desarrollo

"JOAQUIN LOPEZ MU?Z" wrote:
De: Peter Dimov <pdimov@mmltd.net>
"JOAQUIN LOPEZ MU?Z" wrote:
[...]
a) data intrusive: the stuff serialized reflects the internal structure of the class. The set<int> example proposed by Peter above illustrates a case of (unwise) data intrusive serialization. b) interface intrusive: the serialization algorithms cannot be implemented by using the class public interface alone. For instance, up to Boost 1.33 (if my memory serves me well), serialization of shared_ptrs was provided in an interface intrusive way because no non-intrusive approach was found --this has fortunately been corrected now.
There is no such thing as "interface intrusive" serialization if you consider the serialization support a part of the documented interface of the class. I admit that this requires additional investment and is rarely done, but it's the only way to do it right.
I'm not getting you: of course if you provide serialization support for a given class T, this support is usually documented and thus becomes part of T's interface; what additional investment are you referring to? What more do I have to document except saying "T is serializable through Boost.Serialization"?
Well... here's what I would document given a simple class: template<class A> void serialize( A & a, unsigned ); Effects: a & x; a & y; where x and y are the values returned by x() and y(), respectively. "T is serializable through Boost.Serialization" doesn't tell me how to manipulate the result. I can neither construct a suitable input file nor read the output.
An opaque external representation (or serialization algorithm) always leads to problems in the long run.
What kind of problems?
Those Microsoft is having with its file formats. ;-)
In which sense is treating the external representation as an opaque entity different from treating the *internal* representation as an implementation detail?
The external representation is observable and persistent. It can outlive both Boost.Serialization and Boost.MultiIndex (or their specific implementations) and it should be possible for an alternate implementation of the MultiIndex interface to read/write the same format. If you use serialization to send a Boost.MI object over the wire to another machine, it should be possible for the non-C++ program on the other end to reconstruct the data structure. Think about how people would feel when they serialize a std::multiindex with implementation A and can't read it with implementation B, as will happen if the requirement for an implementation is just to be able to read back what it created, not create a particular documented external representation.

Hi Peter, ----- Mensaje original ----- De: Peter Dimov <pdimov@mmltd.net> Fecha: Sábado, Mayo 5, 2007 4:47 pm Asunto: Re: [boost] Serialization support Para: boost@lists.boost.org
"JOAQUIN LOPEZ MU?Z" wrote: [...]
I'm not getting you: of course if you provide serialization support for a given class T, this support is usually documented and thus becomes part of T's interface; what additional investment are you referring to? What more do I have to document except saying "T is serializable through Boost.Serialization"?
Well... here's what I would document given a simple class:
template<class A> void serialize( A & a, unsigned );
Effects: a & x; a & y; where x and y are the values returned by x() and y(), respectively.
"T is serializable through Boost.Serialization" doesn't tell me how to manipulate the result. I can neither construct a suitable input file nor read the output. [...] The external representation is observable and persistent. It can outlive both Boost.Serialization and Boost.MultiIndex (or their specific implementations) and it should be possible for an alternate implementation of the MultiIndex interface to read/write the same format. If you use serialization to send a Boost.MI object over the wire to another machine, it should be possible for the non-C++ program on the other end to reconstruct the data structure.
Think about how people would feel when they serialize a std::multiindex with implementation A and can't read it with implementation B, as will happen if the requirement for an implementation is just to be able to read back what it created, not create a particular documented external representation.
Ok now I understand, and basically share, your concerns about the lack of documentation on what goes into the serialization stream. Of course I can document my part in the case of B.MI, but for this to constitute a real cross-platform, cross- implementation spec one would additionally need that: 1. Other serializable types (serialization primitive and otherwise) do the same and document their way of stuffing themselves down the wire. 2. Boost.Serialization exposes its internal mechanism for serialization --I'm thinking here of things like versioning and pointer tracking-- at least to the extent that they reflect on what gets actually saved to the stream. This initiative is interesting (and would be almost mandatory if B.S was ever proposed to the standard) but its scope is community-wide and beyond the realm of an isolated class' definition IMHO. What does Robert think about this? JoaquÃn M López Muñoz Telefónica, Investigación y Desarrollo

JOAQUIN LOPEZ MU?Z wrote:
Ok now I understand, and basically share, your concerns about the lack of documentation on what goes into the serialization stream.
I'm not sure you see the full scope of what Peter is raising, or perhaps I'm reading more into it :-) It is not enough to document the serialization for, nor the serialization procedure, as that leaves out non-Boost Serialization implementations. I happen to have one of those non-Boost implementations, and it's implemented non-intrusively. In thinking about the scope of designing a serializable class, I consider the case of writing an external copy algorithm. If I can't write a copy function that given one instance will create an equivalent instance, then the class isn't usefully serializable. This aspect isn't important just for serialization, but also for general algorithm design. If you can't get at a consistent, reproducible, state of an instance it limits the number of algorithms you can apply to it. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

----- Mensaje original ----- De: Rene Rivera <grafikrobot@gmail.com> Fecha: Sábado, Mayo 5, 2007 7:05 pm Asunto: Re: [boost] Serialization support Para: boost@lists.boost.org
JOAQUIN LOPEZ MU?Z wrote:
Ok now I understand, and basically share, your concerns about the lack of documentation on what goes into the serialization stream.
I'm not sure you see the full scope of what Peter is raising, or perhaps I'm reading more into it :-) It is not enough to document the serialization for, nor the serialization procedure, as that leaves out non-Boost Serialization implementations.
Well the idea is, you can leverage the B.S interface to add yout non-Boost serialization support. As its crudest, you can do the following: void non_boost_save(const T& t,non_boost_archive& a) { std::ostringstream oss; { boost::archive::text_oarchive oa(oss); oa<<t; } a.save(os.str()); } Something more sensible could be done by defining your own utility B.S Archive class, you get the idea.
I happen to have one of those non-Boost implementations, and it's implemented non-intrusively. In thinking about the scope of designing a serializable class, I consider the case of writing an external copy algorithm. If I can't write a copy function that given one instance will create an equivalent instance, then the class isn't usefully serializable.
Here you lost me. I know you're not referring to the following, but from your description looks like you're asking for a function T create_copy(const T& t) { return t; } which of course is readily available whenever T is copy-constructible. JoaquÃn M López Muñoz Telefónica, Investigación y Desarrollo

JOAQUIN LOPEZ MU?Z wrote:
De: Rene Rivera
JOAQUIN LOPEZ MU?Z wrote:
Ok now I understand, and basically share, your concerns about the lack of documentation on what goes into the serialization stream. I'm not sure you see the full scope of what Peter is raising, or perhaps I'm reading more into it :-) It is not enough to document the serialization for, nor the serialization procedure, as that leaves out non-Boost Serialization implementations.
Well the idea is, you can leverage the B.S interface to add yout non-Boost serialization support. As its crudest, you can do the following:
void non_boost_save(const T& t,non_boost_archive& a) { std::ostringstream oss; { boost::archive::text_oarchive oa(oss); oa<<t; } a.save(os.str()); }
Something more sensible could be done by defining your own utility B.S Archive class, you get the idea.
Yes, I'm familiar with the idea. Your example of course is still using B-S. The point of my use case is that my idea of serialization may be very different from both the B-S idea, and the B-MI idea of serialization. For example I may be willing to loose some data, or I may want to serialize as a different structural representation.
I happen to have one of those non-Boost implementations, and it's implemented non-intrusively. In thinking about the scope of designing a serializable class, I consider the case of writing an external copy algorithm. If I can't write a copy function that given one instance will create an equivalent instance, then the class isn't usefully serializable.
Here you lost me. I know you're not referring to the following, but from your description looks like you're asking for a function
T create_copy(const T& t) { return t; }
which of course is readily available whenever T is copy-constructible.
Not what I was referring to at all... As that just forward the copy to the internal implementation ;-) What I was referring to is an "equivalent copy", where equivalent is use case dependent. For example, AFAICR, all std containers have the property that they can be reinterpreted as a different container (within the limits of impedance between the containers) by only using the public interfaces. The kind of copy I'm taking about is minimally seen with: tuple< int, shared_array<int> > create_copy(vector<int> const & v) { tuple< int, shared_array<int> > result(v.size(), new int[v.size]); std::copy(v.begin(),v.end(),result.get<1>().get()); return result; } The result of that function is an equivalent container to the vector passed. It has the same information, in the same order, but not with the same interface. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

"JOAQUIN LOPEZ MU?Z" wrote:
Ok now I understand, and basically share, your concerns about the lack of documentation on what goes into the serialization stream. Of course I can document my part in the case of B.MI, but for this to constitute a real cross-platform, cross- implementation spec one would additionally need that:
1. Other serializable types (serialization primitive and otherwise) do the same and document their way of stuffing themselves down the wire. 2. Boost.Serialization exposes its internal mechanism for serialization --I'm thinking here of things like versioning and pointer tracking-- at least to the extent that they reflect on what gets actually saved to the stream.
This initiative is interesting (and would be almost mandatory if B.S was ever proposed to the standard) but its scope is community-wide and beyond the realm of an isolated class' definition IMHO. What does Robert think about this?
My stated goal was to permit the serialization an de-serialization of any group of C++ data structures in the most expendient way. I wanted the system to be simple to use, complete and efficient. To this end, I made a determined effort to separate the external aspects in the "archives" and description of the class attributes related to serialization in "serialization" I refrained from and a-priori definition and description of the external format for a couple of reasons: a) it was too hard - I would require a huge amount of work and forsight. b) I believed - and still believe - that it conflicted with my stated goals above. c) I didn't think that the investment of time would be worth it in brining the package "to market" After some experience in seeing how the package is being used and how hard/easy it is to use, I don't regret my decisions. If one want's to add an externallly defined language independent format to the above goals, I think one will be doomed to failure. Of course I could be wrong and anyone is free to take a crack at it. Lots of people have. I'm not sure how all the other systems out there compare to boost serialization these days. So I don't see an externally documented format for this boost serialization. Hence I don't see anything like boost serialization ever appearing in the standard. Perhaps some system which might functionally similar but I think it would have to be grown from scratch with a different set of goals and priorities. Which is the reason that I think the whole concept of library standards have been over-applied and even detremental to the future success of C++. Robert Ramey

Robert Ramey wrote:
If one want's to add an externallly defined language independent format to the above goals, I think one will be doomed to failure. Of course I could be wrong and anyone is free to take a crack at it. Lots of people have. I'm not sure how all the other systems out there compare to boost serialization these days.
So I don't see an externally documented format for this boost serialization. Hence I don't see anything like boost serialization ever appearing in the standard. Perhaps some system which might functionally similar but I think it would have to be grown from scratch with a different set of goals and priorities.
I think you're being too pessimistic -- standards that could work have been developed multiple times and places. No doubt, though, this is too much for a single person and I agree with you that boost.serialization might not have ever completed. Off the top of my head I think the following would be good, well-documented options: - CDR for truly portable binary data http://www.omg.org/cgi-bin/doc?formal/02-06-51 - JSON text format - http://www.json.org/ - YAML text format - http://www.yaml.org/ So, if someone would spend a couple days writing new archives we'd be in business. BTW, I have seen the lack of a documented format become a reason to not use Boost serialization on a project.
Which is the reason that I think the whole concept of library standards have been over-applied and even detremental to the future success of C++.
This too I believe is wrong. Every human system of significance rests on standards. You and I couldn't be conversing now if we didn't have a pile of IEEE standards, posix standards, W3C standards, and yes, programming standards. For C++, there is a real effect of having something in the standard -- companies that refuse to use Boost will insist on the use of ISO standard C++. Jeff

Jeff Garland wrote:
BTW, I have seen the lack of a documented format become a reason to not use Boost serialization on a project.
No doubt this has happened - and for good reason. I such a format is required - This "right choice" is probably not boost serialization. Boost Serializaition has (in my view) lots of appeal - and its applicable to a very wide class of problems. But it can't be all things to all people (an all host languages). Sorry, but I think different tool developed with a different set of priority goals is required for that.
Which is the reason that I think the whole concept of library standards have been over-applied and even detremental to the future success of C++.
This too I believe is wrong. Every human system of significance rests on standards. You and I couldn't be conversing now if we didn't have a pile of IEEE standards, posix standards, W3C standards, and yes, programming standards. For C++, there is a real effect of having something in the standard -- companies that refuse to use Boost will insist on the use of ISO standard C++.
I know my view on this subject isn't widely held. We've touched upon it before and maybe we will again - but for now we're frying other fish. Robert Ramey

I'd like to add my support for separating out serialization headers. I just recently decided that I'd try out ptr_containers. When I go to include ptr_vector.hpp I suddenly have dependencies on serialization headers when I care nothing about serialization at the moment. There doesn't even appear to be a macro to disable these dependencies so I'm going to have to hack at the sources to remove these dependencies. Seems rather silly. Thanks, Michael Marcin

Peter Dimov wrote: If you use
serialization to send a Boost.MI object over the wire to another machine, it should be possible for the non-C++ program on the other end to reconstruct the data structure.
Not every language can support all the data structures that C++ supports. Robert Ramey

JOAQUIN LOPEZ MU?Z wrote:
So, I had no other choice but to implement serialization support for Boost.MultiIndex in an interface intrusive way. The morale of the story is: for rich-state classes where the exact state of an object depends heavily on its past history, non-intrusive serialization can be either algorithmically unfeasible (it is hard or impossible to reconstruct the history from the current state) or potentially less efficient than a interface intrusive approach. This does not mean that the class interface or the serialization support implementation are "broken".
Thx for explaining this. I'd like to go back the the header inclusion dependency issue. The need for internal access isn't a reason why the serialization headers need to be 'included by default'. A friend class/function in separate serialization header could implement the serialization using the internal mechanisms that are otherwise an implementation detail. Jeff

----- Mensaje original ----- De: Jeff Garland <jeff@crystalclearsoftware.com> Fecha: Sábado, Mayo 5, 2007 5:14 pm Asunto: Re: [boost] Serialization support, Was: [BoostCon07][testing] Session reminder. Para: boost@lists.boost.org
JOAQUIN LOPEZ MU?Z wrote:
So, I had no other choice but to implement serialization support for Boost.MultiIndex in an interface intrusive way. [...] Thx for explaining this. I'd like to go back the the header inclusion dependency issue. The need for internal access isn't a reason why the serialization headers need to be 'included by default'. A friend class/function in separate serialization header could implement the serialization using the internal mechanisms that are otherwise an implementation detail.
Hello Jeff, Yep, the need for intrusive serialization support does not preclude its relocation to separate serialization headers. In the discussion we had two months ago I gave my rationales for not using separate headers, let me quote that for the readers' convenience: <quote> Well, I thought about this kind of problems when designing the serialization support of B.MI, and what you've got is the best I came up with. There is a rationale for not having boost/multi_index/serialization/*_index.hpp headers, let me explain: When you serialize a multi_index_container comprised of N indices, every index gets involved in the serialization process; so, if you have something like: tyepedef multi_index_container< element, indexed_by< ordered_unique<...>, hashed_non_unique<...>, sequenced<...> >
mic_t;
and want to serialize objects of type mic_t, you'd have (according to the serialization header model) to include the following: #include <boost/multi_index/serialization.hpp> #include <boost/multi_index/serialization/ordered_index.hpp> #include <boost/multi_index/serialization/hashed_index.hpp> #include <boost/multi_index/serialization/sequenced_index.hpp> Which looks like excessively cumbersome to me. Failing to include one of the headers won't result in less-capable serialization support, only in a compile-time error when trying to stream mic_t's. A twist on this could be: why not embed all the serialization support in one centralized header? #include <boost/multi_index/serialization.hpp> This is convenient from the user's point of view, but doesn't scale up well internally, cause adding a new type of index would cause this header to grow regardless of whether the new type of index is used or not. As things stand now, index types are entirely orthogonal with each other in terms of code base, which is good. So, my decision was to embed serialization support for each index (and the multi_index_container wrapper itself) directly in their corresponding headers, and I left the disabling macro just in case. Faced with the dilemma of whether this support should be on or off by default, I opted for "on" because, thanks to the autolinking feature, when the serialization capabilities are not invoked the user does not notice anything, perhaps some theoretical slowdown in compile times. </quote> What's your stance on this? If there's some agreement that the actual approach should be changed I can of course do it --it'd only raise small backwards-compatibility glitches. BTW, if the separate header is agreed on as the preferred mechanism for bringing in serialization capabilities, I think it'd be great to reach a consensus on the names of those convenience headers; there are various options: 1. boost/lib/T_serialization.hpp 2. boost/lib/T_serialize.hpp (date_time approach) 3. boost/lib/serialization/T.hpp (consistent with B.S approach) I like 3 best because _serialization or _serialize suffixes tend to give long file names, and we've got the 31 char limit rule still in effect. JoaquÃn M López Muñoz Telefónica, Investigación y Desarrollo

JOAQUIN LOPEZ MU?Z wrote:
Hello Jeff,
Yep, the need for intrusive serialization support does not preclude its relocation to separate serialization headers. In the discussion we had two months ago I gave my rationales for not using separate headers, let me quote that for the readers' convenience:
Thx...seemed like it was longer ;-)
<quote> ...snipe... serialization header model) to include the following:
#include <boost/multi_index/serialization.hpp> #include <boost/multi_index/serialization/ordered_index.hpp> #include <boost/multi_index/serialization/hashed_index.hpp> #include <boost/multi_index/serialization/sequenced_index.hpp>
Which looks like excessively cumbersome to me. Failing to include
I agree, but it could be explained for the header minimalists in a couple paragraphs.
A twist on this could be: why not embed all the serialization support in one centralized header?
#include <boost/multi_index/serialization.hpp>
This is convenient from the user's point of view, but doesn't scale up well internally, cause adding a new type of index would cause this header to grow regardless of whether the new type of index is used or not. As things stand now, index types are entirely orthogonal with each other in terms of code base, which is good.
Here's where I disagree with your decision. I think the 'scale-up' is a non-issue. Most of the time the extra headers won't matter, so the combined header is the easy way for users. But really, you could easily offer both options if you wanted.
So, my decision was to embed serialization support for each index (and the multi_index_container wrapper itself) directly in their corresponding headers, and I left the disabling macro just in case. Faced with the dilemma of whether this support should be on or off by default, I opted for "on" because, thanks to the autolinking feature, when the serialization capabilities are not invoked the user does not notice anything, perhaps some theoretical slowdown in compile times.
What's your stance on this? If there's some agreement that the actual approach should be changed I can of course do it --it'd only raise small backwards-compatibility glitches.
Here's another use case. Of I try to subset Boost with multi-index only I need to know about the macro or carry along the serialization headers, and whatever it includes, unless I know about the macro. I'd like to have a smarter version of bcp where I can say something like: bcp --no-serialization lib1 lib2 and it would make me a subset that ignored the serialization dependency.
BTW, if the separate header is agreed on as the preferred mechanism for bringing in serialization capabilities, I think it'd be great to reach a consensus on the names of those convenience headers; there are various options:
1. boost/lib/T_serialization.hpp 2. boost/lib/T_serialize.hpp (date_time approach) 3. boost/lib/serialization/T.hpp (consistent with B.S approach)
Fair enough, but I guess I don't like number 3 much as it interferes with other subdirectory structure that the library might have. So I'd have to have boost/date_time/gregorian/serialization/ -- I guess I could live with it as long as I could have a date_time/serializ(e)(tion).hpp that wrapped up all of the library. It would sure make that bcp thing easy.
I like 3 best because _serialization or _serialize suffixes tend to give long file names, and we've got the 31 char limit rule still in effect.
True, but don't we have a 512 char path too ;-) Anyway, I'd be willing to switch if people prefer 1 or 3. Overall, I think Boost is big enough and mature enough that we should settle some of these things for the goal of overall Boost usability. Even things like the lack of an 'all in one' header throws users off track....which is a shame. Jeff

Peter Dimov wrote: [...]
If your serialization methods
- allow creation of objects whose invariants do not hold, or
Of course. This is analogous to writing a buggy copy constructor.
- expose implementation details in the external representation,
This is more tricky: you may want to serialize an object which is in a state that in normal circumstances is not the direct outcome of construction, but rather is the result of some activity. How can you manage that? Either you provide a special purpose constructor or you open up the object's internal state. I would say that while a "serializable" class should be independent of the serialization machinery, the provision of a specific serialization interface must be explicitly considered. This interface is indeed part of the class. Cheers, Nicola Musatti

Nicola Musatti wrote:
Peter Dimov wrote: [...]
If your serialization methods
- allow creation of objects whose invariants do not hold, or
Of course. This is analogous to writing a buggy copy constructor.
- expose implementation details in the external representation,
This is more tricky: you may want to serialize an object which is in a state that in normal circumstances is not the direct outcome of construction, but rather is the result of some activity. How can you manage that? Either you provide a special purpose constructor or you open up the object's internal state.
You store information that allows you to reconstruct the state. There is no need to open the object's internal state, and it is not entirely clear what you mean by that.

Peter Dimov wrote:
Nicola Musatti wrote:
Peter Dimov wrote: [...]
If your serialization methods
- allow creation of objects whose invariants do not hold, or Of course. This is analogous to writing a buggy copy constructor.
- expose implementation details in the external representation, This is more tricky: you may want to serialize an object which is in a state that in normal circumstances is not the direct outcome of construction, but rather is the result of some activity. How can you manage that? Either you provide a special purpose constructor or you open up the object's internal state.
You store information that allows you to reconstruct the state. There is no need to open the object's internal state, and it is not entirely clear what you mean by that.
I don't think we're saying very different things; in order to make use of the information you mention above it is likely that your class will need specific member functions, that are not strictly required to fulfill the class's primary responsibility. I believe this is what Joaquin refers to as "interface intrusive". Elsewhere you argue for an external representation that captures a class's conceptual behaviour without exposing the details of a specific implementation. While I'd agree that this is a sensible objective for a general purpose library, and even more for a standardized serialization approach, in application development it might be overkill, and sometimes just dumping a subset of a class's data members may prove more effective. Cheers, Nicola Musatti

"Nicola Musatti" <Nicola.Musatti@gmail.com> wrote in message news:f1frd3$jm8$1@sea.gmane.org...
Gennadiy Rozental wrote: [...]
I believe there are two problems in Gennadiy's proposal: the granularity is too fine
It's natural separation by independent libraries
I might agree if they really were independent, but in many cases they are not. On the other hand there are a number of rather large libraries that have fewer dependent ones.
Sorry. I was a bit unclear. By independent I mean "independently developed".
and the constraint of releasing Boost in a single whole is going to make things unnecessarily hard.
Which constrain?
I'm under the impression that in your scheme you expect to be able to assemble a complete Boost release by choosing the appropriate releases of all the component libraries.
Yes. So what is the problem?
A better approach would be to separate from core Boost some of the larger libraries once and for all. I'm thinking of Serialization, Spirit, Python, possibly a few others. In most cases other libraries should not depend on these (i.e. Preprocessor is not a good candidate). Where dependencies exist they should either be removed or moved to reside within the separate library (e.g. serialization support for core libraries should be supplied by serialization and not core [as I believe it is now, at least in many cases]).
These libraries should be developed, tested and released separately, against the most recent release of core. It will be up to each library mantainers' team to decide whether to "port" one or more released versions of their library to new releases of Boost Core, while they work on a new major release.
1. This in no way address problem developing and releasing libraries that other depend on. And this is biggest problem IMO
All that is needed is to shift the release date of the split libraries some 2-3 months after the release of core, assuming a six month release
What if we want to release "core" every 3 month? But what is more important there is independently developed libraries within core.
cycle. Core developers will take advantage of the more manageable size of the library collection they work on, while split libraries'
What advantage? Each developer really care about one's own library size and dependencies. Why do you believe that be splitting couple libraries we achieve anything?
developers will gain from the resulting period of Core's guaranteed stability.
What stability? Core is stable to 6 month next release library A is updated. For 6 month separated library B tested against outdated version and it become unusable once next "core" release occurs. It takes another couple month to make another release of library B compatible with A, only to become invalid again in a month or so when next version of "core" is released.
2. You proposition leads to the separated libraries to be potentially unusable with latest boost release. This is not a good thing IMO.
People that only use core will be able to switch to a new release immediately; those who need one or more of the separated libraries will have two wait up to three months. On the other hand by reducing Core
And something will always be missing: either update to separate library or new feature from "core" one.
Boost to a much more manageable size than whole Boost is nice, the chances of hitting planned release dates should increase. If you
How splitting couple libs will make it "much more manageable"? And if you split let's say 2/3 of libs how is it different from what I propose?
consider how long people have been waiting for the libraries that were introduced/improved in 1.34, not to mention those that are expected for 1.35...
3. Who make this decision? Which libraries are "core" and which are standalone?
This will have to be agreed upon, considering size, dependencies and breadth of applicability.
This will never going to be agreed upon.
Ideally library authors should offer to split off their libraries if they think it reasonable. In a way Robert Ramey is already heading in a similar direction with Serialization. I think he should be encouraged to do so, but within an agreed upon setup, rather than in total independence, so that other authors may benefit from the experience gained.
There is no real incentive for a library author to do extra work of keeping up with independent "core" releases. In my scheme you can't add "core" library release to the boost release until all the libs that depend on LATEST version of the library are tested against it. What you propose essentially leave every developer of separated libraries on their own. While the same time developers of the "core" libraries are still faces with current interdependencies problem. Gennadiy

Gennadiy Rozental wrote: > "Nicola Musatti" <Nicola.Musatti@gmail.com> wrote in message > news:f1frd3$jm8$1@sea.gmane.org... [...] > Sorry. I was a bit unclear. By independent I mean "independently developed". OK >>>> and the constraint of releasing Boost in a single whole is going to make >>>> things unnecessarily hard. >>> Which constrain? >> I'm under the impression that in your scheme you expect to be able to >> assemble a complete Boost release by choosing the appropriate releases >> of all the component libraries. > > Yes. So what is the problem? I'm convinced that one of the problems that lead to 1.34 not being released for over a year beyond the original schedule has to do with the size and complexity of Boost as a whole and coordinating the personal schedules of many developers to reach a single coordinated goal. The one advantage of this situation is that each developer has been able to react to changes as soon as s/he realized that her/his library was impacted. By splitting at the library level, but expecting to still issue one coherent whole you don't solve the problem, but you give up the one advantage. I don't see what your approach would gain us. >>>> A better approach would be to separate from core Boost some of the >>>> larger >>>> libraries once and for all. I'm thinking of Serialization, Spirit, >>>> Python, >>>> possibly a few others. In most cases other libraries should not depend >>>> on >>>> these >>>> (i.e. Preprocessor is not a good candidate). Where dependencies exist >>>> they >>>> should either be removed or moved to reside within the separate library >>>> (e.g. >>>> serialization support for core libraries should be supplied by >>>> serialization and >>>> not core [as I believe it is now, at least in many cases]). >>>> >>>> These libraries should be developed, tested and released separately, >>>> against the >>>> most recent release of core. It will be up to each library mantainers' >>>> team to >>>> decide whether to "port" one or more released versions of their library >>>> to >>>> new >>>> releases of Boost Core, while they work on a new major release. >>> 1. This in no way address problem developing and releasing libraries that >>> other depend on. And this is biggest problem IMO >> All that is needed is to shift the release date of the split libraries >> some 2-3 months after the release of core, assuming a six month release > > What if we want to release "core" every 3 month? But what is more important > there is independently developed libraries within core. So? At worst users of specific libraries will be one core release behind. They'll have to face the trade off of being able to use the one library they need, but having to wait some more time before they can adopt some newly introduced core feature. Assuming a more realistic six month release cycle for core plus an extra three month for split libraries these people would still be much better of than those that right now would like to use a library that was reviewed a year ago. >> cycle. Core developers will take advantage of the more manageable size >> of the library collection they work on, while split libraries' > > What advantage? Each developer really care about one's own library size and > dependencies. Why do you believe that be splitting couple libraries we > achieve anything? For instance people not using those libraries won't have to wait for them to reach stability before a release can be made. By just setting apart Serialization regression testing would take half the time, leading to a much shorter turn around time. The problem is rather that the split libraries would need testing resources too. >> developers will gain from the resulting period of Core's guaranteed >> stability. > > What stability? Core is stable to 6 month next release library A is updated. > For 6 month separated library B tested against outdated version and it > become unusable once next "core" release occurs. It takes another couple > month to make another release of library B compatible with A, only to become > invalid again in a month or so when next version of "core" is released. Again, this a problem we keep having to face whenever we have to use a piece of software that depends on others. >>> 2. You proposition leads to the separated libraries to be potentially >>> unusable with latest boost release. This is not a good thing IMO. >> People that only use core will be able to switch to a new release >> immediately; those who need one or more of the separated libraries will >> have two wait up to three months. On the other hand by reducing Core > > And something will always be missing: either update to separate library or > new feature from "core" one. If the result is a shorter overall cycle than we have now,, this will be a non problem. >> Boost to a much more manageable size than whole Boost is nice, the >> chances of hitting planned release dates should increase. If you > > How splitting couple libs will make it "much more manageable"? And if you > split let's say 2/3 of libs how is it different from what I propose? Again, just as an example, consider splitting off Serialization, Spirit, Wave, Python, Mpi. This not just a couple of libs, it's a significant portion of the Boost code base, and you still don't have to find out which release of Bind goes with which release of Function and so on. >> consider how long people have been waiting for the libraries that were >> introduced/improved in 1.34, not to mention those that are expected for >> 1.35... >> >>> 3. Who make this decision? Which libraries are "core" and which are >>> standalone? >> This will have to be agreed upon, considering size, dependencies and >> breadth of applicability. > > This will never going to be agreed upon. Why do you think so? Spirit is already been developed independently and Robert R. clearly intends to do the same with Serialization. I'm sure that for them it would be easier to release their libraries against a stable release of the rest of Boost than against a moving target. >> Ideally library authors should offer to split >> off their libraries if they think it reasonable. In a way Robert Ramey >> is already heading in a similar direction with Serialization. I think he >> should be encouraged to do so, but within an agreed upon setup, rather >> than in total independence, so that other authors may benefit from the >> experience gained. > > There is no real incentive for a library author to do extra work of keeping > up with independent "core" releases. In my scheme you can't add "core" > library release to the boost release until all the libs that depend on > LATEST version of the library are tested against it. What you propose > essentially leave every developer of separated libraries on their own. While > the same time developers of the "core" libraries are still faces with > current interdependencies problem. Keeping up with independent core releases is *less* work than keeping up with a continuously moving target and the reason I'm writing all this is because I don't *want* split library authors to be left on their own or to go their separate ways. Rather I want Boost to be split into a small number of packages and I'm convinced that this would improve noticeably the manageability of the whole lot. Cheers, Nicola Musatti

Nicola Musatti wrote:
Gennadiy Rozental wrote:
Why do you think so? Spirit is already been developed independently and Robert R. clearly intends to do the same with Serialization. I'm sure that for them it would be easier to release their libraries against a stable release of the rest of Boost than against a moving target.
I wouldn't call what I plan to do as develop independently. I just want run my own tests against a stable code base that everyone has and make the changes available to anyone who wants them. I expect to check in changes with whatever system boost decides to use. A side effect of this is that I make the defaut test set smaller and add the tests that I now need to add but cannot because they take too long. DOES ANYONE OBJECT TO MY DOING THIS? WHY? Does this in anyway conflict with any of the proposals made so far? Anyone who doesn't agree with this can just wait for the normal boost process - he will see no change and won't have to be aware that I've tested stuff on my on machine and permitted those who are interested in using the lastest version. Robert Ramey

on Fri May 04 2007, "Robert Ramey" <ramey-AT-rrsd.com> wrote:
Nicola Musatti wrote:
Gennadiy Rozental wrote:
Why do you think so? Spirit is already been developed independently and Robert R. clearly intends to do the same with Serialization. I'm sure that for them it would be easier to release their libraries against a stable release of the rest of Boost than against a moving target.
I wouldn't call what I plan to do as develop independently. I just want run my own tests against a stable code base that everyone has and make the changes available to anyone who wants them. I expect to check in changes with whatever system boost decides to use. A side effect of this is that I make the defaut test set smaller and add the tests that I now need to add but cannot because they take too long.
DOES ANYONE OBJECT TO MY DOING THIS? WHY?
If you are going to "make the changes available to anyone who wants them" by checking them into a branch in the Boost SVN repo, I think it's fine. In fact, I think it will pretty much line up with Beman's proposal. -- Dave Abrahams Boost Consulting www.boost-consulting.com Don't Miss BoostCon 2007! ==> http://www.boostcon.com

David Abrahams wrote:
DOES ANYONE OBJECT TO MY DOING THIS? WHY?
If you are going to "make the changes available to anyone who wants them" by checking them into a branch in the Boost SVN repo, I think it's fine. In fact, I think it will pretty much line up with Beman's proposal.
Actually, what I was thinking would be to create a *.zip file of the hanged files. As I explained before and has been suggested by other authors, local changes/updates/modifications of the Boost can be handled by including a "shadow" tree of include files before the tree of the official boost release. Robert Ramey

on Mon May 07 2007, "Robert Ramey" <ramey-AT-rrsd.com> wrote:
David Abrahams wrote:
DOES ANYONE OBJECT TO MY DOING THIS? WHY?
If you are going to "make the changes available to anyone who wants them" by checking them into a branch in the Boost SVN repo, I think it's fine. In fact, I think it will pretty much line up with Beman's proposal.
Actually, what I was thinking would be to create a *.zip file of the hanged files.
Then I object. Please use the source control tools, if only so we have a historical record. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com Don't Miss BoostCon 2007! ==> http://www.boostcon.com

David Abrahams wrote:
on Mon May 07 2007, "Robert Ramey" <ramey-AT-rrsd.com> wrote:
David Abrahams wrote:
DOES ANYONE OBJECT TO MY DOING THIS? WHY?
If you are going to "make the changes available to anyone who wants them" by checking them into a branch in the Boost SVN repo, I think it's fine. In fact, I think it will pretty much line up with Beman's proposal.
Actually, what I was thinking would be to create a *.zip file of the hanged files.
I meant making the *.zip file in addition to checking into the head. The HEAD and its history would be as it always has been. Robert Ramey
Then I object. Please use the source control tools, if only so we have a historical record.

on Wed May 09 2007, "Robert Ramey" <ramey-AT-rrsd.com> wrote:
David Abrahams wrote:
on Mon May 07 2007, "Robert Ramey" <ramey-AT-rrsd.com> wrote:
David Abrahams wrote:
DOES ANYONE OBJECT TO MY DOING THIS? WHY?
If you are going to "make the changes available to anyone who wants them" by checking them into a branch in the Boost SVN repo, I think it's fine. In fact, I think it will pretty much line up with Beman's proposal.
Actually, what I was thinking would be to create a *.zip file of the hanged files.
I meant making the *.zip file in addition to checking into the head. The HEAD and its history would be as it always has been.
I want you to not only develop in HEAD, but create tags that mark each "released" .zip file. That means "copying" head to a subdirectory of your tags directory. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com Don't Miss BoostCon 2007! ==> http://www.boostcon.com

"Nicola Musatti" <Nicola.Musatti@gmail.com> wrote in message news:f1gb29$b3s$1@sea.gmane.org... > I'm convinced that one of the problems that lead to 1.34 not being > released for over a year beyond the original schedule has to do with the > size and complexity of Boost as a whole and coordinating the personal > schedules of many developers to reach a single coordinated goal. The one My proposal doesn't require ANY coordination during boost release. And that's the key. Once time has come you collect what's ready at this time, combine and release. No testing is done at the time of boost release whatsoever. And no waiting. > advantage of this situation is that each developer has been able to > react to changes as soon as s/he realized that her/his library was > impacted. > > By splitting at the library level, but expecting to still issue one > coherent whole you don't solve the problem, but you give up the one > advantage. I don't see what your approach would gain us. I am still in a dark: what exactly did I give up? >>>>> A better approach would be to separate from core Boost some of the >>>>> larger >>>>> libraries once and for all. I'm thinking of Serialization, Spirit, >>>>> Python, >>>>> possibly a few others. In most cases other libraries should not depend >>>>> on >>>>> these >>>>> (i.e. Preprocessor is not a good candidate). Where dependencies exist >>>>> they >>>>> should either be removed or moved to reside within the separate >>>>> library >>>>> (e.g. >>>>> serialization support for core libraries should be supplied by >>>>> serialization and >>>>> not core [as I believe it is now, at least in many cases]). >>>>> >>>>> These libraries should be developed, tested and released separately, >>>>> against the >>>>> most recent release of core. It will be up to each library mantainers' >>>>> team to >>>>> decide whether to "port" one or more released versions of their >>>>> library >>>>> to >>>>> new >>>>> releases of Boost Core, while they work on a new major release. >>>> 1. This in no way address problem developing and releasing libraries >>>> that >>>> other depend on. And this is biggest problem IMO >>> All that is needed is to shift the release date of the split libraries >>> some 2-3 months after the release of core, assuming a six month release >> >> What if we want to release "core" every 3 month? But what is more >> important >> there is independently developed libraries within core. > > So? At worst users of specific libraries will be one core release > behind. They'll have to face the trade off of being able to use the one > library they need, but having to wait some more time before they can > adopt some newly introduced core feature. Assuming a more realistic six > month release cycle for core plus an extra three month for split > libraries these people would still be much better of than those that > right now would like to use a library that was reviewed a year ago. No. It will never work from user's prospective IMO. I am using boost xyz. I need lib A. Now I am stack and can't upgrade just because library A developer has no incentive (or just busy) to build against next version of boost. >>> cycle. Core developers will take advantage of the more manageable size >>> of the library collection they work on, while split libraries' >> >> What advantage? Each developer really care about one's own library size >> and >> dependencies. Why do you believe that be splitting couple libraries we >> achieve anything? > > For instance people not using those libraries won't have to wait for > them to reach stability before a release can be made. What about those who do need them? They will always be in "one (at least) release too late" position. > By just setting > apart Serialization regression testing would take half the time, leading Today serialization. Tomorrow GUI lib, XML parser, HTTP server etc. > to a much shorter turn around time. The problem is rather that the split > libraries would need testing resources too. Yes. Testing. You did not address it. It's completely unclear how do you plan to test separated library that depends on some long back release. >>> developers will gain from the resulting period of Core's guaranteed >>> stability. >> >> What stability? Core is stable to 6 month next release library A is >> updated. >> For 6 month separated library B tested against outdated version and it >> become unusable once next "core" release occurs. It takes another couple >> month to make another release of library B compatible with A, only to >> become >> invalid again in a month or so when next version of "core" is released. > > Again, this a problem we keep having to face whenever we have to use a > piece of software that depends on others. No. Boost release *has to be consistent*. Otherwise separated component is just another third-party library we don't care about. >>>> 2. You proposition leads to the separated libraries to be potentially >>>> unusable with latest boost release. This is not a good thing IMO. >>> People that only use core will be able to switch to a new release >>> immediately; those who need one or more of the separated libraries will >>> have two wait up to three months. On the other hand by reducing Core >> >> And something will always be missing: either update to separate library >> or >> new feature from "core" one. > > If the result is a shorter overall cycle than we have now,, this will be > a non problem. It is. Large number of users tend to stick to single version and use it until they jump to another. You propose me to have 10 different versions of boost "core" for 10 separated libraries developed at different speed. >>> Boost to a much more manageable size than whole Boost is nice, the >>> chances of hitting planned release dates should increase. If you >> >> How splitting couple libs will make it "much more manageable"? And if you >> split let's say 2/3 of libs how is it different from what I propose? > > Again, just as an example, consider splitting off Serialization, Spirit, > Wave, Python, Mpi. This not just a couple of libs, it's a significant > portion of the Boost code base, and you still don't have to find out > which release of Bind goes with which release of Function and so on. 1. What about long term? Big libraries will continue to appear in boost. 2. How do you decide which library qualify for separation? Does Multi-index big-enough or complex enough? How about Date/Time? Or MPL? Regex? Thread? One of the biggest is Preprocessor. Does it qualify? On the other hand there many small components that other libraries has very little dependency upon. Why not separate them? >>> consider how long people have been waiting for the libraries that were >>> introduced/improved in 1.34, not to mention those that are expected for >>> 1.35... >>> >>>> 3. Who make this decision? Which libraries are "core" and which are >>>> standalone? >>> This will have to be agreed upon, considering size, dependencies and >>> breadth of applicability. >> >> This will never going to be agreed upon. > > Why do you think so? Spirit is already been developed independently and > Robert R. clearly intends to do the same with Serialization. I'm sure > that for them it would be easier to release their libraries against a > stable release of the rest of Boost than against a moving target. IMO what you proposing is a moving target ;). The examples you provide, just prove my point: independent library development is the way to go. The only difference is that you propose strange (IMO) combination of existing mess and proper way. If we decide that independent development is the way to go, why not make it a rule: all libraries are developed independently. >>> Ideally library authors should offer to split >>> off their libraries if they think it reasonable. In a way Robert Ramey >>> is already heading in a similar direction with Serialization. I think he >>> should be encouraged to do so, but within an agreed upon setup, rather >>> than in total independence, so that other authors may benefit from the >>> experience gained. >> >> There is no real incentive for a library author to do extra work of >> keeping >> up with independent "core" releases. In my scheme you can't add "core" >> library release to the boost release until all the libs that depend on >> LATEST version of the library are tested against it. What you propose >> essentially leave every developer of separated libraries on their own. >> While >> the same time developers of the "core" libraries are still faces with >> current interdependencies problem. > > Keeping up with independent core releases is *less* work than keeping up > with a continuously moving target and the reason I'm writing all this > is because I don't *want* split library authors to be left on their own > or to go their separate ways. Rather I want Boost to be split into a > small number of packages and I'm convinced that this would improve > noticeably the manageability of the whole lot. Where did you see a moving target in my proposal? In my makefile I specify: my lib depends on A:abc B:xyz. It can be concrete versions for a long time. Once my development is completed, I may switch to the LATEST versions of dependent components. With some luck it just compiles, otherwise I had to make fixes to comply. Alternatively I may decide that I do not want to continue development and don't switch to LATEST. In later case my library is not included in next version of boost. The only tough case is if library A that depends on LATEST version of library B and it was not fixed. Who is at fault here? My proposal is postpone library A release. There will be come complications here. But I believe they all can be resolved. The key advantage above is separation of my own development from development of dependent libraries. Gennadiy

Rereading your proposal I realize that what I have in mind is not all that different: First, I think that there's a set of libraries that appear to be too closely interconnected to handle them as separate components; I'm thinking of bind/function/signals, but there may be others. Second, I wish that Boost was also made available as a small number of packages, so that users could choose to only install those that they actually need. For the rest, I'm convinced that the techniques you propose stand an even better chance of working at a coarser degree of granularity. Cheers, Nicola Musatti

"Nicola Musatti" <Nicola.Musatti@gmail.com> wrote in message news:f1hdjg$pnq$1@sea.gmane.org...
Rereading your proposal I realize that what I have in mind is not all that different:
First, I think that there's a set of libraries that appear to be too closely interconnected to handle them as separate components; I'm thinking of bind/function/signals, but there may be others.
Yes. I agree. I also thought that some small numeber of mosr widely used components should constitute Boost.Core component. Boost.Signals I would split though. It's special purpose (though generic) library. Smart_ptr/config/utility/type_traits are good candidates for core.
Second, I wish that Boost was also made available as a small number of packages, so that users could choose to only install those that they actually need.
Yes. I believe we will need to consider procedure for packaging of independent library releases. This is a second stage though.
For the rest, I'm convinced that the techniques you propose stand an even better chance of working at a coarser degree of granularity.
Gennadiy

"Robert Ramey" <ramey@rrsd.com> wrote in message news:f1egi4$rqi$1@sea.gmane.org...
Gennadiy Rozental wrote:
Hi,
I do plan to attend this session. I've got some ideas on the subject. I pitched them while ago once. And I still believe that the synchronization is the root of all evils. The only real solution to break this deadloop is independent libraries versioning. This should resolve both out release and testing issues (which are closely connected IMO)
This is on the right track - but seems way too complicated.
Here is what I plan to do from now on:
a) I will load the latest released Boost on my machine. b) I will make a (CVS or SVN or what ever) tree on my machine which includes only the serialization library files. c) I will tweak the build process so it will look into my serialization library tree before it looks in to the the latest Boost Official release. d) I will make changes in my tree as is convenient. I will test against on my local system against latest boost release. e) When it passes all my tests on the compilers I have. I will do the following: i) check in my changes into whatever tree boost decides it wants to ii) zip up the files which are different from the last boost release and upload the ziped file to a place on my website. The website will contain instructions on how to set up one's include paths so that the latest validated serialization library can be used. iii) version number isn't critical for me. Easiest would be the date of the upload. Serialization library would be "validated against the latest released version of boost" f) I will include better and more complete instructions for users to test the library on their own systems. Any users who want help with compilers I haven't tested will have to run complete test suite and report the results.
This doesn't address neither testing against new version of your library by developers of other libraies nor dealing with testing agains updates of component your library depends on. Formalization of independent versioning should allow us deal with it. Gennadiy

Gennadiy Rozental wrote:
"Robert Ramey" <ramey@rrsd.com> wrote in message This doesn't address neither testing against new version of your library by developers of other libraies
Sure it does - they can either test against the HEAD as they do now or they update their local installation
nor dealing with testing agains updates of component your library depends on.
This amounts to using the tests for one library as tests of the dependent library. There are two problems with this. First its very inefficient and arbitrary as the tests in my library aren't presume that the rest of boost works so they aren't designed to test the rest of boost. And if an error occurs then it has to be determined which of the components, libraries, build system, etc which are simultaneously changing is the cause of the failure. This wastes huge amount of time. That being said. If the rest of boost things that the serialization library is a good way test the other libraries - fine with me. Its not an issue for me at all.
Formalization of independent versioning should allow us deal with it.
I think boost as a concept has to change from tightly coupled group of "standard" libraries to a loosely coupled group of "interoperable" libraries. At its core I think this is the issue. Good news is that its not something that we have to agree upon. Its happening now as it must as boost get beyond the scale where it can be digested as a whole. Robert Ramey

"Robert Ramey" <ramey@rrsd.com> wrote in message news:f1gnvf$bv8$1@sea.gmane.org...
Gennadiy Rozental wrote:
"Robert Ramey" <ramey@rrsd.com> wrote in message This doesn't address neither testing against new version of your library by developers of other libraies
Sure it does - they can either test against the HEAD as they do now or they update their local installation
Who is going to do this? Regression testers?
nor dealing with testing agains updates of component your library depends on.
This amounts to using the tests for one library as tests of the dependent library. There are two problems with this. First its very inefficient and arbitrary as the tests in my library aren't presume that the rest of boost works so they aren't designed to test the rest of boost. And if an error occurs then it has to be determined which of the components, libraries, build system, etc which are simultaneously changing is the cause of the failure. This wastes huge amount of time.
That being said. If the rest of boost things that the serialization library is a good way test the other libraries - fine with me. Its not an issue for me at all.
I am not sure how your reply relates to my comment. What I meant is: How can YOU test that your library works with the next version of let's say Boost.Test I am planning to include into next release. Let's say I do my development in branch new_dev. How could you test your lib against this branch, but allow all other libraries still to be tested against HEAD?
Formalization of independent versioning should allow us deal with it.
I think boost as a concept has to change from tightly coupled group of "standard" libraries to a loosely coupled group of "interoperable" libraries. At its core I think this is the issue. Good news is that its not something that we have to agree upon. Its happening now as it must as boost get beyond the scale where it can be digested as a whole.
This comment I completely agree with. I just porpose how add some formalization to this process.
Robert Ramey

on Thu May 03 2007, "Gennadiy Rozental" <gennadiy.rozental-AT-thomson.com> wrote:
Hi,
I do plan to attend this session. I've got some ideas on the subject. I pitched them while ago once. And I still believe that the synchronization is the root of all evils. The only real solution to break this deadloop is independent libraries versioning. This should resolve both out release and testing issues (which are closely connected IMO)
INDEPENDENT LIBRARY VERTIONING
I just want to point out that the BoostCon "Testing Boost" sprint is about our testing processes and infrastructure, not about the structure of Boost itself and its release process. I agree that the latter is an important topic and I hope we'll discuss it at BoostCon, but it's a much tougher issue (to form consensus on, and to solve) than is the topic of the sprint. In order to ensure that the sprint is successful, I strongly request that in Rene's session, we concentrate on the topic at hand. Thank you, -- Dave Abrahams Boost Consulting www.boost-consulting.com Don't Miss BoostCon 2007! ==> http://www.boostcon.com

on Fri May 04 2007, David Abrahams <dave-AT-boost-consulting.com> wrote:
on Thu May 03 2007, "Gennadiy Rozental" <gennadiy.rozental-AT-thomson.com> wrote:
Hi,
I do plan to attend this session. I've got some ideas on the subject. I pitched them while ago once. And I still believe that the synchronization is the root of all evils. The only real solution to break this deadloop is independent libraries versioning. This should resolve both out release and testing issues (which are closely connected IMO)
INDEPENDENT LIBRARY VERTIONING
I just want to point out that the BoostCon "Testing Boost" sprint is about our testing processes and infrastructure, not about the structure of Boost itself and its release process. I agree that the latter is an important topic and I hope we'll discuss it at BoostCon, but it's a much tougher issue (to form consensus on, and to solve) than is the topic of the sprint. In order to ensure that the sprint is successful,
I strongly request that in Rene's session, we concentrate on the topic at hand.
Thank you,
Oh, and I request that those discussing the topic Gennadiy raises start a new thread or at minimum, change the subject line. Thanks again, -- Dave Abrahams Boost Consulting www.boost-consulting.com Don't Miss BoostCon 2007! ==> http://www.boostcon.com

"David Abrahams" <dave@boost-consulting.com> wrote in message news:87lkg4d5sj.fsf@valverde.peloton...
on Thu May 03 2007, "Gennadiy Rozental" <gennadiy.rozental-AT-thomson.com> wrote:
Hi,
I do plan to attend this session. I've got some ideas on the subject. I pitched them while ago once. And I still believe that the synchronization is the root of all evils. The only real solution to break this deadloop is independent libraries versioning. This should resolve both out release and testing issues (which are closely connected IMO)
INDEPENDENT LIBRARY VERTIONING
I just want to point out that the BoostCon "Testing Boost" sprint is about our testing processes and infrastructure, not about the structure of Boost itself and its release process. I agree that the latter is an important topic and I hope we'll discuss it at BoostCon, but it's a much tougher issue (to form consensus on, and to solve) than is the topic of the sprint. In order to ensure that the sprint is successful,
I strongly request that in Rene's session, we concentrate on the topic at hand.
While in general I agree, IMO discussing how we are going to test the libraries before deciding how the process is organized in general is like putting cart in front of a horse. Part of my proposition directly affect the way testing needs to be organized. Gennadiy.

on Fri May 04 2007, "Gennadiy Rozental" <gennadiy.rozental-AT-thomson.com> wrote:
"David Abrahams" <dave@boost-consulting.com> wrote in message news:87lkg4d5sj.fsf@valverde.peloton...
I just want to point out that the BoostCon "Testing Boost" sprint is about our testing processes and infrastructure, not about the structure of Boost itself and its release process. I agree that the latter is an important topic and I hope we'll discuss it at BoostCon, but it's a much tougher issue (to form consensus on, and to solve) than is the topic of the sprint. In order to ensure that the sprint is successful,
I strongly request that in Rene's session, we concentrate on the topic at hand.
While in general I agree, IMO discussing how we are going to test the libraries before deciding how the process is organized in general is like putting cart in front of a horse. Part of my proposition directly affect the way testing needs to be organized.
Clearly. And a whole bunch of other things. It's a radical, sweeping change to how we do things that raises lots of knotty questions. This cart is already being driven by an ox, as it were, and we can't afford to buy a horse yet, nor can we easily agree that your horse is the best way to pull the cart. I hope we can upgrade the cart much sooner than we could agree on all that, and I know the process of doing so will stand us in good stead no matter how we decide to change the release process. There seems to be, more-or-less, a consensus on the list that in the near term, we'll be working with a variation of the plan Beman posted some time ago, which is already quite different from what we're doing now. We only have a short time to work on the testing problem in Aspen, and IMO much too little time there to agree on your plan. Again, In order to avoid derailing the sprint, I ask that we limit the scope of what we're considering there to the topic of the sprint. -- Dave Abrahams Boost Consulting www.boost-consulting.com Don't Miss BoostCon 2007! ==> http://www.boostcon.com

There are a great many things that could (and should) be discussed with respect to the boost infrastructure, as well as the development process. This is about testing, though, so I'd like to restrict my arguments to that as much as possible. I hear (and share) various complaints about the existing testing procedure: * Test runs require a lot of resources. * Test runs take a lot of time. * There is no clear (visual) association between test results and code revisions. * There are (sometimes) multiple test runs for the same platform, so the absolute number of failures has no meaning (worse, two test runs for the same platform may result in differing outcomes, because some environment variables differ and are not accounted for). Now let me contrast that to some utopic boost testing harness with the following characteristics: * The boost repository stores code, as well as a description of platforms, configurations that the code should be tested on. * The overal space of tests is chunked by some local harness into small-scale test suites, accessible for volunteers to run. * Contributors subscribe by providing some well controlled environment in which such test suites can run. The whole works somewhat similar to seti@home (say), i.e. users merely install some 'slave' that then contacts the master to retrieve individual tasks, sending back results as they are ready. * The master harness then collects results, generates reports, and otherwise postprocesses the incoming data. For example, individual slaves may be associated with some confidence ('trust') about the validity of the results (after all, there is always that last bit of uncontrolled environment potentially affecting test runs...) What does it take to get there ? I think there are different paths to pursue, more or less independently. 1) The test run procedure should be made more and more autonomous, requiring less hand-holding by the user. The less parameters there are for users to set, the less error-prone (or at least, subject of interpretation) the results become. This also implies a much enhanced facility to report platform characteristics from the user's platform as part of the test run results. (In fact, this should be reported upfront, as these data determine what part of the mosaic the slave will actually execute.) 2) The smaller tasks, as well as the more convenient handling, should increase parallelism, leading to a shorter turn-around. That, together with better annotation should allow the report generator to more correctly associate test results with code versions, helping developers to better understand what changeset a regression relates to. I think that a good tool to use for 1) is buildbot (http://buildbot.net/trac). It allows to formalize the build process. The only remaining unknown is the environment seen by the buildslaves when they are started. However, a) all environment variables are reported, and b) we can encapsulate the slave startup further to control the environmental variables to be seen by the build process. As far as the size of tasks (test suites) is concerned, this question is related to the discussion concerning modularity. Individual test runs should at most run a single toolchain on a single library, but may be even less (a single build variant, say). Keeping modularity at that level also allows to parametrize test sub-suites. For example, the boost.python testsuite may need to be tested against different python versions, while boost.mpi needs to be tested against different MPI backends / versions. Etc. Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...

on Wed May 02 2007, Rene Rivera <grafikrobot-AT-gmail.com> wrote:
Boost Community,
The Boost Conference will be here in very short order! And I need to get feedback for the "Testing Boost" sprint we'll be doing <http://boostcon.com/program/sessions#rivera-testing-sprint>. If you have any ideas, concerns, off the wall comments, etc. at *minimum* send them directly to me at "rrivera/acm.org". Why that account? Well because that will be the one account I can guarantee I will access to during the conference, and hence people can send me stuff up to the last minute. Not that I'm encouraging lateness, but I know how busy we all are ;-)
If you are going to send it to the Boost dev list you will also want to eventually send me a summary follow up, if needed, from the ensuing discussions.
Okay, here are some issues I think ought to be solved, in no particular order, some (much) more important than others: * inaccurate header dependency tracing impedes useful incremental testing * report usability is poor for most constituencies. This point could be explored in much more detail. * no mechanism to pinpoint the checkin that caused a regression * no mechanism for communicating that tests are known not to work on some platform (toolset+os) to the build system, and so avoid building them * generating XML by parsing the jam log is fragile and prevents the use of multiple build processes (-jN). This one should be almost embarassingly easy to fix. * The smallest granularity of test that any tester can contribute is the entire Boost suite for one toolset, which makes for longer turnaround. * No way for a developer to request testing results for a specific branch * No "nanny system" that bothers the developer who causes a regression until it is fixed. I think I've thought of others from time-to-time; these are just the ones off the top of my head this morning. -- Dave Abrahams Boost Consulting www.boost-consulting.com Don't Miss BoostCon 2007! ==> http://www.boostcon.com

David Abrahams wrote:
Okay, here are some issues I think ought to be solved, in no particular order, some (much) more important than others:
David, I'm happy to see you bring up much the same points I mentioned. Here is one I haven't yet got to, and would like to explore a bit further:
* generating XML by parsing the jam log is fragile and prevents the use of multiple build processes (-jN). This one should be almost embarassingly easy to fix.
I think there is more to this than only the ability to run tests in parallel. For example, it would help to robustify the testing harness if the 'test database' could be inspected without actually executing any test. By that I mean the ability to: * See all tests, as part of the test database structure (i.e. their organization into test suites) * See meta data associated with tests, such as - what kind of test - expected outcome, per platform - dependencies, prerequisites, etc. In fact, this wishlist is influenced by the fact that I'm working on QMTest (http://www.codesourcery.com/qmtest), where these aspects are an important part of the overall design. (In fact, I have been trying to convince Vladimir to make it possible to hook bbv2 (at least the part related to testing) up with QMTest in support of the above features. Note also that test report generation and a graphical user interface are an integral part of QMTest.) Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...

on Fri May 04 2007, Stefan Seefeld <seefeld-AT-sympatico.ca> wrote:
David Abrahams wrote:
Okay, here are some issues I think ought to be solved, in no particular order, some (much) more important than others:
David,
I'm happy to see you bring up much the same points I mentioned. Here is one I haven't yet got to, and would like to explore a bit further:
* generating XML by parsing the jam log is fragile and prevents the use of multiple build processes (-jN). This one should be almost embarassingly easy to fix.
I think there is more to this than only the ability to run tests in parallel. For example, it would help to robustify the testing harness if the 'test database' could be inspected without actually executing any test. By that I mean the ability to:
* See all tests, as part of the test database structure (i.e. their organization into test suites)
* See meta data associated with tests, such as
- what kind of test - expected outcome, per platform - dependencies, prerequisites, etc.
How would that help with robustness? -- Dave Abrahams Boost Consulting www.boost-consulting.com Don't Miss BoostCon 2007! ==> http://www.boostcon.com

David Abrahams wrote:
* See all tests, as part of the test database structure (i.e. their organization into test suites)
* See meta data associated with tests, such as
- what kind of test - expected outcome, per platform - dependencies, prerequisites, etc.
How would that help with robustness?
There have been a number of cases where the regression harness picked up (and reported) stale results, just because old executables / results were lying around in the build tree. It didn't 'know' that the test actually had been removed from the test suite. But even in a somewhat broader sense, being able to inspect the test database with all its metadata IMO contributes to robustness. Robustness through transparency... Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...

Stefan Seefeld wrote:
David Abrahams wrote:
* See all tests, as part of the test database structure (i.e. their organization into test suites)
* See meta data associated with tests, such as
- what kind of test - expected outcome, per platform - dependencies, prerequisites, etc.
How would that help with robustness?
There have been a number of cases where the regression harness picked up (and reported) stale results, just because old executables / results were lying around in the build tree. It didn't 'know' that the test actually had been removed from the test suite.
Boost.Build outputs the list of all tests. The only reason regression.py does not use that information to prune test results is because nobody implemented such pruning. Just to clarify... - Volodya

(sorry, dropped the cc by mistake) David Abrahams wrote:
Okay, here are some issues I think ought to be solved, in no particular order, some (much) more important than others:
[...] * No way for a test to fail with a "skipped" status (gray box). This is similar to
* no mechanism for communicating that tests are known not to work on some platform (toolset+os) to the build system, and so avoid building them
but it can rely on Boost.Config macros, which may not be easily accessible to the build system.

on Fri May 04 2007, "Gennadiy Rozental" <gennadiy.rozental-AT-thomson.com> wrote:
"David Abrahams" <dave@boost-consulting.com> wrote in message news:876478d51y.fsf@valverde.peloton...
* No way for a developer to request testing results for a specific branch
And against particular branch.
How is "for a specific branch" different from "against a particular branch?" it seems as though you're repeating what I meant to say (?)
That's what independent versioning should address.
I don't see a relationship between the two issues. The ability to request test results against a specific branch has obvious utility; you can try out changes and see how portable they are without disturbing general test results. Independent versioning, by it self, does not address that need. You'd still need to implement all the same testing infrastructure, so I don't see the relevance here. -- Dave Abrahams Boost Consulting www.boost-consulting.com Don't Miss BoostCon 2007! ==> http://www.boostcon.com

"David Abrahams" <dave@boost-consulting.com> wrote in message news:87zm4k87rw.fsf@valverde.peloton...
on Fri May 04 2007, "Gennadiy Rozental" <gennadiy.rozental-AT-thomson.com> wrote:
"David Abrahams" <dave@boost-consulting.com> wrote in message news:876478d51y.fsf@valverde.peloton...
* No way for a developer to request testing results for a specific branch
And against particular branch.
How is "for a specific branch" different from "against a particular branch?" it seems as though you're repeating what I meant to say (?)
My library depend in lib A, B, C. I want to test against new_dev branch of A, HEAD branch of B and 1_33 branch of C.
That's what independent versioning should address.
I don't see a relationship between the two issues. The ability to request test results against a specific branch has obvious utility; you can try out changes and see how portable they are without disturbing general test results. Independent versioning, by it self, does not address that need. You'd still need to implement all the same testing infrastructure, so I don't see the relevance here.
Yes. independent versioning has little to do with "testing for a specific branch". But has everything to do with "testing against particular branch" ;) Gennadiy

on Sat May 05 2007, "Gennadiy Rozental" <gennadiy.rozental-AT-thomson.com> wrote:
"David Abrahams" <dave@boost-consulting.com> wrote in message news:87zm4k87rw.fsf@valverde.peloton...
on Fri May 04 2007, "Gennadiy Rozental" <gennadiy.rozental-AT-thomson.com> wrote:
"David Abrahams" <dave@boost-consulting.com> wrote in message news:876478d51y.fsf@valverde.peloton...
* No way for a developer to request testing results for a specific branch
And against particular branch.
How is "for a specific branch" different from "against a particular branch?" it seems as though you're repeating what I meant to say (?)
My library depend in lib A, B, C. I want to test against new_dev branch of A, HEAD branch of B and 1_33 branch of C.
So assemble a branch that contains everything you want, check it in, and request the test. SVN is really good for that sort of thing because of the way it copies "by reference".
That's what independent versioning should address.
I don't see a relationship between the two issues. The ability to request test results against a specific branch has obvious utility; you can try out changes and see how portable they are without disturbing general test results. Independent versioning, by it self, does not address that need. You'd still need to implement all the same testing infrastructure, so I don't see the relevance here.
Yes. independent versioning has little to do with "testing for a specific branch". But has everything to do with "testing against particular branch" ;)
I still don't see it, sorry. -- Dave Abrahams Boost Consulting www.boost-consulting.com Don't Miss BoostCon 2007! ==> http://www.boostcon.com

2007/5/5, David Abrahams <dave@boost-consulting.com>:
on Sat May 05 2007, "Gennadiy Rozental" <gennadiy.rozental-AT-thomson.com> wrote:
My library depend in lib A, B, C. I want to test against new_dev branch of A, HEAD branch of B and 1_33 branch of C.
So assemble a branch that contains everything you want, check it in, and request the test. SVN is really good for that sort of thing because of the way it copies "by reference".
Is it possible to easily find out what such a branch contains? I.e. Is there a command that will give the result "new_dev branch of A, HEAD branch of B and 1_33 branch of C". /$

David Abrahams wrote:
on Sat May 05 2007, "Gennadiy Rozental" <gennadiy.rozental-AT-thomson.com> wrote:
"David Abrahams" <dave@boost-consulting.com> wrote in message news:87zm4k87rw.fsf@valverde.peloton...
on Fri May 04 2007, "Gennadiy Rozental" <gennadiy.rozental-AT-thomson.com> wrote:
"David Abrahams" <dave@boost-consulting.com> wrote in message news:876478d51y.fsf@valverde.peloton...
* No way for a developer to request testing results for a specific branch And against particular branch. How is "for a specific branch" different from "against a particular branch?" it seems as though you're repeating what I meant to say (?) My library depend in lib A, B, C. I want to test against new_dev branch of A, HEAD branch of B and 1_33 branch of C.
So assemble a branch that contains everything you want, check it in, and request the test. SVN is really good for that sort of thing because of the way it copies "by reference".
Switching branches for testing practically implies non-incremental testing. That can't be done efficiently without significant changes to how we run tests or to the amount of hardware available for testing. Regards, m

"David Abrahams" <dave@boost-consulting.com> wrote in message news:87tzur78cq.fsf@valverde.peloton...
on Sat May 05 2007, "Gennadiy Rozental" <gennadiy.rozental-AT-thomson.com> wrote:
My library depend in lib A, B, C. I want to test against new_dev branch of A, HEAD branch of B and 1_33 branch of C.
So assemble a branch that contains everything you want, check it in, and request the test. SVN is really good for that sort of thing because of the way it copies "by reference".
Does it means that my branch will keep track of changes made in referenced branches? If not it's useless in practice. Gennadiy

on Sun May 06 2007, "Gennadiy Rozental" <gennadiy.rozental-AT-thomson.com> wrote:
"David Abrahams" <dave@boost-consulting.com> wrote in message news:87tzur78cq.fsf@valverde.peloton...
on Sat May 05 2007, "Gennadiy Rozental" <gennadiy.rozental-AT-thomson.com> wrote:
My library depend in lib A, B, C. I want to test against new_dev branch of A, HEAD branch of B and 1_33 branch of C.
So assemble a branch that contains everything you want, check it in, and request the test. SVN is really good for that sort of thing because of the way it copies "by reference".
Does it means that my branch will keep track of changes made in referenced branches? If not it's useless in practice.
You could do that with svn:externals.

David Abrahams writes:
Okay, here are some issues I think ought to be solved, in no particular order, some (much) more important than others:
* inaccurate header dependency tracing
... and the fact that Jamfiles/rule files themselves are not included as dependencies (see "Incremental testing is not reliable" on http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?Boost.Testing) ...
impedes useful incremental testing
[...]
* generating XML by parsing the jam log is fragile and prevents the use of multiple build processes (-jN). This one should be almost embarassingly easy to fix.
And enormously beneficial, IMO. -- Aleksey Gurtovoy MetaCommunications Engineering
participants (20)
-
"JOAQUIN LOPEZ MU?Z"
-
Aleksey Gurtovoy
-
Anthony Williams
-
David Abrahams
-
Emil Dotchevski
-
Gennadiy Rozental
-
Hartmut Kaiser
-
Henrik Sundberg
-
Janek Kozicki
-
Jeff Garland
-
Martin Wille
-
Michael Fawcett
-
Michael Marcin
-
Nicola Musatti
-
Peter Dimov
-
Rene Rivera
-
Robert Ramey
-
Sohail Somani
-
Stefan Seefeld
-
Vladimir Prus