[ANN] Format Lite

Jonathan Turkanis

7 Nov 2004 7 Nov '04

6:41 a.m.

Dear All, I've just finished documenting a small library, available here: http://home.comcast.net/~jturkanis/format_lite/ It's a lightweight version of Reece Dunne's recently reviewed Output Formatters library. It was inspired by the discussion of Reece's library on this list and by the discussion of TR1 tuple i/o at the recent standards committee meeting in Redmond. ----- Format Lite is a lightweight, easy-to-learn framework for formatting data structures such as standard library containers and Boost tuples. It provides a reasonable degree of customizability with an emphasis on human-readable output formats useful for testing and debugging. A future version of Format Lite could be a candidate for incorporation into C++0x, to compensate for the lack of standard iostreams inserters and extractors for standard library containers, and to make existing inserters and extractors - such as those provided in <complex> - more flexible. Format lite provides three function templates: * boost::io::punctuate, used to specify punctuation sequences and options for line-breaks and indentation. * operator<<, used to insert ranges and tuple-like objects into standard output streams. * operator>>, used to extract ranges and tuple-like objects from standard input streams. ----- Best Regards, Jonathan

Show replies by date

Thorsten Ottosen

7 Nov 7 Nov

4:17 p.m.

Hi Jonathan, A smalll comment. Could syntax like << punctuate< vector<string> >("{ ", ", ", " }")be made into<< punctuate( my_variable )( "{ ", ", ", " }" )to deduce the arguments instead of specifying them?-Thorsten

Jonathan Turkanis

4:49 p.m.

"Thorsten Ottosen" <nesotto@cs.auc.dk> wrote in message news:cmlhsv$63u$1@sea.gmane.org...

...

Hi Jonathan,

A smalll comment.

Could syntax like << punctuate< vector<string> >("{ ", ", ", " }")be made into<< punctuate( my_variable )( "{ ", ", ", " }" )to deduce the arguments instead of specifying them?-Thorsten

Hi Thorsten, Yes, it could work this way. But one of the features of this library -- which I realize now I never pointed out in the docs -- is that formatting options for a particular type or collection of types can be set once and then used many times. cout << punctuate< vector<_> >("{ ", ", ", " }"); .... vector<string> v1 = list_of(...); cout << v1; ... vector< list<string> > v2 = list_of(...); cout << v2 .... At the time you set the formatting options you might not have an instance of the type lying around. Even if you have an instance of the outer type, you might want to specify how some deeply nested types are formatted, and it may be hard to get to them. Finally, I'm not sure how you would indicate whether the formatting options apply to all specializations of a given template or only to exact matches, without introducing more complex notation. I guess I could add more overloads of punctuate and let users choose. BTW, I used the Assign library in the examples and regression tests, and I don't think I could have done without it. Thanks! Jonathan

Thorsten Ottosen

5:50 p.m.

"Jonathan Turkanis" <technews@kangaroologic.com> wrote in message news:cmlj0d$8o9$1@sea.gmane.org... | BTW, I used the Assign library in the examples and regression tests, and I don't | think I could have done without it. Thanks! you're welcome. A small nitpick: list_of( pair_type("Sofa", "Living Room") ) ( pair_type("Stove", "Kitchen") ) can be done as list_of<pair_type>( "Sofa", "Living Room")( "Stove", "Kitchen") -Thorsten

Jonathan Turkanis

6:44 p.m.

"Thorsten Ottosen" <nesotto@cs.auc.dk> wrote in message news:cmlnb5$j8s$1@sea.gmane.org...

...

"Jonathan Turkanis" <technews@kangaroologic.com> wrote in message news:cmlj0d$8o9$1@sea.gmane.org...

| BTW, I used the Assign library in the examples and regression tests, and I don't | think I could have done without it. Thanks!

you're welcome. A small nitpick:

list_of( pair_type("Sofa", "Living Room") ) ( pair_type("Stove", "Kitchen") ) can be done as list_of<pair_type>( "Sofa", "Living Room")( "Stove", "Kitchen") -Thorsten

Thanks. So // pseudocode initialization vector< list< pair<string, string> > > test = { { { "London", "England"}, { "Paris", "France"} }, { { "Sofa", "Living Room"}, { "Stove", "Kitchen"} }, { { "Brain", "Skull"}, { "Appendix", "Abdomen"} } }; translates to typedef pair<string, string> pair_type; typedef list<pair_type> list_type; vector< pair<string, string> > test = list_of<list_type>( list_of<pair_type>("London", "England") ("Paris", "France") ) ( list_of<pair_type>("Sofa", "Living Room") ("Stove", "Kitchen") ) ( list_of<pair_type>("Brain", "Skull") ("Appendix", "Abdomen") ); ? Jonathan

Thorsten Ottosen

8:36 p.m.

"Jonathan Turkanis" <technews@kangaroologic.com> wrote in message news:cmlpo2$p7t$1@sea.gmane.org... | | "Thorsten Ottosen" <nesotto@cs.auc.dk> wrote in message | news:cmlnb5$j8s$1@sea.gmane.org... | > "Jonathan Turkanis" <technews@kangaroologic.com> wrote in message | > news:cmlj0d$8o9$1@sea.gmane.org... | > | > | > | BTW, I used the Assign library in the examples and regression tests, and I | > don't | > | think I could have done without it. Thanks! | > | > you're welcome. A small nitpick: | > | > | > list_of( pair_type("Sofa", "Living Room") ) | > ( pair_type("Stove", "Kitchen") ) | > can be done as list_of<pair_type>( "Sofa", "Living Room")( "Stove", | > "Kitchen") | > -Thorsten | | Thanks. So | | // pseudocode initialization | vector< list< pair<string, string> > > test = | { { { "London", "England"}, { "Paris", "France"} }, | { { "Sofa", "Living Room"}, { "Stove", "Kitchen"} }, | { { "Brain", "Skull"}, { "Appendix", "Abdomen"} } }; | | translates to | | typedef pair<string, string> pair_type; | typedef list<pair_type> list_type; | | vector< pair<string, string> > test = | list_of<list_type>( | list_of<pair_type>("London", "England") | ("Paris", "France") | ) | ( | list_of<pair_type>("Sofa", "Living Room") | ("Stove", "Kitchen") | ) | ( | list_of<pair_type>("Brain", "Skull") | ("Appendix", "Abdomen") | ); | | ? well, yes. If everything worked as I would expect for a conforming implementation, you can even say = list_of( map_list_of( "London", Englang" )( "Sofa", "Living Room" ) ) ( map_list_of( ... ) ); but, as said, g++ and/or g++'s standard library has problems...either in pair or iterator range constructors. -Thorsten

Martin

8 Nov 8 Nov

7:46 a.m.

Just a small question. Why didn't you use a custom locale facet (e.g. sequence_punct) instead of the punctuate object. With the current implementation you need to have access to the stream to be able to specify the format. If a facet is used you can use the formatter with lexical_cast and the program options library.

Jonathan Turkanis

7:26 p.m.

"Martin" <adrianm@touchdown.se> wrote in message news:loom.20041108T084035-80@post.gmane.org...

...

Just a small question.

Why didn't you use a custom locale facet (e.g. sequence_punct) instead of the punctuate object.

...

With the current implementation you need to have access to the stream to be able to specify the format.

Originally, I was undecided about whether to use facets or pword, so I provided two implementations and a preprocessor symbol BOOST_FORMAT_LITE_USE_PWORD which the user could define. At some point, I decided it was bad manners to keep changing a stream's locale. By the way, there would be one facet for each selector, e.g., sequence_punc< vector<_>, char >.

...

If a facet is used you can use the formatter with lexical_cast and the program options library.

I'm not yet familiar with the program options library. With lexical_cast, there's no way to specify a locale, so I guess you're talking about setting the global locale. Right? I'm afraid this might be bad manners, too. Mybe I can provide both options. Jonathan

Vladimir Prus

9 Nov 9 Nov

6:39 a.m.

Jonathan Turkanis wrote:

...

...
If a facet is used you can use the formatter with lexical_cast and the program options library.

I'm not yet familiar with the program options library. With lexical_cast, there's no way to specify a locale, so I guess you're talking about setting the global locale. Right? I'm afraid this might be bad manners, too. Mybe I can provide both options.

Or maybe lexical_cast is just bad solution. I recently wanted to read hex number from stream, which lexical_cast does not support. I ended up writing my own from_string class template which can be used like this: string s = "0x1A"; int i = from_string<int>(std::hex)(s); and the same class can be used to pass punctuator, as well: from_string<vector<int> >(punctuator(....))(s); The obvious implementation is at http://zigzag.cs.msu.su/~ghost/from_string.hpp - Volodya

Jonathan Turkanis

7:26 p.m.

"Vladimir Prus" <ghost@cs.msu.su> wrote in message news:cmponu$27l$1@sea.gmane.org...

...

Jonathan Turkanis wrote:

...

...
I'm not yet familiar with the program options library. With lexical_cast, there's no way to specify a locale, so I guess you're talking about setting the global locale. Right? I'm afraid this might be bad manners, too. Mybe I can provide both options.

Or maybe lexical_cast is just bad solution. I recently wanted to read hex number from stream, which lexical_cast does not support. I ended up writing my own from_string class template which can be used like this:

string s = "0x1A"; int i = from_string<int>(std::hex)(s);

and the same class can be used to pass punctuator, as well:

from_string<vector<int> >(punctuator(....))(s);

Nice. This should work out-of-the-box.

...

- Volodya

Jonathan

Martin

7:42 a.m.

...

the user could define. At some point, I decided it was bad manners to keep changing a stream's locale.

Why is it bad manners to set the streams locale? The default locale is "classic" but I always change it to the user's preferred locale. For some streams (e.g. to generate SQL statements) I specify classic locale to avoid locale based formatting. A custom facet doesn't change the locale, it just adds new functionality to the existing locale just as pword adds new functionality to streams. Both tribool and date_time library uses custom facets for the formatting.

...

By the way, there would be one facet for each selector, e.g., sequence_punc< vector<_>, char >.

...

I'm not yet familiar with the program options library. With lexical_cast, there's no way to specify a locale, so I guess you're talking about setting

Yes, but that is not different from your manipulator implementation. To make it easy you could have a facet where the constructor accepts a punctuate object. It would work something like this: std::locale mylocale(std::locale(), new sequence_punct(punctuate< pair<string, string> >("[ ", " : ", " ]")); stream.imbue(mylocale); stream << xxx the

...

global locale. Right? I'm afraid this might be bad manners, too.

See above about bad manners but you are right about that the global locale needs to be changed. I suggested that program_options library should include an imbue method but I don't think the author agreed. lexical_cast is just broken.

...

Mybe I can provide both options.

Should be easy, just check if the facet exists and if not check the pword.

Jonathan Turkanis

5:24 p.m.

"Martin" <adrianm@touchdown.se> wrote in message news:loom.20041109T081650-836@post.gmane.org...

...

...
the user could define. At some point, I decided it was bad manners to keep changing a stream's locale.

Why is it bad manners to set the streams locale? The default locale is "classic" but I always change it to the user's preferred locale. For some streams (e.g. to generate SQL statements) I specify classic locale to avoid locale based formatting.

Personally I don't feel strongly about it. But P.J.Plaugher, in his proposal to add additional code convertion components to the standard library (http://tinyurl.com/5hal5), mentions the fact that the components do not modify a stream's locale as a feature. I thought others might share this view, and didn't see any reason to prefer a locale-based solution, so I used pword instead. Of course this can be reconsidered.

...

A custom facet doesn't change the locale, it just adds new functionality to the existing locale just as pword adds new functionality to streams.

It does change the locale, but the new locale is mostly a copy of the old one. (I'm sure you know this.)

...

Both tribool and date_time library uses custom facets for the formatting.

Okay, then this change might improve consitency with the rest of Boost.

...

...
By the way, there would be one facet for each selector, e.g., sequence_punc< vector<_>, char >.

Yes, but that is not different from your manipulator implementation.

I'm aware of that.

...

To make it easy you could have a facet where the constructor accepts a punctuate object. It would work something like this:

std::locale mylocale(std::locale(), new sequence_punct(punctuate< pair<string, string> >("[ ", " : ", " ]")); stream.imbue(mylocale); stream << xxx

sequence_punct would need some template parameter, so it would be better to use object generators, as in std::locale mylocale(std::locale(), punctuate_locale< pair<string, string>

...

( ... ));

Or how about: std::locale mylocale = std::locale() + punctuate< pair<string, string> >(...); :-) ?

...

...
I'm not yet familiar with the program options library. With lexical_cast, there's no way to specify a locale, so I guess you're talking about setting the global locale. Right? I'm afraid this might be bad manners, too.

See above about bad manners but you are right about that the global locale needs to be changed. I suggested that program_options library should include an imbue method but I don't think the author agreed.

lexical_cast is just broken.

...
Mybe I can provide both options.

Should be easy, just check if the facet exists and if not check the pword.

Right, but get_punctuation() for tuple-like objects already checks for three different facets using pword, and it really should be four (since I forgot to allow users to specify a default punctuation sequence for all types.) Doubling this would lead to a lot of checking. Best Regards, Jonathan

Pavel Vozenilek

10 Nov 10 Nov

3:19 p.m.

...

I've just finished documenting a small library, available here:

http://home.comcast.net/~jturkanis/format_lite/

I took (brief) look and have question about feasibility of other idea: - would it be possible to combine format_lite functionality with Boost.Serialization to take advantage of both libs? Imagine solution like: // the formatting info is provided via boost::archive compatible object formatted_text_oarchive arch(a_ostream, default_formatting_settings ....); arch << my_data; class MyObject { template<class Archive> void serialize(Archive& ar, const unsigned) { .... normal serialization code, used when we DO NOT do formatting output } // specialization for debug formatting template<> void serialize<formatted_text_oarchive>(....) { ar << my_vector; // default formatting will apply ar << "some info text..."; ar.increase_indentation(); // use different formatting for next vector punctuate<vector<...>(ar)(....); ar << my_other_vector; } }; The advantages I see: - the whole infrastructure of Boost.Serialization is available and ready and it handles all situations like cycles. format_lite could concentrate on just formatting. - the debugging output can be separated from other serialization types (but doesn't need to be) - formatting directives can be "inherited" from "higher level of data" to "lower levels". Newly added data would not need formatting of its own by default. Change on higher level would propagate itself "down". - indentation for pretty printing could be handled (semi)automatically by formatting archive. - multiple formatting styles could be provided for any class. My experience is that Serialization is quite easy to use and lightweight enough so I do not consider it any disadvantage for practical use. /Pavel

Jonathan Turkanis

11 Nov 11 Nov

7:09 p.m.

"Pavel Vozenilek" <pavel_vozenilek@hotmail.com> wrote in message news:cmu1n0$p1j$1@sea.gmane.org...

...

...
I've just finished documenting a small library, available here:

http://home.comcast.net/~jturkanis/format_lite/

I took (brief) look and have question about feasibility of other idea:

- would it be possible to combine format_lite functionality with Boost.Serialization to take advantage of both libs?

Imagine solution like:

// the formatting info is provided via boost::archive compatible object formatted_text_oarchive arch(a_ostream, default_formatting_settings ....);

arch << my_data;

class MyObject { template<class Archive> void serialize(Archive& ar, const unsigned) { .... normal serialization code, used when we DO NOT do formatting output } // specialization for debug formatting template<> void serialize<formatted_text_oarchive>(....) {

I believe this specialization is illegal. You could write void serialize(formatted_text_oarchive&ar, const unsigned) but I can't say whether this will work (I seem to remember Robert saying somewhere in the documentation that he was relying on the fact that non-templates are better matches than templates.)

...

ar << my_vector; // default formatting will apply

ar << "some info text..."; ar.increase_indentation();

// use different formatting for next vector punctuate<vector<...>(ar)(....); ar << my_other_vector; } };

I have two separate ideas for formatting libraries: - one lightweight, which I posted, for input and output of ranges and tuples-like objects - one for output only, which allows much more customization; I see this as an inverse of Spirit Your suggestion looks similar to the second (except that you want to support input), so let me sketch my idea (which has improved since I sketched it here last time), and then ask some questions about yours. ---- The motivation for both libraries is that the mechanism provided by the standard library for formatted output (overloading operator<<) leaves all the work to the designer of the type to be formatted. The designer determines the extent to which formatting is customizable, if at all. In particular, the user has no say about how nested objects are formatted, unless they are formatted by delegating to ostream::operator<<, in which case the user is at the mercy of the designer of the nested type. ---- The main limitation of Format Lite is that at the point where an object is formatted: out << obj the static type of obj is available, but the static type of any fancy formatting information added to out has been collapsed to some concrete type known in advance. My idea is to solve this problem with an ostream wrapper that has an ostream-compatible interface and contains stylistic information as part of its static type: ostream out; catfish fish; styled_ostream<cajun_style> cajun_out(out); cajun_out<< fish; // formats fish using cajun_style The main template is template<typename Style, typename Ch, typename Tr> struct styled_ostream; Concepts: Formatter - provides access to a boolean metafunction which returns true for types which can be formatted by its instances - defines a templates operator() like so struct Formatter { /* ... */ tempalte<typename StyledOstream, typename T> void operator()(StyledOstream& out, const T& t) const { // writes t to out, using out's ostream interfaces // as well as its additional styled_ostream members } }; - formatters for sequences or tuple-like types can be specified with expression templates, e.g., str("[") << _2 << ":" _1 << ")" Style - default constructable - provides access to a collection of Formatters and Styles - for a given type, can search its collections and produce an appropriate formatter, or a default formatter if none has been specified for that type - the manner in which Formatters and Styles are composed to produce additional Styles yields a cascading effect similar to CSS The advantages of this approach are: - an arbitrary amount of contextual information, such as indentation and numbering, can be stored in the styled_stream and accessed directly by formatters - arbitrary user-defined types can be formatted non-intrusively - flexible formatting is built-in for sequences and tuple-like types (and user-defined types can choose to present themselves as sequences or tuple-like types to take advantage of this feature.) ---- "Pavel Vozenilek" <pavel_vozenilek@hotmail.com> wrote:

...

The advantages I see: - the whole infrastructure of Boost.Serialization is available and ready and it handles all situations like cycles. format_lite could concentrate on just formatting.

This is a big plus, obviously. (However, I remember Robert saying her prefered to keep formatting and serialization separate.)

...

- the debugging output can be separated from other serialization types (but doesn't need to be)

- formatting directives can be "inherited" from "higher level of data" to "lower levels". Newly added data would not need formatting of its own by default. Change on higher level would propagate itself "down".

Can you explain how this works?

...

- indentation for pretty printing could be handled (semi)automatically by formatting archive.

Would this involve modifying the archive interface? I'd like a formatter for a given type (or an overloaded serialize function) to be able to access these properties directly.

...

- multiple formatting styles could be provided for any class.

It would be one formatting style for each archive type for which serialize has been specialized, correct? Would this allow styles for various types to be mixed freely?

...

My experience is that Serialization is quite easy to use and lightweight enough so I do not consider it any disadvantage for practical use.

I've read the Serialization documentation, but haven't used it yet. I've noticed it takes a long time to build, but this is probably because of the various combinations of threading, debugging and linking options.My inclination is to keep formatting separate from serialization, though, because they have different aims. If you believe they can be integrated, without making serialization harder to learn or sacrifying flexibility of formatting options, it should definitely be considered.

...

/Pavel

Jonathan

Pavel Vozenilek

9:19 p.m.

"Jonathan Turkanis" wrote:

...

...
// specialization for debug formatting template<> void serialize<formatted_text_oarchive>(....) {

I believe this specialization is illegal. You could write

void serialize(formatted_text_oarchive&ar, const unsigned)

but I can't say whether this will work (I seem to remember Robert saying somewhere in the documentation that he was relying on the fact that non-templates are better matches than templates.)

Yes, this (nontemplated overloaded function) will work. I just wrote down an idea in haste.

...

I have two separate ideas for formatting libraries:

- one lightweight, which I posted, for input and output of ranges and tuples-like objects - one for output only, which allows much more customization; I see this as an inverse of Spirit

Your suggestion looks similar to the second (except that you want to support input), so let me sketch my idea (which has improved since I sketched it here last time), and then ask some questions about yours.

Formatted input would be optional (and maybe not practical). I do not understand what are avantages of the "lightweight" approach (except compile time). Is switch between lightweight and heavyweight solution easy? [snip]

...

The advantages of this approach are:

- an arbitrary amount of contextual information, such as indentation and numbering, can be stored in the styled_stream and accessed directly by formatters - arbitrary user-defined types can be formatted non-intrusively - flexible formatting is built-in for sequences and tuple-like types (and user-defined types can choose to present themselves as sequences or tuple-like types to take advantage of this feature.)

...

...
The advantages I see: - the whole infrastructure of Boost.Serialization is available and ready and it handles all situations like cycles. format_lite could concentrate on just formatting.

This is a big plus, obviously. (However, I remember Robert saying her

I feel having formatting descendant of boost::archive stream could be made with the same features. prefered

...

to keep formatting and serialization separate.)

Formatting, as I see it would just use Serialization as infrastructure. There would be no inpact on Serialization from Formatting.

...

...
- formatting directives can be "inherited" from "higher level of data" to "lower levels". Newly added data would not need formatting of its own by default. Change on higher level would propagate itself "down".

Can you explain how this works?

...

...
- indentation for pretty printing could be handled (semi)automatically by formatting archive.

Would this involve modifying the archive interface? I'd like a formatter for a given type (or an overloaded serialize function) to be able to access

I mean trick with RAII (its not really Serialization feature): void serialize(formatting_archive& ar, const unsigned) { // change currently used formatting style formatting_setter raii(ar, "... formattng directives...") ar & data; <<== new formatting style will be used // destructor of raii object will revert formatting back } these

...

properties directly.

Yes, formatting archive could have any additional interface.

...

...
- multiple formatting styles could be provided for any class.

It would be one formatting style for each archive type for which serialize has been specialized, correct? Would this allow styles for various types to be mixed freely?

Yes, serialize() function would be specialized. I see three ways to customize output: 1. Formatting archive has its own configuration how to output data. This keeps overall style coherent and should be enough for most uses. 2. Specialization of serialize() could change formatting style. This may be used to fine tune the output here or there. 3. Specializations of serialize() may generate different outputs altogether. E.g. if you have archives: class high_level_formatting_archive {...} class all_details_formatting_archive { ... } you can omit details in void serialize(high_level_formatting_archive& ar, const unsigned); and use them all in void serialize(all_details_formatting_archive& ar, const unsigned); I think this (option 3) is not possible now with format_lite.

...

...
My experience is that Serialization is quite easy to use and lightweight enough so I do not consider it any disadvantage for practical use.

I've read the Serialization documentation, but haven't used it yet. I've noticed it takes a long time to build, but this is probably because of the various combinations of threading, debugging and linking options.

I do use Serialization (with BCB). It can be put into precompiled header and this makes compilation as fast as w/o any Serialization support.

...

My inclination is to keep formatting separate from serialization, though, because they have different aims. If you believe they can be integrated, without making serialization harder to learn or sacrifying flexibility of formatting options, it should definitely be considered.

I see Serialization as just vehicle, ready and handy and almost invisible to Formatting users. Simple data structures are very easy with Serialization and should be as easy as with format_lite now. If user tries to format tricky structures (e.g. pImpl) he would need to dig into Serialization docs but at least there will be chance to make the whole thing work. The Serialization goes to great lengths to work under any situation and configuration. It is also likely other Boost libs will have support for Serialization and Formatting could use it as is. /Pavel

Jonathan Turkanis

12 Nov 12 Nov

4:54 p.m.

[I apologize if this message shows up twice -- I sent it last night and it still hasn't appeared] "Pavel Vozenilek" <pavel_vozenilek@hotmail.com> wrote in message news:cn0l08$cul$1@sea.gmane.org...

...

"Jonathan Turkanis" wrote:

...
...
// specialization for debug formatting template<> void serialize<formatted_text_oarchive>(....) {

I believe this specialization is illegal. You could write

void serialize(formatted_text_oarchive&ar, const unsigned)

but I can't say whether this will work (I seem to remember Robert saying somewhere in the documentation that he was relying on the fact that non-templates are better matches than templates.)

Yes, this (nontemplated overloaded function) will work. I just wrote down an idea in haste.

...
I have two separate ideas for formatting libraries:

- one lightweight, which I posted, for input and output of ranges and tuples-like objects - one for output only, which allows much more customization; I see this as an inverse of Spirit

Your suggestion looks similar to the second (except that you want to support input), so let me sketch my idea (which has improved since I sketched it here last time), and then ask some questions about yours.

...

Formatted input would be optional (and maybe not practical).

I do not understand what are advantages of the "lightweight" approach (except compile time).

The aim of Format Lite was to present a facility which would be a candidate for standardization, filling a need that was expressed at the most recent library working group meeting. Towards this end, 1. I tried to keep the implementation as small as possible 2. I introduced just one new function template -- punctuate() -- in the public interface 3. I used the same syntax currently recommended for formatting user-defined types (overloading iostreams operators >> and <<) 4. I introduced no new class templates in the public interface 5. I introduced no new concepts. (Strictly speaking, I use Single Pass Range and Extensible Range, but these could be replaced by the standard library container concepts.) If people like the interface (and so far there's not much evidence), I think Format Lite would stand a reasonable chance of making it into TR2. To get Serialization standardized would require a much bigger push, IMO, although I'd like to see it happen -- perhaps with additional language support.

...

Is switch between lightweight and heavyweight solution easy?

Yes, since formatting with Format Lite would still be the default when a Style provides no specific formatting options for a given range or tuple-like type. In the following vector< string > v = list_of( ... ); ostream out; styled_ostream<cajun_style> cajun_out(out); cajun_out << v; If none of the Styles or Formatters associated with cajun_style knows how to format a vector, cajun_out will delegate formatting to the underlying ostream out, which will use the operator<< from Format Lite.

...

[snip]

...
The advantages of this approach are:

- an arbitrary amount of contextual information, such as indentation and numbering, can be stored in the styled_stream and accessed directly by formatters - arbitrary user-defined types can be formatted non-intrusively - flexible formatting is built-in for sequences and tuple-like types (and user-defined types can choose to present themselves as sequences or tuple-like types to take advantage of this feature.)

I feel having formatting descendant of boost::archive stream could be made with the same features.

Good.

...

...
...
The advantages I see: - the whole infrastructure of Boost.Serialization is available and ready and it handles all situations like cycles. format_lite could concentrate on just formatting.

This is a big plus, obviously. (However, I remember Robert saying her prefered to keep formatting and serialization separate.)

Formatting, as I see it would just use Serialization as infrastructure. There would be no inpact on Serialization from Formatting.

You mean no changes to the library code -- we would just define additional archive concepts and types?

...

...
...
- formatting directives can be "inherited" from "higher level of data" to "lower levels". Newly added data would not need formatting of its own by default. Change on higher level would propagate itself "down".

Can you explain how this works?

I mean trick with RAII (its not really Serialization feature):

void serialize(formatting_archive& ar, const unsigned) { // change currently used formatting style formatting_setter raii(ar, "... formattng directives...") ar & data; <<== new formatting style will be used // destructor of raii object will revert formatting back }

I see. I think this is a characteristic of all schemes where stylistic info is stored in the stream or stream-like object.

...

...
...
- indentation for pretty printing could be handled (semi)automatically by formatting archive.

Would this involve modifying the archive interface? I'd like a formatter for a given type (or an overloaded serialize function) to be able to access these properties directly.

Yes, formatting archive could have any additional interface.

I see. But no changes to existing archive types.

...

...
...
- multiple formatting styles could be provided for any class.

It would be one formatting style for each archive type for which serialize has been specialized, correct? Would this allow styles for various types to be mixed freely?

Yes, serialize() function would be specialized.

What I meant to ask can be illustrated by an example. Suppose you have two classes, Duck and Goose. Duck and Goose each have two associated formatting styles. The choice of styles should be independent, so we would need four archive types to handle the various combinations. Now my question is: would Duck need four specializations of serialize, or just two? In my system, formatting options for Duck and Goose could be added to a Style independently; I want to know if overloading serialize can handle this.

...

I see three ways to customize output:

1. Formatting archive has its own configuration how to output data. This keeps overall style coherent and should be enough for most uses.

2. Specialization of serialize() could change formatting style. This may be used to fine tune the output here or there.

3. Specializations of serialize() may generate different outputs altogether.

E.g. if you have archives: class high_level_formatting_archive {...} class all_details_formatting_archive { ... }

you can omit details in

void serialize(high_level_formatting_archive& ar, const unsigned);

and use them all in

void serialize(all_details_formatting_archive& ar, const unsigned);

I think this (option 3) is not possible now with format_lite.

In fact, it only supports 1. That's part of what makes it 'lite'. If a class already provides standard library inserters and extractors (corresponding to 2, above), those provided by Format Lite will not be called. I have a couple of questions: 1. Is your idea flexible enough to allow pairs (a,b) to be formatted with the elements in reverse order? 2. If a type defines a member function serialize, can it be bypassed altogether in favor of an end-user supplied formatting style?

...

...
My inclination is to keep formatting separate from serialization, though, because they have different aims. If you believe they can be integrated, without making serialization harder to learn or sacrifying flexibility of formatting options, it should definitely be considered.

...

I see Serialization as just vehicle, ready and handy and almost invisible to Formatting users.

...

Simple data structures are very easy with Serialization and should be as easy as with format_lite now.

...

If user tries to format tricky structures (e.g. pImpl) he would need to dig into Serialization docs but at least there will be chance to make the whole thing work. The Serialization goes to great lengths to work under any situation and configuration.

This sounds quite reasonable, provided it is sufficiently flexible. I wonder how much of the Serialization infrastructure is really needed, though. Detecting cycles is definitely not something I want to reimplement; OTOH, I'm not sure it's needed for pretty-printing. I haven't looked at the Serialization implementation, but I did read the Java serialization specification several years ago. IIRC, when an object was encountered for the second time, some sort of placeholder would be inserted in the stream referencing the already serialized data. I assume the Serialization library does something like this. Would this really be desirable for human-readable output? Perhaps the formatting library should concern itself only with cycle-free data structures.

...

/Pavel

Jonathan

Robert Ramey

5:38 p.m.

...

If people like the interface (and so far there's not much evidence), I

"Jonathan Turkanis" <technews@kangaroologic.com> wrote in message news:cn2p63$97u$1@sea.gmane.org... think

...

Format Lite would stand a reasonable chance of making it into TR2.

For what its worth, I like the interface. I much appreciate a small library whose usage can be completely described in a couple of pages of documentation. I could easily imagine someone using this to implement a custom archive in the serialization system.

...

To get Serialization standardized would require a much bigger push, IMO, although I'd like to see it happen -- perhaps with additional language support.

I would never expect such a think to happen for a number of reasons. However, it could occur that the standards (compiler and library) might be enhanced to better support serialization. This includes things like better support for type traits, reflection, guid assignment and export, options to override/require code inclusion. It might also impose requirements the libraries expose enough information to permit serialization. All these issues created difficulties in making the serialization library.

...

You mean no changes to the library code -- we would just define additional archive concepts and types?

...

What I meant to ask can be illustrated by an example. Suppose you have two classes, Duck and Goose. Duck and Goose each have two associated

...

styles. The choice of styles should be independent, so we would need four archive types to handle the various combinations.

Now my question is: would Duck need four specializations of serialize, or just two? In my system, formatting options for Duck and Goose could be added to a Style independently; I want to know if overloading serialize can handle

...

1. Is your idea flexible enough to allow pairs (a,b) to be formatted with

additional archive types. formatting this. Its not clear to me how this would be done with your proposal. Would you condition code depending on? A macro, commenting out code, some execution time switch?. Whatever method you use to implement this idea with the formating library would carry over to the implementation of the same idea with the serialization library. What Pavel's idea does is to introduce the idea of an archive format selector which would choose between different desired formats. Of course the formt library could easily do the same thing by defining derivations from output streams and implementing different versions of operator<< for each one. The usage would be identical. the

...

elements in reverse order?

very much so

...

2. If a type defines a member function serialize, can it be bypassed altogether in favor of an end-user supplied formatting style?

...

I wonder how much of the Serialization infrastructure is really needed,

...

Detecting cycles is definitely not something I want to reimplement; OTOH, I'm not sure it's needed for pretty-printing. I haven't looked at the Serialization implementation, but I did read the Java serialization specification several years ago. IIRC, when an object was encountered for the second time, some sort of placeholder would be inserted in the stream referencing the already serialized data. I assume the Serialization library does something like

...

Would this really be desirable for human-readable output? Perhaps the

one would define a member function serialize for a specific archive class. This would bypass the standard templated one generally defined. though. this. That is correct. However, the behavior can be suppressed with class serialization triaits. In the next version there will exist the ability to suppress this and a couple of other facilities on a archve by archive basis. This will be implemented to make the serialization library more useful for such things as transaction rollback/recovery and debug logging. formatting

...

library should concern itself only with cycle-free data structures.

Your library is very attractive and easy to use. I can imagine that people will find it attractive and start to use it. Then you start to get requests like, can I use it for pointers?, what about multiple copies, what about built in support for arrays, etc on and on and on. The your very readable document starts to get cluttered up with special cases (e.g. cycles, pointers, etc), and your code starts to get complicated with code to detect violations (e.g. detecting cycles, etc). You really can't get the monkey off your back until you end up covering just about everything. oh - but then its not simple anymore and you have to go back and try to re-rationalize it. The point is that the serialization library has already been through that mill - and more or less emerged intact. Robert Ramey

Jonathan Turkanis

7:21 p.m.

"Robert Ramey" <ramey@rrsd.com> wrote in message news:cn2se7$j8n$1@sea.gmane.org...

...

"Jonathan Turkanis" <technews@kangaroologic.com> wrote in message news:cn2p63$97u$1@sea.gmane.org...

...
If people like the interface (and so far there's not much evidence), I think Format Lite would stand a reasonable chance of making it into TR2.

For what its worth, I like the interface. I much appreciate a small library whose usage can be completely described in a couple of pages of documentation.

Thanks.

...

...
To get Serialization standardized would require a much bigger push, IMO, although I'd like to see it happen -- perhaps with additional language support.

I would never expect such a think to happen for a number of reasons.

I'm not expecting it. But I'd still like to have a standard serialization library, since it fills such a common neeed, and Boost.Serialization is the definitive serialization library for C++.

...

However, it could occur that the standards (compiler and library) might be enhanced to better support serialization. This includes things like better support for type traits, reflection, guid assignment and export, options to override/require code inclusion. It might also impose requirements the libraries expose enough information to permit serialization. All these issues created difficulties in making the serialization library.

This is what I meant by additional language support. It's a higher priority for me than new standard libraries, because it increases the scope of what portable libraries can do.

...

...
You mean no changes to the library code -- we would just define additional archive concepts and types?

additional archive types.

If the archives are to have additional member functions, as Pavel suggested, it would probably be good to codify them with a new archive concept.

...

...
What I meant to ask can be illustrated by an example. Suppose you have two classes, Duck and Goose. Duck and Goose each have two associated formatting styles. The choice of styles should be independent, so we would need four archive types to handle the various combinations.

Now my question is: would Duck need four specializations of serialize, or just two? In my system, formatting options for Duck and Goose could be added to a Style independently; I want to know if overloading serialize can handle this.

Its not clear to me how this would be done with your proposal.

It would look something like this: struct Duck; struct Gooose; struct DuckStuffedWithPork : single_class_formatter<Duck> { template<typename StyledOstream> void operator()(StyledOstream& out, const Duck& d) { /**/ } }; struct BlackenedGoose : single_class_formatter<Goose> { template<typename StyledOstream> void operator()(StyledOstream& out, const Goose& d) { /**/ } }; struct cajun_style : style< use<Duck, DuckStuffedWithPork>, use<Goose, BlackenedGoose> > { }; styled_ostream<cajun_style> cajun_out(cout); If DuckStuffedWithPork and BlackenedGoose weren't default constructible, I'd have to write a cajun_style constructor sepcifying instances of these types.

...

Whatever method you use to implement this idea with the formating library would carry over to the implementation of the same idea with the serialization library.

...

From what you and Pavel are saying, I'm hoping the above could be implemented just by defining a templated archive type parameterized by a Style:

template<typename Style, ... > class styled_oarchive;

...

What Pavel's idea does is to introduce the idea of an archive format selector which would choose between different desired formats.

Of course the formt library could easily do the same thing by defining derivations from output streams and implementing different versions of operator<< for each one. The usage would be identical.

I prefer to definine an ostream wrapper with an ostream-compatible interface. This way you can use several different wrappers with the same ostream and there's no possibility of passing it to a function taking an ostream& argument, which would cause the formatting information to be lost.

...

...
1. Is your idea flexible enough to allow pairs (a,b) to be formatted with the elements in reverse order?

very much so

...
2. If a type defines a member function serialize, can it be bypassed altogether in favor of an end-user supplied formatting style?

one would define a member function serialize for a specific archive class. This would bypass the standard templated one generally defined.

Would the member function be a member of the archive or the type to be formatted? I'd like to be able to tell an archive to format a Duck in a particular way whether the Duck likes it or not.

...

...
I wonder how much of the Serialization infrastructure is really needed, though. Detecting cycles is definitely not something I want to reimplement; OTOH, I'm not sure it's needed for pretty-printing. I haven't looked at the Serialization implementation, but I did read the Java serialization specification several years ago. IIRC, when an object was encountered for the second time, some sort of placeholder would be inserted in the stream referencing the already serialized data. I assume the Serialization library does something like this.

That is correct. However, the behavior can be suppressed with class serialization triaits. In the next version there will exist the ability to suppress this and a couple of other facilities on a archve by archive basis. This will be implemented to make the serialization library more useful for such things as transaction rollback/recovery and debug logging.

Sounds good.

...

...
Would this really be desirable for human-readable output? Perhaps the formatting library should concern itself only with cycle-free data structures.

Your library is very attractive and easy to use. I can imagine that people will find it attractive and start to use it. Then you start to get requests like, can I use it for pointers?, what about multiple copies, what about built in support for arrays, etc on and on and on. The your very readable document starts to get cluttered up with special cases (e.g. cycles, pointers, etc), and your code starts to get complicated with code to detect violations (e.g. detecting cycles, etc). You really can't get the monkey off your back until you end up covering just about everything. oh - but then its not simple anymore and you have to go back and try to re-rationalize it. The point is that the serialization library has already been through that mill - and more or less emerged intact.

Okay, I get the picture ;-) I have no intention of adding stuff like this to Format Lite, but I can easily imagine it happening with the more ambitious version. So I'm inclined now inclined to try to reuse the serialization machinery, once I see how I can be done. But first I'd like to get a better sense of the problems that might arise. Regarding pointers, would you agree that most of the thorniest issues relating to pointers disappear if you concern yourself with output only, since you wouldn't have to detect or record the fully derived type or have a mechanism for constructing new objects? Would cycle-detection be the only remaining issue? Outputting arrays should be no problem. Format Lite should be able to handle them except that I forgot to add a speciailzation of is_singe_pass_range for arrays. By multiple copies, do you mean multiple pointers that point to the same object? I think an output formatting library should just output each as if the other didn't exist. Thanks for your help.

...

Robert Ramey

Jonathan

Robert Ramey

8:28 p.m.

"Jonathan Turkanis" <technews@kangaroologic.com> wrote in message news:cn31ps$64u$1@sea.gmane.org...

...

...
...
You mean no changes to the library code -- we would just define

additional

...

...
...
archive concepts and types?

additional archive types.

If the archives are to have additional member functions, as Pavel suggested, it would probably be good to codify them with a new archive concept.

I understand Pavel's proposals to be more specialized serialiizations. This can be either more specialized member templates or more specialized free templates. The idea that an archive is totally independent of the types it serializes would remain intact. What would happen is that now the serialization of a selected classes would be dependent upon the type of archive used. If we wanted a special type of archive, we could derive from an existing one just to get a new type which can be used to specialize the templated serialze function

...

...
...
What I meant to ask can be illustrated by an example. Suppose you have

two

...

...
...
classes, Duck and Goose. Duck and Goose each have two associated formatting styles. The choice of styles should be independent, so we would need four archive types to handle the various combinations.

Now my question is: would Duck need four specializations of serialize, or just two? In my system, formatting options for Duck and Goose could be added to a Style independently; I want to know if overloading serialize can handle this.

Its not clear to me how this would be done with your proposal.

It would look something like this:

struct Duck; struct Gooose;

struct DuckStuffedWithPork : single_class_formatter<Duck> { template<typename StyledOstream> void operator()(StyledOstream& out, const Duck& d) { /**/ } };

struct BlackenedGoose : single_class_formatter<Goose> { template<typename StyledOstream> void operator()(StyledOstream& out, const Goose& d) { /**/ } };

struct cajun_style : style< use<Duck, DuckStuffedWithPork>, use<Goose, BlackenedGoose> > { };

styled_ostream<cajun_style> cajun_out(cout);

If DuckStuffedWithPork and BlackenedGoose weren't default constructible, I'd have to write a cajun_style constructor sepcifying instances of these types.

...
Whatever method you use to implement this idea with the formating library would carry over to the implementation of the same idea with the serialization library.

...
From what you and Pavel are saying, I'm hoping the above could be implemented just by defining a templated archive type parameterized by a Style:

template<typename Style, ... > class styled_oarchive;

...
What Pavel's idea does is to introduce the idea of an archive format selector which would choose between different desired formats.

Of course the formt library could easily do the same thing by defining derivations from output streams and implementing different versions of operator<< for each one. The usage would be identical.

I don't really understand the above example but here is what I think you want to do using free serialization functions. // default serialization template<class Archive> void serialize(Archive &ar, Duck &d){ ar & d; } // serialization cajun style template<> void serialize(cajun_style_oarchive &ar, Duck &d){ ar & punctuation("&*^#@#@&" & d; } // define a type for cajun style archives// this is just a named wrapper for text archives. class cajun_style_archive : public text_archive {}; // serialize normal style Duck d; text_oarchive tar; tar << d; // serialize cajun style cajun_style_oarchive car; car << d; That would be about it. Of course you're serialize template specialization could be as elaborat as you which to accomodate some program drivien customization. Notice that none of requires alteration of either the classes to be serialized or the archives used.

...

I prefer to definine an ostream wrapper with an ostream-compatible interface. This way you can use several different wrappers with the same ostream and there's no possibility of passing it to a function taking an ostream& argument, which would cause the formatting information to be lost.

I think that's what I meant to say.

...

...
...
2. If a type defines a member function serialize, can it be bypassed altogether in favor of an end-user supplied formatting style?

one would define a member function serialize for a specific archive class. This would bypass the standard templated one generally defined.

...

Would the member function be a member of the archive or the type to be formatted?

of the type to be serialized/formatted not of the archive. see above

...

I'd like to be able to tell an archive to format a Duck in a particular way whether the Duck likes it or not.

...

...
...
Would this really be desirable for human-readable output? Perhaps the formatting library should concern itself only with cycle-free data structures.

Your library is very attractive and easy to use. I can imagine that

Then you wouldn't specialize the duck serialization for this particular type of archive and it would just use the default one. Or you would people

...

...
will find it attractive and start to use it. Then you start to get requests like, can I use it for pointers?, what about multiple copies, what about built in support for arrays, etc on and on and on. The your very readable document starts to get cluttered up with special cases (e.g. cycles, pointers, etc), and your code starts to get complicated with code to detect violations (e.g. detecting cycles, etc). You really can't get the monkey off your back until you end up covering just about everything. oh - but then its not simple anymore and you have to go back and try to re-rationalize it. The point is that the serialization library has already been through that mill - and more or less emerged intact.

Okay, I get the picture ;-)

Regarding pointers, would you agree that most of the thorniest issues relating to pointers disappear if you concern yourself with output only, since you wouldn't have to detect or record the fully derived type or have a mechanism for constructing new objects?

Hmm - maybe - about 5 seconds reflection raise questions like - what about pointers to abstract base classes. Should they serialized the named class, or the most derived class. Should the library use be permited to choose? If so how would he do this. What about systems that don't support RTTI?

...

Would cycle-detection be the only remaining issue? Note the question of cycles is not limited to pointers. References could also cycle

I have no idea

...

By multiple copies, do you mean multiple pointers that point to the same object? I think an output formatting library should just output each as if the other didn't exist.

If you want to avoid an infinite loops if a cycle occurs, then you'll automatically have multiple detection. Will some user ask for a method to inhibit/enable this? As I've said before, its not really a technical issue. Its a question of what happens when supplies an elegant solution to part of a hard problem. Depending on the problem domain, that may be just fine. With serialization its exceedingly annoying to start using the system and start to depend upon it only to find that there's an area where you can't use it. Then you're faced with making some sort of kludge to get around it. and the whole appeal of having a "definitive" solution goes out the windows and users howl. Its a slippery slope. Once you start its hard to stop until you get to the end. Robert Ramey

Jonathan Turkanis

9:21 p.m.

"Robert Ramey" <ramey@rrsd.com> wrote in message news:cn36dn$j1i$1@sea.gmane.org...

...

"Jonathan Turkanis" <technews@kangaroologic.com> wrote in message

...

...
If the archives are to have additional member functions, as Pavel suggested, it would probably be good to codify them with a new archive concept.

I understand Pavel's proposals to be more specialized serialiizations. This can be either more specialized member templates or more specialized free templates. The idea that an archive is totally independent of the types it serializes would remain intact. What would happen is that now the serialization of a selected classes would be dependent upon the type of archive used.

If we wanted a special type of archive, we could derive from an existing one just to get a new type which can be used to specialize the templated serialze function

Okay.

...

...
It would look something like this:

struct Duck; struct Gooose;

struct DuckStuffedWithPork : single_class_formatter<Duck> { template<typename StyledOstream> void operator()(StyledOstream& out, const Duck& d) { /**/ } };

struct BlackenedGoose : single_class_formatter<Goose> { template<typename StyledOstream> void operator()(StyledOstream& out, const Goose& d) { /**/ } };

struct cajun_style : style< use<Duck, DuckStuffedWithPork>, use<Goose, BlackenedGoose> > { };

styled_ostream<cajun_style> cajun_out(cout);

...

I don't really understand the above example

The point is that you can define several formatters for Duck and several for Goose, and then combine them freely.

...

but here is what I think you want to do using free serialization functions.

// default serialization template<class Archive> void serialize(Archive &ar, Duck &d){ ar & d; }

// serialization cajun style template<> void serialize(cajun_style_oarchive &ar, Duck &d){ ar & punctuation("&*^#@#@&" & d; }

// define a type for cajun style archives// this is just a named wrapper for text archives. class cajun_style_archive : public text_archive {};

// serialize normal style Duck d; text_oarchive tar; tar << d;

// serialize cajun style cajun_style_oarchive car; car << d;

That would be about it.

The problem comes here (quoting from the above): void serialize(cajun_style_oarchive &ar, Duck &d){ <snip> } I don't want the way Duck is formatted by a cajun_style_oarchive to involve cajun_style_oarchive at all. It should involve just a Duck formatter which can be combined with other formatters to form any number of styled archives. For example, struct style1 : style< use<Duck, DuckStuffedWithPork>, use<Goose, BlackenedGoose> > { }; struct style2 : style< use<Duck, DuckStuffedWithPork>, use<Goose, DeepFriedGoose> > { }; struct style3 : style< use<Duck, DuckStuffedWithPork>, use<Goose, SteamedGooseWithWalnuts> > { }; Here the code for DuckStuffedWithPork needs to be written just once. I wouldn't want to have to write separate specializations class style1_oarchive : public text_archive { }; class style2_oarchive : public text_archive { }; class style3_oarchive : public text_archive { }; template<> void serialize(style1_oarchive &ar, Duck &d){ // .. } template<> void serialize(style2_oarchive &ar, Duck &d){ // .. } template<> void serialize(style3_oarchive &ar, Duck &d){ // .. } Here I have to repeat the commented-out code three times. If you want to vary styles for two or more types independently this leads to a combinatorial explosion.

...

Of course you're serialize template specialization could be as elaborat as you which to accomodate some program drivien customization. Notice that none of requires alteration of either the classes to be serialized or the archives used.

...

...
Regarding pointers, would you agree that most of the thorniest issues relating to pointers disappear if you concern yourself with output only, since you wouldn't have to detect or record the fully derived type or have a mechanism for constructing new objects?

Hmm - maybe - about 5 seconds reflection raise questions like - what about pointers to abstract base classes. Should they serialized the named class, or the most derived class. Should the library use be permited to choose? If so how would he do this. What about systems that don't support RTTI?

I think these are bigger problems for serialization than for formatting. My feeling is that formatting pointers according to their static type would be sufficient for a formatting library, even for abstract base classes, since you don't need to be able to recover the lost information. A user can always supply a formatter for the abstract base class which does exactly what she wants.

...

If you want to avoid an infinite loops if a cycle occurs, then you'll automatically have multiple detection. Will some user ask for a method to inhibit/enable this?

Yes, I understand that cycle detection remains an issue. The hard part would seem to be implementing cycle detection; providing a switch to turn it off seems simple. I guess your point is that the serialization library has already made these choices. I'm really interested to see what all the issues are.

...

As I've said before, its not really a technical issue. Its a question of what happens when supplies an elegant solution to part of a hard problem. Depending on the problem domain, that may be just fine. With serialization its exceedingly annoying to start using the system and start to depend upon it only to find that there's an area where you can't use it. Then you're faced with making some sort of kludge to get around it. and the whole appeal of having a "definitive" solution goes out the windows and users howl.

Its a slippery slope. Once you start its hard to stop until you get to the end.

I think a good way to proceed would be to write a quick sample implementation, using ostream wrappers, ignoring some of the messy issues such as formatting pointers. Then I can ask you if it is possible to imitate the behavior using the serialization infrastructure. This will be at least a few months away, however.

...

Robert Ramey

Jonathan

Pavel Vozenilek

13 Nov 13 Nov

9:27 a.m.

"Jonathan Turkanis" wrote:

...

Here the code for DuckStuffedWithPork needs to be written just once. I wouldn't want to have to write separate specializations

class style1_oarchive : public text_archive { }; class style2_oarchive : public text_archive { }; class style3_oarchive : public text_archive { };

template<> void serialize(style1_oarchive &ar, Duck &d){ // .. }

template<> void serialize(style2_oarchive &ar, Duck &d){ // .. }

template<> void serialize(style3_oarchive &ar, Duck &d){ // .. }

Here I have to repeat the commented-out code three times. If you want to vary styles for two or more types independently this leads to a combinatorial explosion.

This could be solved by: class common_style_oarchive : public text_archive {...}; class style1_oarchive : public common_style_oarchive { }; class style2_oarchive : public common_style_oarchive { }; class style3_oarchive : public common_style_oarchive { }; template<> void serialize(common_style_oarchive &ar, Duck &d){ // .. } This should cover typical situations. When more combinations are needed there could be something as: class common_style_oarchive : public text_archive {...}; class style1_oarchive : public common_style_oarchive { }; class style2_oarchive : public common_style_oarchive { }; class style3_oarchive : public common_style_oarchive { }; template<Archive> void serialize(common_style_oarchive&, Duck& d) { // .... } template<> void serialize(style1_oarchive &ar, Duck &d){ // switch temporarily style for Duck or whatever style_changer_raii_like<Duck> changer(ar, "......"); serialize<common_style_oarchive>(ar, d); } template<> void serialize(style2_oarchive &ar, Duck &d){ style_changer_raii_like<Duck> changer(ar, "......"); serialize<common_style_oarchive>(ar, d); } template<> void serialize(style3_oarchive &ar, Duck &d){ style_changer_raii_like<Duck> changer(ar, "......"); serialize<common_style_oarchive>(ar, d); } Would this be enough to keep code bloat down? /Pavel

Robert Ramey

12 Nov 12 Nov

5:55 a.m.

"Jonathan Turkanis" <technews@kangaroologic.com> wrote in message news:cn0cmq$ja8$1@sea.gmane.org...

...

I've read the Serialization documentation, but haven't used it yet. I've noticed it takes a long time to build,

I believe this is mostly due to the usage of spirit for XML archive parsing. Its a one time cost - when the library is built. Robert Ramey

Robert Ramey

6:35 a.m.

"Jonathan Turkanis" <technews@kangaroologic.com> wrote in message news:cn0cmq$ja8$1@sea.gmane.org...

...

but I can't say whether this will work (I seem to remember Robert saying somewhere in the documentation that he was relying on the fact that non-templates are better matches than templates.)

...

This is a big plus, obviously. (However, I remember Robert saying her

Hmm - I think I remember saying something like that myself. But I can't find it now I can't remember the exact context. I think it was related to some issues in implementation of certain archives and is not relevent to Pavel's proposal. prefered

...

to keep formatting and serialization separate.)

I do. An Pavel's idea maintain that. The roles of the various parts of the serialization library are: a) template<class Archive, class my_class> void serlialize(Archive &ar, my_class &t) - specifiy information which is required to save and restore(load) the state of the class T. b) class archive - specify how data is to be stored This means that any/all archive serializations can function wiht any archive. Any combination is known to be valid and function correctly. adding Pavel's idea c) template <> void serialize<special _archive &ar, my_class &t) - specify special output for this combination of special_archive and my_class This maintains the independence of serialization and archive while permitting customization for any special combinations of archve/class serialization. The serialization library can be thus be used with any formatting library (in much the same way it uses spirit to parse xml input). Thus the is no real conflict with the serialization library.

...

...
- indentation for pretty printing could be handled (semi)automatically by formatting archive.

Would this involve modifying the archive interface?

all archives have the same interface. It is never modified for particular classes. This a fundamental idea from which the serialization library derives a large portion of its utility. This is why the same programs can produce binary, text or xml_archive with no changes other than selecting a different archive class.

...

...
- multiple formatting styles could be provided for any class.

It would be one formatting style for each archive type for which serialize has been specialized, correct? Would this allow styles for various types to be mixed freely?

If I understand your question correctly, the answer is yes.

...

I've read the Serialization documentation, but haven't used it yet. I've noticed it takes a long time to build, but this is probably because of the various combinations of threading, debugging and linking options.

This is most probably due the compilation of the xml_archives class which uses spirit to parse XML input. Not that it matters as its just a one time cost in any case. If fact, this is the very reason that its a library rather than a header. User program which use serialization library generally compile quite fast. No one has reported that serialization adds a disproportionate amount to compile time. I did get one reporte from one user with 80 classes that he was starting to get ICEs from VC 7.1 . But that's it. I believe that the size of library has lead some to have reservations regarding performance of the serialization library - compile time, link time, memory size, or execution speed - or its ease of usage. I think that such reservations are unfounded. No one who has actually used the library has voiced such reservations to me.

...

My inclination is to keep formatting separate from serialization, though, because they have different aims. If you believe they can be integrated, without making serialization harder to learn or sacrifying flexibility of formatting options, it should definitely be considered.

Serialization can use any formating. A formatting library doesn't really have use for or depend upon serialization. However, the direction of your efforts looks like you might be accidently re-inventing the serialization library. I would guess this will sort itself out eventually. Robert Ramey

Christoph Ludwig

9:45 a.m.

On Thu, Nov 11, 2004 at 10:35:45PM -0800, Robert Ramey wrote: [...]

...

User program which use serialization library generally compile quite fast. No one has reported that serialization adds a disproportionate amount to compile time. I did get one reporte from one user with 80 classes that he was starting to get ICEs from VC 7.1 . But that's it.

I believe that the size of library has lead some to have reservations regarding performance of the serialization library - compile time, link time, memory size, or execution speed - or its ease of usage. I think that such reservations are unfounded. No one who has actually used the library has voiced such reservations to me.

Here is some user experience: "Fast" is not exactly the word I'd choose. I use Boost.Serialization in a library that is templated on arbitrary precision integer and floating point types. I think the library qualifies as medium sized. (I don't have the LOCs, but the source code is about 6 MByte.) Since most classes are serialized through (shared) pointers to base types I need to register my specializations with BOOST_CLASS_EXPORT. Earlier versions of the library made the compiler allocate up to 800 MByte of RAM which caused my system to thrash. After some refactoring of the code (and the installation of additional memory) I have the following situation I can live with: For each (integer, fpa) pair I have a separate translation unit that does nothing but register the respective specializations. Note that the class templates are explicitly specialized in extra TUs; here I only register the classes (which also means instantiation of the respective serialize member templates). The compilation of each of those translation units takes about 90 seconds, respectively, and the compiler (gcc 3.4.2) occupies up to 400 MByte of RAM. This is on a notebook with a 1.7 GHz Pentium 4 mobile CPU and 1 GByte RAM. I am sure this is mostly due to a suboptimal handling of template instantiation in the compiler. This is certainly no problem if you do not need to register your classes. And I guess that the serialization code of non-template classes will compile reasonable fast. But if I were to develop an even larger template library than my current project then I would hesitate to use Boost.Serialization unless I am convinced my compiler performs well when the number of template instantiation grows. Besides the problems with the resource usage of the compiler there's another point that should IMHO be addressed before the next release: The current solution for serializing shared pointers is quite fragile. For example, I abonded trials to access my library from Python through Boost.Python when the requirements of both libraries conflicted: boost/serialization/shared_ptr.hpp has to be included before boost/shared_ptr.hpp. And boost/serialization/shared_ptr.hpp includes somehow (via config.hpp?) system headers. On the other hand, the Python headers need to be included before any system headers. But the Boost.Python headers include boost/shared_ptr.hpp... (There is an inclusion sequence of system and boost headers that satisfies the requirements of all libraries involved, I think. But I did not have the time to look into it back then.) I don't recall the exact reasons, but I also had to manually register the specializations of boost::detail::sp_counted_base_impl. That's an implementation detail of shared_ptr that I (as a mere user) don't want to know about. Christoph -- http://www.informatik.tu-darmstadt.de/TI/Mitarbeiter/cludwig.html LiDIA: http://www.informatik.tu-darmstadt.de/TI/LiDIA/Welcome.html

Robert Ramey

4:16 p.m.

...

On Thu, Nov 11, 2004 at 10:35:45PM -0800, Robert Ramey wrote: [...]

...
User program which use serialization library generally compile quite fast. No one has reported that serialization adds a disproportionate amount to compile time. I did get one reporte from one user with 80 classes that he was starting to get ICEs from VC 7.1 . But that's it.

I believe that the size of library has lead some to have reservations regarding performance of the serialization library - compile time, link time, memory size, or execution speed - or its ease of usage. I think

...

...
such reservations are unfounded. No one who has actually used the

"Christoph Ludwig" <cludwig@cdc.informatik.tu-darmstadt.de> wrote in message news:20041112094527.GA14367@cdc-ws9.cdc.informatik.tu-darmstadt.de... that library

...

...
has voiced such reservations to me.

Here is some user experience:

[snip] Hmm - I guess I'll have to retract the following statement

...

...
User program which use serialization library generally compile quite fast.

maybe I can say "reasonably fast". Actually my view was formed by the lack of complaints and the experience with test programs which do compile quite fast. Of course the test code doesn't approach the 6M/40 char/line) ~150K lines of code of this application.

...

Besides the problems with the resource usage of the compiler there's another point that should IMHO be addressed before the next release: The current solution for serializing shared pointers is quite fragile.

For example, I abonded trials to access my library from Python through Boost.Python when the requirements of both libraries conflicted: boost/serialization/shared_ptr.hpp has to be included before boost/shared_ptr.hpp. And boost/serialization/shared_ptr.hpp includes somehow (via config.hpp?) system headers. On the other hand, the Python headers need to be included before any system headers. But the Boost.Python headers include boost/shared_ptr.hpp... (There is an inclusion sequence of system and boost headers that satisfies the requirements of all libraries involved, I think. But I did not have the time to look into it back then.)

...

I don't recall the exact reasons, but I also had to manually register the specializations of boost::detail::sp_counted_base_impl. That's an implementation detail of shared_ptr that I (as a mere user) don't want to know about.

I am aware that the implementation of serialization isn't truely satisfactory. The problems described are due to a hack used in its implementation to get access to its private variables. Its a current topic of discussion as to how serialization of shared_ptr should be implemented. I suspect that to accomdate serialization with this or any other serialization library, shared_ptr will have to be enhanced. The current situation seemed to be the best I could do without messing with shared_ptr myself. Not a good idea. I'm hopeful that this will be improved in the future. Robert Ramey

Jeff Flinn

2:03 p.m.

Robert, "Robert Ramey" <ramey@rrsd.com> wrote in message news:cn1lk0$ail$1@sea.gmane.org...

...

"Jonathan Turkanis" <technews@kangaroologic.com> wrote in message news:cn0cmq$ja8$1@sea.gmane.org...

...

...
I've read the Serialization documentation, but haven't used it yet. I've noticed it takes a long time to build, but this is probably because of the various combinations of threading, debugging and linking options.

This is most probably due the compilation of the xml_archives class which uses spirit to parse XML input. Not that it matters as its just a one time cost in any case. If fact, this is the very reason that its a library rather than a header.

User program which use serialization library generally compile quite fast. No one has reported that serialization adds a disproportionate amount to compile time. I did get one reporte from one user with 80 classes that he was starting to get ICEs from VC 7.1 . But that's it.

FYI. I've hit VC7.1 "internal structure limits" with much fewer than 80 classes ( about a dozen) in a single translation unit. I use both xml and binary archives to support serialization to disk and clipboard/drag/drop respectively. In particular I have separate compilation units with only 2-3 BOOST_SHARED_POINTER_EXPORT calls. Any additional calls exceed VC7.1 limits. ----------------- Jeff Flinn Applied Dynamics, International

troy d. straszheim

3:47 p.m.

Also just FYI: Maybe this is already in the bug list, but on my OSX 10.3.6 box with the latest apple-distributed gcc, we seem to hit some fatal compiler bug when building xml_grammar.o in release mode only: the compile never exits, load stays high, and the compiler's memory usage grows steadily and without bound... Debug mode builds fine and in a reasonable amount of time. -t Jeff Flinn writes:

...

Robert,

"Robert Ramey" <ramey@rrsd.com> wrote in message news:cn1lk0$ail$1@sea.gmane.org...

...
"Jonathan Turkanis" <technews@kangaroologic.com> wrote in message news:cn0cmq$ja8$1@sea.gmane.org...

...

...
...
I've read the Serialization documentation, but haven't used it yet. I've noticed it takes a long time to build, but this is probably because of the various combinations of threading, debugging and linking options.

This is most probably due the compilation of the xml_archives class which uses spirit to parse XML input. Not that it matters as its just a one time cost in any case. If fact, this is the very reason that its a library rather than a header.

User program which use serialization library generally compile quite fast. No one has reported that serialization adds a disproportionate amount to compile time. I did get one reporte from one user with 80 classes that he was starting to get ICEs from VC 7.1 . But that's it.

FYI.

I've hit VC7.1 "internal structure limits" with much fewer than 80 classes ( about a dozen) in a single translation unit. I use both xml and binary archives to support serialization to disk and clipboard/drag/drop respectively. In particular I have separate compilation units with only 2-3 BOOST_SHARED_POINTER_EXPORT calls. Any additional calls exceed VC7.1 limits.

----------------- Jeff Flinn Applied Dynamics, International

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Robert Ramey

5:52 a.m.

...

...
I've just finished documenting a small library, available here:

http://home.comcast.net/~jturkanis/format_lite/

I took (brief) look and have question about feasibility of other idea:

- would it be possible to combine format_lite functionality with Boost.Serialization to take advantage of both libs?

Imagine solution like:

// the formatting info is provided via boost::archive compatible object formatted_text_oarchive arch(a_ostream, default_formatting_settings ....);

arch << my_data;

class MyObject { template<class Archive> void serialize(Archive& ar, const unsigned) { .... normal serialization code, used when we DO NOT do

Pavel, I've been watching this thread with mild interest. Your idea to specify a specializations for specific pairs of archives / class serializations never occurred to me. Before reading this I hadn't figured out how to do this. This is really, really, really cool. It preserves the extreme simplicty of the usage of serialization, its power and completeness while permitting infinite flexibility. I was plannng to include a demo/test of the usage of serializaton for debug and transaction logs. Frankly, I was missing the magic piece you just supplied. This will permit the serialization library to dovetail with any varieties of format libraries. Good Work Robert Ramey "Pavel Vozenilek" <pavel_vozenilek@hotmail.com> wrote in message news:cmu1n0$p1j$1@sea.gmane.org... formatting

...

output } // specialization for debug formatting template<> void serialize<formatted_text_oarchive>(....) { ar << my_vector; // default formatting will apply

ar << "some info text..."; ar.increase_indentation();

// use different formatting for next vector punctuate<vector<...>(ar)(....); ar << my_other_vector; } };

The advantages I see: - the whole infrastructure of Boost.Serialization is available and ready and it handles all situations like cycles. format_lite could concentrate on just formatting.

- the debugging output can be separated from other serialization types (but doesn't need to be)

- formatting directives can be "inherited" from "higher level of data" to "lower levels". Newly added data would not need formatting of its own by default. Change on higher level would propagate itself "down".

- indentation for pretty printing could be handled (semi)automatically by formatting archive.

- multiple formatting styles could be provided for any class.

My experience is that Serialization is quite easy to use and lightweight enough so I do not consider it any disadvantage for practical use.

/Pavel

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

7519

Age (days ago)

7525

Last active (days ago)

List overview

Download

27 comments

9 participants

participants (9)

Christoph Ludwig
Jeff Flinn
Jonathan Turkanis
Martin
Pavel Vozenilek
Robert Ramey
Thorsten Ottosen
troy d. straszheim
Vladimir Prus

[ANN] Format Lite

Jonathan Turkanis

Thorsten Ottosen

Jonathan Turkanis

Thorsten Ottosen

Jonathan Turkanis

Thorsten Ottosen

Martin

Jonathan Turkanis

Vladimir Prus

Jonathan Turkanis

Martin

Jonathan Turkanis

Pavel Vozenilek

Jonathan Turkanis

Pavel Vozenilek

Jonathan Turkanis

Robert Ramey

Jonathan Turkanis

Robert Ramey

Jonathan Turkanis

Pavel Vozenilek

Robert Ramey

Robert Ramey

Christoph Ludwig

Robert Ramey

Jeff Flinn

troy d. straszheim

Robert Ramey

tags

participants (9)