[serialization] use of unsigned int instead of size_type

File collections_save_imp.hpp uses unsigned int to represent number of elements in STL container. Shouldn't this be replaced with "typename Container::size_type" (or at least with size_t) to be more portable? At least it would stop annoying warning from MSVC. The part of file that is relevant: template<class Archive, class Container> inline void save_collection(Archive & ar, const Container &s) { // record number of elements unsigned int count = s.size(); // <-- here and also below ar << make_nvp("count", const_cast<const unsigned int &>(count)); /* ...truncated... */ Best regards, Marcin

This has been mentioned from time to time and will eventually be changed. The original rationale was that in some platforms it was 64 bits which seemed wasteful for binary archives - that is 2 G Objects seemed enough. Now it seems that we'll really need a special type for collection count. In anycase, it can't be just changed without making obsolete existing archives - so its kind of a slow process. Robert Ramey Marcin Kalicinski wrote:
File collections_save_imp.hpp uses unsigned int to represent number of elements in STL container. Shouldn't this be replaced with "typename Container::size_type" (or at least with size_t) to be more portable? At least it would stop annoying warning from MSVC.
The part of file that is relevant:
template<class Archive, class Container> inline void save_collection(Archive & ar, const Container &s) { // record number of elements unsigned int count = s.size(); // <-- here and also below ar << make_nvp("count", const_cast<const unsigned int &>(count));
/* ...truncated... */
Best regards, Marcin
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Feb 8, 2006, at 7:33 PM, Robert Ramey wrote:
This has been mentioned from time to time and will eventually be changed.
The original rationale was that in some platforms it was 64 bits which seemed wasteful for binary archives - that is 2 G Objects seemed enough.
Now it seems that we'll really need a special type for collection count.
In anycase, it can't be just changed without making obsolete existing archives - so its kind of a slow process.
It can still be changed and be backward-compatible by bumping the version number. Matthias

Matthias Troyer wrote:
On Feb 8, 2006, at 7:33 PM, Robert Ramey wrote:
This has been mentioned from time to time and will eventually be changed.
The original rationale was that in some platforms it was 64 bits which seemed wasteful for binary archives - that is 2 G Objects seemed enough.
Now it seems that we'll really need a special type for collection count.
In anycase, it can't be just changed without making obsolete existing archives - so its kind of a slow process.
It can still be changed and be backward-compatible by bumping the version number.
That's what I was refering to. That effectively means that it can only be changed between boost versions - one can't just make the fix on his particular system. It's still not clear what this should be changed to - if in fact it should be changed at all. std::size_t is a candidate - but I was under the impression that there might be interest in defining a special type for this - like collection_size_t or ? Robert Ramey

On Feb 9, 2006, at 8:14 AM, Robert Ramey wrote:
Matthias Troyer wrote:
On Feb 8, 2006, at 7:33 PM, Robert Ramey wrote:
This has been mentioned from time to time and will eventually be changed.
The original rationale was that in some platforms it was 64 bits which seemed wasteful for binary archives - that is 2 G Objects seemed enough.
Now it seems that we'll really need a special type for collection count.
In anycase, it can't be just changed without making obsolete existing archives - so its kind of a slow process.
It can still be changed and be backward-compatible by bumping the version number.
That's what I was refering to. That effectively means that it can only be changed between boost versions - one can't just make the fix on his particular system.
It's still not clear what this should be changed to - if in fact it should be changed at all. std::size_t is a candidate - but I was under the impression that there might be interest in defining a special type for this - like collection_size_t or ?
Indeed that's what's needed, and I have all the patches ready that would need to be applied to do it. Matthias

Matthias Troyer <troyer@itp.phys.ethz.ch> writes:
It's still not clear what this should be changed to - if in fact it should be changed at all. std::size_t is a candidate - but I was under the impression that there might be interest in defining a special type for this - like collection_size_t or ?
Indeed that's what's needed, and I have all the patches ready that would need to be applied to do it.
Please, guys, get this into 1.34. It's embarassing and a little frustrating that this problem has persisted so long. -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
Matthias Troyer <troyer@itp.phys.ethz.ch> writes:
It's still not clear what this should be changed to - if in fact it should be changed at all. std::size_t is a candidate - but I was under the impression that there might be interest in defining a special type for this - like collection_size_t or ?
Indeed that's what's needed, and I have all the patches ready that would need to be applied to do it.
Please, guys, get this into 1.34. It's embarassing and a little frustrating that this problem has persisted so long.
It turns out that the internal library version number is going to be bumped from 3 to 4 in the next release. This has been necessary to implement a correction to versioning of items of collections. So its not a bad time to make such a change if that is indeed what is necessary. My question is - what is the urgency. The current system would inhibit the serialization of collections of greater than 2 G objects. But as far as I know no one has yet run into that problem. So I'm curious - has the usage of 32 bit count of objects created some problem somewhere else? Robert Ramey

On Feb 10, 2006, at 8:39 AM, Robert Ramey wrote:
David Abrahams wrote:
Matthias Troyer <troyer@itp.phys.ethz.ch> writes:
It's still not clear what this should be changed to - if in fact it should be changed at all. std::size_t is a candidate - but I was under the impression that there might be interest in defining a special type for this - like collection_size_t or ?
Indeed that's what's needed, and I have all the patches ready that would need to be applied to do it.
Please, guys, get this into 1.34. It's embarassing and a little frustrating that this problem has persisted so long.
It turns out that the internal library version number is going to be bumped from 3 to 4 in the next release. This has been necessary to implement a correction to versioning of items of collections. So its not a bad time to make such a change if that is indeed what is necessary.
My question is - what is the urgency. The current system would inhibit the serialization of collections of greater than 2 G objects. But as far as I know no one has yet run into that problem. So I'm curious - has the usage of 32 bit count of objects created some problem somewhere else?
We have some vectors that are larger and which we cannot serialize using Boost.Serialization at the moment Matthias

I was expecting to see your enhancements for arrays to get included and for collection_size (or whatever) to be part of this. I'm not sure what status of this is and/or if it conflicts with the recent feature freeze. I've been working only on things that don't affect features. In particular. a) fixing a bug in versioning of collection members b) performance tweaks c) keeping up with changes in other parts of boost that show up as test failures in the serialization library. d) trying to address failures that come with new compilers and standard libraries (stlport 5.0?) Things I would like to do if I had the time would be a) investigate and clarify some issues regarding DLLS and serialization. b) setup execution profiling for the library. c) add a section to the manual discussing various approaches to extending the library. And finally in order to permit the serialization library to be tested on the platforms that its currently supports, a different test framework will have to be used. This is not urgent but it will require some effort. On a related note - moving to bjam v2 has been mentioned. Setting up the Jamfile for serialization caused me lots of pain. This permitted me to skip many tests that didn't make sense for particular environments. I can only speculate if this is transferable to V2 and what kind of effort that might require. Robert Ramey Matthias Troyer wrote:
On Feb 10, 2006, at 8:39 AM, Robert Ramey wrote:
David Abrahams wrote:
Matthias Troyer <troyer@itp.phys.ethz.ch> writes:
It's still not clear what this should be changed to - if in fact it should be changed at all. std::size_t is a candidate - but I was under the impression that there might be interest in defining a special type for this - like collection_size_t or ?
Indeed that's what's needed, and I have all the patches ready that would need to be applied to do it.
Please, guys, get this into 1.34. It's embarassing and a little frustrating that this problem has persisted so long.
It turns out that the internal library version number is going to be bumped from 3 to 4 in the next release. This has been necessary to implement a correction to versioning of items of collections. So its not a bad time to make such a change if that is indeed what is necessary.
My question is - what is the urgency. The current system would inhibit the serialization of collections of greater than 2 G objects. But as far as I know no one has yet run into that problem. So I'm curious - has the usage of 32 bit count of objects created some problem somewhere else?
We have some vectors that are larger and which we cannot serialize using Boost.Serialization at the moment
Matthias
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

"Robert Ramey" <ramey@rrsd.com> writes:
I was expecting to see your enhancements for arrays to get included and for collection_size (or whatever) to be part of this. I'm not sure what status of this is and/or if it conflicts with the recent feature freeze.
IMO the size_type change should be considered a bugfix, as it was not possible to portably serialize collections without it. -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
"Robert Ramey" <ramey@rrsd.com> writes:
I was expecting to see your enhancements for arrays to get included and for collection_size (or whatever) to be part of this. I'm not sure what status of this is and/or if it conflicts with the recent feature freeze.
IMO the size_type change should be considered a bugfix, as it was not possible to portably serialize collections
of over 4 G objects
without it.
We could change it to std::size_t, but I'm not sure that is what we really want for this. the number of items in a collection is sort a different thing than the size that something comsumes in memory which is what I interprete std::size_t to be. So a special collection_count_type or something like that has been considered but I haven't investigated the implications of this. Such a change affects archives forever in the future. For these reasons, I have been reluctant to "just do it" Robert Ramey

At 11:52 AM -0800 2/11/06, Robert Ramey wrote:
David Abrahams wrote:
IMO the size_type change should be considered a bugfix, as it was not possible to portably serialize collections
of over 4 G objects
Strictly speaking, of any size. And changing the type of the count from "unsigned int" to std::size_t would actually be worse, in a practical sense. The representation size (how many bits of data will appear in the archive) must be the same on all platforms. sizeof(unsigned int) is commonly 4 on 32bit platforms sizeof(unsigned int) is (I think) commonly 4 on 64bit platforms sizeof(std::size_t) is commonly 4 on 32bit platforms sizeof(std::size_t) is commonly 8 on 64bit platforms (I'm knowingly and intentionally ignoring DSP's and the like in the above). The count should be some type that has a fixed (up to byte-order issues) representation, i.e. something like uint64_t. Note that a portable archive doesn't help here, because the choice of a portable (or not) representation type for the count is made within the serialization routine. So unless the serialization routine can query the archive about what type it should use for the count (ick!), the serialization routine must use a type with a consistent cross-platform representation. The archive can then deal with byte-order issues. Without this, cross-platform(*) archive portability of collections is pretty hopeless, even if the library user is really careful to pedantically use portable types everywhere (such as std::vector<int16_t> or the like). (*) At least for current commodity platforms; some DSP's and the like add enough additional restrictions that a user of the serialization library might quite reasonably decide that they are outside the scope of portability that said user cares about.

Kim Barrett <kab@irobot.com> writes:
At 11:52 AM -0800 2/11/06, Robert Ramey wrote:
David Abrahams wrote:
IMO the size_type change should be considered a bugfix, as it was not possible to portably serialize collections
of over 4 G objects
Strictly speaking, of any size.
Right. Unsigned int isn't required to be more than 16 bits, and there *are* compilers where it is a 16 bit number.
And changing the type of the count from "unsigned int" to std::size_t would actually be worse, in a practical sense. The representation size (how many bits of data will appear in the archive) must be the same on all platforms.
sizeof(unsigned int) is commonly 4 on 32bit platforms sizeof(unsigned int) is (I think) commonly 4 on 64bit platforms sizeof(std::size_t) is commonly 4 on 32bit platforms sizeof(std::size_t) is commonly 8 on 64bit platforms
(I'm knowingly and intentionally ignoring DSP's and the like in the above).
The count should be some type that has a fixed (up to byte-order issues) representation, i.e. something like uint64_t.
Or a variable-length representation, which is what I *think* Matthias chose. -- Dave Abrahams Boost Consulting www.boost-consulting.com

At 5:33 PM -0500 2/11/06, David Abrahams wrote:
The count should be some type that has a fixed (up to byte-order issues) representation, i.e. something like uint64_t.
Or a variable-length representation, which is what I *think* Matthias chose.
I think that would be even better.

Just a little information that might be helpful to those concerned about what type the serialization system should use for storing the count of items in a collection. a) the serialization system maintains the concept of "archive version" which gets incremented when changes are made in the the library which could break existing archives. The version number - not to be confused with the version number for any particular class is currently at number 3 and will be incremented to 4 for 1.34. If a change in the collection count type is made, the reading of this number will will depend on the archive version under which the archive was created and adjust accordingly. b) for text base archives it will be hard to notice any change. numbers are renders as - text - which is an inherently variable length format. c) For native binary archives - the length of the count as stored in the archive will change. Currently its the length is the size of unsigned int for the machine that creates the archive. If a change is made to std::size_t then the length will change to the size of this type on the machine which created the archive. Native binary archives meet no requirements for portability accross platforms. If portability is a consideration then native binary archives are not suitable and a different archive class must be used. There is an example of a portable binary archive which stores integers in a variable length format and these are portable across machines as long as the recieving machine has the capability to represent the numbers actually written to the archive. d) the natural candidate for the collection count is std::size_t as that is what the STL uses to specifiy the count of its members. I believe the question has been raised as to whether the serialization system should use this or some special type for this purpose. I don't have a strong opinion. I am surprised that apparently people are using the serialization system with more than 4G objects. Even 4G doubles comes to 32 GB. I would have guessed that the intersection of people doing that and using the serialization system is a null set. So the only questions are a) is the a big enough deal to even bother. b) If it is - should the change be made to std::size_t or is there some reason that it should be made to some other type c)If so, what other type and why? Robert Ramey Kim Barrett wrote:
At 5:33 PM -0500 2/11/06, David Abrahams wrote:
The count should be some type that has a fixed (up to byte-order issues) representation, i.e. something like uint64_t.
Or a variable-length representation, which is what I *think* Matthias chose.
I think that would be even better. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Feb 11, 2006, at 2:16 PM, Kim Barrett wrote:
At 11:52 AM -0800 2/11/06, Robert Ramey wrote:
David Abrahams wrote:
IMO the size_type change should be considered a bugfix, as it was not possible to portably serialize collections
of over 4 G objects
Strictly speaking, of any size. And changing the type of the count from "unsigned int" to std::size_t would actually be worse, in a practical sense. The representation size (how many bits of data will appear in the archive) must be the same on all platforms.
sizeof(unsigned int) is commonly 4 on 32bit platforms sizeof(unsigned int) is (I think) commonly 4 on 64bit platforms
no, it can be 4 or 8 depending on the platform
sizeof(std::size_t) is commonly 4 on 32bit platforms sizeof(std::size_t) is commonly 8 on 64bit platforms
(I'm knowingly and intentionally ignoring DSP's and the like in the above).
The count should be some type that has a fixed (up to byte-order issues) representation, i.e. something like uint64_t.
Or one can leave it up to the archive to decide how to store the count. That's why I proposed to introduce a collection_size_type object storing the count. Any archive can then decide for itself how to serialize it, whether as unsigned int, uint64_t or whatever you like. Matthias

On Sat, Feb 11, 2006 at 03:21:55PM -0800, Matthias Troyer wrote:
On Feb 11, 2006, at 2:16 PM, Kim Barrett wrote:
The count should be some type that has a fixed (up to byte-order issues) representation, i.e. something like uint64_t.
Or one can leave it up to the archive to decide how to store the count. That's why I proposed to introduce a collection_size_type object storing the count. Any archive can then decide for itself how to serialize it, whether as unsigned int, uint64_t or whatever you like.
I'm for this as well... this issue comes up in basic_binary_i/oprimitive.ipp as well as in collections_save_imp.hpp: template<class Archive, class OStream> BOOST_ARCHIVE_OR_WARCHIVE_DECL(void) basic_binary_oprimitive<Archive, OStream>::save(const char * s) { std::size_t l = std::strlen(s); this->This()->save(l); save_binary(s, l); } One can fix this by implementing one's own portable binary primitives (which is what we've done), but that does involve duplicating a lot of library code in order to change a few lines. The special type for collection size (if applied consistently) seems cleaner. -t

I don't see a reason not to use size_t. After all, there's no way to load a collection that needs more than 32 bit address space on a machine which is limited to 32bit address space. Kevin _________________________________________________________________ FREE pop-up blocking with the new MSN Toolbar - get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/

troy d. straszheim wrote:
template<class Archive, class OStream> BOOST_ARCHIVE_OR_WARCHIVE_DECL(void) basic_binary_oprimitive<Archive, OStream>::save(const char * s) { std::size_t l = std::strlen(s); this->This()->save(l); save_binary(s, l); }
One can fix this by implementing one's own portable binary primitives (which is what we've done), but that does involve duplicating a lot of library code in order to change a few lines. The special type for collection size (if applied consistently) seems cleaner.
I'm curious as to what problem using std::size_t created. That is, why did you feel you had to change it. Robert Raamey

On Sun, Feb 12, 2006 at 03:43:12PM -0800, Robert Ramey wrote:
troy d. straszheim wrote:
template<class Archive, class OStream> BOOST_ARCHIVE_OR_WARCHIVE_DECL(void) basic_binary_oprimitive<Archive, OStream>::save(const char * s) { std::size_t l = std::strlen(s); this->This()->save(l); save_binary(s, l); }
One can fix this by implementing one's own portable binary primitives (which is what we've done), but that does involve duplicating a lot of library code in order to change a few lines. The special type for collection size (if applied consistently) seems cleaner.
I'm curious as to what problem using std::size_t created. That is, why did you feel you had to change it.
Gah. Looking at it again, I probably felt I had to change it in the binary primitives due to sleep deprivation, or just from having stared at the code for too long. There's no reason to implement separate portable primitives... You can simply add save/load for const char*, const std::string&, etc, in the most-derived portable_binary_(i|o)archive, and use whatever type you like to represent the string's size. For the platforms that I need to be portable across, this is workable. The save_collection stuff, template<class Archive, class Container> inline void save_collection(Archive & ar, const Container &s) { // record number of elements unsigned int count = s.size(); ar << make_nvp("count", const_cast<const unsigned int &>(count)); is fine for my portability purposes, since unsigned int is the same size everywhere I have to run. But in general, the size_t problem I've hit looks like this: If you feel that you can't afford the extra time and space overhead of storing a 1 byte size with every single int/long/longlong in the archive, then you must decide how much storage each primitive gets when the archive is created and record it in the archive header or something. Take these 3 platforms: intel32/glibc intel64/glibc ppc/darwin sizeof(int) 4 4 4 sizeof(long) 4 8 4 sizeof(long long) 8 8 8 uint32_t u int u int u int uint64_t u long long u long u long long size_t u int u long u long Even if you could afford to keep size_t at 32 bits because your containers are never that big, or you could afford to bump size_t up to 64 bits because we're not too short on disk space, you have problems. There's no way to do something consistent with size_t if the archive doesn't know it is size_t. If you make a decision about how big size_t will be on disk when the archive is created, then you have to shrink or expand all types that size_t might be before you write them and after you read them. You can either shrink it to 32 bits when saving on 64 bit machines (say using numeric_limits<> to range-check and throw if something is out of range), or save as 64 bits and shrink to 32 bits when loading on 32 bit plaforms... If you reduce "unsigned long" to 32 bits in the archive you do get a consistent size for size_t across all platforms... but then you have no way to save 64 bit ints, because on intel64, uint64_t is unsigned long. If you increase unsigned long to 64 bits on disk, then intel64 and ppc have consistent container size in the archive, but intel32 doesn't, as there size_t is unsigned int. You could bump up unsigned int *and* unsigned long to 64 bits, then you have consistent container size across all three, but you have even more space overhead than the size-byte-per-primitive approach. There may be some other cases, I dunno. The whole thing is kinda messy. -t

David Abrahams wrote:
"Robert Ramey" <ramey@rrsd.com> writes:
I was expecting to see your enhancements for arrays to get included and for collection_size (or whatever) to be part of this. I'm not sure what status of this is and/or if it conflicts with the recent feature freeze.
IMO the size_type change should be considered a bugfix, as it was not possible to portably serialize collections without it.
The change is not trivial. Using a size_t typedef means that program A can write an unsigned int and program B can read an unsigned long. This will appear to work at first because the serialization library doesn't include archives where this is significant. Yet.

"Peter Dimov" <pdimov@mmltd.net> writes:
David Abrahams wrote:
"Robert Ramey" <ramey@rrsd.com> writes:
I was expecting to see your enhancements for arrays to get included and for collection_size (or whatever) to be part of this. I'm not sure what status of this is and/or if it conflicts with the recent feature freeze.
IMO the size_type change should be considered a bugfix, as it was not possible to portably serialize collections without it.
The change is not trivial. Using a size_t typedef means that program A can write an unsigned int and program B can read an unsigned long. This will appear to work at first because the serialization library doesn't include archives where this is significant. Yet.
I don't know exactly how to interpret your remarks. It sounds like you're describing the status quo here, which would seem to argue that the change is necessary. For the record, I didn't claim it was a trivial change, only that it should be considered a bugfix and not a new feature. -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
"Peter Dimov" <pdimov@mmltd.net> writes:
David Abrahams wrote:
"Robert Ramey" <ramey@rrsd.com> writes:
I was expecting to see your enhancements for arrays to get included and for collection_size (or whatever) to be part of this. I'm not sure what status of this is and/or if it conflicts with the recent feature freeze.
IMO the size_type change should be considered a bugfix, as it was not possible to portably serialize collections without it.
The change is not trivial. Using a size_t typedef means that program A can write an unsigned int and program B can read an unsigned long. This will appear to work at first because the serialization library doesn't include archives where this is significant. Yet.
I don't know exactly how to interpret your remarks. It sounds like you're describing the status quo here, which would seem to argue that the change is necessary.
The status quo is that the size of the container is consistently written or read as an unsigned int, is it not? Consider the simplistic example: void f( unsigned int ); // #1 void f( unsigned long ); // #2 void g( std::vector<int> & v ) { unsigned int n1 = v.size(); f( n1 ); // #1 size_t n2 = v.size(); f( n2 ); // ??? }

"Peter Dimov" <pdimov@mmltd.net> writes:
David Abrahams wrote:
"Peter Dimov" <pdimov@mmltd.net> writes:
David Abrahams wrote:
"Robert Ramey" <ramey@rrsd.com> writes:
I was expecting to see your enhancements for arrays to get included and for collection_size (or whatever) to be part of this. I'm not sure what status of this is and/or if it conflicts with the recent feature freeze.
IMO the size_type change should be considered a bugfix, as it was not possible to portably serialize collections without it.
The change is not trivial. Using a size_t typedef means that program A can write an unsigned int and program B can read an unsigned long. This will appear to work at first because the serialization library doesn't include archives where this is significant. Yet.
I don't know exactly how to interpret your remarks. It sounds like you're describing the status quo here, which would seem to argue that the change is necessary.
The status quo is that the size of the container is consistently written or read as an unsigned int, is it not?
I think so, though I could be mistaken.
Consider the simplistic example:
void f( unsigned int ); // #1 void f( unsigned long ); // #2
void g( std::vector<int> & v ) { unsigned int n1 = v.size(); f( n1 ); // #1
size_t n2 = v.size(); f( n2 ); // ??? }
Sure. So how does this relate to serialization? -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
"Peter Dimov" <pdimov@mmltd.net> writes:
The status quo is that the size of the container is consistently written or read as an unsigned int, is it not?
I think so, though I could be mistaken.
Consider the simplistic example:
void f( unsigned int ); // #1 void f( unsigned long ); // #2
void g( std::vector<int> & v ) { unsigned int n1 = v.size(); f( n1 ); // #1
size_t n2 = v.size(); f( n2 ); // ??? }
Sure. So how does this relate to serialization?
Consider an archive where unsigned int and unsigned long have a different internal representation. When a value of a size_t type is written on platform A, where size_t == unsigned int, platform B, where size_t == unsigned long, won't be able to read the file.

Peter Dimov wrote:
Consider an archive where unsigned int and unsigned long have a different internal representation. When a value of a size_t type is written on platform A, where size_t == unsigned int, platform B, where size_t == unsigned long, won't be able to read the file.
native binary archives are not guarenteed or even expected to be portable accross platforms. text archives don't have this problem. Robert Ramey

Robert Ramey wrote:
Peter Dimov wrote:
Consider an archive where unsigned int and unsigned long have a different internal representation. When a value of a size_t type is written on platform A, where size_t == unsigned int, platform B, where size_t == unsigned long, won't be able to read the file.
native binary archives are not guarenteed or even expected to be portable accross platforms.
text archives don't have this problem.
That's why I said that currently the serialization library will appear to work with a size_t. Only after an archive is added where unsigned int and unsigned long differ in their representation will the problem manifest itself.

"Peter Dimov" <pdimov@mmltd.net> writes:
David Abrahams wrote:
"Peter Dimov" <pdimov@mmltd.net> writes:
The status quo is that the size of the container is consistently written or read as an unsigned int, is it not?
I think so, though I could be mistaken.
Consider the simplistic example:
void f( unsigned int ); // #1 void f( unsigned long ); // #2
void g( std::vector<int> & v ) { unsigned int n1 = v.size(); f( n1 ); // #1
size_t n2 = v.size(); f( n2 ); // ??? }
Sure. So how does this relate to serialization?
Consider an archive where unsigned int and unsigned long have a different internal representation. When a value of a size_t type is written on platform A, where size_t == unsigned int, platform B, where size_t == unsigned long, won't be able to read the file.
Sure, but I don't see what that has to do with the ambiguity in overload resolution you're pointing at above. I don't think anyone is suggesting that we use size_t; int has the same problem, after all. I thought Matthias wass using a variable-length representation, but on inspection it looks like he's just using a "strong typedef" around std::size_t, which should work adequately for the purposes we're discussing. -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
I don't think anyone is suggesting that we use size_t;
Acually, that's what I would use. I still haven't heard a valid reason why it shouldn't be used. I'm not saying there isn't one - but no one has stated one.
int has the same problem, after all. I thought Matthias wass using a variable-length representation, but on inspection it looks like he's just using a "strong typedef" around std::size_t,
I for one am curious as to the motivation for ths.
which should work adequately for the purposes we're discussing.
well, so would size_t for what has been dicussed so far. Robert Ramey

On Feb 11, 2006, at 9:21 PM, Robert Ramey wrote:
David Abrahams wrote:
I don't think anyone is suggesting that we use size_t;
Acually, that's what I would use. I still haven't heard a valid reason why it shouldn't be used. I'm not saying there isn't one - but no one has stated one.
int has the same problem, after all. I thought Matthias wass using a variable-length representation, but on inspection it looks like he's just using a "strong typedef" around std::size_t,
I for one am curious as to the motivation for ths.
That's easy to answer: to achieve separation between archive format and serialization, which is, I believe what you advocate. Leave it to the archive to decide how to serialize sizes of collections - it could be different than serializing the integer type used for std::sized_t
which should work adequately for the purposes we're discussing.
well, so would size_t for what has been dicussed so far.
size_t has problems with portable binary archives.

Matthias Troyer wrote:
That's easy to answer: to achieve separation between archive format and serialization, which is, I believe what you advocate. Leave it to the archive to decide how to serialize sizes of collections - it could be different than serializing the integer type used for std::sized_t
I don't dispute this - it's just not clear how using std::size_t creates a problem.
size_t has problems with portable binary archives.
Well, lots of types have problems with portable binary archives. In anycase, using a strong_typedef would be fine with me. I believe that this is what you I believe that you want to use a BOOST_STRONG_TYPEDEF for this purpose and that sounds just fine to me. I still don't see why its needed but I haven't looked at the particular issues regarding collections in the depth that you have so I'm willing to take your word for the proposition that distinguishing it is a good thing. No downside and distinguishes this particular usage. Robert Ramey

"Peter Dimov" <pdimov@mmltd.net> writes:
David Abrahams wrote:
"Peter Dimov" <pdimov@mmltd.net> writes:
The status quo is that the size of the container is consistently written or read as an unsigned int, is it not?
I think so, though I could be mistaken.
Consider the simplistic example:
void f( unsigned int ); // #1 void f( unsigned long ); // #2
void g( std::vector<int> & v ) { unsigned int n1 = v.size(); f( n1 ); // #1
size_t n2 = v.size(); f( n2 ); // ??? }
Sure. So how does this relate to serialization?
Consider an archive where unsigned int and unsigned long have a different internal representation. When a value of a size_t type is written on platform A, where size_t == unsigned int, platform B, where size_t == unsigned long, won't be able to read the file.
Sure, but I don't see what that has to do with the ambiguity in overload resolution you're pointing at above. I don't think anyone is suggesting that we use size_t; int has a similar problem, after all. I thought Matthias was using a variable-length representation, but on inspection it looks like he's just using a "strong typedef" around std::size_t, which should work adequately for the purposes we're discussing. -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
Consider an archive where unsigned int and unsigned long have a different internal representation. When a value of a size_t type is written on platform A, where size_t == unsigned int, platform B, where size_t == unsigned long, won't be able to read the file.
Sure, but I don't see what that has to do with the ambiguity in overload resolution you're pointing at above.
Not ambiguity, just that different overloads will be called on different platforms.
I don't think anyone is suggesting that we use size_t; int has a similar problem, after all. I thought Matthias was using a variable-length representation, but on inspection it looks like he's just using a "strong typedef" around std::size_t, which should work adequately for the purposes we're discussing.
A strong typedef should work, if all archives implement its serialization.

On Feb 12, 2006, at 2:59 AM, Peter Dimov wrote:
David Abrahams wrote:
Consider an archive where unsigned int and unsigned long have a different internal representation. When a value of a size_t type is written on platform A, where size_t == unsigned int, platform B, where size_t == unsigned long, won't be able to read the file.
Sure, but I don't see what that has to do with the ambiguity in overload resolution you're pointing at above.
Not ambiguity, just that different overloads will be called on different platforms.
I don't think anyone is suggesting that we use size_t; int has a similar problem, after all. I thought Matthias was using a variable-length representation, but on inspection it looks like he's just using a "strong typedef" around std::size_t, which should work adequately for the purposes we're discussing.
A strong typedef should work, if all archives implement its serialization.
As far as I understand there is a default implementation of serialization for strong typedefs, which just serializes the underlying type. Only archives needing different behavior will have to implement its serialization. Matthias

Matthias Troyer wrote:
On Feb 12, 2006, at 2:59 AM, Peter Dimov wrote:
David Abrahams wrote: A strong typedef should work, if all archives implement its serialization.
as a general rule, the included archives only implment serialization for primitives. This permits all serializations to work with all archives. Its been a constant battle to keep these things decoupled in order to preserve this feature. serializations should be attributes of the type NOT any particular archive.
As far as I understand there is a default implementation of serialization for strong typedefs, which just serializes the underlying type.
LOL - I'm embarassed to say that off hand I don't know the answer to this!!. I used BOOST_STRONG_TYPEDEF for the special types needed for the library internals and implemented in them in the archive. BOOST_STRONG_TYPEDEF will convert to the underlying type when used in an arithmetic context and I believe that this might be enough to get serialization for free. Or maybe not. BOOST_STRONG_TYPEDEF should be considered for promotion out of the serialization library into the core of boost itself. Robert Ramey

At 9:14 AM -0800 2/12/06, Robert Ramey wrote:
David Abrahams wrote: A strong typedef should work, if all archives implement its serialization.
as a general rule, the included archives only implment serialization for primitives. This permits all serializations to work with all archives. Its been a constant battle to keep these things decoupled in order to preserve this feature.
serializations should be attributes of the type NOT any particular archive.
This is the objection I expected to hear from Robert much earlier in this discussion. A strong typedef for this purpose effectively widens the archive concept by adding a new thing that needs to be supported. That conceptual widening occurs even if the default archive behavior for a strong typedef is to just serialize the underlying type. It still needs to be documented as part of the archive concept, and anyone defining a new kind of archive ought to consider whether that default is the correct behavior for this new archive type, or whether something else is needed. And I think he would have a pretty good rationale for feeling that way. Keeping the archive interface narrow and minimizing the coupling between serialization and archives minimal is, I think, one of the strengths of the serialization library's design. I would be in full agreement with Robert here, except that all of the alternatives I can think of seem worse to me. 1. std::size_t This causes major problems for portable binary archives. I'm aware that portable binary archives are tricky (and perhaps not truly possible in the most general sense of "portable"). In particular, they require that users of such archives be very careful about the types that they include in such archives, avoiding all explicit use of primitive types with implementation-defined representations in favor of types with a fixed representation. So no int's or long's, only int32_t and the like. Floating point types add their own complexity. Some (potential) users of the serialization library (such as us) are already doing that, and have been working under such restrictions for a long time (long before we ever heard of boost.serialization), because cross-platform serialization is important to us. The problem for portable binary archives caused by using std::size_t as the container count is that it is buried inside the container serializer, where the library client has no control over it. All the archive gets out of the serializer is the underlying primitive type, with which it does whatever it does on the given platform. The semantic information that this is a container count is lost by the time the archive sees the value, so there's nothing a "portable" archive can do to address this loss of information. And this occurs no matter how careful clients are in their own adherence to use of fixed representation value types. This leaves a client needing a portable binary archive with several unappealing options (in no particular order) - Modify the library to use one of the other options. - Override all of the needed container serializers specifically for the portable archive. - Don't attempt to serialize any standard container types. 2. standard fixed-size type We already know that uint32_t is inadequate; there are users with very large containers. Maybe uint64_t is big enough, though of course predictions of that sort have a history of proving false, sometimes in a surprisingly short time. And uint64_t isn't truly portable anyway, since an implementation might not have that type at all. Also, some might object to always requiring 8 bytes of count information, even when the actual size will never be anything like that large. This was my preferred approach before I learned about the strong typedef approach, in spite of the stated problems. 3. some self-contained variable-size type This might be possible, but the additional complexity is questionable. Also, all of the ideas I've thought of along this line make the various text-based archives less human readable, which some might reasonably find objectionable. 4. something else I haven't thought of any. Anybody else? So it appears to me that all of the available options have downsides. While my initial reaction to the strong typedef approach was rather ambivalent because of the associated expansion of the archive concept, it seems to me to be the best of the available options.

Kim Barrett wrote:
At 9:14 AM -0800 2/12/06, Robert Ramey wrote:
David Abrahams wrote: A strong typedef should work, if all archives implement its serialization.
This is the objection I expected to hear from Robert much earlier in this discussion. A strong typedef for this purpose effectively widens the archive concept by adding a new thing that needs to be supported. That conceptual widening occurs even if the default archive behavior for a strong typedef is to just serialize the underlying type. It still needs to be documented as part of the archive concept, and anyone defining a new kind of archive ought to consider whether that default is the correct behavior for this new archive type, or whether something else is needed.
Even if a strong type is used, it is neither necessary nor is it desireable to add it to every archive. The procedures would be: create a header boost/collection_size.hpp which would contain something like namespace boost { BOOST_STRONG_TYPE(collection_size_t, std::size_t) // now we have a collection type BOOST_CLASS_IMPLEMENTION_LEVEL<collection_size_t, object) // no versioning for effiency reasons template<class Archive> void seriaize(Archive &ar, collection_size_t &, const unsigned int version){ ar & t; // if its converted autmatically to size_t // or ar & static_cast<collection_size_t &>(t); // if not converted automatically }
And I think he would have a pretty good rationale for feeling that way. Keeping the archive interface narrow and minimizing the coupling between serialization and archives minimal is, I think, one of the strengths of the serialization library's design.
Halleluha!!!
I would be in full agreement with Robert here, except that all of the alternatives I can think of seem worse to me.
1. std::size_t
This causes major problems for portable binary archives. I'm aware that portable binary archives are tricky (and perhaps not truly possible in the most general sense of "portable"). In particular, they require that users of such archives be very careful about the types that they include in such archives, avoiding all explicit use of primitive types with implementation-defined representations in favor of types with a fixed representation. So no int's or long's, only int32_t and the like. Floating point types add their own complexity.
A portable binary archive comes down to serializing primitives in a portable way. This is what the example included with the serialization library does. The example isn't complete but it does illustrate this point.
Some (potential) users of the serialization library (such as us) are already doing that, and have been working under such restrictions for a long time (long before we ever heard of boost.serialization), because cross-platform serialization is important to us.
Hmm - well, maybe you want to just finish the example in the package by adding floats and doubles - and your done !!.
The problem for portable binary archives caused by using std::size_t as the container count is that it is buried inside the container serializer, where the library client has no control over it. All the archive gets out of the serializer is the underlying primitive type, with which it does whatever it does on the given platform. The semantic information that this is a container count is lost by the time the archive sees the value, so there's nothing a "portable" archive can do to address this loss of information. And this occurs no matter how careful clients are in their own adherence to use of fixed representation value types.
This is all true. But I'm not convinced that its necessary to know where the primitive came from to handle it. But I don't really need to be convinced. I would be happy to go along with it if someone who does think this is necessary is willing to address all the minor little things that will add up to kind of a pain. This includes: a) Selecting a type that will please everyone. b) Carefully setting up the appropriate serialization traits for such a type c) Tweaking the collection serialization to use the new type. d) while making sure that existing archives can still be read - this entails having a little bit of conditional code in the collection loading functions. I believe that a BOOST_STRONG_TYPE is a very good candidate for this - But that would suggest it might be a good idea to take a critical look at BOOST_STRONG_TYPE. So, if its done correctly, its more than trivial "bug fix"
This leaves a client needing a portable binary archive with several unappealing options (in no particular order)
- Modify the library to use one of the other options.
- Override all of the needed container serializers specifically for the portable archive.
- Don't attempt to serialize any standard container types.
As I said - I don't agree at all here. To illustrate my point, I point to the example in the documentation and code demo_portable_binary
2. standard fixed-size type
We already know that uint32_t is inadequate; there are users with very large containers. Maybe uint64_t is big enough, though of course predictions of that sort have a history of proving false, sometimes in a surprisingly short time. And uint64_t isn't truly portable anyway, since an implementation might not have that type at all. Also, some might object to always requiring 8 bytes of count information, even when the actual size will never be anything like that large. This was my preferred approach before I learned about the strong typedef approach, in spite of the stated problems.
3. some self-contained variable-size type
This might be possible, but the additional complexity is questionable. Also, all of the ideas I've thought of along this line make the various text-based archives less human readable, which some might reasonably find objectionable.
text archives present no problem. Numbers coded as a string of decimal characters have no finite limit as to numbers they can represent. "portable binary" archives must also have some sort of way to code numbers in a variable length format. The only problem arises with the native_binary archive - and it is explicitly exempt from any portability requirement. \> So it appears to me that all of the available options have downsides.
While my initial reaction to the strong typedef approach was rather ambivalent because of the associated expansion of the archive concept, it seems to me to be the best of the available options.
Noooooo - and you were on a roll. Robert Ramey

Dear Robert, dear all, Let me try to stop the explosive growth of this thread by summarizing the problems and let me state that I think that Robert's strong typedef proposal is the best solution, and argue why. The first problem with the current state is that it does not allow for more than 4G elements in a collection to be serialized. Another serious problem is that it does not allow an archive to treat a collection size differently from the integral type used to represent it. That feature is useful for portable binary archives, and is absolutely essential for serialization using MPI archives. For MPI archives we need to treat size types differently than integers (I don't want to go into details here since that will only distract from the discussion). Needing to distinguish size types from integers in some archives, rules out choosing any other integer type to represent the sizes. Furthermore, there will never be a consensus as to which integral type is best. If I want to store a 4G+ collection, I will vote for a 64 bit integer type, while if I want to serialize millions of short containers, I would hate to waste the memory needed for 64 bit size types. Fortunately there is an elegant solution: use a "strong typedef" to distinguish container sizes from an unsigned int (or std::size_t), and let the archive decide how to represent it, just as Robert suggests: On Feb 12, 2006, at 10:42 PM, Robert Ramey wrote:
Even if a strong type is used, it is neither necessary nor is it desireable to add it to every archive.
The procedures would be:
create a header boost/collection_size.hpp which would contain something like
namespace boost { BOOST_STRONG_TYPE(collection_size_t, std::size_t)
// now we have a collection type BOOST_CLASS_IMPLEMENTION_LEVEL<collection_size_t, object) // no versioning for effiency reasons
This will work with all existing archives, and the serialize function:
template<class Archive> void seriaize(Archive &ar, collection_size_t &, const unsigned int version){ ar & t; // if its converted autmatically to size_t // or ar & static_cast<collection_size_t &>(t); // if not converted automatically }
is actually not needed in my experience. I have actually implemented this solution and it passes all regression tests. With this solution, existing archives will continue to work, and any programmers who want or need to serialize size types differently from std::size_t can overload the serialization of collection_size_t in their archive. Thus anybody's wishes can be granted with this solution and I think we should go for it, as we had already discussed last November. Matthias

Matthias Troyer <troyer@itp.phys.ethz.ch> writes:
Another serious problem is that it does not allow an archive to treat a collection size differently from the integral type used to represent it. That feature is useful for portable binary archives, and is absolutely essential for serialization using MPI archives. For MPI archives we need to treat size types differently than integers (I don't want to go into details here since that will only distract from the discussion).
I think it's probably worth a couple paragraphs to describe why that feature is useful for portable binary archives and essential for MPI archives. There seems to be lots of mystery in the air about that. -- Dave Abrahams Boost Consulting www.boost-consulting.com

Abject apologies. I seem to have become confused and forgot how the portable archive works, instead incorrectly ascribing to it misfeatures of some alternative approaches that I'm attempting to quash. Please ignore pretty much everything I've said previously in this thread.

Up until yesterday, I had accepted this as the word on this topic and had resolved to make strong_type for collection size - or encourage Matthias to do - depending on what was convenient. I just came upon the fact that each stl collection C has its own size_type predefined as in C::size_type . I came upon this quite be accident in reviewing the SGI stl documentation. It never occured to me to look there - as it would never occur to me that different collections might have different types for the size. But now that this is there - what implications does it have. I expect that C::size_type usually or always implemented as a typedef - NOT a strong type - so using it might be problematic. I just thought I would throw that into the pot. I don't have a strongly held opinion on this particular subject. But, I'm bumping the archive implementation version from 3 to 4 with this release and collections will have a tiny bit of conditional code to handle older archives in any case so now would be a convenient time to make changes. Robert Ramey Matthias Troyer wrote:
Dear Robert, dear all,
Let me try to stop the explosive growth of this thread by summarizing the problems and let me state that I think that Robert's strong typedef proposal is the best solution, and argue why.
The first problem with the current state is that it does not allow for more than 4G elements in a collection to be serialized. Another serious problem is that it does not allow an archive to treat a collection size differently from the integral type used to represent it. That feature is useful for portable binary archives, and is absolutely essential for serialization using MPI archives. For MPI archives we need to treat size types differently than integers (I don't want to go into details here since that will only distract from the discussion).
Needing to distinguish size types from integers in some archives, rules out choosing any other integer type to represent the sizes. Furthermore, there will never be a consensus as to which integral type is best. If I want to store a 4G+ collection, I will vote for a 64 bit integer type, while if I want to serialize millions of short containers, I would hate to waste the memory needed for 64 bit size types.
Fortunately there is an elegant solution: use a "strong typedef" to distinguish container sizes from an unsigned int (or std::size_t), and let the archive decide how to represent it, just as Robert suggests:
On Feb 12, 2006, at 10:42 PM, Robert Ramey wrote:
Even if a strong type is used, it is neither necessary nor is it desireable to add it to every archive.
The procedures would be:
create a header boost/collection_size.hpp which would contain something like
namespace boost { BOOST_STRONG_TYPE(collection_size_t, std::size_t)
// now we have a collection type BOOST_CLASS_IMPLEMENTION_LEVEL<collection_size_t, object) // no versioning for effiency reasons
This will work with all existing archives, and the serialize function:
template<class Archive> void seriaize(Archive &ar, collection_size_t &, const unsigned int version){ ar & t; // if its converted autmatically to size_t // or ar & static_cast<collection_size_t &>(t); // if not converted automatically }
is actually not needed in my experience. I have actually implemented this solution and it passes all regression tests.
With this solution, existing archives will continue to work, and any programmers who want or need to serialize size types differently from std::size_t can overload the serialization of collection_size_t in their archive. Thus anybody's wishes can be granted with this solution and I think we should go for it, as we had already discussed last November.
Matthias
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Kim Barrett wrote:
1. std::size_t
This causes major problems for portable binary archives. I'm aware that portable binary archives are tricky (and perhaps not truly possible in the most general sense of "portable"). In particular, they require that users of such archives be very careful about the types that they include in such archives, avoiding all explicit use of primitive types with implementation-defined representations in favor of types with a fixed representation. So no int's or long's, only int32_t and the like. Floating point types add their own complexity.
I don't understand where all this talk of portable binary archives being tricky is coming from. I have such an archive, and it works. int32_t's of course don't work for the same reason that size_t doesn't work, they are typedefs that may map to different types on different platforms. Floating point formats _are_ tricky, though. There are ways to convert FP to portable external representation but I haven't used them; the hack of assuming IEEE and same endianness as an int32/int64 works for me so far.

Peter Dimov <pdimov <at> mmltd.net> writes:
There are ways to convert FP to portable external representation but I haven't used them; the hack of assuming IEEE and same endianness as an int32/int64 works for me so far.
I think the code below is portable - but not terribly efficient. I haven't included the actual serialization of the unpacked_real. This is something I'm using for ASN.1 CER (I haven't tried to support all the encodings allowed by BER!) and not with boost.serialization. The support for infinity seems to be working for me, not sure if it is strictly correct. I haven't tried to support any other special values, ASN.1 doesn't support them and I don't need them. Presumably ASN.1 BER would make a reasonable portable binary archive format (basically a binary equivalent of XML) and includes variable length sequence size encoding in the format. Is there any interest in this? I would expect ASN.1 BER to be slower to encode/decode than a more "raw" binary format and I'm not sure that it really makes much sense when used in an environment where the syntax is really defined by the data, not by an ASN.1 or other specification. struct unpacked_real { bool negative_, inf_; unsigned long mantissa; int exp; enum { scale = sizeof(unsigned long)*8-1 }; bool is_zero() { return (mantissa == 0 && exp == 0); } bool is_inf() { return inf_; } unpacked_real() : negative_(false), inf_(false), mantissa(0), exp(0) {} template <class Real> unpacked_real(const Real &x) { Real m; negative_ = x < 0.0; if (inf_ = test_inf(x)) return; if (x == 0.0) { mantissa = 0; exp = 0; return; } if (negative_) m = frexp(-x, &exp); else m = frexp(x, &exp); mantissa = (unsigned long )ldexp(m, scale); exp -= scale; while (!(mantissa & 1)) { mantissa >>= 1; exp += 1; } } operator float() { if (inf_ && negative_) return -std::numeric_limits<float>::infinity(); if (inf_ && !negative_) return std::numeric_limits<float>::infinity(); float v = mantissa; v = ldexp(v,exp); if (negative_) v = -v; return v; } operator double() { if (inf_ && negative_) return -std::numeric_limits<double>::infinity(); if (inf_ && !negative_) return std::numeric_limits<double>::infinity(); double v = mantissa; v = ldexp(v,exp); if (negative_) v = -v; return v; } private: template <class T> bool test_inf(const T&x) { return ( x > std::numeric_limits<T>::max() || x < -std::numeric_limits<T>::max() ); } };

On Feb 11, 2006, at 10:53 AM, Robert Ramey wrote:
I was expecting to see your enhancements for arrays to get included and for collection_size (or whatever) to be part of this. I'm not sure what status of this is and/or if it conflicts with the recent feature freeze.
I was hesitating to commit anything as long as there were still hundreds of regression failures appearing. Matthias
participants (9)
-
Darryl Green
-
David Abrahams
-
Kevin Sopp
-
Kim Barrett
-
Marcin Kalicinski
-
Matthias Troyer
-
Peter Dimov
-
Robert Ramey
-
troy d. straszheim