[serialization] object tracking for shared strings
Hi all, I hope one of you will be able to help me with this: I am working with a data structure that contains a large number of (mostly identical) strings. To save memory I put together a shared memory string implementation, which ensures that each string is kept in memory only once (implemented through a global map with all strings). However, once I write them to disk, I again have n copies of each string. I would really prefer, if I could avoid the space overhead. At first glance, object tracking seems to be the way to go. If my strings were tracked, there would automatically be only one copy of each string. However I can't quite figure out how. I don't really want to change the global tracking behavior for std::string, since that might end up affecting code all over the place. Serializing pointers would be an option, but then I would have to deal with the issue of when to deallocate that object. I suppose I could serialize shared pointers, but then I would have to change my implementation to use shared pointers internally. Alternatively I could also wrap a struct around the strings internally and turn on tracking for that struct, but again I would have to complicate my implementation for the sake of serialization. What I was hoping to find was some way of telling the archive at the point where I am serializing the object, that it should track this one object. I am thinking of something like ar & BOOST_SERIALIZATION_TRACK_THIS (mystring); I was looking for something like that in the documentation, but wasn't very successful. Any ideas? Thanks in advance for any help, Nils
Nils Krumnack wrote:
Hi all, At first glance, object tracking seems to be the way to go. If my strings were tracked, there would automatically be only one copy of each string. However I can't quite figure out how. I don't really want to change the global tracking behavior for std::string, since that might end up affecting code all over the place.
I was looking for something like that in the documentation, but wasn't very successful. Any ideas?
your instinct is correct. It's just that std::string is special - (its the only "special" type with respect to boost serialization). It is marked "primitive" so it will never be tracked. For you're situation the solution is to: a) make your own string type: struct my_string : public std::string { template<class Archive> void serialization(Archive & ar, const unsigned int version){ ar & base_objectstd::string(*this); } }; BOOST_SERIALIZATION_TRACKING(my_string, track_always). Now you have as special string that wil be the same is std::string except that this one will be tracked during serialization. Note that that keeps normal strings separate from these "special" strings so you don't have to worry about having wierd surprises in other parts of your executable. If a lot of your strings are duplicated, I think boot::flyweight might bear looking into. Robert Ramey
On Dec 16, 2009, at 6:55 PM, Robert Ramey wrote:
Nils Krumnack wrote:
Hi all, At first glance, object tracking seems to be the way to go. If my strings were tracked, there would automatically be only one copy of each string. However I can't quite figure out how. I don't really want to change the global tracking behavior for std::string, since that might end up affecting code all over the place.
I was looking for something like that in the documentation, but wasn't very successful. Any ideas?
your instinct is correct. It's just that std::string is special - (its the only "special" type with respect to boost serialization). It is marked "primitive" so it will never be tracked.
For you're situation the solution is to:
a) make your own string type:
struct my_string : public std::string { template<class Archive> void serialization(Archive & ar, const unsigned int version){ ar & base_objectstd::string(*this); } };
BOOST_SERIALIZATION_TRACKING(my_string, track_always).
Now you have as special string that wil be the same is std::string except that this one will be tracked during serialization. Note that that keeps normal strings separate from these "special" strings so you don't have to worry about having wierd surprises in other parts of your executable.
Thanks, I will try to do that.
If a lot of your strings are duplicated, I think boot::flyweight might bear looking into.
Thanks, that would have saved some work. Right what I needed. But it appears it doesn't have serialization support (as of 1.39). Is that the usual thing where we pass the buck between the developers of the different libraries as to who has to implement this functionality? Particularly for flyweights this would be nice to have, given the aforementioned problem. Thanks again, Nils
Robert Ramey
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Nils Krumnack wrote:
On Dec 16, 2009, at 6:55 PM, Robert Ramey wrote:
If a lot of your strings are duplicated, I think boot::flyweight might bear looking into.
Thanks, that would have saved some work. Right what I needed. But it appears it doesn't have serialization support (as of 1.39).
Hmm - I thought it did
Is that the usual thing where we pass the buck between the developers of the different libraries as to who has to implement this functionality? Particularly for flyweights this would be nice to have, given the aforementioned problem.
Well, excuse me, I was only trying to help. Robert Ramey
On Dec 17, 2009, at 10:59 AM, Robert Ramey wrote:
Nils Krumnack wrote:
On Dec 16, 2009, at 6:55 PM, Robert Ramey wrote:
If a lot of your strings are duplicated, I think boot::flyweight might bear looking into.
Thanks, that would have saved some work. Right what I needed. But it appears it doesn't have serialization support (as of 1.39).
Hmm - I thought it did
Well, I'll try to update to the latest boost version and see if it is there.
Is that the usual thing where we pass the buck between the developers of the different libraries as to who has to implement this functionality? Particularly for flyweights this would be nice to have, given the aforementioned problem.
Well, excuse me, I was only trying to help.
Oh, I didn't mean to cause any offense. I greatly appreciate the boost libraries in general, and the serialization library in particular. It is a very nice piece of work and I use it all the time. Without taking anything away from that, I would prefer if the other boost libraries provided serialization support for their objects. However it seems to fall somewhere between the cracks, which I think is kind of a shame. I'd be happy to help by writing serialization code, where it does not exists and I know the library. I am just not sure if it gets accepted or will just be a waste of effort. I remember someone floated code for serializing filesystem::path objects on this mailing list an eternity ago, which was never incorporated into a stable release. Thanks again, Nils
Robert Ramey
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
I'd be happy to help by writing serialization code, where it does not exists and I know the library. I am just not sure if it gets accepted or will just be a waste of effort. I remember someone floated code for serializing filesystem::path objects on this mailing list an eternity ago, which was never incorporated into a stable release.
If you need to serialize something you have to write the code. If you like what you've done you can uploaded it to the vault. A few people have done this. Adding to a stable release is a whole 'nother thing. It requires that the package work for all archives, tests which pass on almost all compilers and documentation. Turns out this is a lot more effort than most people can justify. But, all this doesn't apply if it's just added to the vault. There is serialization for flyweight in the sandbox. However it was implemented in a way which required changes in the base library and it's interface. I couldn't accept this. But the code is still there if anyone cares to check it out. Robert Ramey
I'd be happy to help by writing serialization code, where it does not exists and I know the library. I am just not sure if it gets accepted or will just be a waste of effort. I remember someone floated code for serializing filesystem::path objects on this mailing list an eternity ago, which was never incorporated into a stable release.
If you need to serialize something you have to write the code. If you like what you've done you can uploaded it to the vault. A few people have done this.
Adding to a stable release is a whole 'nother thing. It requires that the package work for all archives, tests which pass on almost all compilers and documentation. Turns out this is a lot more effort than most people can justify. But, all this doesn't apply if it's just added to the vault.
There is serialization for flyweight in the sandbox. However it was implemented in a way which required changes in the base library and it's interface. I couldn't accept this. But the code is still there if anyone cares to check it out.
I might come back to you on that. I have been thinking for a while of contributing something to boost, and this would be much more manageable than a whole library. Cheers, Nils
Robert Ramey
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Nils Krumnack
There is serialization for flyweight in the sandbox. However it was implemented in a way which required changes in the base library and it's interface. I couldn't accept this. But the code is still there if anyone cares to check it out.
I might come back to you on that. I have been thinking for a while of contributing something to boost, and this would be much more manageable than a whole library.
Hi Nils, The serialization code you can find in the sandbox is probably of little use to you, since it requires deep modifications in Boost.Serialization itself that are not part of the official library. An obvious way to serialize flyweight<T> is to simply serialize their underlying T values (you'll also need to use reset_object_address if T of the tracked kind.) This has the disadvantage that when T is not tracked thatyou'll get many repetitions of the saved values, and thus big archives. The sandbox code tries to remedy this, but alas at the expense of modifications to Boost.Serialization, as mentioned above. I couldn't find a better solution, but I'd be delighted if someone finds it. Count on me for help/guidance if you decide to tackle the task. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
participants (3)
-
Joaquin M Lopez Munoz
-
Nils Krumnack
-
Robert Ramey