[serialization] deserializing asynchronously serialized types

I have an unusual use case for boost.serialization, and I was wondering if it would be possible to adapt it to my needs: - I have a set of over 100 types, and instances of each are generated asynchronously then serialized to a file in that order. - The most interesting serialized data will be written just before the power is unexpectedly cut. - I need to load in and run on as much data as possible when reading the serialized data back, ignoring incomplete data at the end (due to a power cut). - The basic Boost serialization examples require you to know the type of the next piece of data to be loaded when reading. Since these types are generated asynchronously they are not known in advance. - I need to write the data out immediately when it arrives because of the power issue. - Files will be getting up to around 150GB in size for binary archives, so it can't be marshaled in memory, it needs to be written immediately even if it is redundant. Is there a way to read in that serialized file using the facilities provided in boost.serialization? Here are my current ideas: - I tried using boost.variant, but it loses its will to compile when I increase the typelist limit to around ~60 types on gcc 4.4, and I have more than a hundred. - Use preprocessor metaprogramming to do something equivalent to boost.variant, but I would very much prefer a more pleasant option. - Serialize an index or custom headers indicating the next type to appear - One way of achieving some of these goals is writing one piece at a time using a binary archive to an fstream, with the index mentioned above separating data types. I don't know what aspects of my requirements will prove to be a problem, so if anyone can provide advice that would help me avoid a major pitfall, it would be greatly appreciated. Thanks for your thoughts. Cheers! Andrew Hundt

Andrew Hundt wrote:
I have an unusual use case for boost.serialization, and I was wondering if it would be possible to adapt it to my needs:
- I have a set of over 100 types, and instances of each are generated asynchronously then serialized to a file in that order. - The most interesting serialized data will be written just before the power is unexpectedly cut. - I need to load in and run on as much data as possible when reading the serialized data back, ignoring incomplete data at the end (due to a power cut). - The basic Boost serialization examples require you to know the type of the next piece of data to be loaded when reading. Since these types are generated asynchronously they are not known in advance. - I need to write the data out immediately when it arrives because of the power issue. - Files will be getting up to around 150GB in size for binary archives, so it can't be marshaled in memory, it needs to be written immediately even if it is redundant.
Is there a way to read in that serialized file using the facilities provided in boost.serialization?
Here are my current ideas: - I tried using boost.variant, but it loses its will to compile when I increase the typelist limit to around ~60 types on gcc 4.4, and I have more than a hundred. - Use preprocessor metaprogramming to do something equivalent to boost.variant, but I would very much prefer a more pleasant option. - Serialize an index or custom headers indicating the next type to appear - One way of achieving some of these goals is writing one piece at a time using a binary archive to an fstream, with the index mentioned above separating data types.
I don't know what aspects of my requirements will prove to be a problem, so if anyone can provide advice that would help me avoid a major pitfall, it would be greatly appreciated.
Thanks for your thoughts.
Cheers! Andrew Hundt
Andrew Hundt wrote:
I have an unusual use case for boost.serialization, and I was wondering if it would be possible to adapt it to my needs:
- I have a set of over 100 types, and instances of each are generated asynchronously then serialized to a file in that order. - The most interesting serialized data will be written just before the power is unexpectedly cut. - I need to load in and run on as much data as possible when reading the serialized data back, ignoring incomplete data at the end (due to a power cut). - The basic Boost serialization examples require you to know the type of the next piece of data to be loaded when reading. Since these types are generated asynchronously they are not known in advance. - I need to write the data out immediately when it arrives because of the power issue. - Files will be getting up to around 150GB in size for binary archives, so it can't be marshaled in memory, it needs to be written immediately even if it is redundant.
Is there a way to read in that serialized file using the facilities provided in boost.serialization?
Here are my current ideas: - I tried using boost.variant, but it loses its will to compile when I increase the typelist limit to around ~60 types on gcc 4.4, and I have more than a hundred. - Use preprocessor metaprogramming to do something equivalent to boost.variant, but I would very much prefer a more pleasant option. - Serialize an index or custom headers indicating the next type to appear - One way of achieving some of these goals is writing one piece at a time using a binary archive to an fstream, with the index mentioned above separating data types.
I don't know what aspects of my requirements will prove to be a problem, so if anyone can provide advice that would help me avoid a major pitfall, it would be greatly appreciated.
Thanks for your thoughts.
You could still make your own "special purpose variant. Look at the section "serialization wrappers". struct class my_wrapper { unsigned m_i; union { type1 &m_t1 type2 &m_t2 .... }; my_wrapper(type1 t1) : m_i(1), m_t1(t1) {} my_wrapper(type1 t2) : m_i(2), m_t2(t2) {} ... } template<class Archive> void save(Archive &ar, const my_wrapper & w, unsigned int version){ ar << w.i switch(w.i){ case 1: ar << w.t1; break .... } template<class Archive> void load(Archive &ar, my_wrapper & w, unsigned int version){ ar >> w.i switch(w.i){ case 1{ ar >> w.t1 .... } } so now you could just use ar << my_wrapper(t); // where t is any one of 100 types This is basically a poor man's variant which doesn't use compile time coding. Another idea - tricker would be to use a variant of variants to get around the compiler limitations. Robert Ramey
participants (2)
-
Andrew Hundt
-
Robert Ramey