Interest in serialization library

I recently wrote a small C++11 template library to unify the specification and construction of a serializable object. The library introduces a simple template markup language used to specify the layout of data on the wire, using that specification to construct the type in question much like a named tuple. A sample of the syntax used by the library is provided below: struct Object : serializable <Object, value <NAME("Field 1"), little_endian <uint32_t>>, value <NAME("Field 2"), big_endian <uint32_t>>> { }; The library is available under the MIT license at: http://www.github.com/molw5/framework with associated documentation located at: http://molw5.github.com/framework Is there any interest in including this library in Boost?

On December 28, 2012 3:14:23 AM iwg molw5 <iwg.molw5@gmail.com> wrote:
I recently wrote a small C++11 template library to unify the specification and construction of a serializable object. The library introduces a simple template markup language used to specify the layout of data on the wire, using that specification to construct the type in question much like a named tuple. A sample of the syntax used by the library is provided below:
struct Object : serializable <Object, value <NAME("Field 1"), little_endian <uint32_t>>, value <NAME("Field 2"), big_endian <uint32_t>>> { };
The library is available under the MIT license at:
http://www.github.com/molw5/framework
with associated documentation located at:
http://molw5.github.com/framework
Is there any interest in including this library in Boost?
Does your library support non-intrusive approach of adding serialization support to user's classes? How does it compare to Boost.Serialization in terms of features and performance?

As requested, this post summarizes the major differences between Boost.Serialization and the "serializable" library. To begin, I will examine the two libraries in the context of this library's motivating use-case - protocol specification. It is not uncommon for an application to require full control over how it's data is laid out on the wire, common examples here include network protocols and proprietary archive formats. Native C++ implementations generally follow the following pattern: struct Object { uint32_t x; uint32_t y; }; void read (Stream& in, Object& out) { out >> out.x >> out.y; ntohl(out.x); ntohl(out.y); } void write (Object const& in, Stream& out) { out << htonl(in.x) << htonl(in.y); } The primary advantage of this library over an implementation similar to the above is zero duplication of intent. A reformulation of the above follows: struct Object : serializable <Object, value <NAME("x"), big_endian <uint32_t>>, value <NAME("y"), big_endian <uint32_t>>> { }; Boost.Serialization is difficult to compare here - it is not really intended for this sort of application. A naive implementation using a binary archive, or similar, would likely result in code very similar to the basic read/write pattern given above, especially if the byte ordering is inconsistent across the message. On the other hand it is not at all clear to me that a superficial extension of Boost.Serialization could not be used here to simplify this task dramatically - comments are welcome. With the above in mind, the serializable library can accommodate other use-cases. For example, Boost.Serialization excels at applications (with some caveats of course) where the layout of data may be uniformly defined across the archive. Here the two are much easier to compare - specifically, this library allows for: 1. Zero duplication of intent 2. Little or no overhead over a native C implementation 3. Generic access to the container Point 3 above is worth expanding on - as an object's specification is contained within it's type information generic methods may be written that interact with the object's fields, not unlike Boost.Fusion. In particular, the exposed field names are used to implement a set of generic member-wise comparison methods between a pair of arbitrary serializable objects: inline_object <value <NAME("x"), int>> o1 {0}; inline_object <value <NAME("y"), float>> o2 {1}; assert(less_than(o1 < o2)); On the other hand, Boost.Serialization retains some serious advantages here, such as: 1. Feature rich (native versioning/pointer/... support) 2. Portability (serializable requires C++11) 3. No member-access syntax overhead Again the third point above is worth expanding on - this library identifies variables by typename. Specifically, the NAME macro maps a string literal to a unique identifier used to access the associated variable. As a result, there is a syntax overhead over a raw C structure. Some examples: struct Object : serializable <Object, value <NAME("x"), int>> { ... int foo () { return base <NAME("x")>::get() + 1; } }; int foo (Object& t) { return get <NAME("x")> (t) + 1; } To summarize then - the performance of the two libraries varies dramatically depending on the use-case in question. I hope the above sufficiently illustrated the major differences between the two - if not, comments are welcome. On Thu, Dec 27, 2012 at 8:42 PM, Andrey Semashev <andrey.semashev@gmail.com>wrote:
Does your library support non-intrusive approach of adding serialization support to user's classes? How does it compare to Boost.Serialization in terms of features and performance?
______________________________**_________________ Unsubscribe & other changes: http://lists.boost.org/** mailman/listinfo.cgi/boost<http://lists.boost.org/mailman/listinfo.cgi/boost>
Non-intrusive serialization is supported in a manner similar to that used by Boost.Serialization. The feature set provided by the library is fairly limited compared to Boost.Serialization, particularly in areas where Boost.Serialization excels - see above. The performance of the library is difficult to comment on. The serializable library does not introduce any intrinsic overhead over an equivalent set of operations performed on a raw C structure - does that answer your question?

On Thu, Dec 27, 2012 at 9:39 PM, iwg molw5 <iwg.molw5@gmail.com> wrote:
struct Object : serializable <Object, value <NAME("x"), big_endian <uint32_t>>, value <NAME("y"), big_endian <uint32_t>>> { };
A "serialization" library is concerned with writing and reading the state of objects depending on their type. In general, this is not equivalent to simply writing and reading of the objects' members. What you're defining seems to be a dictionary-type of structure: type-safe map of names to objects. It can be implemented as follows: struct dictionary; shared_ptr<dictionary> create_empty_dictionary(); template <class T> T & bind_element( dictionary &, name const & ); Given a dictionary d, we can create/bind elements it contains: uint32_t & x=bind_element<uint32_t>(d,name("x")); uint32_t & y=bind_element<uint32_t>(d,name("y")); Serialization can then be supported through a separate library (for example Boost Serialization) in terms of loading and saving dictionary objects (implemented in terms of loading and saving the types they contain, according to the serialization library interface.) Emil Dotchevski Reverge Studios, Inc. http://www.revergestudios.com/reblog/index.php?n=ReCode

On Thu, Dec 27, 2012 at 11:23 PM, Emil Dotchevski <emildotchevski@gmail.com>wrote:
A "serialization" library is concerned with writing and reading the state of objects depending on their type. In general, this is not equivalent to simply writing and reading of the objects' members.
What you're defining seems to be a dictionary-type of structure: type-safe map of names to objects. It can be implemented as follows:
struct dictionary; shared_ptr<dictionary> create_empty_dictionary();
template <class T> T & bind_element( dictionary &, name const & );
Given a dictionary d, we can create/bind elements it contains:
uint32_t & x=bind_element<uint32_t>(d,name("x")); uint32_t & y=bind_element<uint32_t>(d,name("y"));
Serialization can then be supported through a separate library (for example Boost Serialization) in terms of loading and saving dictionary objects (implemented in terms of loading and saving the types they contain, according to the serialization library interface.)
Emil Dotchevski Reverge Studios, Inc. http://www.revergestudios.com/reblog/index.php?n=ReCode
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
The library introduces a simple mark-up language used to map common serialization routines to data members – the two are logically independent. For example, you may want to serialize x and y as members of a bit field: struct Object : serializable <Object, bit_field <little_endian <uint16_t>, value <NAME(“x”), bit_value <4, uint32_t>>, value <NAME(“y”), bit_value <10, uint32_t>>> { } The specification above should be read as: - Provide a word-aligned bit field for the serialization of x and y - Read the first 4 bits in the bit field into x, stored as uint32_t - Read the next 10 bits in the bit field into y, stored as uint32_t - Discard the remaining two bits Note - I'm still not entirely happy with the bit field implementations I've put together. The above is intended to be illustrate of the syntax only - a bit_field wrapper is not yet present in the library. The value structure above defines the type of x (in this case uint32_t, or rather a simple wrapper around that type), the remaining information is used to define the serialization of Object.

On Fri, Dec 28, 2012 at 9:39 AM, iwg molw5 <iwg.molw5@gmail.com> wrote:
Non-intrusive serialization is supported in a manner similar to that used by Boost.Serialization.
Could you elaborate? A short example would certainly help.
The performance of the library is difficult to comment on. The serializable library does not introduce any intrinsic overhead over an equivalent set of operations performed on a raw C structure - does that answer your question?
Does the library control the validity of the serialized data when deserializing? Is there an associated overhead?

On Fri, Dec 28, 2012 at 12:38 AM, Andrey Semashev <andrey.semashev@gmail.com
wrote:
Could you elaborate? A short example would certainly help.
struct S { int x; }; namespace framework {namespace serializable { template <> serializable_specification <S> { template <typename Input> static bool read (Input& in, S& out) { return serializable_specification <little_endian <int>>::read(in, out.x)); } template <typename Output> static bool write (S const& in, Output& out) { return serializable_specification <little_endian <int>>::write(in.x, out)); } }; }} Admittedly, the above is not particularly palatable - I wanted to introduce nested serializable_specification calls and avoid referencing operator types above. The library does not constrain the stream type - the above could just as easily assume the presence of stream operators and use those. See: framework/serializable/custom_serialization.cpp for a more complete example and: framework/serializable/custom_serialization_boost.cpp for a syntax closer to that used by boost. Does the library control the validity of the serialized data when
deserializing? Is there an associated overhead?
The library supports custom wrapper types (such as little_endian) and custom implementations (not introduced above - see custom_implementation.cpp or the documentation's tutorial) depending on the type of input validation required.

On Fri, Dec 28, 2012 at 1:07 AM, iwg molw5 <iwg.molw5@gmail.com> wrote:
framework/serializable/custom_serialization.cpp
for a more complete example and:
framework/serializable/custom_serialization_boost.cpp
Slight correction - the paths above were missing the "examples" directory: framework/examples/serializable/custom_serialization.cpp framework/examples/serializable/custom_serialization_boost.cpp

On Fri, Dec 28, 2012 at 12:07 PM, iwg molw5 <iwg.molw5@gmail.com> wrote:
On Fri, Dec 28, 2012 at 12:38 AM, Andrey Semashev <andrey.semashev@gmail.com
wrote:
Could you elaborate? A short example would certainly help.
struct S { int x; };
namespace framework {namespace serializable { template <> serializable_specification <S> { template <typename Input> static bool read (Input& in, S& out) { return serializable_specification <little_endian <int>>::read(in, out.x)); }
template <typename Output> static bool write (S const& in, Output& out) { return serializable_specification <little_endian <int>>::write(in.x, out)); } }; }}
Admittedly, the above is not particularly palatable - I wanted to introduce nested serializable_specification calls and avoid referencing operator types above. The library does not constrain the stream type - the above could just as easily assume the presence of stream operators and use those. See:
framework/serializable/custom_serialization.cpp
for a more complete example and:
framework/serializable/custom_serialization_boost.cpp
for a syntax closer to that used by boost.
I see. It would be great if the library provided a way to define the serialization format as a grammar separately from the object definition - perhaps similarly to how Boost.Fusion adapts structs or in pure C++ syntax without macros in Boost.Spirit style; whichever works best. The serializable_specification approach can be useful for tricky cases but as a general solution it doesn't provide a good readable description of the target format. One more question. Does the library provide a portable binary format for floating point numbers?

On Fri, Dec 28, 2012 at 1:45 AM, Andrey Semashev <andrey.semashev@gmail.com>wrote:
I see. It would be great if the library provided a way to define the serialization format as a grammar separately from the object definition - perhaps similarly to how Boost.Fusion adapts structs or in pure C++ syntax without macros in Boost.Spirit style; whichever works best. The serializable_specification approach can be useful for tricky cases but as a general solution it doesn't provide a good readable description of the target format.
I agree - I've toyed with similar approaches, I have not yet found one I'm satisfied with.
One more question. Does the library provide a portable binary format for floating point numbers?
No, nothing reliable. Technically the little_endian and big_endian wrappers do use a floating-point host endianness flag, FRAMEWORK_HOST_ENDIANNESS_FLOAT, and adjust their behaviour accordingly, however the conversion implementations are extremely rudimentary at the moment. Ideally I'd like tie the implementation of those wrappers to Boost.Endian or similar; I don't believe a library specific solution is suitable here.

On Fri, Dec 28, 2012 at 1:45 AM, Andrey Semashev <andrey.semashev@gmail.com>wrote:
I see. It would be great if the library provided a way to define the serialization format as a grammar separately from the object definition - perhaps similarly to how Boost.Fusion adapts structs or in pure C++ syntax without macros in Boost.Spirit style; whichever works best. The serializable_specification approach can be useful for tricky cases but as a general solution it doesn't provide a good readable description of the target format.
I added some sample code that accomplishes the above - see: examples/serializable/non_intrusive_wrappers_1.cpp Sample syntax: struct Object { uint32_t x; uint32_t y; uint32_t z; }; using ObjectSpecification = alias < link <LINK(Object::x), little_endian <uint32_t>>, link <LINK(Object::y), little_endian <uint32_t>>, link <LINK(Object::z), little_endian <uint32_t>>>; BIND(ObjectSpecification, Object) I'm not particularly happy with that implementation. The above syntax is tolerable and the implementation is light, however that kind of detached specification is not really compatible with generic member access. Another possible approach involved binding the members of Object to a temporary serializable object; this would avoid the preceding problems, but it comes with it's own set of drawbacks. I'll add that implementation to a similar example under non_intrusive_wrappers_2.cpp sometime this weekend. Perhaps one of these was what you were looking for - if not, comments are welcome.
participants (3)
-
Andrey Semashev
-
Emil Dotchevski
-
iwg molw5