
Here's my review of the library. It's based on reading docs and source for several hours, as well as using almost every previous version of the library on my project. I think the library should be accepted. I voted "no" the previous time, but I think library was considerably improved. I'd also like to thank Robert for pushing this effort forward! Detailed comments are below. Some of them are marked as "critical" -- it means I believe they should be addressed before the library is added to official distribution. ** General issues ** -- (critical) The primary issue with the library is lack of reference documentation -- exact description of semantics for every method (preferrable in C++ std style) and overview for all important classes. The library is non-trivial, and without such documentation user will have to experiment to find out that exact semantic. Better internal docs (comments) are also needed. For example, I'm looking at save_access::end_preamble and don't know what it does. Comment for basic_oarchive::end_preamble says what this method is for, but does not say when this method is called, which makes it hard to do anything. Again, implementation is non-trivial. Lack of comments make it somewhat dangerous to use the library -- if I hit a bug while Robert is on vacation, I'm not sure I'll be able to quickly fix it locally. As an illustraction, I've already had a couple of bugs each costing several hours to debug. -- (critical) While BOOST_CLASS_EXPORT is a big step forward, I think it can be further improved. Now, if I export a class I need to include some archive headers before invoking BOOST_CLASS_EXPORT. The class can be only serialized to archives which were included. This means that to really export a class, I need to include *all* archive headers. That's not good. What I suggest is - create new class 'polymorphic_archive', which will have virtual functions for saving/loading and create templated 'polymorphic_wrapper', which can be used to add this polymorphic interface to existing archive. - always register all exported classes with polymorphic archive. When serializing in a specific archive, check if class is registered with that archive. If not, serialize it via polymprphic wrapper. In fact, I'd be willing to help with polymorphic archive implementation, but some changes inside serialization library will be required. -- (critical) I suspect BOOST_CLASS_EXPORT does not play nice with templates. Say I'm implementing RPC system: class function_call_base { virtual ~function_call_base() {} } ; template<class Arg> class function_call1 : public function_call_base {}; // in some code rpc_server s; s.register("foo", &foo); function_call_base* c; while(archive >> c) (*c)(); The idea is the caller will serialize instances of function_call1 and the callee will deserialize it and call the proper function. To make this work, we have to BOOST_CLASS_EXPORT all used instantinations of function_call1. We can assume that all functions (and therefore all argument types) are passed to the 'register' function and can function can invoke BOOST_CLASS_EXPORT. Now the problem: BOOST_CLASS_EXPORT can be used only at top-level of a translation unit. There's not way 'register' and automatically export a class. The solution I'd like is: template<class T> class export_class { public: static void instantiate(); }; So that the object of this type can be created anywhere. -- One usage of the serialization library I have in mind is taking "debug dumps" for an application. After the run, a separate tool can be used to look at those dumps. The size of data can potentially be rather large (e.g. 100*640*480 or maybe more), so I'd rather not to load all data, but build some index. So the question is: how is it possible to access specific part of archive without reading it all. I understand it's not possible in general, but in some cases (e.g. BOOST_CLASS_EXPORT for everything) it should work. -- The documentation of 'make_binary_object' is very incomplete and does not tell the exact semantic. As far as I can tell, the function can be only used to load/store objects which have fixed size and will crash when saving/loading new[]-d array -- which seemed an obvious use-case for me. I would suggest that we have two functions: make_pod_wrapper -- which is templated, has one parameter -- object, and stores sizeof(T) bytes. Another name -- make_static_array_wrapper. make_dynarray_wrapper -- which can be used to store/load arrays allocated by new[] -- The library requires exact match between static types for saving and loading. For example: D* d = new D; ia << d; B* b = new B; ia >> b; does not work. Moreover, it does not work even if 'D' is BOOST_CLASS_EXPORT-ed. I'd suggest that this case should work, since it's very easy to save derived classes using pointer to derived class, without casting to base class. -- It appears that XML archive require member/variable names in all cases. I'm not sure that's reasonable -- given that the library requires exact correspondence between loading and saving order even for XML archives. It would load archive back even if I change all the names manually in XML file right? Then why requiring names? -- I have some concerns about XML archives. The point of XML, in general is that it's somewhat readable by humans, and readable by other applications written using different libraries or in different languages. However, when I look at: <c2 class_id_reference="1" object_id="_2"> <i>10</i> <j>4</j> <base object_id="_3"></base> </c2> it's not clear if any of those benefits are there. This actually corresponds to a class "C". To find out this, one has to find "class_id" attribute with value of "1" somewhere above. If the above was: <C name="c" ...> finding all "C" elements would be trivial with XPath, or even with more simple methods. As it stands now, even finding all instances of "C" is hard. There may be a number of similiar issues. If the goal is to produce XML which can be processed by other programs, then at least the class name should become element name. -- I believe that library should not change codecvt facet of the stream. There are two reasons. First, if user has already imbued some facet (say utf-8), he might have good reasons to do so (say, he believes utf-8 is the most space-efficient for his data). So the serializaton library should not silently change codecvt to ucs-2. Second, the whole locale stuff is complex, and it's not even clear if it's possible to change codecvt facet on the fly, and when the change will take effect, and so on. I believe requiring user to 'imbue' proper facet himself is better than some unexpected internal effects. -- I've tried to play with "implementation level" and created the example at http://zigzag.cs.msu.su:7813/tracking.cpp It stores a pointer to a single class with two data members and a single base class. I get the following archive content. 1 1 C 0 10 4 1 0 1 Is seems too much for a class with implementation level of "object serializable". 10 and 4 is the data itself. "1 C" is probably class name, but what other data is? ** Implementation issues ** -- I suggest that all MPL expressions in the library are refactored for better readability. For example, the expression in save_non_pointer_type::invoke is 28 lines long. I think it should be possible to extract some of the subexressions and create a typedef with good name -- looks like instantination of those expression in all cases should not cause compile errors. ** Custom archive creation ** I've tried to create a custom archive class (in fact, polymorphic_oarchive), and got some comments. - Documentation says the derived class must provide operator<< and method 'load'. It also says there common_iarchive::operator<<. I wonder why the derived class should provide operator<<. Can't base operator call 'load' directly? - Even if not, why derived operator<< should have return * This(); not plain "return *this"? After all, we're in most derived object - It's needed to add friend class boost::serialization::load_access; to class declaration. However, it's not stated where this type is declared, and in fact in declared in *internal* header boost/archive/detail/iserializer.hpp And, finally, the right syntax seems to be: friend class boost::archive::load_access; (Note the namespace!) - The example uses: boost::serialization::load(* This(), t); Again, what header is this function in? The answer is that it's in the same header as above, and the line should be: boost::archive::load(* This(), t); - The documentation should really state the minimal set of 'load'/'save' overloads which will make archive usable. For example, it's probably not necessary to provide separate overload for 'bool', right? - (critical) It should be explained how the newly-created class can be tested. As it stands, one should either do superficial tests or try to integrate the new archive into existing tests. As for latter -- I don't know how to do that, and also, I think it's not necessary to run all the tests with the new archive. For example, serialization of vector and map is not likely to be broken with the new type. What I'm looking for is a test which saves/loads all possible types and verifies results. ** Various comments ** -- The BOOST_CLASS_EXPORT macros defines an explicit instantiation of a function. gcc rejests this when BOOST_CLASS_EXPORT is used inside some namespace. I've asked on comp.std.c++: http://groups.google.com/groups?q=Explicit+instantiation&hl=en&lr=&ie=UTF-8&safe=off&selm=u5boi1-7te.ln1% 40zigzag.lvk.cs.msu.su&rnum=3 and got a reply directing me to: http://std.dkuug.dk/jtc1/sc22/wg21/docs/cwg_defects.html#275 Which means gcc is mostly right. So, the macro should either be changed somehow, or at least the restriction should bes documented -- using BOOST_CLASS_EXPORT outside of any namespaces is OK. -- Docs and code use "const unsigned long int version" everywhere. Is that really necessary, as opposed to "unsigned version"? -- The library saves char* as arrays of integers. The rationale section says: input of char * and wchar_t * could be altered to create a buffer overrun and potential security problem. I don't understand this point at all. How storing the elements as integer values helps with the buffer overrun? Moreover, library stores char* internally (e.g. for archive signature). What if that string is modified? Will that cause buffer overrun? -- The following program (yes, of two lines), causes a compile error: #include <boost/archive/text_oarchive.hpp> #include <boost/serialization/export.hpp> It's because export.hpp uses 'instantiate_pointer_iserializer' but does not include the header where that class is defined. 'text_oarchive.hpp' does not include it either. The error goes away if I include #include <boost/archive/text_iarchive.hpp> but a better diagnostic is desirable. - Volodya
participants (1)
-
Vladimir Prus