[serializtion] Problem with multiple archives and Intel compiler
I am having great difficulty in architecting my boost serialization
solution with the Intel compiler.
The problem is this: I need to support text, xml, and binary archive
formats. I allow the user to select which format to save in, by
default using the portable text format. If the user wants greater
speed, they can choose binary, or if they want to visually inspect the
data, they can choose xml. On load, I dynamically detect which
archive is used and call the appropriate load routine (text, xml,
binary).
However, if I include all three archives in the same compilation unit,
the default (text) serialization is slowed down dramatically, with the
runtimes increasing approximately 50% (from 200 to 300 seconds, and
this on a fast 64-bit Opteron box).
I tried breaking things apart, putting my archive includes and
BOOST_CLASS_EXPORT calls into separate compilation units.
I have the following code structure:
class AbstractBase {
virtual void method() = 0;
};
BOOST_IS_ABSTRACT(AbstractBase)
class Derived: public AbstractBase {
virtual void method() {}
};
class A {
Def* def;
template <class Archvie>
void serialize(Archive& ar, const unsigned int version) {
ar & def;
}
};
I break my serialization code into three different files, one for
each type of archive, each of which includes a common file that has
a templated method for loading a serialized file, along with
BOOST_CLASS_EXPORT for the derived class.
Here is the text version:
#include
Bill Lear wrote:
Is there any way at all to fix this? I have run a profiler on the Intel code and it simply seems to be "doing more" than the non-Intel version (more comparison operators, chiefly).
There are two way so approach an problem like this: a) Try to work around it. b) Try to fix it. My approach as a library user is to first try a) and often that does the trick. Well, you've tried a) and it turns out that it isn't easy. At some point one has to concede that the work around is going to be more painful than trying to fix the problem. I think you've come to that point. So I would go back to the profiler. It would seem that something is being called way to often. I would try starting up the debugger and putting a trap in that something. Then check the backtrace to see what's going on. That is you're basically debugging the STL implementation. This should reveal what one might do to fix this. For example, its possible that for some reason the lookup is just very slow and having more entries iin the static tables make taht worse. I would hope that this kind of sleuthing would reveal the exact place to fix the problem once an for all. I'm sorrty I can't give you a better answer. But sometimes thate is no other. If you do find this, let us know so we can roll it into the library. Robert Ramey .
On Monday, June 26, 2006 at 21:13:42 (-0700) Robert Ramey writes:
Bill Lear wrote: ... I'm sorrty I can't give you a better answer. But sometimes thate is no other. If you do find this, let us know so we can roll it into the library.
Not a problem. I did do some quick profiling and all I found was that "operator <" was at the top of the list for the Intel compiler, but not for gcc. I'll see if I can gather more info. I'm curious: for object tracking, do all objects that are tracked by pointer get put into the same STL collection (I think it's a set)? This may sound crazy: but would it be possible to use a templated set based on the object type: so if I have 6,000 types of objects, I have 6,000 relatively small sets, instead of one set? I was just thinking that there is no point in saving all pointers in one set (aside from simplicity, a big reason of course). Perhaps this is just a red herring, but if that were possible, for very large data sets (we have HUUUUGE data sets), this might significantly improve performance. I'm sure it would be TRIVIAL to try this out:-). Next question: after serializing the object and destroying all of the archive variables, does boost get rid of all of the data structures with object pointers? So, if I do: { ofstream ofs("file", ios::binary); text_oarchive in(ifs); in << object; } when the scope exits, is the set tracking the object pointers destroyed? Thanks for all of your help. Bill
Bill Lear wrote:
On Monday, June 26, 2006 at 21:13:42 (-0700) Robert Ramey writes:
Bill Lear wrote: ... I'm sorrty I can't give you a better answer. But sometimes thate is no other. If you do find this, let us know so we can roll it into the library.
Not a problem. I did do some quick profiling and all I found was that "operator <" was at the top of the list for the Intel compiler, but not for gcc. I'll see if I can gather more info.
I'm curious: for object tracking, do all objects that are tracked by pointer get put into the same STL collection (I think it's a set)?
yes
This may sound crazy: but would it be possible to use a templated set based on the object type: so if I have 6,000 types of objects, I have 6,000 relatively small sets, instead of one set? I was just thinking that there is no point in saving all pointers in one set (aside from simplicity, a big reason of course). Perhaps this is just a red herring, but if that were possible, for very large data sets (we have HUUUUGE data sets), this might significantly improve performance.
My thinking is exactly the opposite. I believe that putting all the tracking into one collection will facilitate "load-leveling". Here are some random observations on serialization performance which are "on my list" but have yet to be implemented. This is largely because they won't effect most users and they can't just be implemented without a lot of experimentation and profiling. So the programming isn't the bottleneck - its determining what to program. a) Currently, I believe - without the benefit of having profiled the code that for small data sets - most of the time is consumed in setting up the STL collections used for tracking. This has been reported by one user who graciously took the time to run some good tests. So for lots of small serializations (like for inter-process communication) there are benefits to stream lining archive creation and perhaps re-use. b) Currently, there are two slightly different sets used for tracking pointers. One is used to implement delete_created_pointers. These can be consolidated into one. This would help a lot for loading pointers. c) For STL collections used by the archives - there is no way to specify optimizations which can have a huge impact. Longer term - one would like to be able to reserve memory in large chunks and/or also to specify a custom allocator. This would make a huge difference for saving/loading tracked information and permit "tuning" based on experimentation. This expermentation based tuning is about the only way to do it since results are necessarily going to depend up the standard library implementation. Also, one size fits all can't do the job. Setting large reserve sizes for collections is going to conflict with other users who want to invoke lots of tiny serializations. Implementing this would require lots of testing, update of API and its documentation, etc. And currently: i) testing serialization is already consuming too many resources and we're looking to cut it back. ii) enhancing API and its documentation has already exhausted my patience and I'm trying to cut back here as well.
I'm sure it would be TRIVIAL to try this out:-).
great - let us know how it works out.
Next question: after serializing the object and destroying all of the archive variables, does boost get rid of all of the data structures with object pointers?
yes
So, if I do: { ofstream ofs("file", ios::binary);
text_oarchive in(ifs);
in << object; }
when the scope exits, is the set tracking the object pointers destroyed?
yes Finally, I got the impression from your previous posts that your profiler exeperiments suggested that he problem was with the lookup into the static variable sets which contain one entry for each type and/or combination type-archive. There are a lot of lookups into these tables but the tables are small so it shouldn't consume a lot of time. On the other hand, the symptom you note - much slower when code is instanticiated but never called - makes the tables bigger (still very small) but doesn't change anything else. Note that this is ENTIRELY different than the issue of time consumed by object tracking which would be effected only by the archives actually used. I believe that as programmers, we're too quick to fix stuff like this based on speculation. "To a man with a hammer in his hand - whole world looks like a nail". I think the best thing is to step back and get more factual data. I would like to see the Jamfiles updated to add a few comprehensive profiling tests to the serialization library tests. These would take too much time to be run by the tester's so they would be commented out or conditioned on some environmenatal variable. Also VC 7.1 has the statistical profiler removed so it would only work with GCC (mingw?). This would be a huge help.
Thanks for all of your help.
LOL - I don't think I helped you much - but you're welcome anyway.
Bill
Robert Ramey
participants (2)
-
Bill Lear
-
Robert Ramey