
Bill Lear wrote:
On Monday, June 26, 2006 at 21:13:42 (-0700) Robert Ramey writes:
Bill Lear wrote: ... I'm sorrty I can't give you a better answer. But sometimes thate is no other. If you do find this, let us know so we can roll it into the library.
Not a problem. I did do some quick profiling and all I found was that "operator <" was at the top of the list for the Intel compiler, but not for gcc. I'll see if I can gather more info.
I'm curious: for object tracking, do all objects that are tracked by pointer get put into the same STL collection (I think it's a set)?
yes
This may sound crazy: but would it be possible to use a templated set based on the object type: so if I have 6,000 types of objects, I have 6,000 relatively small sets, instead of one set? I was just thinking that there is no point in saving all pointers in one set (aside from simplicity, a big reason of course). Perhaps this is just a red herring, but if that were possible, for very large data sets (we have HUUUUGE data sets), this might significantly improve performance.
My thinking is exactly the opposite. I believe that putting all the tracking into one collection will facilitate "load-leveling". Here are some random observations on serialization performance which are "on my list" but have yet to be implemented. This is largely because they won't effect most users and they can't just be implemented without a lot of experimentation and profiling. So the programming isn't the bottleneck - its determining what to program. a) Currently, I believe - without the benefit of having profiled the code that for small data sets - most of the time is consumed in setting up the STL collections used for tracking. This has been reported by one user who graciously took the time to run some good tests. So for lots of small serializations (like for inter-process communication) there are benefits to stream lining archive creation and perhaps re-use. b) Currently, there are two slightly different sets used for tracking pointers. One is used to implement delete_created_pointers. These can be consolidated into one. This would help a lot for loading pointers. c) For STL collections used by the archives - there is no way to specify optimizations which can have a huge impact. Longer term - one would like to be able to reserve memory in large chunks and/or also to specify a custom allocator. This would make a huge difference for saving/loading tracked information and permit "tuning" based on experimentation. This expermentation based tuning is about the only way to do it since results are necessarily going to depend up the standard library implementation. Also, one size fits all can't do the job. Setting large reserve sizes for collections is going to conflict with other users who want to invoke lots of tiny serializations. Implementing this would require lots of testing, update of API and its documentation, etc. And currently: i) testing serialization is already consuming too many resources and we're looking to cut it back. ii) enhancing API and its documentation has already exhausted my patience and I'm trying to cut back here as well.
I'm sure it would be TRIVIAL to try this out:-).
great - let us know how it works out.
Next question: after serializing the object and destroying all of the archive variables, does boost get rid of all of the data structures with object pointers?
yes
So, if I do: { ofstream ofs("file", ios::binary);
text_oarchive in(ifs);
in << object; }
when the scope exits, is the set tracking the object pointers destroyed?
yes Finally, I got the impression from your previous posts that your profiler exeperiments suggested that he problem was with the lookup into the static variable sets which contain one entry for each type and/or combination type-archive. There are a lot of lookups into these tables but the tables are small so it shouldn't consume a lot of time. On the other hand, the symptom you note - much slower when code is instanticiated but never called - makes the tables bigger (still very small) but doesn't change anything else. Note that this is ENTIRELY different than the issue of time consumed by object tracking which would be effected only by the archives actually used. I believe that as programmers, we're too quick to fix stuff like this based on speculation. "To a man with a hammer in his hand - whole world looks like a nail". I think the best thing is to step back and get more factual data. I would like to see the Jamfiles updated to add a few comprehensive profiling tests to the serialization library tests. These would take too much time to be run by the tester's so they would be commented out or conditioned on some environmenatal variable. Also VC 7.1 has the statistical profiler removed so it would only work with GCC (mingw?). This would be a huge help.
Thanks for all of your help.
LOL - I don't think I helped you much - but you're welcome anyway.
Bill
Robert Ramey