
(brought over from fast array serialization thread) Robert Ramey wrote:
We will get to that. I'm interested in incorporating your improved testing. But I do have one concern. I test with windows platforms including borland and msvc. These can be quite different than just testing with gcc and can suck up a lot of time. It may not be a big issue here, but it means you'll have to be aware not to do anything toooo tricky.
Sure. I had this in mind. The changes involve only reducing duplicated work. There aren't any tricks there that are platform specific. Probably things will need some tweaking on platforms I haven't tested the stuff on, and mileage may vary. BTW, the best I can get out of it overall is a factor of two speedup (I got a factor of ~4, but that gain is available in only about half the tests. So you net about two.) Of course this kind of reorganizing doesn't address the "real" underlying MxNxK problem. I think going after that requires a better understanding of the problem than I have at the moment. For instance, with a one-line change to the Jamfile you could cut testing time in half by running dll tests only, if you could establish that any given dll test succeeds if and only if the corresponding static test succeeds, which I only guess is the case. Anyhow such tweaking is very easy to do.
Since you're interested in this I would suggest making a few new directories in your personal boost/libs/serialization tree. I see each of these directories having its own Jamfile so we could just invoke runtest from any of the test suites just by locating to the desired directory.
a) old_test - change the current test directory to this b) test - the current test with your changes to use the unit_test library. You might send me source to one of your changed test to see if I want to comment on it before too much effort is invested. c) test_compatibility. Included your back compatibility tests
My hope was to avoid fragmenting the testing like this and make the testing "modes" switchable from the command line. a) - c) can be accomplished pretty easily in one directory with one Jamfile. One of the more important goals, it seems to me, is to leverage the for-all-archives tests (test_array.cpp, test_set.cpp, test_variant.cpp, etc.) as portability ("portability" as in cross-platform portability for portable archives, and as in backwards-compatibility) tests, and to easily reuse these tests for portability verification.
d) test_performance - I want to include a few tests to test times for thinks like time to serialize different primitives, opening/closing archives, etc. This would be similar to the current setup so I could sort of generate a table which shows which combinations of features and archives are bottlenecks. Its the hope that this would help detect really dumb oversights like recreating an xml character translation table for each xml character serialized !
I'd also like to see stress testing. As I mentioned in some previous thread, we're going to be running terabytes of data through this stuff, and I'm not going to sleep well until we've done it several times successfully. This one does sound to me like a job for a separate testing directory. Anyhow, those changes. They're not polished up, but this will give an idea of how things work. Download http://www.resophonic.com/test.tar.gz, untar it in libs/serialization (delete test/ first). First, explanation of the changes w.r.t unit tests and how they make speedups possible, followed by an explanation of the changes for portability testing. -- Look at test_simple_class.cpp. test_main() has been converted to BOOST_AUTO_UNIT_TEST(unique_identifier), and a couple #includes have been changed. There is a corresponding change of lib in the Jamfile. That's it. If you look at test_map.cpp, you'll see that many of these unit tests can go in the same translation unit. -- Look at test_for_all_archives.cpp. This is where the testing speedup is. test_for_all_archives.cpp gets built once per archive type. This technique can bite you, of course, if your compiler requires too much memory and go to swap. My testing shows the compiler topping out at about 460M for this test, which I would think is still smaller than some other parts of boost. At any rate the file could easily be broken into two. One consequence of #including everything together was a lot of name collisions in different test_*.cpp files, each of which I chased down and resolved by changing names. This could probably have been fixed in some cases more elegantly with namespaces. See classes unregistered_polymorphic_base, null_ptr_polymorphic_base, SplitA, SplitB, TestSharedPtrA, etc. -- Look at the Jamfile, at test-suite "serialization". There you see the test_for_all_archives.cpp and a test_for_one_archive.cpp. I have not checked to see how nicely the testing framework displays failures inside individual unit tests. I've assumed the granularity is good. If it isn't the, test_for_all_archives.cpp business can just be tossed out and the unit tests compiled/linked/run one at a time, as in the current system. Notice also the use of rule templates to provide the demo tests with the exec monitor lib, and the unit tests with the unit test framework lib. Now the changes relating to portability testing: -- Look at test_simple_class.cpp. A reseed() has been added at the top of the test. tmpnam(NULL) has been changed to TESTFILE("unique_identifier"), and remove(const char*) has been changed to finish(const char *). -- Now look at the top of the Jamfile. The switch --portability turns on the #define BOOST_SERIALIZATION_TEST_PORTABILITY which affects the behavior of TESTFILE() and finish(). This (almost) gets you the ability to test portability in various ways. (There are a few more changes required, I'll get to them.) -- Looking at test_tools.cpp, if BOOST_SERIALIZATION_TEST_PORTABILITY is *on*: finish() is a no-op TESTFILE("something") returns a path get_tmpdir()/P/archive-type, Where P is a path that identifies the compiler, platform, and boost version. TESTFILE("nvp1"), for example, could return /tmp/Mac_OS/103300/gcc_version_something/portable_binary_archive.nvp1. if --portability is not specified, TESTFILE() works like tmpnam(NULL) and finish(filename) calls std::remove(filename), which is the "old" functionality. In this way, if each of your testing runs points to the same $TMP, each platform/version/compiler's serialized testing data will be "overlaid" in a directory structure in such a way that you can easily walk the $TMP hierarchy comparing checksums of files with the same name. -- Look at A.hpp. There are now two A's, one portable, one nonportable. In other places I've made similar changes to other classes. The portable version contains only portable types and uses boost random number generators (maybe we want to nix the nonportable one completely and put serialization of nonportable types into their own test somewhere.) std::rand() will of course generate different numbers on one architecture than on others and we need all platforms to generate archives containing A's with exactly the same numbers. (I cannot begin to explain what a thrill it was, as my testing strategy appeared to be on the rocks, to discover that the problem was already solved right there in boost::random.) The reseed() that appears at the top of test_simple_class.cpp reseeds the boost random rngs. So those changes get you switchable portability testing. You just need a utility that walks the hierarchy at $TMP and compares files. I've been using a perl script, you could just pretty easily code one up with boost::crc and boost::filesystem. There's a filesystem-walking routine hanging around in test_tools.cpp. Some minor stuff that I stumbled across and had to resolve in the process, and the open issues that come to mind: -- For platform portability testing, one also has to be careful about containers on some platforms making more temporary copies of A than on others. You create as many A's as you're going to insert into your container, and then insert them one at a time. You can't just call e.g. mymap.insert(A()); multiple times, as you don't know how many times A::A() will get called inside that call to insert(). This will get you serialized maps, for instance, where only the first-inserted entry match. Took a while to track down, but they're all fixed. -- Jamfile is revamped per Rene's suggestions using rule templates. I'm sure there are a couple of toolset requirements that I've managed to drop, but this should just be putting them back in some places. Overall I think it's more flexible/maintainable, but of course it isn't finished. -- test_class_info_save and test_class_info_load always write their data to one of these platform/version/compiler directories suitable for portability testing. Need to do a little housekeeping, or maybe the whole $TMP/platform/version/compiler stuff is OK for general use, your call. -- These changes are all against boost release 1.33.0. Dunno if things are broken w.r.t the trunk. -- the test_tools.hpp and test_tools.cpp stuff is messy at the moment. This should probably just be broken out into a separate lib. -- I'm not 100% clear on my use of rule templates in the Jamfile. Somebody might want to take a look at this. Specifically, it isn't clear to me between which of the three colons <define>WHATEVER should go, and where toolset::required-something-or-other should go. I can verify that things work OK for gcc, but I don't have a windows here to test with. -- The DEPENDS $(saving-tests) : $(loading-tests) business is still there. I don't recall if this was deprecated or not. Well let me know what you think. -t