random_test regressions (and some solutions)

I've tracked down the cause of the random_test failures with some Win32 compilers (Borland and Intel 8 and 9 certainly). Within the test, the random number generator gets passed around by value when constructing a variate_generator. However there are a couple of generators that contain large arrays of long's (the mersenne twister is the first to be tested that has this problem). It's reasonably well known that on Windows, copying large arrays on the stack can overrun Window's stack protection and growth mechanism, resulting in the program trying to write to an invalid address. For the Borland compiler, adding: <linkflags>-lS:8000000 <linkflags>-lSc:8000000 Fixes the regression. For Intel adding: <linkflags>-STACK:8000000:8000000 Works most of the time, but seems to be a bit capricious (it appears you have to delete all temporary linker files before attempting the link, otherwise strange things happen). I'm not even sure if this is the right fix: should we really be passing such large arrays around on the stack in the first place? John.

John Maddock wrote:
I've tracked down the cause of the random_test failures with some Win32 compilers (Borland and Intel 8 and 9 certainly).
Thanks a lot.
Within the test, the random number generator gets passed around by value when constructing a variate_generator.
Indeed, after all, random_test claims to be a test. :-)
However there are a couple of generators that contain large arrays of long's (the mersenne twister is the first to be tested that has this problem).
sizeof(boost::mt19937) is 4996. Funny that this already causes problems.
It's reasonably well known that on Windows, copying large arrays on the stack can overrun Window's stack protection and growth mechanism, resulting in the program trying to write to an invalid address.
[Linker flags for Borland and Intel omitted.]
I'm not even sure if this is the right fix: should we really be passing such large arrays around on the stack in the first place?
Frankly, I'm completely puzzled that apparently relatively modern Windows versions can't handle medium-sized (5 KB) stack allocations. After all, the wonders of modern MMUs enable operating systems to handle similar tasks such as demand-paging just fine. And Unix-style operating systems do handle arbitrary stack growth just fine. Looks like the only option is to allocate the data area on the heap. Which increases the overhead for creating and destroying a mersenne_twister quite a lot. Jens Maurer

[Linker flags for Borland and Intel omitted.]
I'm not even sure if this is the right fix: should we really be passing such large arrays around on the stack in the first place?
Frankly, I'm completely puzzled that apparently relatively modern Windows versions can't handle medium-sized (5 KB) stack allocations. After all, the wonders of modern MMUs enable operating systems to handle similar tasks such as demand-paging just fine. And Unix-style operating systems do handle arbitrary stack growth just fine.
I'm going by memory here, because I couldn't find the relevent MSDN pages on a quick search, but I believe what happens is this: There is only one guard page, and it's 4K in size, so a 5K object that starts in valid memory can still "straddle" the entire guard page. Since the stack winds downward, attempting to access the start of the object will actually access the part that's just "dropped off" the bottom of the guard page, and bang goes your application.
Looks like the only option is to allocate the data area on the heap. Which increases the overhead for creating and destroying a mersenne_twister quite a lot.
Feels like it should be a PIMPL to me, but that's only personal preference. Even though it will slow your construction, a copy-on-write design should give you much faster copy-assign-swap etc. My gut feeling is that kind of change should wait till after 1.33 though - we're only a few days from release after all! HTH, John.
participants (2)
-
Jens Maurer
-
John Maddock