[Boost][Interprocess] Lengthy test runtimes

Hi, The named_recursive test in boost.interprocess takes an excessive amount of time to run. Here you can see it's been running for 45+ minutes 13008 kbelco 25 0 14988 1360 1176 R 99.9 0.0 46:28.97 named_recursive 13014 kbelco 25 0 15040 1380 1188 R 99.8 0.0 46:10.09 named_recursive on fast hardware processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU 5160 @ 3.00GHz Is there something wrong with this test, or my system? I'd like to request that this test be removed or somehow scaled back so it runs a bit faster. Is this possible? Thanks. -- Noel Belcourt

K. Noel Belcourt wrote:
Hi,
The named_recursive test in boost.interprocess takes an excessive amount of time to run. Here you can see it's been running for 45+ minutes
This sounds very strange. In my poor AMD this takes only a few seconds ;-) Can you give any additional hint on what can be happening? Regards, Ion

On Jul 25, 2007, at 12:39 AM, Ion Gaztañaga wrote:
K. Noel Belcourt wrote:
The named_recursive test in boost.interprocess takes an excessive amount of time to run. Here you can see it's been running for 45+ minutes
This sounds very strange. In my poor AMD this takes only a few seconds ;-) Can you give any additional hint on what can be happening?
Well it doesn't exhibit this hang in any consistent fashion. Some nights it goes right though, other times it hangs (like tonight). I hooked named_recursive in gdb and dumped the call stack, it's attached. Anything else I can do to help debug this? -- Noel

K. Noel Belcourt wrote:
On Jul 25, 2007, at 12:39 AM, Ion Gaztañaga wrote:
K. Noel Belcourt wrote:
The named_recursive test in boost.interprocess takes an excessive amount of time to run. Here you can see it's been running for 45+ minutes
This sounds very strange. In my poor AMD this takes only a few seconds ;-) Can you give any additional hint on what can be happening?
Well it doesn't exhibit this hang in any consistent fashion. Some nights it goes right though, other times it hangs (like tonight). I hooked named_recursive in gdb and dumped the call stack, it's attached. Anything else I can do to help debug this?
Ion, this looks suspiciously like the problems I had with notify() vs. notify_all() that I encountered long ago now of the shared queue filling up. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Rene Rivera wrote:
Ion, this looks suspiciously like the problems I had with notify() vs. notify_all() that I encountered long ago now of the shared queue filling up.
I don't know... I think the shared queue problem was a programming error because a blocked thread missed a notification. I think this problem was solved notifying the blocked thread everytime. In this test no condition variable is being used... Regards, Ion

Hi Ion, On Aug 6, 2007, at 10:08 PM, K. Noel Belcourt wrote:
On Jul 25, 2007, at 12:39 AM, Ion Gaztañaga wrote:
K. Noel Belcourt wrote:
The named_recursive test in boost.interprocess takes an excessive amount of time to run. This sounds very strange. In my poor AMD this takes only a few seconds ;-) Can you give any additional hint on what can be happening?
Well it doesn't exhibit this hang in any consistent fashion. Some nights it goes right though, other times it hangs (like tonight). I hooked named_recursive in gdb and dumped the call stack, it's attached. Anything else I can do to help debug this?
I should have mentioned that I ran ipcs, here's the output. [~]$ ipcs -a ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 0 root 777 94208 0 0x00000000 65537 root 777 94208 1 ------ Semaphore Arrays -------- key semid owner perms nsems ------ Message Queues -------- key msqid owner perms used-bytes messages What kind of IPC object does this test create? Why is there no IPC object in the kernel data structures? I don't believe there's anything running on my system that would call ipcrm. There's two copies of this test currently running on my system (gcc-3.4.3 and gcc-4.0.1), would that matter? 19303 kbelco 25 0 14944 1380 1188 R 98.9 0.0 40:00.53 named_recursive 19328 kbelco 25 0 14988 1360 1176 R 98.9 0.0 34:25.73 named_recursive -- Noel

K. Noel Belcourt wrote:
I should have mentioned that I ran ipcs, here's the output.
[~]$ ipcs -a
------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 0 root 777 94208 0 0x00000000 65537 root 777 94208 1
------ Semaphore Arrays -------- key semid owner perms nsems
------ Message Queues -------- key msqid owner perms used-bytes messages
What kind of IPC object does this test create? Why is there no IPC object in the kernel data structures? I don't believe there's anything running on my system that would call ipcrm.
This depends on the system, but for the moment it creates a shared memory with an anonymous mutex. Interprocess uses POSIX primitives (see /dev/shm or /dev/sem for POSIX resources) instead of system V ones and emulates them with memory mapped files in the temp directory if POSIX shared memory is not provided (unistd.h does not define POSIX_SHARED_MEMORY_OBJECTS > 0). Your stack points that the thread is blocked on: while(value == InitializingSegment || value == UninitializedSegment){ detail::thread_yield(); value = detail::atomic_read32(patomic_word); } which means that the thread has found the shared memory created but the flag to synchronize the thread opening the segment until the segment is initialized by the thread that created the segment is not updated. This could mean that the thread that created the segment has been killed before initializing properly the segment.
There's two copies of this test currently running on my system (gcc-3.4.3 and gcc-4.0.1), would that matter?
19303 kbelco 25 0 14944 1380 1188 R 98.9 0.0 40:00.53 named_recursive 19328 kbelco 25 0 14988 1360 1176 R 98.9 0.0 34:25.73 named_recursive
This is really nasty. You should only have one instance, since the test is just single-threaded. This can be causing several problems. Can you investigate a bit why is this happening? I'm thinking about the use of shared memory + anonymous synchronization objects to emulate named synchronization objects and I think I should use native named semaphores when available in systems that provide them. I think it will use less resources. Added to my to-do list. Regards, Ion

On Aug 7, 2007, at 1:41 AM, Ion Gaztañaga wrote:
K. Noel Belcourt wrote:
There's two copies of this test currently running on my system (gcc-3.4.3 and gcc-4.0.1), would that matter?
19303 kbelco 25 0 14944 1380 1188 R 98.9 0.0 40:00.53 named_recursive 19328 kbelco 25 0 14988 1360 1176 R 98.9 0.0 34:25.73 named_recursive
This is really nasty. You should only have one instance, since the test is just single-threaded. This can be causing several problems. Can you investigate a bit why is this happening?
Because I run the boost regression tests like this python regression.py --runner="Sandia-gcc" --mail=kbelco@sandia.gov -- bjam-toolset=gcc --pjl-toolset=gcc --toolsets="gcc-3.4.3,gcc-4.0.1" -- bjam-options=-j4 So bjam is clearly building / running both named_recursive tests at the same time. Am I not permitted to run the boost regression tests for multiple toolsets in a single invocation? -- Noel

K. Noel Belcourt wrote:
Because I run the boost regression tests like this
python regression.py --runner="Sandia-gcc" --mail=kbelco@sandia.gov -- bjam-toolset=gcc --pjl-toolset=gcc --toolsets="gcc-3.4.3,gcc-4.0.1" -- bjam-options=-j4
So bjam is clearly building / running both named_recursive tests at the same time. Am I not permitted to run the boost regression tests for multiple toolsets in a single invocation?
Ummm. Both test might be writing to the same shared memory overwriting their contents. Can you try to run them separately? Otherwise I might need to create a unique shared memory name (I create a new name for each compiler name, but not for different versions of the same compiler). Regards, Ion

Ion Gaztañaga wrote:
K. Noel Belcourt wrote:
Because I run the boost regression tests like this
python regression.py --runner="Sandia-gcc" --mail=kbelco@sandia.gov -- bjam-toolset=gcc --pjl-toolset=gcc --toolsets="gcc-3.4.3,gcc-4.0.1" -- bjam-options=-j4
So bjam is clearly building / running both named_recursive tests at the same time. Am I not permitted to run the boost regression tests for multiple toolsets in a single invocation?
Ummm. Both test might be writing to the same shared memory overwriting their contents. Can you try to run them separately? Otherwise I might need to create a unique shared memory name (I create a new name for each compiler name, but not for different versions of the same compiler).
Ion, you'll have to fix the names. We went through a fair amount of work to make it possible to run test in parallel. It's an important feature as it reduces testing time considerably, even if you only have 1 CPU. Note, it may not only be for different compiler versions. If you have different tests that can clobber each other, it's possible they will clobber each other when run on a single compiler test run is they tests run at the same time. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

FWIW, three interproces library tests hang on HP-UX/ia64: . barrier_test . condition_test . upgradable_mutex_test I did not have a chance to investigate it and, probably, won't have for a while. When I start regression tests, I also start a script which waits for these tests and kills them ('/usr/sbin/fuser -ku test_executable'). And no, I don't run tests for multiple toolsets. Thanks, Boris ----- Original Message ----- From: "Rene Rivera" <grafikrobot@gmail.com> To: <boost@lists.boost.org> Sent: Tuesday, August 07, 2007 4:50 PM Subject: Re: [boost] [Boost][Interprocess] Lengthy test runtimes Ion Gaztañaga wrote:
K. Noel Belcourt wrote:
Because I run the boost regression tests like this
python regression.py --runner="Sandia-gcc" --mail=kbelco@sandia.gov -- bjam-toolset=gcc --pjl-toolset=gcc --toolsets="gcc-3.4.3,gcc-4.0.1" -- bjam-options=-j4
So bjam is clearly building / running both named_recursive tests at the same time. Am I not permitted to run the boost regression tests for multiple toolsets in a single invocation?
Ummm. Both test might be writing to the same shared memory overwriting their contents. Can you try to run them separately? Otherwise I might need to create a unique shared memory name (I create a new name for each compiler name, but not for different versions of the same compiler).
Ion, you'll have to fix the names. We went through a fair amount of work to make it possible to run test in parallel. It's an important feature as it reduces testing time considerably, even if you only have 1 CPU. Note, it may not only be for different compiler versions. If you have different tests that can clobber each other, it's possible they will clobber each other when run on a single compiler test run is they tests run at the same time. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Boris Gubenko wrote:
FWIW, three interproces library tests hang on HP-UX/ia64:
. barrier_test . condition_test . upgradable_mutex_test
I did not have a chance to investigate it and, probably, won't have for a while. When I start regression tests, I also start a script which waits for these tests and kills them ('/usr/sbin/fuser -ku test_executable'). And no, I don't run tests for multiple toolsets.
Thanks for the info. I don't have access to HP-UX systems but I'll try to guess something. Which POSIX features does HP_UX support (I'm referring to those announced at compile time in unistd.h)? Regards, Ion

Ion Gaztanaga wrote:
Thanks for the info. I don't have access to HP-UX systems but I'll try to guess something. Which POSIX features does HP_UX support (I'm referring to those announced at compile time in unistd.h)?
I'm sending unistd.h as an attachment. As for access to a HP-UX/ia64 system, you can get an account on a machine managed by the HP TestDrive Program. For more info, go to <www.testdrive.hp.com> See "Current systems" under "Useful information" for the list of available machines. Thanks, Boris

Rene Rivera wrote:
Ion, you'll have to fix the names. We went through a fair amount of work to make it possible to run test in parallel. It's an important feature as it reduces testing time considerably, even if you only have 1 CPU.
Note, it may not only be for different compiler versions. If you have different tests that can clobber each other, it's possible they will clobber each other when run on a single compiler test run is they tests run at the same time.
Ummm. What do you suggests, temporary names created on the fly (using the system clock, perhaps)? serialization library uses temporary files but that's not enough, since I have to create unique resource names. Regards, Ion

Ion Gaztañaga wrote:
Ummm. What do you suggests, temporary names created on the fly (using the system clock, perhaps)? serialization library uses temporary files but that's not enough, since I have to create unique resource names.
If you have access to it, or can figure out how to get it, using the process ID is the safest. Other than that, getting a temp filename might also work you can use the basename and not bother creating the file itself. Although creating the file might be safer, even if you don't use the file. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

On Aug 7, 2007, at 7:49 PM, Rene Rivera wrote:
Ion Gaztañaga wrote:
Ummm. What do you suggests, temporary names created on the fly (using the system clock, perhaps)? serialization library uses temporary files but that's not enough, since I have to create unique resource names.
If you have access to it, or can figure out how to get it, using the process ID is the safest.
hostid will get you the processor id, but I'm not sure that will solve the problem. If you have a system with one processor and start two IPC jobs, the first job will create the resource, the second job will use the resource created by the first job, and the first job to terminate will remove it, leaving the other one in limbo. -- Noel

K. Noel Belcourt wrote:
On Aug 7, 2007, at 7:49 PM, Rene Rivera wrote:
Ummm. What do you suggests, temporary names created on the fly (using the system clock, perhaps)? serialization library uses temporary files but that's not enough, since I have to create unique resource names. If you have access to it, or can figure out how to get it, using the
Ion Gaztañaga wrote: process ID is the safest.
hostid will get you the processor id, but I'm not sure that will
Did you really mean "processor"? I was thinking of the ID the scheduler gives each running program. Which is obviously unique for all concurrent processes. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

On Aug 7, 2007, at 9:35 PM, Rene Rivera wrote:
K. Noel Belcourt wrote:
On Aug 7, 2007, at 7:49 PM, Rene Rivera wrote:
Ummm. What do you suggests, temporary names created on the fly (using the system clock, perhaps)? serialization library uses temporary files but that's not enough, since I have to create unique resource names. If you have access to it, or can figure out how to get it, using the
Ion Gaztañaga wrote: process ID is the safest.
hostid will get you the processor id, but I'm not sure that will
Did you really mean "processor"?
I need to read a bit more carefully, sorry about the noise.
I was thinking of the ID the scheduler gives each running program. Which is obviously unique for all concurrent processes.
Yes, you are right. -- Noel

K. Noel Belcourt wrote:
On Aug 7, 2007, at 1:41 AM, Ion Gaztañaga wrote:
K. Noel Belcourt wrote:
There's two copies of this test currently running on my system (gcc-3.4.3 and gcc-4.0.1), would that matter?
19303 kbelco 25 0 14944 1380 1188 R 98.9 0.0 40:00.53 named_recursive 19328 kbelco 25 0 14988 1360 1176 R 98.9 0.0 34:25.73 named_recursive
This is really nasty. You should only have one instance, since the test is just single-threaded. This can be causing several problems. Can you investigate a bit why is this happening?
Because I run the boost regression tests like this
python regression.py --runner="Sandia-gcc" --mail=kbelco@sandia.gov -- bjam-toolset=gcc --pjl-toolset=gcc --toolsets="gcc-3.4.3,gcc-4.0.1" -- bjam-options=-j4
So bjam is clearly building / running both named_recursive tests at the same time. Am I not permitted to run the boost regression tests for multiple toolsets in a single invocation?
This will confuse process_jam_logs. - Volodya
participants (5)
-
Boris Gubenko
-
Ion Gaztañaga
-
K. Noel Belcourt
-
Rene Rivera
-
Vladimir Prus