Boost regex multi-processor scalability and user-defined allocators
From some testing weâve done, it seems that the cause of the scalability issue is problematic implementation of std:allocator. This is a known issue on some operating systems (e.g., Solaris 8). Some of
Iâm using boost-regex library, and Iâm running into multi-threading scalability issues. When tested on a machine with several CPUs, my performance tests show improvement of about 60-70% when moving from a single thread to 2 threads. Question 1: Is anyone aware of this problem? Is there a known solution? the boost::regex classes (e.g., match_results) accept a user-defined allocator. However, as far as I can understand thereâs no way to completely override the use of std::allocator. Question 2: is there a way to completely prevent boost / regex from using std::allocator? If no, can such a capability be added? (One solution to this problem would be global overriding the ânewâ and âdeleteâ operators â but for various reasons this cannot be done in the application Iâm developing). __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Roy Emek wrote:
Iâ?Tm using boost-regex library, and Iâ?Tm running into multi-threading scalability issues. When tested on a machine with several CPUs, my performance tests show improvement of about 60-70% when moving from a single thread to 2 threads.
60-70% doesn't sound too bad to me ?
Question 1: Is anyone aware of this problem? Is there a known solution?
From some testing weâ?Tve done, it seems that the cause of the scalability issue is problematic implementation of std:allocator. This is a known issue on some operating systems (e.g., Solaris 8). Some of the boost::regex classes (e.g., match_results) accept a user-defined allocator. However, as far as I can understand thereâ?Ts no way to completely override the use of std::allocator.
Question 2: is there a way to completely prevent boost / regex from using std::allocator? If no, can such a capability be added?
Actually it used to be there but I was asked to remove it :-( The question you need to ask is which part of the regex lib is causing problems, there are three main areas that use memory allocation: 1) Regex construction: uses std::basic_string and other STL classes along with shared_ptr etc etc, there's no way to change the allocator here. However, the question you need to ask is "do my threads need to construct regexes at all?" Boost.Regex is designed so that multiple threads can share the same immutable regex object. 2) Regex matching: the library needs a stack for the FSM to work on. On Unix like systems this memory is cached and obtained from the routines near the end of regex.cpp, you could replace these with thread specific allocators if this is the overhead. Or.... you could define BOOST_REGEX_RECURSIVE and rebuild everything: regex will then use a program-stack-recursive algorithm that saves on memory allocations, but runs the risk of stack overflow. If you are on a platform that can protect you from stack overruns then this can speed things up a little for single threaded apps, and maybe rather more for multithreaded ones. 3) The final match_results allocation: only once per match/search operation does match_results actually allocate any memory - right at the end when the object is written to with the results. You can avoid even that, if you reuse match_results objects so that they already contain a large enough buffer when they're actually used. HTH, John.
I would like to shortly provide my experience with boost::regex and multi-threading. I used defer library from the boost vault to create scanning jobs and execute them later on. Configuration: boost 1.33, Visual Studio 8.0 Express Edition on single CPU Intel P4 3.2 GH with 1 GB RAM (to be honest I don't know if this CPU has hyperthreading) I ran a scanner on cpp source files which were really big and initial scan with single thread has shown that I require around 30 minutes to make a complete scan. After using defer library for jobs and creating the maximum of 200 threads I was able to acomplish the scan within 6 minutes. 200 threads might sound very hight, since a thread quantum on Windows is by default around 100 ms, but my practical measurements have shown that this value was the most effective on this system. I was able to improve performance by factor 5. My suggestion would be to try using defer or similar library and scan from different "really multiple" threads. With Kind Regards, Ovanes Markarian On Tue, August 22, 2006 12:51, John Maddock wrote:
Roy Emek wrote:
Iâ?Tm using boost-regex library, and Iâ?Tm running into multi-threading scalability issues. When tested on a machine with several CPUs, my performance tests show improvement of about 60-70% when moving from a single thread to 2 threads.
60-70% doesn't sound too bad to me ?
Question 1: Is anyone aware of this problem? Is there a known solution?
From some testing weâ?Tve done, it seems that the cause of the scalability issue is problematic implementation of std:allocator. This is a known issue on some operating systems (e.g., Solaris 8). Some of the boost::regex classes (e.g., match_results) accept a user-defined allocator. However, as far as I can understand thereâ?Ts no way to completely override the use of std::allocator.
Question 2: is there a way to completely prevent boost / regex from using std::allocator? If no, can such a capability be added?
Actually it used to be there but I was asked to remove it :-(
The question you need to ask is which part of the regex lib is causing problems, there are three main areas that use memory allocation:
1) Regex construction: uses std::basic_string and other STL classes along with shared_ptr etc etc, there's no way to change the allocator here. However, the question you need to ask is "do my threads need to construct regexes at all?" Boost.Regex is designed so that multiple threads can share the same immutable regex object.
2) Regex matching: the library needs a stack for the FSM to work on. On Unix like systems this memory is cached and obtained from the routines near the end of regex.cpp, you could replace these with thread specific allocators if this is the overhead. Or.... you could define BOOST_REGEX_RECURSIVE and rebuild everything: regex will then use a program-stack-recursive algorithm that saves on memory allocations, but runs the risk of stack overflow. If you are on a platform that can protect you from stack overruns then this can speed things up a little for single threaded apps, and maybe rather more for multithreaded ones.
3) The final match_results allocation: only once per match/search operation does match_results actually allocate any memory - right at the end when the object is written to with the results. You can avoid even that, if you reuse match_results objects so that they already contain a large enough buffer when they're actually used.
HTH, John.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (3)
-
John Maddock
-
Ovanes Markarian
-
Roy Emek