Hello, I am sorry if this is a stupid question, or simply inappropriatie for this list. I am currently camping with this bug for way too long, so I decided to ask this mailinglist. I am currently in the process of finding out where a segfault is, and how to fix it. Now, I am using the boost regex library to parse my regexes, and for some reason when multiple threads use this regex library, it seems to mess up and generate segfaults (it works perfectly with only one thread). Since it also doesn't always occur on the same place, and doesn't even occur /all/ the times, I have a strong feeling this has something to do with thread safety. On the Boost website is was able to find out that the boost regex library should be thread safe when BOOST_HAS_THREADS is defined; I've tested this, and this worked. ( http://www.boost.org/libs/regex/doc/thread_safety.html ) However, I also was able to find a mailinglist message that provides some instructions on how to make certain boost regex functions thread-safe : ( http://lists.boost.org/MailArchives/boost/msg59110.php ) Now, I am confused; is the boost regex library thread-safe or not ? The solution provided in the mailinglist message ( move the regex_replace () function inside its own scope ) can't be applied here... :( I've posted the stacktrace when the problem occurs and the source of the code 'that matters' below. It also occurs in another function, I've posted the other function too. I hope anyone has any experience with this, or is able to help. Thanks in advance! Regards, Leon Mergen One stacktrace: --- #0 std::string::compare(char const*) const (this=0x0, __s=0x40338b81 "C") at /root/gcc-3.3.2/i686-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.h:257 #1 0x4004b101 in boost::c_regex_traits<char>::update() () from /usr/local/lib/libboost_regex-gcc-1_31.so.1.31.0 #2 0x40084cd9 in boost::reg_expression<char, boost::regex_traits<char>, std::allocator<char> >::set_expression(char const*, char const*, unsigned) () from /usr/local/lib/libboost_regex-gcc-1_31.so.1.31.0 #3 0x080648aa in unsigned boost::reg_expression<char, boost::regex_traits<char>, std::allocator<char> >::set_expression<std::char_traits<char>, std::allocator<char> >>(std::basic_string<char, std::char_traits<char>, >>std::allocator<char> > const&, unsigned) (this=0x413d29cc, >>p=@0x413d2adc, f=34055) at basic_regex.hpp:110 #4 0x080647bf in reg_expression<std::char_traits<char>, std::allocator<char> > (this=0x413d29cc, p=@0x413d2adc, f=33031, a=@0x413d29bc) at basic_regex.hpp:114 #5 0x0806474c in basic_regex<std::char_traits<char>, std::allocator<char> > (this=0x413d29cc, p=@0x413d2adc, f=33031, a=@0x413d29bc) at basic_regex.hpp:361 #6 0x080633ee in RegexEngine::parseReplaceRegex(std::string*, std::string*, std::string*) (this=0x80d1160, regex=Cannot look up value of a typedef) at RegexEngine.cc:8 #7 0x0808070a in ModuleDigitsToText::applyModule(std::string*, std::string) (this=0x80d3190, data=0x80d16e0, language= {static npos = 4294967295, _M_dataplus ={<allocator<char>> = {<No data fields>}, _M_p = 0x80d244c "en"}, static _S_empty_rep_storage = {0, 0, 77, 0}}) at ModuleDigitsToText.cc:565 #8 0x08080270 in ModuleDigitsToText::applyModule(InputData*) (this=0x80d3190, inputData=0x80d16d8) at ModuleDigitsToText.cc:527 #9 0x08062200 in ModuleParser::parse(InputData*) (this=0x80d1150, inputData=0x80d16d8) at ModuleParser.cc:105 #10 0x0805e1db in Normaliser::normalise() (this=0x80d1698) at Normaliser.cc:107 #11 0x080831f4 in Client::operator()() (this=0x80ca808) at Client.cc:44 #12 0x080542bf in boost::detail::function::void_function_obj_invoker0<Client, void>::invoke(boost::detail::function::any_pointer) (function_obj_ptr= {obj_ptr = 0x80ca808, const_obj_ptr = 0x80ca808, func_ptr = 0x80ca808, data = "\b"}) at function_template.hpp:128 #13 0x400a69f4 in boost::thread_group::join_all() () from /usr/local/lib/libboost_thread-gcc-mt-1_31.so.1.31.0 #14 0x401cff60 in pthread_start_thread () from /lib/i686/libpthread.so.0 #15 0x401d00fe in pthread_start_thread_event () from /lib/i686/libpthread.so.0 #16 0x402f7327 in clone () from /lib/i686/libc.so.6 --- And are the two functions where the segfault occurs. Line 8 of RegexEngine.cc is the "boost::regex expression(*regex);" line of the parseReplaceRegex() function. --- std::string RegexEngine::parseReplaceRegex (const std::string * regex, const std::string * replace, const std::string * subject) { // Create an expression out of the regex boost::regex expression(*regex); // Return the replaced regex { return boost::regex_replace(*subject, expression, *replace, boost::match_default | boost::format_sed); } } bool RegexEngine::parseMatchRegex (std::string * regex, std::string * subject, RegexResultSet * what) { #ifdef DEBUG std::cout << "+ RegexEngine::parseMatchRegex[long] ( " << *regex << ", " << *subject << " )" << std::endl; #endif // DEBUG // Prepare result set boost::match_results <std::string::const_iterator> results; // Create string iterators std::string::const_iterator start = subject->begin(); std::string::const_iterator end = subject->end(); // Initialise boolean to know wether a match was reached bool success = false; // Create an expression out of the regex boost::regex expression(*regex); std::vector < std::string > regexResults; // Iterate over the results while (boost::regex_search(start, end, results, expression, boost::match_default)) { regexResults.clear(); for (boost::match_results<std::string::const_iterator>::iterator i = results.begin(); i != results.end(); ++i) { regexResults.push_back(*i); } // Push back a result what->push_back(regexResults); // Set new string iterator start = results[0].second; // Mark that a match was reached success = true; } #ifdef DEBUG std::cout << "- RegexEngine::parseMatchRegex ()" << std::endl; #endif // DEBUG // Return wether a match was reached return success; } ---
On Wed, Jan 26, 2005 at 04:16:25PM +0100, Leon Mergen wrote:
I've posted the stacktrace when the problem occurs and the source of the code 'that matters' below. It also occurs in another function, I've posted the other function too. I hope anyone has any experience with this, or is able to help.
As an addition (sorry for the double-post), I also receive assertion faults for the regex library: --- normserv: /usr/local/include/boost-1_31/boost/regex/v4/match_results.hpp:252: void boost::match_results<RandomAccessIterator, Allocator>::set_first(BidiIterator, typename Allocator::size_type) [with BidiIterator = __gnu_cxx::__normal_iterator<const char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > , Allocator = std::allocator<boost::sub_match<__gnu_cxx::__normal_iterator<const char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >]: Assertion `pos+2 < m_subs.size()' failed. --- This generated a SIGABRT, and this is the stacktrace when this happened: --- #0 0x40244b71 in kill () from /lib/i686/libc.so.6 #1 0x401d2cf1 in pthread_kill () from /lib/i686/libpthread.so.0 #2 0x401d300b in raise () from /lib/i686/libpthread.so.0 #3 0x40244904 in raise () from /lib/i686/libc.so.6 #4 0x40245e8c in abort () from /lib/i686/libc.so.6 #5 0x4023de84 in __assert_fail () from /lib/i686/libc.so.6 #6 0x08069fe8 in boost::match_results<__gnu_cxx::__normal_iterator<char const*, std::string>, std::allocator<boost::sub_match<__gnu_cxx::__normal_iterator<char const*, std::string> > > >::set_first(__gnu_cxx::__normal_iterator<char const*, std::string>, unsigned) (this=0x413d28dc, i= {<iterator<std::random_access_iterator_tag,char,int,const char*,const char&>> = {<No data fields>}, _M_current = 0x2 <Address 0x2 out of bounds>}, pos=7) at match_results.hpp:252 #7 0x080677b5 in boost::re_detail::perl_matcher<__gnu_cxx::__normal_iterator<char const*, std::string>, std::allocator<boost::sub_match<__gnu_cxx::__normal_iterator<char const*, std::string> > >, boost::regex_traits<char>, std::allocator<char> >::unwind_paren(bool) (this=0x413d285c, have_match=false) at perl_matcher_non_recursive.hpp:836 #8 0x08065c8e in boost::re_detail::perl_matcher<__gnu_cxx::__normal_iterator<char const*, std::string>, std::allocator<boost::sub_match<__gnu_cxx::__normal_iterator<char const*, std::string> > >, boost::regex_traits<char>, std::allocator<char> >::unwind(bool) (this=0x413d285c, have_match=false) at perl_matcher_non_recursive.hpp:814 #9 0x0806a894 in boost::re_detail::perl_matcher<__gnu_cxx::__normal_iterator<char const*, std::string>, std::allocator<boost::sub_match<__gnu_cxx::__normal_iterator<char const*, std::string> > >, boost::regex_traits<char>, std::allocator<char> >::match_all_states() (this=0x413d285c) at perl_matcher_non_recursive.hpp:158 #10 0x08066e4a in boost::re_detail::perl_matcher<__gnu_cxx::__normal_iterator<char const*, std::string>, std::allocator<boost::sub_match<__gnu_cxx::__normal_iterator<char const*, std::string> > >, boost::regex_traits<char>, std::allocator<char> >::match_prefix() (this=0x413d285c) at perl_matcher_common.hpp:260 #11 0x08066f9b in boost::re_detail::perl_matcher<__gnu_cxx::__normal_iterator<char const*, std::string>, std::allocator<boost::sub_match<__gnu_cxx::__normal_iterator<char const*, std::string> > >, boost::regex_traits<char>, std::allocator<char> >::find_restart_any() (this=0x413d285c) at perl_matcher_common.hpp:679 #12 0x0806588a in boost::re_detail::perl_matcher<__gnu_cxx::__normal_iterator<char const*, std::string>, std::allocator<boost::sub_match<__gnu_cxx::__normal_iterator<char const*, std::string> > >, boost::regex_traits<char>, std::allocator<char> >::find_imp() (this=0x413d285c) at perl_matcher_common.hpp:237 #13 0x08063f55 in boost::re_detail::perl_matcher<__gnu_cxx::__normal_iterator<char const*, std::string>, std::allocator<boost::sub_match<__gnu_cxx::__normal_iterator<char const*, std::string> > >, boost::regex_traits<char>, std::allocator<char> >::find() (this=0x413d285c) at perl_matcher_common.hpp:167 #14 0x0806671b in regex_grep<boost::re_detail::merge_out_predicate<boost::re_detail::string_out_iterator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, __gnu_cxx::__normal_iterator<const char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, char, std::allocator<char>, boost::regex_traits<char> >, __gnu_cxx::__normal_iterator<const char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, char, boost::regex_traits<char>, std::allocator<char> > (foo= {out = 0x413d29a0, last = 0x413d2988, fmt = 0x414716ec "\\1\\3\n<s> \\2 </s>", flags = format_sed, pt = 0x413d2a20}, first= {<iterator<std::random_access_iterator_tag,char,int,const char*,const char&>> = {<No data fields>}, _M_current = 0x4145afac "<s> om zijn brood te verdienen</s>\n<s>Wetenschappelijke ambitie is hem </s>"}, last={<iterator<std::random_access_iterator_tag,char,int,const char*,const char&>> = {<No data fields>}, _M_current = 0x4145aff8 ""}, e=@0x413d2a1c, flags=format_sed) at regex_grep.hpp:49 #15 0x08064b33 in boost::re_detail::string_out_iterator<std::string> boost::regex_replace<boost::re_detail::string_out_iterator<std::string>, __gnu_cxx::__normal_iterator<char const*, std::string>, boost::regex_traits<char>, std::allocator<char>, char>(boost::re_detail::string_out_iterator<std::string>, __gnu_cxx::__normal_iterator<char const*, std::string>, boost::re_detail::string_out_iterator<std::string>, boost::reg_expression<char, boost::regex_traits<char>, std::allocator<char> > const&, boost::re_detail::string_out_iterator<std::string> const*, boost::regex_constants::_match_flags) (out={out = 0x413d2a9c}, first= {<iterator<std::random_access_iterator_tag,char,int,const char*,const char&>> = {<No data fields>},_M_current = 0x4145afac "<s> om zijn brood te verdienen</s>\n<s> Wetenschappelijke ambitie is hem </s>"}, last={<iterator<std::random_access_iterator_tag,char,int,const char*,const char&>> = {<No data fields>}, _M_current = 0x4145aff8 ""}, e=@0x413d2a1c, fmt=0x414716ec "\\1 \\3\n<s> \\2</s>", flags=format_sed) at regex_replace.hpp:41 #16 0x08063b6c in std::basic_string<char, std::char_traits<char>, std::allocator<char> > boost::regex_replace<boost::regex_traits<char>, std::allocator<char>, char>(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::reg_expression<char, boost::regex_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::regex_constants::_match_flags) (s=@0x413d2b3c, e=@0x413d2a1c, fmt=@0x413d2aac, flags=format_sed) at regex_replace.hpp:76 #17 0x08063452 in RegexEngine::parseReplaceRegex(std::string const*, std::string const*, std::string const*) (this=0x80c96c0, regex=Cannot look up value of a typedef ) at RegexEngine.cc:11 #18 0x0808efef in ModuleSplitSentences::findQuotes(std::string) (this=0x80ce818, data= {static npos = 4294967295, _M_dataplus = {<allocator<char>> = {<No data fields>}, _M_p = 0x4145afac "<s> om zijn brood te verdienen</s>\n<s> Wetenschappelijke ambitie is hem </s>"}, static _S_empty_rep_storage = {0, 0, 529, 0}}) at ModuleSplitSentences.cc:141 #19 0x0808e73b in ModuleSplitSentences::applyModule(InputData*) (this=0x80ce818, inputData=0x80d1420) at ModuleSplitSentences.cc:37 #20 0x08062200 in ModuleParser::parse(InputData*) (this=0x80ce388, inputData=0x80d1420) at ModuleParser.cc:105 #21 0x0805e1db in Normaliser::normalise() (this=0x80d13e0) at Normaliser.cc:107 #22 0x080831f4 in Client::operator()() (this=0x80c88a8) at Client.cc:44 #23 0x080542bf in boost::detail::function::void_function_obj_invoker0<Client, void>::invoke(boost::detail::function::any_pointer) (function_obj_ptr= {obj_ptr = 0x80c88a8, const_obj_ptr = 0x80c88a8, func_ptr = 0x80c88a8, data = "�"}) at function_template.hpp:128 #24 0x400a69f4 in boost::thread_group::join_all() () from /usr/local/lib/libboost_thread-gcc-mt-1_31.so.1.31.0 #25 0x401cff60 in pthread_start_thread () from /lib/i686/libpthread.so.0 #26 0x401d00fe in pthread_start_thread_event () from /lib/i686/libpthread.so.0 #27 0x402f7327 in clone () from /lib/i686/libc.so.6 --- Since this concerned an assertion failing, I considered this might be of use to see what's going on... Thanks in advance! Regards, Leon Mergen
I've posted the stacktrace when the problem occurs and the source of the code 'that matters' below. It also occurs in another function, I've posted the other function too. I hope anyone has any experience with this, or is able to help.
As an addition (sorry for the double-post), I also receive assertion faults for the regex library:
That's more worrying, do you have a test case that reproduces the issue (preferable a single threaded one!) Thanks, John.
On Thu, Jan 27, 2005 at 10:59:03AM -0000, John Maddock wrote:
I've posted the stacktrace when the problem occurs and the source of the code 'that matters' below. It also occurs in another function, I've posted the other function too. I hope anyone has any experience with this, or is able to help.
As an addition (sorry for the double-post), I also receive assertion faults for the regex library:
That's more worrying, do you have a test case that reproduces the issue (preferable a single threaded one!)
Hello John, Well, the problem is, I've only seen this fault a few times; it's certainly not a consequent bug (as in, I could run the program three times and the problem only occured once), *and* as far as I am aware these problems only occur after a few minutes when two threads are doing heavy processing (think about around 50 calls a second to this function)... ... since I've built-in the mutexes I don't have this problem at all anymore; however, if I see this problem again and it's reproducable, I'll cut it down to a test case and put up the source code. Thanks for your help, Regards, Leon Mergen
... since I've built-in the mutexes I don't have this problem at all anymore; however, if I see this problem again and it's reproducable, I'll cut it down to a test case and put up the source code.
Sounds like it may be race condition related again: if one thread tries to use a regex that is actually in the process of being constructed by another thread, then it's sub-expression count may suddenly change, leading to your assertion. Just guessing at this point, though... John.
Leon Mergen wrote:
Hello,
I am sorry if this is a stupid question, or simply inappropriatie for this list. I am currently camping with this bug for way too long, so I decided to ask this mailinglist.
I am currently in the process of finding out where a segfault is, and how to fix it. Now, I am using the boost regex library to parse my regexes, and for some reason when multiple threads use this regex library, it seems to mess up and generate segfaults (it works perfectly with only one thread). Since it also doesn't always occur on the same place, and doesn't even occur /all/ the times, I have a strong feeling this has something to do with thread safety.
On the Boost website is was able to find out that the boost regex library should be thread safe when BOOST_HAS_THREADS is defined; I've tested this, and this worked. ( http://www.boost.org/libs/regex/doc/thread_safety.html )
However, I also was able to find a mailinglist message that provides some instructions on how to make certain boost regex functions thread-safe : ( http://lists.boost.org/MailArchives/boost/msg59110.php ) <snip>
That is not specifically about thread-safety in Boost.Regex; the initialisation of local static variables is generally not done in a thread-safe way even by implementations that are intended to support multithreaded programs, and this goes for variables of any type. I can tell you for certain, though, that std::string is not thread-safe in libstdc++ (see <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10350>). That might be the source of the problem. Ben.
On Wed, Jan 26, 2005 at 08:14:36PM +0000, Ben Hutchings wrote:
That is not specifically about thread-safety in Boost.Regex; the initialisation of local static variables is generally not done in a thread-safe way even by implementations that are intended to support multithreaded programs, and this goes for variables of any type.
I can tell you for certain, though, that std::string is not thread-safe in libstdc++ (see <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10350>). That might be the source of the problem.
Hi Ben, Ok, I will look into it tomorrow when I'm back at work. Thanks for your reply. Regards, Leon Mergen
On Wed, Jan 26, 2005 at 08:14:36PM +0000, Ben Hutchings wrote:
That is not specifically about thread-safety in Boost.Regex; the initialisation of local static variables is generally not done in a thread-safe way even by implementations that are intended to support multithreaded programs, and this goes for variables of any type.
Hi Ben, Ok, I've now completely put both functions in a boost::mutex scoped lock and it doesn't generate any segmentation faults anymore; so basically, it works now. For some reason I have the feeling this whole problem could be solved *much* more elegant, but since I have a deadline in 1.5 weeks I think this will do for now... I have found this page explaining in detail why the initialisation is thread-unsafe, in case anyone is interrested: http://weblogs.asp.net/oldnewthing/archive/2004/03/08/85901.aspx Thanks for your help, Leon Mergen
I am sorry if this is a stupid question, or simply inappropriatie for this list. I am currently camping with this bug for way too long, so I decided to ask this mailinglist.
I am currently in the process of finding out where a segfault is, and how to fix it. Now, I am using the boost regex library to parse my regexes, and for some reason when multiple threads use this regex library, it seems to mess up and generate segfaults (it works perfectly with only one thread). Since it also doesn't always occur on the same place, and doesn't even occur /all/ the times, I have a strong feeling this has something to do with thread safety.
On the Boost website is was able to find out that the boost regex library should be thread safe when BOOST_HAS_THREADS is defined; I've tested this, and this worked. ( http://www.boost.org/libs/regex/doc/thread_safety.html )
However, I also was able to find a mailinglist message that provides some instructions on how to make certain boost regex functions thread-safe : ( http://lists.boost.org/MailArchives/boost/msg59110.php )
Now, I am confused; is the boost regex library thread-safe or not ? The solution provided in the mailinglist message ( move the regex_replace () function inside its own scope ) can't be applied here... :(
I've posted the stacktrace when the problem occurs and the source of the code 'that matters' below. It also occurs in another function, I've posted the other function too. I hope anyone has any experience with this, or is able to help.
Regex is thread safe in following sense: given a const regex object, you can safely share that object between multiple threads. However, your backtrace indicates that the problem occurs during regex construction: this is a situation *you* have to deal with (no matter what the data type), the thread safety guarantees only kick in once the object has been constructed, it's up to you to ensure that no race condition occurs during the construction. Your second link above indicates a couple of methods you could use to achieve this. Regards, John.
On Thu, Jan 27, 2005 at 10:53:31AM -0000, John Maddock wrote:
Regex is thread safe in following sense: given a const regex object, you can safely share that object between multiple threads.
However, your backtrace indicates that the problem occurs during regex construction: this is a situation *you* have to deal with (no matter what the data type), the thread safety guarantees only kick in once the object has been constructed, it's up to you to ensure that no race condition occurs during the construction. Your second link above indicates a couple of methods you could use to achieve this.
Hello John, You indeed are correct, and I also came to the conclusion this is something serious I didn't take into account at all; it is not at all boost-related and I apologise for posting these off-topic questions at this mailinglist. Thanks for your replies, Leon Mergen
John Maddock wrote:
Regex is thread safe in following sense: given a const regex object, you can safely share that object between multiple threads.
However, your backtrace indicates that the problem occurs during regex construction: this is a situation *you* have to deal with (no matter what the data type), the thread safety guarantees only kick in once the object has been constructed, it's up to you to ensure that no race condition occurs during the construction. Your second link above indicates a couple of methods you could use to achieve this.
Sorry for jumping into this discussion, since I am not very knowledgable in the use of the regex library. But the buzzwords: "construction", "thread" and "local static" alerted me. There was a bug in the spirit library that seemingly had the same attributes attached. A function contained a local static object. If this is the case in regex library too this should be changed, since such usage is not thread safe. If this is not the case, please simply ignore my post since it does not apply. Roland
A function contained a local static object. If this is the case in regex library too this should be changed, since such usage is not thread safe. If this is not the case, please simply ignore my post since it does not apply.
The local static is in user code, not the library (the library does use some global state, but it's all carefully mutexed...) John.
participants (4)
-
Ben Hutchings
-
John Maddock
-
Leon Mergen
-
Roland Schwarz