Terrimane Pritchett wrote:
I suspect my problem continued problem lies with boost::tokenizer...for more on that skip to the bottom of this post.
First I want to respond to some of the valid concerns presented to me.
The input data *shouldn't* be shared. It shouldn't be but as there is something wrong here I must validate all assumptions.
I am essentially using a boost::thread to process a file for 1 to N files. After I get it working I'll control how many files are processed at once etc. What is relevant here is that each file is processed is a separate thread and there is no communication or sharing of data between threads.
Okay, great. I agree, if you've got a one thread per file then that sounds like they can't stomp on each other.
(don't misinterpret this as criticism...I LOVE all things boost believe me!) If boost::lexical_cast were 100% thread safe I would have no need for guarding access to calling it. Igor R mentioned boost::lexical_cast is thread safe because the function has it own local streambuf instance. I need to do more research but my understanding is that this isn't the case. I was led to believe boost::lexical_cast was not thread safe mostly by reading this discussion:
Okay, fair enough. As I said my opinion was based on a cursory glance at the lexical_cast source and the assumption that it didn't use any shared state.
Back to my problem. Its been suggested that some other library is spawning threads behind my back. I can report that is not the case. I have moved back to the single-threaded implementation and *ONLY* my main thread is a part of my application's process. When I move to the multi-threaded implementation that leverages boost::thread I now have new threads appearing...some I spawned...others I did not. The only difference between both implementations is the inclusion of Boost Threads to the project. If some library were spawning threads behind my back I would see them in the single threaded application at some point and also be able to decipher what kind of threads they were. Neither MSVC or the OS detects anything other than my Main thread running in my single threaded implementation.
Something must explain where the extra threads are coming from.
I agree. Something must explain it. My next suggestion for your threading issue would be to take your current program and keep on cutting it down to a smaller and smaller program that exhibits the problem. When you've got it as small as possible then it will either: a) be clear what you removed to make it work as expected OR b) be a small enough program that you can post it here for others to test
I have full confidence that boost::lexcial_cast is getting bad data. The question for me is how is that possible? Data *shouldn't* be shared. I *shouldn't* be seeing more threads than I explicitly create.
Well it's somewhere is in the program of course. I think your suggestion that the tokeniser may be to blame is possibly the right one. If you have the ability to guard that with a mutex as a quick test that would be quite cool. Otherwise I'd suggest doing the same as the above and sending us a working program that exhibits the problem. If that is not possible for some reason, and there are many reasons why you may not be able to, then I'd suggest playing with guarding the tokeniser first and then see how it goes. It's hard to make suggestions without a fair amount of context about the actual code.
*There is a trailing space at the end of the input string to the casting function* *Other random anomalies cause problems but they are expected when the data is being stomped somewhere*
Well, the extra space is why it's throwing lexical cast...the question of course is why is the tokeniser stripping the space in your single-threaded version (you already said you were using the same data set in both runs) but not in the multi-threaded version. Good luck, I can't wait to see what happens with your next experiment. Regards, Nigel