Re: [Boost-users] mapped_region locks on multithread

9 Jun 2012

      On Sat, Jun 9, 2012 at 1:34 AM, Mikhail Eremin <meremin@gmail.com> wrote:
...
Hello,
SETTING:
- There is an application, written using Boost Template library, meant for
QUICK processing of bulk text files (cca 50-100Gb each).
- There is a huge, quick and expensive piece of hardware with HUGE amount of
RAM and multiple CPU.
- There is [theoretically] any possible UNIX-like OS, even Microsoft
Windows(R) is considered.
- Boost Thread Pool extension is used; previously memory mapped files
through memory_segment have been used, now got rid of the entire
Boost::interprocess.
- There are NO explicit data items in the application's algorithm to be
shared by threads, each has its own piece of input file, thus - there is NO
explicit concurrency.
PROBLEM:
- Ensure fast processing without locks and threads sleeping.
Currently the threads sleep on some internal mutex. We thought it's been
boost::interprocess (specifically - mmap, wrapped by a mutex), but it
apparently isn't so.
SPECIFIC QUESTION:
- How could we get rid of Boost locks?
Mike
Okay, so you have enough memory to map an entire file into memory at
once?  Are the files read-only?  Where are you using a boost lock to
get rid of?  Probably the threadpool library uses a lock on a queue
somewhere?

You could certainly write this without a threadpool.  I'd imagine that
the cost of launching threads will be insignificant compared to
running the algorithm on these regions:

std::vector< std::pair<uint64_t, uint64_t> > regions;
boost::atomic<size_t> nextRegion;

struct RegionThread {
    /*mmap info variables*/
    RegionThread(/*mmap info*/) : /*mmapinfo member(mmap info) */{}
    void operator() () {
         while(true) {
             size_t next = nextRegion.fetch_add(1);
             if(next >= regions.size()) { break; }
             std::pair<uint64_t, uint64_t> const ®ion = regions[next];
             /*perform algorithm on region of mmapped file...*/
        }
    }
};

void operateOnFile(/*some mmap info*/) {
    regions.clear();
    // set up regions for this file
    nextRegion
    boost::thread_group tg;
    for(size_t i = 0; i < boost::thread::hardware_concurrency(); ++i) {
        tg.create_thread(RegionThread(/*mmap info*/));
    }
    tg.join_all();
}

If you can statically schedule the work into sets of regions that each
thread will work on, this is even easier, and can be done without even
an atomic variable.

  Brian

Re: [Boost-users] mapped_region locks on multithread

Brian Budge