Designing a multi-threaded file parser
Hello, I hope this is not off-topic, but I am interested in hearing about boost classes that will help me design the following: 1) I am designing a multi-threaded file parser for C++11 threads. The file format I am interested in parsing is a tagged format, with some tags requiring further processing, and others not. My working model right now is to have one thread through the file, and if it hits a tag that requires further processing, to push the file offset of this tag into a queue, to be processed by a thread pool. The read thread will spend a lot of time waiting for file IO to complete. Would it be faster to us the ASIO classes to do the reading? 2) I would like to design the following thread scheduling library to help process the tags: The library would have the following features: 1. smallest unit of work for a thread is a job 2. jobs have dependencies - job FOO cannot complete until job BAR has completed 3. threads are divided into groups 4. groups can be divided into bundles (a bundle can share the same processor affinity, for example) 5. groups are assigned to domains - domains contain jobs with similar characteristics, for example slow I/O activity 6. domains contain queues of jobs 7. threads in a group assigned to a domain can execute any jobs in any of the domain's queues Does boost support this type of thread scheduling framework? Can boost help me manage dependencies between jobs? Sorry for these broad design questions - just hoping for some insight into best design here. Many Thanks, Aaron
On 21 Apr 2016 at 20:47, Aaron Boxer wrote:
The read thread will spend a lot of time waiting for file IO to complete. Would it be faster to us the ASIO classes to do the reading?
No, it won't. Memory mapped file i/o is almost guaranteed to be the correct technique to use here. Simply map the entire file into memory, and fire threads at processing it. Let the kernel figure out how best to do the i/o. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
Hi Niall,
On Fri, Apr 22, 2016 at 8:18 AM, Niall Douglas
On 21 Apr 2016 at 20:47, Aaron Boxer wrote:
The read thread will spend a lot of time waiting for file IO to complete. Would it be faster to us the ASIO classes to do the reading?
No, it won't.
Memory mapped file i/o is almost guaranteed to be the correct technique to use here. Simply map the entire file into memory, and fire threads at processing it. Let the kernel figure out how best to do the i/o.
Thanks. I do have a memory mapped file interface in my library for the compressed file, so I can certainly try that. My impression is that memory mapping is best when reading a file more than once, because the first read gets cached in virtual memory system, so subsequent reads don't have to go to disk. Also, it eliminates system calls, using simple buffer access instead Since memory mapping acts as a cache, it can create memory pressure on the virtual memory system, as pages need to be recycled for the next usage. And this can slow things down, particularly when reading files whose total size meets are exceeds current physical memory. In my case, I am reading the file only once, so I think the normal file IO methods will be better. Don't know until I benchmark. Thanks again, Aaron
Niall
-- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 22 Apr 2016 at 10:31, Aaron Boxer wrote:
My impression is that memory mapping is best when reading a file more than once, because the first read gets cached in virtual memory system, so subsequent reads don't have to go to disk. Also, it eliminates system calls, using simple buffer access instead
Since memory mapping acts as a cache, it can create memory pressure on the virtual memory system, as pages need to be recycled for the next usage. And this can slow things down, particularly when reading files whose total size meets are exceeds current physical memory.
In my case, I am reading the file only once, so I think the normal file IO methods will be better. Don't know until I benchmark.
You appear to have a flawed understanding of unified page cache kernels (pretty much all OSs nowadays apart from QNX and OpenBSD). Unless O_DIRECT is on, *all* reads and writes are memcpy()ied from/to the page cache. *Always*. mmap() simply wires parts of the page cache into your process unmodified. Memory mapped i/o therefore saves on a memcpy(), and is therefore the most efficient cached i/o you can do. If you are not on Linux, a read() or write() of >= 4Kb on a 4Kb aligned boundary may be optimised into a page steal by the kernel of that memory page into the page cache such that DMA can be directed immediately into userspace. But, technically speaking, this is still DMA into the kernel page cache as normal, it's just the page is wired into userspace already. So basically you only slow down your code using read() or write(). Use mapped files unless the cost of the memcpy() done by the read() is lower than a mmap(). This is typically 16Kb or so, but it depends on memory bandwidth pressure and processor architecture. That part you should benchmark. Obviously all the above is with O_DIRECT off. Turning it on is a whole other kettle of fish, and I wouldn't recommend you do that unless you have many months of time to hand to write and optimise your own caching algorithm, and even then 99% of the time you won't beat the kernel's implementation which has had decades of tuning and optimisation. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On Fri, Apr 22, 2016 at 2:18 PM, Niall Douglas
On 22 Apr 2016 at 10:31, Aaron Boxer wrote:
My impression is that memory mapping is best when reading a file more than once, because the first read gets cached in virtual memory system, so subsequent reads don't have to go to disk. Also, it eliminates system calls, using simple buffer access instead
Since memory mapping acts as a cache, it can create memory pressure on the virtual memory system, as pages need to be recycled for the next usage. And this can slow things down, particularly when reading files whose total size meets are exceeds current physical memory.
In my case, I am reading the file only once, so I think the normal file IO methods will be better. Don't know until I benchmark.
You appear to have a flawed understanding of unified page cache kernels (pretty much all OSs nowadays apart from QNX and OpenBSD).
Unless O_DIRECT is on, *all* reads and writes are memcpy()ied from/to the page cache. *Always*.
mmap() simply wires parts of the page cache into your process unmodified. Memory mapped i/o therefore saves on a memcpy(), and is therefore the most efficient cached i/o you can do.
If you are not on Linux, a read() or write() of >= 4Kb on a 4Kb aligned boundary may be optimised into a page steal by the kernel of that memory page into the page cache such that DMA can be directed immediately into userspace. But, technically speaking, this is still DMA into the kernel page cache as normal, it's just the page is wired into userspace already.
So basically you only slow down your code using read() or write(). Use mapped files unless the cost of the memcpy() done by the read() is lower than a mmap(). This is typically 16Kb or so, but it depends on memory bandwidth pressure and processor architecture. That part you should benchmark.
Obviously all the above is with O_DIRECT off. Turning it on is a whole other kettle of fish, and I wouldn't recommend you do that unless you have many months of time to hand to write and optimise your own caching algorithm, and even then 99% of the time you won't beat the kernel's implementation which has had decades of tuning and optimisation.
Thanks a lot for the detailed explanation. I tested this on windows : fread/fwrite and memory mapped both gave the same performance in my use case. So, it doesn't look like mem mapping will make much of a difference on windows for my case. Need to test this on Linux. Aaron
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Nial,
A correction : turns out the logic for memory mapping was different from
fread/fwrite logic, in my program.
When they were equal, timing was exactly the same using both methods.
Thanks again for your help,
Aaron
On Sat, Apr 23, 2016 at 9:02 AM, Aaron Boxer
On Fri, Apr 22, 2016 at 2:18 PM, Niall Douglas
wrote: On 22 Apr 2016 at 10:31, Aaron Boxer wrote:
My impression is that memory mapping is best when reading a file more than once, because the first read gets cached in virtual memory system, so subsequent reads don't have to go to disk. Also, it eliminates system calls, using simple buffer access instead
Since memory mapping acts as a cache, it can create memory pressure on the virtual memory system, as pages need to be recycled for the next usage. And this can slow things down, particularly when reading files whose total size meets are exceeds current physical memory.
In my case, I am reading the file only once, so I think the normal file IO methods will be better. Don't know until I benchmark.
You appear to have a flawed understanding of unified page cache kernels (pretty much all OSs nowadays apart from QNX and OpenBSD).
Unless O_DIRECT is on, *all* reads and writes are memcpy()ied from/to the page cache. *Always*.
mmap() simply wires parts of the page cache into your process unmodified. Memory mapped i/o therefore saves on a memcpy(), and is therefore the most efficient cached i/o you can do.
If you are not on Linux, a read() or write() of >= 4Kb on a 4Kb aligned boundary may be optimised into a page steal by the kernel of that memory page into the page cache such that DMA can be directed immediately into userspace. But, technically speaking, this is still DMA into the kernel page cache as normal, it's just the page is wired into userspace already.
So basically you only slow down your code using read() or write(). Use mapped files unless the cost of the memcpy() done by the read() is lower than a mmap(). This is typically 16Kb or so, but it depends on memory bandwidth pressure and processor architecture. That part you should benchmark.
Obviously all the above is with O_DIRECT off. Turning it on is a whole other kettle of fish, and I wouldn't recommend you do that unless you have many months of time to hand to write and optimise your own caching algorithm, and even then 99% of the time you won't beat the kernel's implementation which has had decades of tuning and optimisation.
Thanks a lot for the detailed explanation.
I tested this on windows : fread/fwrite and memory mapped both gave the same performance in my use case. So, it doesn't look like mem mapping will make much of a difference on windows for my case. Need to test this on Linux.
Aaron
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (2)
-
Aaron Boxer
-
Niall Douglas