On Tue, May 26, 2015 at 5:29 PM, Hartmut Kaiser
It can be important for O_DIRECT AIO operations. I agree that for buffered I/O, the filesystem overhead will dominate (and, on Linux, you don't have a way to implement futures over buffered I/O without resorting to threads, which will slow things down further).
Actually, on recent Linuces with ext4 reads from page cached data are now kaio wait free. It makes a big difference for warm cache filesystem when you're doing lots of small reads. Writes unfortunately still lock, and moreover exclude readers. Linux has a very long way to go to reach BSD and especially Windows for async i/o.
Optimizing away one allocation to create a promise/future pair (or just a future for make_ready_future) will have no measurable impact in the context of any I/O, be it wait free or asynchronous or both.
In general, all I'm hearing on this thread is 'it could be helpful', 'it should be faster', 'it can be important', or 'makes a big difference', etc. I was hoping that we as a Boost community can do better!
Nobody so far has shown the impact of this optimization technique on a real world applications (measurements). Or at least, measurement results from artificial benchmarks under heavy concurrency conditions (using decent multi-threaded allocators like jemalloc or tcmalloc). I'd venture to say that there will be no measurable speedup (unless proven otherwise).
+1 I'd add that not only there should be a measurable improvement, but it should be a measurable improvement compared to other reasonable optimizations, for example allocations can be optimized through custom allocators. As an analogy, it's not sufficient to show that shared_ptr is "too slow" or "allocates too much" compared to some other smart pointer type -- it also must be shown that the slowness can't be trivially dealt with by implementing a custom allocator for some shared_ptr instances, if needed. -- Emil Dotchevski Reverge Studios, Inc. http://www.revergestudios.com/reblog/index.php?n=ReCode