
2015-08-25 0:10 GMT+03:00 Niall Douglas <s_sourceforge@nedprod.com>:
On 24 Aug 2015 at 21:46, Antony Polukhin wrote:
2) "A less toy example: Concatenating files"
That's example not nice in a multiple ways. First of all, it combines explicit usage of `boost::afio::dispatcher` with implicit `current_dispatcher_guard`. It would be good to have single way of dealing with dispatchers in example. `current_dispatcher_guard` seems more simple (and that's what i like in examples :) )
I'm not sure what you mean here. That example creates a dispatcher. It then sets via RAII of the guard variable the current thread local dispatcher.
Let me clarify. In that example you use two approaches, one of wich explicitly uses `dispatcher` (for example `dispatcher->file(ihs_reqs)`) while the other approach uses `dispatcher` implicitly (for example `async_truncate(oh, offset)`). The example would look more solid if only one approach will be used. The implicit dispatcher looks more simple, which I think is good for examples.
Also, thread count 1 seems more right and less error prone (by default this will require no threads synchronization from users).
I'm not sure which thread count you mean here.
By default thread_pool is creted that spawns some performance optimal count of threads. I'm proposing to spawn minimal amount of threads (1 thread) by default. In that way novice user could experiment with library without any need to think of threads synchronization. Such change (1 thread by default) will reduce the library entry barrier. But that's only my opinion, it's not a requirement :)
Additionally this example is hard to understand and requires a lot of code while solves a very common problem. Could AFIO reuse POSIX splice/vmsplice and provide an async_copy function? If yes, then amount of code could be significantly reduced and no memory allocations will be required.
splice/vmsplice are Linux only, not POSIX. FreeBSD implements a subset of them for sockets only.
Unfortunately they have an annoyingly broken API which is incompatible with the ASIO reactor. The problem is SPLICE_F_NONBLOCK because you cannot feed any fd into an alertable wait state for ASIO. That means you need to dedicate a thread to block on the kernel-side copy. I really wish Linux were more like BSD - on BSD you can have a kevent notify you when an async i/o operation changes its state. Lovely API, really efficient too. And is ASIO compatible.
I don't discount looking into them again sometime in the future, but I didn't find any benefit of splice() over read()/write() until 64Kb blocks or bigger, and even then it is not a huge benefit. I suspect the cost of copying memory is low in the single threaded case relative to the cost of the Linux kernel file page cache. Of course, on big thread counts you end up evicting much useful data from the L1/L2/L3 caches with file i/o on Linux, but we're getting into premature optimisation here I think.
You've convinced me. But I still insist on `async_copy` method ready out of the box. This could be really usefull + will untie your hands for future modifications (for example you could try to call splice/tee and if that blocks - do the copying using memory allocations). OK, good news: I've finally understood the docs and all the interface decisions that were made. This leads to my decision to review AFIO in two parts: nonbatch and batch. This is still not a review, just some discussion. Let's start from the nonbatch operations... NONBATCH I like the interface, I love it. Some comments: Most part of the functions in table "AFIO single page cheat sheet for the very impatient" have synchronous analogs in Boost.Filesystem http://www.boost.org/doc/libs/1_59_0/libs/filesystem/doc/reference.html How about changing the names to be more close to the Boost.Filesystem. For example: async_truncate -> async_resize_file truncate -> resize_file async_dir -> async_directory async_rmfile + async_rmsymlink -> async_remove async_rmdir -> async_remove_all I'm not a huge fan of Boost.Filesystem names, but they are almost a part of the C++ Standard now. Keeping close to them will simplify user's life by not forcing the user to learn new names. It will make the migration from sync Filesystem code to async AFIO more easy. Additionally use the Boost.Interprocess naming for file_flags: create_only_if_not_exist -> create_only, create -> open_or_create, add open_only Probably rename `file_flags` into `open_flags`, because those flags are also usable for directories and used only at open. Also describe all the `T` in `async_` operations. For example template<class T, typename> future async_file(T _path, file_flags _flags = file_flags::none) T _path - The filling system path to use. Could be filesystem::path, const char* ... Also the sync operations that return `handle_ptr` does not seem right. Just remove them if possible. This will simplify the codebase and will hide from user's eyes the strange `handle_ptr` type. The problem that I see is implementation of nonbatch operations using batch operations. `std::vector<path_req>(1, std::move(req))).front()` in every nonbatch operation looks really bad. This must be fixed, no vector constructons must be in here. I do not like the `afio::path` class. It almost same as `filesystem::path` and only does some path modifications on Windows platform. It's much better to use `filesystem::path` all around the code and fix the paths directly near the system call. In such way user won't be bothered by additional class that almost does nothing. It's also an overkill to have a duplicate of `filesystem::path` just for minor path fixes. BATCH * TOC has no mention of batch operations. Please describe in a separate chapter what benefits batch operations provide, how fast they are and so forth. * The implementation is not perfect. Too many memory allocations and shared_ptrs. In ideal world there must be only two memory allocations for a batch operation: one in `std::vector<future<> >` constructor, another one in making shared state for the future. * The interface is not ideal and I can not fully understand how to improve it. But here are some ideas. Minor improvements: * make all the async batch operations start with `async_batch_` prefix (or at least `async_`). * follow the C++ Standard naming for `std::packaged_task` like stuff. Rename all the `*_req` to `*_task`: `io_req` -> `io_task`, `path_req` -> `path_task` ... Problems that I see: * std::vector as an input container is not a perfect solution * std::vector as an output container is OK for some cases, but some users may require nonallocating output * too many helper task-like structures (`io_req`, `path_req`) * result of batch operation must be converted to be suitable for next batch operation. In other words, if I want to create 100 files and then I wish to resize and write something to each file, I'll need to move a lot of `future<>`s from each batch output into `*_req` structures. * empty futures for first batch operation is not something that I'd wish to have around How hard would be to: * allow use of any container for providing a batch tasks * return `boost::array<future<>, N>` if input data size is known at compile time * move the `completion` field from each of the `*_req` structures to the batch method signature This will give the following interface: array<future<>, 7> operation = async_batch_file(std::initializer_list<path_task>{ path_task("file1.txt", open_flags::create), path_task("file2.txt", open_flags::create), path_task("file3.txt", open_flags::create), path_task("file4.txt", open_flags::create), path_task("file5.txt", open_flags::create), path_task("file6.txt", open_flags::create), path_task("file7.txt", open_flags::create) }); const off_t offsets[7] = { 4096, 4096, 4096, 4096, 4096, 4096, 4096 }; operation = async_batch_resize_file(std::move(operation), offsets); operation = async_batch_write(std::move(operation), io_task[]{ /* ... */ } ); operation = async_batch_remove(std::move(operation)); when_all_p(std::move(operation)).wait(); -- Best regards, Antony Polukhin