Re: [boost] [afio] Formal review of Boost.AFIO

-- Dr. Sebastian Theophil | stheophil@think-cell.com<mailto:stheophil@think-cell.com> Senior Software Engineer We are looking for C++ Developers: http://www.think-cell.com/career ________________________________ think-cell Software GmbH http://www.think-cell.com<http://www.think-cell.com/> Chausseestr. 8/E phone / fax +49 30 666473-10 / -19 10115 Berlin, Germany US phone / fax +1 800 891 8091 / +1 212 504 3039 Amtsgericht Berlin-Charlottenburg, HRB 85229 | European Union VAT Id DE813474306 Directors: Dr. Markus Hannebauer, Dr. Arno Schödl On 27 Aug 2015, at 04:45, boost-request@lists.boost.org<mailto:boost-request@lists.boost.org> wrote: Ok, let's revisit the original pattern code I mentioned: EXAMPLE A: shared_future h=async_file("niall.txt"); // Perform these in any order the OS thinks best for(size_t n=0; n<100; n++) async_read(h, buffer[n], 1, n*4096); Niall, is parallel read (parallel writing maybe?) the only use-case where you want a shared_future? If I understand Thomas correctly, he doubts you need the shared_future semantics because one async operation hands down a single handle to the next continuation. Essentially something like: async_file(“niall.txt”) .async_read(buffer, length_to_read) .async_truncate(length_to_read) .async_close() Your counter example was an asynchronous *and* parallel read where you need to share the file handle (or rather the future<handle>) between parallel reads. Shouldn’t this be abstracted away in the API somehow? I can’t think of many file operations you want to do N times in parallel. Truncating a file in parallel several times doesn’t seem to make much sense :-) So why not make it: async_file(“niall.txt”) // Read 100 times asynchrnously and in parallel and provide lambda returning n-th buffer and offset: .async_parallel_read(100, [&](int nCount) { return std::make_pair(buffer[n], n*4096; }) .async_truncate(length_to_read) .async_close() The 100 reads are internally not ordered but they can only begin once the file has been opened, they consume this handle together, and only after all reads are complete can we truncate. Is this not what you’re trying to do? Regards Sebastian

On 27 Aug 2015 at 8:25, Sebastian Theophil wrote:
Ok, let's revisit the original pattern code I mentioned:
EXAMPLE A:
shared_future h=async_file("niall.txt"); // Perform these in any order the OS thinks best for(size_t n=0; n<100; n++) async_read(h, buffer[n], 1, n*4096);
Niall,
is parallel read (parallel writing maybe?) the only use-case where you want a shared_future?
The problem is continuations. You may only add one continuation to a unique future. If you schedule 100 continuations onto the same future, that is a problem.
If I understand Thomas correctly, he doubts you need the shared_future semantics because one async operation hands down a single handle to the next continuation.
Essentially something like:
async_file(“niall.txt”) .async_read(buffer, length_to_read) .async_truncate(length_to_read) .async_close()
Your counter example was an asynchronous *and* parallel read where you need to share the file handle (or rather the future<handle>) between parallel reads. Shouldn’t this be abstracted away in the API somehow? I can’t think of many file operations you want to do N times in parallel. Truncating a file in parallel several times doesn™t seem to make much sense :-)
You want to relax ordering on file i/o as much as possible for optimum performance. It therefore should be assumed to be a very common pattern, and during post-debugging optimisation you the programmer is going to be _relaxing_ ordering as much as possible without breaking your concurrency visibility.
So why not make it:
async_file(“niall.txt”) // Read 100 times asynchrnously and in parallel and provide lambda returning n-th buffer and offset: .async_parallel_read(100, [&](int nCount) { return std::make_pair(buffer[n], n*4096; }) .async_truncate(length_to_read) .async_close()
The 100 reads are internally not ordered but they can only begin once the file has been opened, they consume this handle together, and only after all reads are complete can we truncate.
But isn't this just shared_future?
Is this not what you’re trying to do?
I'm all for unique futures over shared futures where the normal use case is a 1 to 1 mapping of inputs to outputs. However the case of one future to many futures, and many futures to one future comes up a lot in file system programming. Sufficiently so that I chose non-consuming (shared future) semantics as the default for afio::future<>.get_handle(), but left consuming (unique future) semantics for fetching any T in afio::future<T>.get(). Even in another thread with Gavin Lambert I showed how non-consuming futures are needed to ignore errors as AFIO defaults to error propagation. As futures either carry a handle or an error, as I showed in that thread you can use the depends(precondition, output) function to substitute a future output when a future precondition becomes ready. That lets you always close or delete files even if an error occurs. If AFIO defaulted to unique future (consuming) semantics for get_handle(), the user would have to explicitly convert the future to a shared future in their code. I am questioning if that extra hassle for the user is worth it. Thomas has also argued that each async_* function should return only a future specific to its return type, so instead of afio::future<T> always transporting a future handle in addition to any future T, you must supply (I would suppose) two futures to every async_* function, one for the handle, the other as the precondition. I asked if he could propose an API based on that model which is substantially superior to the one used by AFIO, because if he can I'll take it. I can imagine several API models based on unique futures as Thomas advocated. Indeed, I whiteboarded design options about a year ago, as I did genuinely try quite hard to make the current API entirely unique future based. I came to the conclusion that it forced a whole load of extra typing onto the library user for what I felt was very little gain, and it made the AFIO API considerably more complex to both understand and use correctly which had bad consequences on client code complexity and maintainability. I therefore eliminated that API design from consideration. Now, Thomas and his former mentor are world experts in this field. I take their opinions seriously, even if his former mentor has never given my opinions or code on this stuff anything but ridicule and mockery as you have seen in other threads. But I also suspect that their disagreement with my approach comes from one of theory versus boots-on-the-ground, and my view is that C++ is a dirty mongrel language and therefore gets dirty mongel design patterns. In the end, I ask this of C++ library design (in order of priority): (i) Is it fun to program? (ii) Will it bite you in the ass? (iii) Is it very testable and easy to debug and maintainable? (iv) Is it flexibly reusable to solve unanticipated problems? (v) Does it perform well in space and time? I don't care if I trample all over the priniciples of text book proper library design if I get a "yes" to all those five questions. In the end, the C++ compiler to me is just a fancy assembler macro engine, and hence why I upset some people with my "mongrel" design principles. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/

On 28/08/2015 13:47, Niall Douglas wrote:
Thomas has also argued that each async_* function should return only a future specific to its return type, so instead of afio::future<T> always transporting a future handle in addition to any future T, you must supply (I would suppose) two futures to every async_* function, one for the handle, the other as the precondition. I asked if he could propose an API based on that model which is substantially superior to the one used by AFIO, because if he can I'll take it.
I think rather than transporting separate futures he was suggesting something more like using afio::future<afio::handle_tuple<T>>, where "future" is a traditional Concurrency future that transports T or errors and no other state, and "handle_tuple" is a tuple type that transports a handle (and any other internal state that APIs might want, eg. I think you mentioned at one point that it also transports the filename of that handle, for the rmfile case?), plus the desired T for the operation in question. ie. separation of types rather than conflation. Personally, I'm not sure if the extra typing is worth it (and I haven't looked into any other tradeoffs this might bring), but I can see his point (assuming that I am correctly interpreting it). This does of course assume that the handle and the T become ready at the same time and that they don't have separate error states, but given the comments in the docs that preconditions are treated as future<void> anyway and your own comments about default construction on error, I don't think this is a hardship. Please correct me if I've missed something. Perhaps a reasonable compromise might be that a handle-transporting future should be called "handle_future" or "future_handle" or "hfuture" or similar instead of plain "future", to avoid overloading the buzzword. But this might just be bikeshedding again. Although on a related note, having a single future class with a handle-part that's non-consuming and a T-part that's consuming seems potentially confusing. I assume there is some performance argument behind this design choice..?

On 28 Aug 2015 at 14:48, Gavin Lambert wrote:
On 28/08/2015 13:47, Niall Douglas wrote:
Thomas has also argued that each async_* function should return only a future specific to its return type, so instead of afio::future<T> always transporting a future handle in addition to any future T, you must supply (I would suppose) two futures to every async_* function, one for the handle, the other as the precondition. I asked if he could propose an API based on that model which is substantially superior to the one used by AFIO, because if he can I'll take it.
I think rather than transporting separate futures he was suggesting something more like using afio::future<afio::handle_tuple<T>>, where "future" is a traditional Concurrency future that transports T or errors and no other state, and "handle_tuple" is a tuple type that transports a handle (and any other internal state that APIs might want, eg. I think you mentioned at one point that it also transports the filename of that handle, for the rmfile case?), plus the desired T for the operation in question.
BTW where the file is open and the platform supports it, AFIO deletes via file descriptor not by name. A third party process could be looping renames of that file and it would still work perfectly.
ie. separation of types rather than conflation. Personally, I'm not sure if the extra typing is worth it (and I haven't looked into any other tradeoffs this might bring), but I can see his point (assuming that I am correctly interpreting it).
I can see his point too, hence the engagement. Using a tuple handle + T transport is interesting. Definitely better than separate futures.
This does of course assume that the handle and the T become ready at the same time and that they don't have separate error states, but given the comments in the docs that preconditions are treated as future<void> anyway and your own comments about default construction on error, I don't think this is a hardship. Please correct me if I've missed something.
afio::future<> has a single errored or exceptioned state. I suppose you're right, it's not far from being a std::future<std::tuple<afio::handle_ptr, T>>. The big problem with std::future<std::tuple<afio::handle_ptr, T>> is it's ABI poison because of the unknown T passing through an ABI boundary, so you'd have to type erase which means an extra memory allocation before crossing the ABI. With the AFIO API design we never send an unknown T across an ABI boundary.
Although on a related note, having a single future class with a handle-part that's non-consuming and a T-part that's consuming seems potentially confusing. I assume there is some performance argument behind this design choice..?
Not at all. It led to less code being typed by the end user, that's all. Normally whenever that happens I hear alarm bells, because less typing often opens unintended consequences. But I am unaware of any related to the split consuming-nonconsuming future semantics in the AFIO API design. I do think there are some question marks elsewhere in the AFIO API design, as I've mentioned in other threads. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/

On 28/08/2015 15:16, Niall Douglas wrote:
afio::future<> has a single errored or exceptioned state.
Then FYI, the wording of the first Description paragraph in https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/classes/fut... is potentially confusing in that regard, since it seems to be discussing error states for "both".
I suppose you're right, it's not far from being a std::future<std::tuple<afio::handle_ptr, T>>. The big problem with std::future<std::tuple<afio::handle_ptr, T>> is it's ABI poison because of the unknown T passing through an ABI boundary, so you'd have to type erase which means an extra memory allocation before crossing the ABI.
Perhaps I'm missing something again, but where do you need to do that? If you mean when passing preconditions back, that could still just discard T to void as you're doing now -- assuming you can convert a tuple<handle_ptr, T> to tuple<handle_ptr, void> or tuple<handle_ptr>. Again, I'm not sure how complex this is or whether it would be "worth it", but it does seem like a cleaner design to treat the handle and other data as the composite value of the future. Another possibility would be to explicitly separate the handle_ptr. Most functions would then change from accepting a future<void> to accepting a future<handle_ptr>, since that's the value they actually need to do the work. For dependency order tracking you can just use the depends() helper you mentioned earlier, eg: afio::future<handle_ptr> openfile = afio::async_file(...); afio::future<void> resizedfile = afio::async_truncate(openfile, 12); ... afio::future<void> written = afio::async_write(depends(resizedfile, openfile), buffers, 0); ... So the write depends on resizedfile being done, but that *doesn't* carry a file handle -- the only one that does is openfile. (depends, of course, would tolerate different Ts for its parameter futures, and will return the second one's type.) This is perhaps more like how people are used to working with futures, rather than having them carry around extra values. The only possibly tricky part that I foresee is that anything calling then() will need to know the T of the specific future, but hopefully that's trivial in most cases. (And other than depends()' first parameter, you can probably get away with assuming that it's either void or handle_ptr most of the time.)

On 28 Aug 2015 at 17:18, Gavin Lambert wrote:
afio::future<> has a single errored or exceptioned state.
Then FYI, the wording of the first Description paragraph in https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/classes/fut... is potentially confusing in that regard, since it seems to be discussing error states for "both".
It means both get() and get_handle().
I suppose you're right, it's not far from being a std::future<std::tuple<afio::handle_ptr, T>>. The big problem with std::future<std::tuple<afio::handle_ptr, T>> is it's ABI poison because of the unknown T passing through an ABI boundary, so you'd have to type erase which means an extra memory allocation before crossing the ABI.
Perhaps I'm missing something again, but where do you need to do that?
If you mean when passing preconditions back, that could still just discard T to void as you're doing now -- assuming you can convert a tuple<handle_ptr, T> to tuple<handle_ptr, void> or tuple<handle_ptr>.
Again, I'm not sure how complex this is or whether it would be "worth it", but it does seem like a cleaner design to treat the handle and other data as the composite value of the future.
Maybe. Where I was coming from was that if you go with std::future<std::tuple<afio::handle_ptr, T>>, it implies you are able to send some T past the AFIO ABI which you cannot without losing ABI stability. I would worry that by hiding the implicit conversion from std::future<std::tuple<afio::handle_ptr, T>> into std::future<std::tuple<afio::handle_ptr>> somewhere in metaprogramming land instead of explicitly declaring in your public API that you only accept std::future<std::tuple<afio::handle_ptr>> and nothing but std::future<std::tuple<afio::handle_ptr>>, and therefore that conversion happens in the end user's code, you are opening up a can of worms regarding end user expectations. In other words, if all APIs accept only std::future<std::tuple<afio::handle_ptr>> there is no risk for disappointment.
Another possibility would be to explicitly separate the handle_ptr. Most functions would then change from accepting a future<void> to accepting a future<handle_ptr>, since that's the value they actually need to do the work. For dependency order tracking you can just use the depends() helper you mentioned earlier, eg:
afio::future<handle_ptr> openfile = afio::async_file(...); afio::future<void> resizedfile = afio::async_truncate(openfile, 12); ... afio::future<void> written = afio::async_write(depends(resizedfile, openfile), buffers, 0); ...
So the write depends on resizedfile being done, but that *doesn't* carry a file handle -- the only one that does is openfile. (depends, of course, would tolerate different Ts for its parameter futures, and will return the second one's type.)
This is perhaps more like how people are used to working with futures, rather than having them carry around extra values.
I'd live comfortably with what you just proposed. In fact, it was my second preference API design after the one I chose. I felt originally that making the end user write depends() all over the place would be irritating, but maybe it's better to force the end user to always be explicit about what operations relate to which others. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/

On 29/08/2015 10:36, Niall Douglas wrote:
On 28 Aug 2015 at 17:18, Gavin Lambert wrote:
afio::future<> has a single errored or exceptioned state.
Then FYI, the wording of the first Description paragraph in https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/classes/fut... is potentially confusing in that regard, since it seems to be discussing error states for "both".
It means both get() and get_handle().
I realise that, but it also seems to imply that each have separate error state. The clause about error state being "identical for both" is perhaps an attempt to defuse that interpretation, but I'm not sure it works. Though again maybe I'm just being too nitpicky.
Another possibility would be to explicitly separate the handle_ptr. Most functions would then change from accepting a future<void> to accepting a future<handle_ptr>, since that's the value they actually need to do the work. For dependency order tracking you can just use the depends() helper you mentioned earlier, eg:
afio::future<handle_ptr> openfile = afio::async_file(...); afio::future<void> resizedfile = afio::async_truncate(openfile, 12); ... afio::future<void> written = afio::async_write(depends(resizedfile, openfile), buffers, 0); ...
So the write depends on resizedfile being done, but that *doesn't* carry a file handle -- the only one that does is openfile. (depends, of course, would tolerate different Ts for its parameter futures, and will return the second one's type.)
This is perhaps more like how people are used to working with futures, rather than having them carry around extra values.
I'd live comfortably with what you just proposed. In fact, it was my second preference API design after the one I chose. I felt originally that making the end user write depends() all over the place would be irritating, but maybe it's better to force the end user to always be explicit about what operations relate to which others.
I think this might end up being a little cleaner, despite being more verbose at first glance. In theory then all the functions that accept preconditions would just be very thin wrappers that simply "return precondition.then(actual operation)" (or similar; I'm not sure if you need an extra layer to cope with "depends" returning an unready future). If the "actual operation" API were also public then this might also satisfy the folks in the other thread that want to avoid having precondition parameters at all. Bear in mind that once coroutines or async/await are better established in the language, it's likely that there would be no need for precondition parameters as it would be simpler to attach continuations at the top level.

On 31 Aug 2015 at 12:10, Gavin Lambert wrote:
I'd live comfortably with what you just proposed. In fact, it was my second preference API design after the one I chose. I felt originally that making the end user write depends() all over the place would be irritating, but maybe it's better to force the end user to always be explicit about what operations relate to which others.
I think this might end up being a little cleaner, despite being more verbose at first glance. In theory then all the functions that accept preconditions would just be very thin wrappers that simply "return precondition.then(actual operation)" (or similar; I'm not sure if you need an extra layer to cope with "depends" returning an unready future).
If the "actual operation" API were also public then this might also satisfy the folks in the other thread that want to avoid having precondition parameters at all.
Bear in mind that once coroutines or async/await are better established in the language, it's likely that there would be no need for precondition parameters as it would be simpler to attach continuations at the top level.
I think I can make the above work well. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
participants (3)
-
Gavin Lambert
-
Niall Douglas
-
Sebastian Theophil