
On 24 Aug 2015 at 13:33, Thomas Heller wrote:
This is my review of AFIO.
DISCLAIMER: I am not on a personal vendetta or hold any grudge against the author. I spent valuable family time in order to review the presented work and opinions expressed so far.
Firstly, I wish to publicly say that I consider any past interactions between us to be water under the bridge. You have been very useful off-list, and you were one of the first to take a good look at lightweight future-promise and I found your feedback valuable. I want to say I greatly appreciate your support and help and feedback. Secondly, your review is exactly why I came here for a review, and I can tell how much effort you invested because you spotted some of the more interesting design tradeoffs. Thank you Thomas, genunely. Your review is very useful to me.
- The author clearly states that performance is not one of his goals (http://thread.gmane.org/gmane.comp.lib.boost.devel/261914/focus=262478). Yet, the documentation, especially the introduction, puts a great emphasize on that, while the presented benchmarks do not show any of those promises or compare apples with oranges.
I place correctness and lock free file system programming before absolute maximum performance yes. The comparing of apples to oranges is a fair point.
- Portability, the author claims to present a portable solution to the ACID problem however goes into great length in explaining the differences between various OSes and FSes from which I concluded that users of the library still have to implement special code paths for some systems. This comes especially obvious when looking at example code provided in the documentation.
There is a lot of variability in conformance and behaviour in filing systems and operating systems - even versions of the Linux kernel. I could simply remove all mention of platform specific workarounds in the code examples and the docs. But that would not be useful to end users - they *want* to know the platform specific quirks and workarounds.
- "Readiness": The docs list tons and tons of features which aren't really there yet. One can get the impression that it is only half done. Especially concerning is that running the unit tests might actually break your hard drive (https://boostgsoc13.github.io/boost.afio/doc/html/afio/quickstart/workshop/e...) which is not very assuring in the context of the promised feature set.
It is unfortunate that you interpreted my well defined future roadmap for AFIO as meaning "it isn't finished". I could simply remove all mention of the roadmap completely, and then AFIO would appear to be "finished". But it would be a lie. I personally highly value a well planned documented future evolution of a library. It helps me to decide if locking myself into a library is a good idea. That's why the roadmap is so publicly well defined - the end user knows exactly where the library is at, and where it is expected to go next.
It looks like it would have been better to let the dust settle after the final additions to the documentation and maybe even properly implement the praised backend before aiming for review. I think it would also be best to only include the latest version of the library for inclusion to boost, is the community really supposed to also review the apparently inferior V1 of the library?
Those are ABI versions. If you build and link against v1, I give you the promise your end user code will build and link against v1 for the lifetime of v1. For any break in ABI, I'll iterate to v2 (as has already happened), or v3 and so on.
2. What is your evaluation of the design?
The design of the documented, user facing API isn't what I expected from a modern C++ API. Following are some points, in no particular ordering, that struck me most.
- afio::future<T = void>:
As far as I can see, there is not a single function documented that returns anything than some equivalent to "shared_future<handle_ptr>".
The synchronous single shot functions return an afio::handle_ptr or some type appropriate for that function (e.g. enumerate() returns a vector of directory_entry). The asynchronous single shot functions return an afio::future<T> where T is usually void. afio::future<T>.get_handle() returns an afio::handle_ptr, while afio::future<T>.get() returns whatever T is (which might be void). The batch functions return vectors of afio::future<T>.
The usage of afio::future<void> (explicitly added the default parameter) is, IMHO, very confusing as soon as you pass them as dependencies to another operation. The void template parameter always looks like "just a signal" instead most of the time, the actual *handle* associated with that future is the dependency, and from time to time even needed as an argument to the async operation.
I am unaware of any occasion where an async operation consumes a handle_ptr. If there are any, I would expect that to be a bug needing to be fixed. You may not be aware that afio::future<void> implicitly converts from an afio::handle_ptr. It's equivalent to make_ready_future(handle). This is effectively free of cost under lightweight futures, hence it being used.
In addition, "shared_future<handle_ptr>" has two indirections, one to the shared state of the future and one to the actual handle.
You have spotted one of those unusual design choices I mentioned earlier. It turns out that shared_ptr<shared_ptr<handle>> (which is effectively what this is) is much faster than shared_ptr<handle>. This is because the outer shared_ptr is local to the operation in question and therefore its use count is not contended across threads, whereas the inner shared_ptr manages lifetime for the handle and is global across operations and its use count gets very much contended across threads. I agree the choice is not obvious, and indeed this was not the design in very early AFIO. I came to it through being puzzled about poor benchmark performance. Is there a better alternative to this? To the next question ...
Moreover, I disagree that afio::future<T> should have shared future semantics by default. I am a proponent of the std::future/std::shared_future design and believe this should be followed by all libraries dealing with some sort of futures. I think most of the design decisions are based on shared_future only semantics, which is probably the reason why I have problems with them.
Did you notice that afio::future<T> has non-shared semantics for T? That's why it's called afio::future<>, not afio::shared_future<>. You are correct that the implicit handle_ptr available from afio::future<>.get_handle() always has shared future semantics. That's because handle_ptr is a shared_ptr, so you are already referring to shared state, and therefore shared future semantics would seem to fit best. Should afio::future<>.get_handle() always have shared future semantics? This is a very profound question, and the entire of the AFIO design hangs around that design choice. It was not an easy decision back in 2012, and it is still not an easy decision in 2015 even after hundreds of hours working on AFIO. I share completely your unease with the design choice, and I have the same problems you do. I can only tell you that on balance, I think that shared future semantics for the handle_ptr only make for a much easier implementation, and I chose ease of maintenance over strictness. I don't think that this design choice impacts end users. No end user code is going to get broken or misdesigned (in my opinion) by handle_ptr always having shared future semantics. It is, in my opinion, a quality of internal implementation decision. I think the true controversial design choice is the concept of a future<> type transporting more than one thing each with differing semantics. Especially when future<T> is allowed and expected to type slice into future<void> when entering the AFIO DLL boundary. These controversial design choices are exactly why I wanted a Boost peer review now. If someone can think of a better viable solution which allows a stable DLL ABI, I am more than happy for AFIO to be rejected now and I reimplement a fundamentally better design in a new reborn AFIO.
- Extensive use of shared_ptr:
Some functions return some shared_ptr (specifically handle and dispatcher), therefore probably always relying on some sort of memory allocation. This seems to be due to the fact that only opaque abstract base classes are returned from those API functions. Why not return unique_ptr or even by value of the distinct type?
The answer is lifetime and stability of DLL ABI. AFIO v1.0 didn't make as much use of shared_ptr, and it had race issues and a bad balance of locking as a result. I rewrote the core engine in v1.2 to use shared_ptr to manage lifetime and therefore remove all but two locks per operation because the shared_ptr can be relied upon to keep lifetime without locking. Performance was immensely improved. Can a lighter weight AFIO be implemented? Yes. Should it be? No [1]. The overhead of a shared_ptr is immaterial to a 1m CPU cycle i/o operation. And shared_ptr makes the code much easier to maintain, and much harder to accidentally introduce races during a modification. I haven't introduced a race (according to the thread sanitiser) into AFIO in well over a year thanks to pervasive use of shared_ptr. [1]: This isn't an opinion - I tested this empirically. I spent two days hacking out the shared_ptr to see what the effect would be on performance of the benchmark code. It improved by 0.5%. Not worth it.
3. What is your evaluation of the implementation?
I refrained from actually looking at the implementation. This is actually a pretty severe point. The meat of the implementation is split into 4 headers only, with two headers (afio.hpp and detail/impl/afio.ipp) having the majority code with over 6000 lines of codes each! Despite the actual content, I believe this is a maintenance nightmare. I don't want to review such large blobs of text.
It is true I only orthonongally split files when I see a point because I like as few files as possible, as you can tell. I also like really long lines. Most Boost libraries keep a header per type. I dislike that - I prefer a header per orthogonal reusable collection of stuff. On that basis there is some splitting of afio.hpp which is probably a good idea. I've recorded that as https://github.com/BoostGSoC13/boost.afio/issues/90.
4. What is your evaluation of the documentation?
- I have no idea what the table in https://boostgsoc13.github.io/boost.afio/doc/html/afio/overview.html is trying to tell me
Can you suggest what would be better?
- The design introduction contains lots of text that seems like unrelated anecdotes - The reference section doesn't cross link between the different related API functions/classes that are needed for one specific class/function.
Can you explain what you mean exactly? Do you mean links on each reference page? I had thought the categorisation of functions by functionality implemented made it easy to find what you needed quickly? How would you improve the reference organisation?
On the incomplete part: - The workshop tutorial is not finished. - Some references are not completely documented, for example:
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/classes/dir...
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/classes/dir...
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/classes/dis...
I'm confused. What is not completely documented? Do you mean that trivial APIs do not have their own full reference page?
On the examples do not match reference documentation part: - https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/m...
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/m...
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/m... The example mentioned in the above entries do not mention the make_io_req function at all.
You are absolutely right. I apologise for the mistake (it was a typo). I have logged these documentation errors at https://github.com/BoostGSoC13/boost.afio/issues/91.
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/d... Why does this example not focus on the presented function?
I felt that a single example showing the batch use and the free function use shown comparatively alongside other functions you would likely be using was more useful. As I mention below, it looks like it is instead confusing and I have logged an issue for fixing it.
On the examples containing errors part: - https://boostgsoc13.github.io/boost.afio/doc/html/afio/quickstart/workshop/n... " // Boost.Monad futures are also monads, so this implies a make_ready_future() return std::current_exception(); " The return type is shared_future<data_store::ostream>. What does this have to do with Boost.Monad and any of its behavior? Which shared_future is meant here that has a non-explicit ctor that takes a std::exception_ptr?
You may not have realised that all shared_future<T> mentioned by the documentation is boost::monad::lightweight_futures::shared_future<T>. Not std::shared_future<T>. boost::monad::lightweight_futures::shared_future<T> will implicitly convert from a std::exception_ptr into an already ready future. That's what the comment was explaining, as that is not normal shared_future<T> behaviour.
- For example (there are many more):
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/f... documents dispatcher::file taking a vector of path_req, in the example, the vector is constructed, but only the front is passed to the function. This seems very odd.
The batch API is exampled further below under the section commented "Batch". I think you were looking at the top part. I'm getting from this that a unified code example for both batch and single shot functions on both batch and single shot pages is confusing. Logged at https://github.com/BoostGSoC13/boost.afio/issues/92.
Other documentation flaws: - https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/p... https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/macros_flag... looks odd, there are no parameters or anything there, so why does this page contain a section for it?
Sadly out of my control. It's a bug in the doxygen_xml2qbk tool.
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/macros_flag... the synopsis is very hard to read.
Can you explain how it is hard to read? It tells you what each flag does. How could it be improved?
(same goes for: https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/macros_flag... and https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/macros_flag...) - Why document undocumented features?
Unfortunately doxygen_xml2qbk seems to insist on outputting documentation for undocumented members, or maybe I don't understand the tool or maybe I've misconfigured it. So I can have a blank documentation entry like for afio::file_flags::os_lockable which seems inferior to saying "Don't use this".
https://boostgsoc13.github.io/boost.afio/doc/html/afio/quickstart/mini_progr... - Can you please strip example code from commented sections of code that obviously don't add anything to the example (or "#if 0"ed sections or "legacy" behavior)?
Logged to https://github.com/BoostGSoC13/boost.afio/issues/93.
- It would have been nice to see all examples (at least in a non advanced section) to not rely on platform dependent macro definitions.
If you are referring to https://boostgsoc13.github.io/boost.afio/doc/html/afio/quickstart/mini _programs/filesystem_races.html, one is trapped by substantial differences in Windows filesystem semantics. I still think it more important to real life end users of AFIO to demonstrate and show those platform specific differences than pretend they don't exist.
In addition, I personally dislike the "Try AFIO now in online web compiler" button. Why does it have to appear on *every* single page (there is also a bug that it doesn't appear in some of the reference doc sections)? Besides, it doesn't really match the style of the rest of the documentation. Which brings me to the "Comments" part, doesn't match the style either, which makes it look totally out of place especially since it comes from some third party provider, and I am not sure I can trust that. It looks especially odd when javascript is disabled. Isn't discussing a library and its documentation on a mailing list not enough?
Regarding the online web compiler button, I can remove it from every page, and place it just on one or two pages? Regarding the comments, I had thought there was consensus here that per-page commenting was a good idea, so that's why they are there. Thanks hugely for the review Thomas. I found it very valuable. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/