Re: [boost] [afio] Formal review of Boost.AFIO

24 Aug 2015

      On 24 Aug 2015 at 13:33, Thomas Heller wrote:
...
This is my review of AFIO.
DISCLAIMER: I am not on a personal vendetta or hold any grudge against
the author. I spent valuable family time in order to review the
presented work and opinions expressed so far.
Firstly, I wish to publicly say that I consider any past interactions 
between us to be water under the bridge. You have been very useful 
off-list, and you were one of the first to take a good look at 
lightweight future-promise and I found your feedback valuable. I want 
to say I greatly appreciate your support and help and feedback.

Secondly, your review is exactly why I came here for a review, and I 
can tell how much effort you invested because you spotted some of the 
more interesting design tradeoffs. Thank you Thomas, genunely. Your 
review is very useful to me.
...
- The author clearly states that performance is not one of his goals
(http://thread.gmane.org/gmane.comp.lib.boost.devel/261914/focus=262478). Yet,
the documentation, especially the introduction, puts a great emphasize
on that, while the presented benchmarks do not show any of those
promises or compare apples with oranges.
I place correctness and lock free file system programming before 
absolute maximum performance yes.

The comparing of apples to oranges is a fair point.
...
- Portability, the author claims to present a portable solution to the
ACID problem however goes into great length in explaining the
differences between various OSes and FSes from which I concluded that
users of the library still have to implement special code paths for some
systems. This comes especially obvious when looking at example code
provided in the documentation.
There is a lot of variability in conformance and behaviour in filing 
systems and operating systems - even versions of the Linux kernel.

I could simply remove all mention of platform specific workarounds in 
the code examples and the docs. But that would not be useful to end 
users - they *want* to know the platform specific quirks and 
workarounds.
...
- "Readiness": The docs list tons and tons of features which aren't
really there yet. One can get the impression that it is only half done.
Especially concerning is that running the unit tests might actually
break your hard drive
(https://boostgsoc13.github.io/boost.afio/doc/html/afio/quickstart/workshop/e...)
which is not very assuring in the context of the promised feature set.
It is unfortunate that you interpreted my well defined future roadmap 
for AFIO as meaning "it isn't finished". I could simply remove all 
mention of the roadmap completely, and then AFIO would appear to be 
"finished".

But it would be a lie. I personally highly value a well planned 
documented future evolution of a library. It helps me to decide if 
locking myself into a library is a good idea. That's why the roadmap 
is so publicly well defined - the end user knows exactly where the 
library is at, and where it is expected to go next.
...
It looks like it would have been better to let the dust settle after the
final additions to the documentation and maybe even properly implement
the praised backend before aiming for review. I think it would also be
best to only include the latest version of the library for inclusion to
boost, is the community really supposed to also review the apparently
inferior V1 of the library?
Those are ABI versions. If you build and link against v1, I give you 
the promise your end user code will build and link against v1 for the 
lifetime of v1. For any break in ABI, I'll iterate to v2 (as has 
already happened), or v3 and so on.
...
...
2. What is your evaluation of the design?
The design of the documented, user facing API isn't what I expected from
a modern C++ API. Following are some points, in no particular ordering,
that struck me most.
- afio::future<T = void>:
As far as I can see, there is not a single function documented that
returns anything than some equivalent to "shared_future<handle_ptr>".
The synchronous single shot functions return an afio::handle_ptr or 
some type appropriate for that function (e.g. enumerate() returns a 
vector of directory_entry).

The asynchronous single shot functions return an afio::future<T> 
where T is usually void. afio::future<T>.get_handle() returns an 
afio::handle_ptr, while afio::future<T>.get() returns whatever T is 
(which might be void).

The batch functions return vectors of afio::future<T>.
...
The usage of afio::future<void> (explicitly added the default parameter)
is, IMHO, very confusing as soon as you pass them as dependencies to
another operation. The void template parameter always looks like "just a
signal" instead most of the time, the actual *handle* associated with
that future is the dependency, and from time to time even needed as an
argument to the async operation.
I am unaware of any occasion where an async operation consumes a 
handle_ptr. If there are any, I would expect that to be a bug needing 
to be fixed.

You may not be aware that afio::future<void> implicitly converts from 
an afio::handle_ptr. It's equivalent to make_ready_future(handle). 
This is effectively free of cost under lightweight futures, hence it 
being used.
...
In addition,
"shared_future<handle_ptr>" has two indirections, one to the shared
state of the future and one to the actual handle.
You have spotted one of those unusual design choices I mentioned 
earlier.

It turns out that shared_ptr<shared_ptr<handle>> (which is 
effectively what this is) is much faster than shared_ptr<handle>. 
This is because the outer shared_ptr is local to the operation in 
question and therefore its use count is not contended across threads, 
whereas the inner shared_ptr manages lifetime for the handle and is 
global across operations and its use count gets very much contended 
across threads.

I agree the choice is not obvious, and indeed this was not the design 
in very early AFIO. I came to it through being puzzled about poor 
benchmark performance.

Is there a better alternative to this? To the next question ...
...
Moreover, I disagree that afio::future<T> should have shared future
semantics by default. I am a proponent of the
std::future/std::shared_future design and believe this should be
followed by all libraries dealing with some sort of futures. I think
most of the design decisions are based on shared_future only semantics,
which is probably the reason why I have problems with them.
Did you notice that afio::future<T> has non-shared semantics for T? 
That's why it's called afio::future<>, not afio::shared_future<>.

You are correct that the implicit handle_ptr available from 
afio::future<>.get_handle() always has shared future semantics. 
That's because handle_ptr is a shared_ptr, so you are already 
referring to shared state, and therefore shared future semantics 
would seem to fit best.

Should afio::future<>.get_handle() always have shared future 
semantics? This is a very profound question, and the entire of the 
AFIO design hangs around that design choice. It was not an easy 
decision back in 2012, and it is still not an easy decision in 2015 
even after hundreds of hours working on AFIO.

I share completely your unease with the design choice, and I have the 
same problems you do. I can only tell you that on balance, I think 
that shared future semantics for the handle_ptr only make for a much 
easier implementation, and I chose ease of maintenance over 
strictness.

I don't think that this design choice impacts end users. No end user 
code is going to get broken or misdesigned (in my opinion) by 
handle_ptr always having shared future semantics. It is, in my 
opinion, a quality of internal implementation decision.

I think the true controversial design choice is the concept of a 
future<> type transporting more than one thing each with differing 
semantics. Especially when future<T> is allowed and expected to type 
slice into future<void> when entering the AFIO DLL boundary.

These controversial design choices are exactly why I wanted a Boost 
peer review now. If someone can think of a better viable solution 
which allows a stable DLL ABI, I am more than happy for AFIO to be 
rejected now and I reimplement a fundamentally better design in a new 
reborn AFIO.
...
- Extensive use of shared_ptr:
Some functions return some shared_ptr (specifically handle and
dispatcher), therefore probably always relying on some sort of memory
allocation. This seems to be due to the fact that only opaque abstract
base classes are returned from those API functions. Why not return
unique_ptr or even by value of the distinct type?
The answer is lifetime and stability of DLL ABI. AFIO v1.0 didn't 
make as much use of shared_ptr, and it had race issues and a bad 
balance of locking as a result. I rewrote the core engine in v1.2 to 
use shared_ptr to manage lifetime and therefore remove all but two 
locks per operation because the shared_ptr can be relied upon to keep 
lifetime without locking. Performance was immensely improved.

Can a lighter weight AFIO be implemented? Yes. Should it be? No [1]. 
The overhead of a shared_ptr is immaterial to a 1m CPU cycle i/o 
operation. And shared_ptr makes the code much easier to maintain, and 
much harder to accidentally introduce races during a modification.

I haven't introduced a race (according to the thread sanitiser) into 
AFIO in well over a year thanks to pervasive use of shared_ptr.

[1]: This isn't an opinion - I tested this empirically. I spent two 
days hacking out the shared_ptr to see what the effect would be on 
performance of the benchmark code. It improved by 0.5%. Not worth it.
...
...
3. What is your evaluation of the implementation?
I refrained from actually looking at the implementation. This is
actually a pretty severe point. The meat of the implementation is split
into 4 headers only, with two headers (afio.hpp and
detail/impl/afio.ipp) having the majority code with over 6000 lines of
codes each! Despite the actual content, I believe this is a maintenance
nightmare. I don't want to review such large blobs of text.
It is true I only orthonongally split files when I see a point 
because I like as few files as possible, as you can tell. I also like 
really long lines.

Most Boost libraries keep a header per type. I dislike that - I 
prefer a header per orthogonal reusable collection of stuff. On that 
basis there is some splitting of afio.hpp which is probably a good 
idea. I've recorded that as 
https://github.com/BoostGSoC13/boost.afio/issues/90.
...
...
4. What is your evaluation of the documentation?
- I have no idea what the table in
https://boostgsoc13.github.io/boost.afio/doc/html/afio/overview.html is
trying to tell me
Can you suggest what would be better?
...
- The design introduction contains lots of text that seems like
unrelated anecdotes
 - The reference section doesn't cross link between the different
related API functions/classes that are needed for one specific
class/function.
Can you explain what you mean exactly? Do you mean links on each 
reference page?

I had thought the categorisation of functions by functionality 
implemented made it easy to find what you needed quickly? How would 
you improve the reference organisation?
...
On the incomplete part:
 - The workshop tutorial is not finished.
 - Some references are not completely documented, for example:
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/classes/dir...
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/classes/dir...
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/classes/dis...
I'm confused. What is not completely documented?

Do you mean that trivial APIs do not have their own full reference 
page?
...
On the examples do not match reference documentation part:
 -
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/m...
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/m...
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/m...
   The example mentioned in the above entries do not mention the
make_io_req function at all.
You are absolutely right. I apologise for the mistake (it was a 
typo). I have logged these documentation errors at 
https://github.com/BoostGSoC13/boost.afio/issues/91.
...
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/d...
   Why does this example not focus on the presented function?
I felt that a single example showing the batch use and the free 
function use shown comparatively alongside other functions you would 
likely be using was more useful. As I mention below, it looks like it 
is instead confusing and I have logged an issue for fixing it.
...
On the examples containing errors part:
 -
https://boostgsoc13.github.io/boost.afio/doc/html/afio/quickstart/workshop/n...
   "
   // Boost.Monad futures are also monads, so this implies a
make_ready_future()
   return std::current_exception();
   "
   The return type is shared_future<data_store::ostream>. What does this
have to do with Boost.Monad and any of its behavior? Which shared_future
is meant here that has a non-explicit ctor
   that takes a std::exception_ptr?
You may not have realised that all shared_future<T> mentioned by the 
documentation is boost::monad::lightweight_futures::shared_future<T>. 
Not std::shared_future<T>.

boost::monad::lightweight_futures::shared_future<T> will implicitly 
convert from a std::exception_ptr into an already ready future. 
That's what the comment was explaining, as that is not normal 
shared_future<T> behaviour.
...
- For example (there are many more):
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/f...
   documents dispatcher::file taking a vector of path_req, in the
example, the vector is constructed, but only the front is passed to the
function. This seems very odd.
The batch API is exampled further below under the section commented 
"Batch". I think you were looking at the top part.

I'm getting from this that a unified code example for both batch and 
single shot functions on both batch and single shot pages is 
confusing. Logged at 
https://github.com/BoostGSoC13/boost.afio/issues/92.
...
Other documentation flaws:
 -
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/p...
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/macros_flag...
looks odd, there are no parameters or anything there, so why does this
page contain a section for it?
Sadly out of my control. It's a bug in the doxygen_xml2qbk tool.
...
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/macros_flag...
the synopsis is very hard to read.
Can you explain how it is hard to read? It tells you what each flag 
does. How could it be improved?
...
(same goes for:
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/macros_flag...
and
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/macros_flag...)
 - Why document undocumented features?
Unfortunately doxygen_xml2qbk seems to insist on outputting 
documentation for undocumented members, or maybe I don't understand 
the tool or maybe I've misconfigured it. So I can have a blank 
documentation entry like for afio::file_flags::os_lockable which 
seems inferior to saying "Don't use this".
...
https://boostgsoc13.github.io/boost.afio/doc/html/afio/quickstart/mini_progr...
 - Can you please strip example code from commented sections of code
that obviously don't add anything to the example (or "#if 0"ed sections
or "legacy" behavior)?
Logged to https://github.com/BoostGSoC13/boost.afio/issues/93.
...
- It would have been nice to see all examples (at least in a non
advanced section) to not rely on platform dependent macro definitions.
If you are referring to 
https://boostgsoc13.github.io/boost.afio/doc/html/afio/quickstart/mini
_programs/filesystem_races.html, one is trapped by substantial 
differences in Windows filesystem semantics. I still think it more 
important to real life end users of AFIO to demonstrate and show 
those platform specific differences than pretend they don't exist.
...
In addition, I personally dislike the "Try AFIO now in online web
compiler" button. Why does it have to appear on *every* single page
(there is also a bug that it doesn't appear in some of the reference doc
sections)? Besides, it doesn't really match the style of the rest of the
documentation. Which brings me to the "Comments" part, doesn't match the
style either, which makes it look totally out of place especially since
it comes from some third party provider, and I am not sure I can trust
that. It looks especially odd when javascript is disabled. Isn't
discussing a library and its documentation on a mailing list not enough?
Regarding the online web compiler button, I can remove it from every 
page, and place it just on one or two pages?

Regarding the comments, I had thought there was consensus here that 
per-page commenting was a good idea, so that's why they are there.

Thanks hugely for the review Thomas. I found it very valuable.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/

Re: [boost] [afio] Formal review of Boost.AFIO

Niall Douglas