Re: [boost] [AFIO] Review (or lack of it)

30 Aug 2015

      On 30 Aug 2015 at 10:19, Hartmut Kaiser wrote:
...
...
I did a bit of mocking up of what the refactored data_store interface
for that AFIO tutorial will look like:
https://gist.github.com/ned14/163ac57c937bda61e5c9
Ok, so it's no longer the world's simplest key-value store API and I
deliberately dropped the STL iostreams interop, but the big
difference is that this one is real-world ready and probably quite
close to a final API design for such a thing.
*Looks* much better, you seem to listen after all. I wouldn't call that to
be real-world-ready without a thorough specification of the semantics,
though.
This is the first post you have written which contained anything 
useful. You also wrote civilly in a productive tone. Thank you.
...
- What happens if commit() was called on your transaction, the returned
future is not ready yet, but the ~transaction() is called?
Nothing happens. commit() transfers the state held by the transaction 
object into an asynchronous operation bound by the returned future. 
The transaction object's destructor becomes do-nothing.

You are probably now about to ask if the returned future's destructor 
is blocking. It is not. The transferred state is kept around by the 
AFIO engine until success or failure and disposed of internally.
...
- Why is your transaction object not RAII?
You may have a different definition to me. The idea is when you begin 
a transaction, you pin a copy of the internally held key-value index 
to that transaction. You then say what you are going to modify to the 
transaction object via add(). If an exception throw occurs before the 
commit(), the transaction destructor will throw away any 
modifications you have added and it unpins the internally held 
key-value index.
...
- Why do you need begin_transaction?
It pins a copy of the internally held key-value index to the 
transaction object. Later on, during commit, it should be able to 
match off key lookups with key updates to decide if two concurrent 
commits conflict or not.

This interface is purely for the AFIO tutorial only. Its concurrent 
transaction handling policy is to always abort if any concurrent 
update happened since the index used for the transaction was pinned. 
This is deliberately simple for the tutorial.
...
- Why do you need two load_blob() functions?
One loads a blob into a set of user supplied scatter buffers.

The other memory maps the blob and returns it as a set of potentially 
discontinuous buffers. This saves you having to allocate your own 
buffers.

The reason there are two forms is that there is limited free address 
space on 32 bit architectures.
...
- Why is your buffertype so over-specified, shouldn't any range do?
std::vector is ABI stable and friendly to C and Microsoft COM 
wrappers.
...
- Why can't the user specify/provide the type of the hash? std::hash<>
anybody?
For a content addressible data store, std::hash<> collides too 
frequently.

The choice between Spooky (non-crypto) and Blake2b (crypto) is all 
anyone needs. There is a good argument for renaming them to "fast" 
and "quality" respectively which I would do if this were not a 
tutorial example.
...
- Why does one of the load_blob returns a
future<std::shared_ptr<buffers_type>>? Handle to handle to handle to data
this time? Shared ownership again, really?
shared_ptr can opaquely call an internally supplied destruction 
callback to de-memory map the blob. You could return some custom 
smart pointer type which does the same. I felt it was much of a 
muchness. I know you'll disagree with that decision, but it's a 
tutorial, and an additional custom smart pointer type is extra screen 
verbiage.
...
- Can this blob-store be distributed over many machines?
It can be concurrently used from many machines, including via CIFS 
network shares. It would be trivial to extend to be distributed 
across multiple machines, but that is an implementation detail.
...
Can it be in memory only (with transparent disk backup if needed)?
The kernel file page cache and extents based design makes it 
effectively memory only with very lazy but write ordered flushing to 
disk. If used with scatter gather buffers allocated via 
afio::page_allocator, it also is zero memory copy on Windows and BSD.
...
- Why do you specify undefined behavior into your API (see your union). Why
do you need this in the first place?
The compiler will correctly align a __m128i on all the three major 
compilers which allows for the hash implementations to be more 
efficient. I left out the #ifdef SSE2 for brevity, but the final 
edition would have it.
...
- What's the concurrency/thread-safety guarantees of each of the functions?
Undecided. I currently expect thread safety for everything will 
naturally emerge without even needing a mutex, but my local 
implementation here isn't finished yet.
...
etc. many more open questions remain. Please, don't get me wrong, I don't
need/want answers to those questions right away, I'm just listing those for
you to think about.
I however suggest that you redesign AFIO first (including the proposed AFIO
API), before you even start thinking about submitting this to Boost review.
This is purely for the front AFIO tutorial only and what I would call 
an unguaranteed solution. It has an over engineered interface for a 
tutorial, but I also want to demonstrate how AFIO's features deliver 
the solution in just 1000 lines of code.

A production ready solution with well tested power loss behaviour I 
doubt I could bring for Boost review until early 2017 unless someone 
sponsors me to work on this full time, in which case Spring 2016 
would a maximum. I have had enough of working 60-80 hour weeks which 
has been the case most weeks since February and has contributed in 
large part to my recent grumpiness.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/

Re: [boost] [AFIO] Review (or lack of it)

Niall Douglas