
On 30 Aug 2015 at 10:19, Hartmut Kaiser wrote:
I did a bit of mocking up of what the refactored data_store interface for that AFIO tutorial will look like:
https://gist.github.com/ned14/163ac57c937bda61e5c9
Ok, so it's no longer the world's simplest key-value store API and I deliberately dropped the STL iostreams interop, but the big difference is that this one is real-world ready and probably quite close to a final API design for such a thing.
*Looks* much better, you seem to listen after all. I wouldn't call that to be real-world-ready without a thorough specification of the semantics, though.
This is the first post you have written which contained anything useful. You also wrote civilly in a productive tone. Thank you.
- What happens if commit() was called on your transaction, the returned future is not ready yet, but the ~transaction() is called?
Nothing happens. commit() transfers the state held by the transaction object into an asynchronous operation bound by the returned future. The transaction object's destructor becomes do-nothing. You are probably now about to ask if the returned future's destructor is blocking. It is not. The transferred state is kept around by the AFIO engine until success or failure and disposed of internally.
- Why is your transaction object not RAII?
You may have a different definition to me. The idea is when you begin a transaction, you pin a copy of the internally held key-value index to that transaction. You then say what you are going to modify to the transaction object via add(). If an exception throw occurs before the commit(), the transaction destructor will throw away any modifications you have added and it unpins the internally held key-value index.
- Why do you need begin_transaction?
It pins a copy of the internally held key-value index to the transaction object. Later on, during commit, it should be able to match off key lookups with key updates to decide if two concurrent commits conflict or not. This interface is purely for the AFIO tutorial only. Its concurrent transaction handling policy is to always abort if any concurrent update happened since the index used for the transaction was pinned. This is deliberately simple for the tutorial.
- Why do you need two load_blob() functions?
One loads a blob into a set of user supplied scatter buffers. The other memory maps the blob and returns it as a set of potentially discontinuous buffers. This saves you having to allocate your own buffers. The reason there are two forms is that there is limited free address space on 32 bit architectures.
- Why is your buffertype so over-specified, shouldn't any range do?
std::vector is ABI stable and friendly to C and Microsoft COM wrappers.
- Why can't the user specify/provide the type of the hash? std::hash<> anybody?
For a content addressible data store, std::hash<> collides too frequently. The choice between Spooky (non-crypto) and Blake2b (crypto) is all anyone needs. There is a good argument for renaming them to "fast" and "quality" respectively which I would do if this were not a tutorial example.
- Why does one of the load_blob returns a future<std::shared_ptr<buffers_type>>? Handle to handle to handle to data this time? Shared ownership again, really?
shared_ptr can opaquely call an internally supplied destruction callback to de-memory map the blob. You could return some custom smart pointer type which does the same. I felt it was much of a muchness. I know you'll disagree with that decision, but it's a tutorial, and an additional custom smart pointer type is extra screen verbiage.
- Can this blob-store be distributed over many machines?
It can be concurrently used from many machines, including via CIFS network shares. It would be trivial to extend to be distributed across multiple machines, but that is an implementation detail.
Can it be in memory only (with transparent disk backup if needed)?
The kernel file page cache and extents based design makes it effectively memory only with very lazy but write ordered flushing to disk. If used with scatter gather buffers allocated via afio::page_allocator, it also is zero memory copy on Windows and BSD.
- Why do you specify undefined behavior into your API (see your union). Why do you need this in the first place?
The compiler will correctly align a __m128i on all the three major compilers which allows for the hash implementations to be more efficient. I left out the #ifdef SSE2 for brevity, but the final edition would have it.
- What's the concurrency/thread-safety guarantees of each of the functions?
Undecided. I currently expect thread safety for everything will naturally emerge without even needing a mutex, but my local implementation here isn't finished yet.
etc. many more open questions remain. Please, don't get me wrong, I don't need/want answers to those questions right away, I'm just listing those for you to think about.
I however suggest that you redesign AFIO first (including the proposed AFIO API), before you even start thinking about submitting this to Boost review.
This is purely for the front AFIO tutorial only and what I would call an unguaranteed solution. It has an over engineered interface for a tutorial, but I also want to demonstrate how AFIO's features deliver the solution in just 1000 lines of code. A production ready solution with well tested power loss behaviour I doubt I could bring for Boost review until early 2017 unless someone sponsors me to work on this full time, in which case Spring 2016 would a maximum. I have had enough of working 60-80 hour weeks which has been the case most weeks since February and has contributed in large part to my recent grumpiness. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/