On 22/03/2017 10:50, Olaf van der Spek wrote:
On Wed, Mar 22, 2017 at 11:43 AM, Niall Douglas via Boost
wrote: Plucking straight from random as it was too long ago I examined your source, but a classic mistake is to assume this is sequentially consistent:
int keyfd, storefd; write(storefd, data) fsync(storefd) write(keyfd, key) fsync(keyfd)
Here the programmer writes the value being stored, persists it, then writes the key to newly stored value, persists that. Most programmers unfamiliar with filing systems will assume that the fsync to the storefd cannot happen after the fsync to the keyfd. They are wrong, that is a permitted reorder. fsyncs are only guaranteed to be sequentially consistent *on the same file descriptor* not different file descriptors.
Just curious, how is that permitted?
Isn't fsync() supposed to ensure data is on durable storage before it returns?
A common misconception. Here is the POSIX wording: "The fsync() function shall request that all data for the open file descriptor named by fildes is to be transferred to the storage device associated with the file described by fildes. The nature of the transfer is implementation-defined. The fsync() function shall not return until the system has completed that action or until an error is detected. [SIO] [Option Start] If _POSIX_SYNCHRONIZED_IO is defined, the fsync() function shall force all currently queued I/O operations associated with the file indicated by file descriptor fildes to the synchronized I/O completion state. All I/O operations shall be completed as defined for synchronized I/O file integrity completion. [Option End]" So, without _POSIX_SYNCHRONIZED_IO, all fsync() guarantees is that it will not return until the *request* for the transfer of outstanding data to storage has completed. In other words, it pokes the OS to start flushing data now rather than later, and returns immediately. OS X implements this sort of fsync() for example. With _POSIX_SYNCHRONIZED_IO, you get stronger guarantees that upon return from the syscall, "synchronized I/O file integrity completion" has occurred. Linux infamously claims _POSIX_SYNCHRONIZED_IO, yet ext2/ext3/ext4 don't implement it fully and will happily reorder fsyncs of the metadata needed to later retrieve a fsynced write of data. So the data itself is written on fsync return sequentially consistent, but not the metadata to later retrieve it, that can be reordered with respect to other fsyncs. AFIO v1 and v2 take care of this sort of stuff for you. If you tell AFIO you want a handle to a file to write reliably, AFIO does what is needed to make it reliable. Be prepared to give up lots of performance however (and hence where async file i/o starts to become very useful because you can queue up lots of writes, and completion handlers will fire when the write really has reached storage in a way always retrievable in the future - excluding bugs in the kernel, filing system, storage device etc). Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/