On 29/03/2017 10:13, Asbjørn via Boost wrote:
On 29.03.2017 08:18, Niall Douglas via Boost wrote:
Whatever is lost is lost, the *key* feature is that damaged data doesn't cause further data loss.
I'm struggling to see how you can guarantee that without _any_ guarantees from the OS or hardware.
The lack of guarantees only refers to post-power-loss data integrity. And, as you've mentioned, it's only a portability concern. Specific combinations of OS kernel version, SCSI controller, SSD etc have excellent guarantees. The trouble is you can't know whether your particular combination works reliably or not, or whether it is still working reliably or not. For the implementation of Durability, one can assume that everything works perfectly in between power loss events. That in itself is a bit risky due to storage bit rot, cosmic ray bitflips and so on, but that's a separate matter to Durability. (incidentally, AFIO v2 provides a fast templated SECDED class letting you repair bitflips from parity info, handy for mature cold storage)
If so, why throw it all away? Maybe the user has an OS, a filesystem and some hardware which can guarantee this?
Because a proper implementation of durability should be able to use no fsync and no O_SYNC at all. In that case, you get "late durability" where minutes of recent writes get lost after power loss. For users where that is unacceptable, O_SYNC should be turned on and you now have "early durability" where only seconds may be lost. You pay for that early durability with much reduced performance.
Without O_SYNC and fsync, replace "minutes" with "hours" or "days". This may be entirely unacceptable. With O_SYNC you get horrible performance as you note, which may be entirely unacceptable.
A filing system which takes hours to send dirty blocks to storage is buggy or misconfigured. Most will write out dirty blocks within 30 seconds of modification whatever the situation. You are probably referring to "live blocks", so in the past, especially on Linux, if you repeatedly modified the same block frequently it would get its age counter reset and so might never be written out. I don't believe any recent Linux kernel has that problem any more. "First dirtied" timestamps are kept separate to "last dirtied" nowadays. If a dirtied block is too old, it'll get flushed. There is still a problem I believe on Windows with FAT where live blocks may take far too long to hit storage. But most USB disks are mounted with write through semantics, so you shouldn't see that problem in most modern systems which don't tend to have FAT drives with writeback caching.
Also, I'm assuming the hardware may ignore the O_SYNC as much as it can ignore the fsync, in which case you're SOL anyway.
Oh yes. You should also assume O_SYNC does nothing. On some systems, or some configurations of systems (e.g. inside lxc containers) it really does do nothing. Thankfully, most lxc containers I've seen in the wild only disable fsync, not O_SYNC. Which is another very good reason to not use fsync - people running your code inside a lxc container get a false sense of security. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/