The hack is the above. We cache the address of the canonical singleton, ...
Hm. I must be missing something.
BOOSTLITE_NOINLINE inline const std::error_category &_generic_category() { const std::error_category &c = stl11::generic_category(); return c; }
This doesn't cache anything. It just calls stl11::generic_category() and returns the result.
Yes, you're right. Looks like I removed the static storage at some point. I would assume I found I didn't need it any more to achieve the desired removal of code bloat.
The resulting assembler generated is greatly improved on MSVC, a single result<T> shrinks from ~260 opcodes to less than 5.
Would be nice if we could see that demonstrated on godbolt.org.
It's an atomic fence to the *compiler*, not to the CPU. It can't be demonstrated online easily. You need sequences of operations to be folded by an optimiser where the fence inhibits folding. You can have a look at the git history of https://github.com/ned14/boost.outcome/blob/develop/test/constexprs/msvc.csv if you really need proof. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/