
Dear All, I have humbly taken a look upon the "conversion.hpp" section of Beman Dawes' library and I was able to learn a lot... Thank you... I was able to learn a lot from Tymofey's and Phil Endecott's code too, so thank you too ! I am just a programmer-wannabe... I do hope that my message is not understood as disrespect, for it is not meant as such, not at all. I have conducted a small and non-representative benchmark, the ugly source code of which I have uploaded here: http://preview.tinyurl.com/4yhcc8t It compares several versions of code meant to "bswap" 16-bit, 32-bit and 64-bit integer values. In the table below, I have used the following notations: RC-1: This is endian::revert from the release-candidate (non-final, I think) version of Beman Dawes' library. Tymofey: A combination of my own unworthy hack for uint16_t, Phil Endecott's version for uint32_t (for USE_TYMOFEY) and Tymofey's suggestion for uint64_t. Imbecil-0: Boost-wannabe.RawMemory compiled with BOOST_RAW_MEMORY_TRICKS #define'd as 0. Imbecil-1: Boost-wannabe.RawMemory compiled with BOOST_RAW_MEMORY_TRICKS #define'd as 1 (the default). Here are the approximative results, on the same computer that is described here: http://adder.iworks.ro/Boost/RawMemory/#Benchmarks (Ctrl+F: "And the name of the computer is"). (Much to my shame, I have not employed Windows' "high-performance counter". Also, I did not run too many repetitions in order to avoid overheating the lovely computer or the benchmarks turning against me.) (For the 64-bit integers -- what a devious choice ! --, I have also noted the approximative number of machine code bytes that were generated.) Borland C++Builder 5.5: uint16_t RC-1 1765 Tymofey 265 Imbecil-0 250 Imbecil-1 250 uint32_t RC-1 1922 Tymofey 453 Imbecil-0 469 Imbecil-1 453 uint64_t RC-1 2360 ( 72 bytes of code) Tymofey 3375 (225 bytes of code) Imbecil-0 2234 (169 bytes of code) Imbecil-1 797 (7 bytes for the caller, 15 bytes for the callee) Digital Mars C++: uint16_t RC-1 360 Tymofey 250 Imbecil-0 265 Imbecil-1 250 uint32_t RC-1 1828 Tymofey 391 Imbecil-0 1203 Imbecil-1 453 <-- I am such a noob ! uint64_t RC-1 2797 ( 84 bytes of code) Tymofey 2875 (292 bytes of code) Imbecil-0 2453 (202 bytes of code) Imbecil-1 609 (7 bytes for the caller, 15 bytes for the callee) GCC 4.3.4 (20090804): uint16_t RC-1 1750 Tymofey 188 Imbecil-0 187 Imbecil-1 187 <-- Of course, I chose the run that favours me ! uint32_t RC-1 1328 Tymofey 453 Imbecil-0 438 Imbecil-1 188 uint64_t RC-1 2781 ( 67 bytes of code) Tymofey 1578 ( 96 bytes of code) Imbecil-0 578 ( 46 bytes of code) Imbecil-1 250 ( 8 bytes of code) Visual C++ 2003: uint16_t RC-1 984 Tymofey 187 Imbecil-0 125 Imbecil-1 203 <-- I have to investigate this ! uint32_t RC-1 1563 Tymofey 375 Imbecil-0 515 Imbecil-1 125 uint64_t RC-1 2328 ( 63 bytes of code) Tymofey 3047 (154 bytes of code) Imbecil-0 1110 (122 bytes of code) Imbecil-1 438 ( 12 bytes of code, but I had to "WorkAround.cpp" an Internal Compiler Error by avoiding the inlining the function; I have to investigate this !) Visual C++ 2005: uint16_t RC-1 906 Tymofey 188 Imbecil-0 187 Imbecil-1 203 uint32_t RC-1 1906 Tymofey 391 Imbecil-0 328 Imbecil-1 125 uint64_t RC-1 3437 ( 65 bytes of code) Tymofey 3047 (151 bytes of code) Imbecil-0 641 ( 61 bytes of code) Imbecil-1 188 ( 8 bytes of code) Visual C++ 2005 for x64: uint16_t RC-1 1687 Tymofey 203 Imbecil-0 188 Imbecil-1 203 uint32_t RC-1 1390 Tymofey 328 Imbecil-0 329 Imbecil-1 188 uint64_t RC-1 2406 ( 72 bytes of code) Tymofey 703 (210 bytes of code; the loop was unrolled 1:2) Imbecil-0 531 (162 bytes of code; the loop was unrolled 1:2) Imbecil-1 187 ( 3 bytes of code) The "Tymofey" optimizations are included in the previous emails from Tymofey and Phil Endecott. The "Imbecil" optimizations (and pessimizations) are described and included here: http://adder.iworks.ro/Boost/RawMemory If Assembler, compiler intrinsics, __fastcall, compiler-specific tuning, etc. sound interesting, you are welcome to have a closer look. Thank you for your time and for your work... -- Yours truly, Adder On 9/6/11, Beman Dawes <bdawes@acm.org> wrote:
On Mon, Sep 5, 2011 at 6:38 PM, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:
I've just done some quick benchmarks of Beman's proposed byte-swapping code...
What do people see on other platforms?
Twenty plus years ago I put a lot of effort into finding optimum code for a C language endian library, but real-world application tests showed that what was optimum for one compiler was a dog on another compiler, that compiler switches could change what was optimum code, and then for the next release of the compiler we had to do it all over again.
That said, if we can come up with a benchmark representative of real-world uses cases, and can come up with robust optimizations that have some staying power, I'll gladly include them in the code.
--Beman