[Endian] library proposal

https://bitbucket.org/davidstone/endian A while back, I had the somewhat common problem of networking in C++. Libraries like Boost.Bind and Boost.Asio proved immensely helpful, but there wasn't anything in Boost that would take care of the byte ordering. Fortunately, I could use the hton family of functions (htonl, htons, ntohl, ntohs) because all of my networking code was 4 bytes or smaller. Later on, I was writing some code that created hashes (specifically MD5 and SHA-2). In the quest for more performance, I tried to maximize the amount of memory I accessed at once (grabbing 4, 8, or even sizeof (uintmax_t) bytes at a time, instead of 1). This also meant that I had to make sure I manipulated those bytes in the proper order, so I went to my networking code and tried to reuse that. However, I quickly ran into some problems, the largest of which is that there is no htonll or similar for 8-byte integers. So I did what any self- respecting programmer would do: I wrote my own using bit shifting. My work paid off, and I was rewarded with a few % increase in performance in my library. I naturally wondered if I could do better, and I found out that yes, I could do better than dynamic endian determination and manual bit shifting. This led me to make a complete library, with the goal of submitting it to Boost. Imagine my surprise to find out that just a few months ago, Beman Dawes submitted his own version of such a library! However, I feel my version offers a few improvements over parts of Dawes's library. My library does not provide any extra integer types. All mine does is convert the representation. For this reason, I feel it would probably be best for the final Boost.Endian library to combine my library with Dawes's. Here is an overview of the features of my library, with emphasis given on the relative strengths of what I have written (in no particular order) * My library uses compiler intrinsics where they exist. Currently, it uses intrinsics with GCC, Clang, MSVC, and (according to the documentation, I haven't tested) ICC. On GCC, where I have done most of my testing, using intrinsics gave a small (but measurable) performance boost. I believe my timings suggested about a 3.5% increase in speed using the intrinsic version vs. using a manual bit- shifting version with compiler options: -Ofast -march=native -funsafe-loop-optimizations which I believe is close to the strongest set of optimizations gcc supports. * My library uses static determination of endianness where possible (as defined in boost/detail/endian.hpp), and dynamic determination of endianness only when that fails. Static determination gave me about a 1.25% increase in speed over dynamic determination (and has slightly smaller memory requirements, but they are probably on the order of one word or two). * My library uses templates to determine which byte reordering function to call. This genericity was essential to support my SHA-2 solution, for instance, because the SHA-2 hashing algorithm is actually a family of 4 functions, half of which use a 32-bit word size, and half of which use a 64-bit word size. I had already minimized code duplication by use of templates, and a template-based solution works much better with generic code (while not inconveniencing non- template code) than functions like htonl / htons. * My library returns results via return value, rather than by reference. This is more natural and likely leads to greater performance (due to not having to reference / dereference built-in types). It also allows const-correct programming. * My library includes thorough unit testing that doubles as a performance test. Compiling the .cpp file and calling it normally is a simple unit test that fully exercises its capabilities (it calls multiple versions of conversion functions and verifies dynamic byte order determination is working as expected). It also accepts an optional command-line argument as an integer that states how many iterations of the unit tests to run. The program then reports how long each test took. * Like Dawes's library, my library does not (yet) include support for floating point numbers. The reason for this is that floating point endianness is not well documented or as consistent. It's possible for there to be little endian floating point numbers and big endian integers on the same machine. Rather than suggest that the problem is solved by adding in support for floating point numbers, attempting to use them will result in a compile-time error. However, thanks to my library's use of type traits, changing around most of the functions is as simple as changing is_integral with is_numeric. I'm not even sure if traditional endianness can describe floating point numbers as a whole, or if (for example) the mantissa and the exponent are each stored in a particular (hopefully identical) endianness. This could lead to numbers with byte ordering like DCBA HGEF (where the first part represents the mantissa and the second the exponent, or perhaps it's the other way around). Basic byte-swapping functions will not cope with this, and as such, I'll need to do more research. * My library supports big-endian, little-endian, and PDP-endian. * I have tested the library with gcc 4.6.2 on Fedora 16 64-bit and Fedora 16 32-bit. I have tested with clang 3.0 on Fedora 16 64-bit. I have tested with MSVC 9 and 11 (I believe those were the versions I used) on Windows 7 64-bit. My code compiles with no warnings in GCC, even when set to the paranoid warning levels described in this post: http://stackoverflow.com/a/9862800 . It nearly compiles cleanly in Visual Studio at /W4, except that it complains about a conditional expression being constant in a template (it's always true for some instantiations, and always false for others). Even with /Wall, it compiled cleanly (other than library code), with the exception of the above mentioned warning about constant conditional expressions, and a few confusing signed / unsigned mismatch warnings. Unfortunately, I do not have a big-endian or pdp- endian machine to test on. However, my MD5 and SHA-2 tests still validate (which they would fail to do if my big-endian routines were wrong). * I'm a little uncertain about my dynamic byte ordering. I currently determine it once and save the result to a global const bool. I'm not sure if that's the best way (but it did give a few % increase in performance over computing it every time). For these reasons, I hope you consider my library for inclusion in Boost. https://bitbucket.org/davidstone/endian

According to a message I received, the ARM CPU uses little-endian 32-bit integers, and 64-bit integers are created by combining two little-endian integers in big-endian order, which by my calculations gives us a byte order of DCBAHGFE. I actually defined PDP-endian to do this byte ordering for 64-bit words simply because I couldn't forbid it due to dynamic byte ordering concerns. This page: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0278b/Cegbbbab.html suggests that the ARM can actually switch between big-endian, little-endian, and middle-endian! However, this brings up a concern of the best way to name things. The PDP-11 didn't have 64-bit integers (unless there's some extension somewhere that I'm not aware of), so calling this PDP-endian is really a misnomer for 64-bit integers. The ARM processor is either big-endian or little-endian for 32-bit access, so it can't even be generalized to some middle-endian version, I would need something like PDP-endian and ARM-endian. I'm assuming that all multi-endian (the term bi-endian doesn't apply to the ARM, which appears to have three endian modes) processors require dynamic endian determination.
participants (1)
-
David Stone