Re: [boost] [Submission] RawMemory (24-bit integers and runtime-dispatch)

Hi there ! And thank you for your time and for your questions. Before I answer the first question regarding 24-bit integers, etc., please allow me to kindly ask you: Where are you going to store the 24-bit integer that you read from "raw memory" (= untyped memory = array of bytes) ? Or where is the 24-bit integer that you plan to write to "raw memory" stored ? (In what type of variable ?) Regarding the second question, regarding run-time dispatching of operations (as opposed to compile-time dispatching), I will start with a couple of examples that supports your question even more: It is not just the endian-ness that might not be known at compile-time. The alignment of data might also not be known before explicitly checking it at run-time. Let me quote from the ugly documentation that I have written (http://adder.iworks.ro/Boost/RawMemory/#UsersManual): "When we are dealing with an array of values (i.e. contiguous values) and performance is critical, we might want to check whether the data is aligned and specifically demand unchecked_memory_xfer if it is." Ultimately, the number of bytes that are used to represent the integer may be variable. Such is the case with the "mapping pair offsets" used by NTFS to locate the contiguous runs of clusters that make up the (non-MFT resident portion of the) file. In order to save precious MFT (Master File Table) entry space, signed offsets are stored in variable-sized (1 to 8-byte long) integers. My answer is probably obvious by now. I am thinking about letting the programmer use the ultimate weapon she has in her toolchest: The mighty "if" instruction. For the specific example you kindly gave, I am thinking that the following syntax might be useful for reading UCS-2 characters from a file saved with Notepad (in "Unicode" or "Unicode Big Endian" format): const uint16_t nByteOrderMask = raw_memory <uint16_t, little_endian>::peek (pFileBuffer); const bool bBigEndian = nByteOrderMask == 0xFEFF; ... // The loop: { const uint16_t cCurrent = bBigEndian ? raw_memory <uint16_t, big_endian>::peek (pCurrent) : raw_memory <uint16_t, little_endian>::peek (pCurrent); ... } I await your thoughts... Thank you again ! (-: -- Yours truly, Adder On 9/7/11, Joe Mucchiello <jmucchiello@yahoo.com> wrote:
For a library designed to deal with the weirdness of binary file formats, I'm surprised it does not support 24-bit integers of arbitrary alignment/sign.
uint32_t x = raw_memory <uint32_t, big_endian>::peek (&vbBuffer [5]);
Do you also support a non-template "endianness" parameter? Sometimes you don't know the endian-style of the file until you open it and read some value out of it.
Joe
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Before I answer the first question regarding 24-bit integers, etc.,
please allow me to kindly ask you:
Where are you going to store the 24-bit integer that you read from "raw memory" (= untyped memory = array of bytes) ? Or where is the 24-bit integer that you plan to write to "raw memory" stored ?
(In what type of variable ?)
Regarding the second question, regarding run-time dispatching of operations (as opposed to compile-time dispatching), I will start with a couple of examples that supports your question even more:
It is not just the endian-ness that might not be known at compile-time. The alignment of data might also not be known before explicitly checking it at run-time. Let me quote from the ugly documentation that I have written (http://adder.iworks.ro/Boost/RawMemory/#UsersManual):
"When we are dealing with an array of values (i.e. contiguous values) and performance is critical, we might want to check whether the data is aligned and specifically demand unchecked_memory_xfer if it is."
Ultimately, the number of bytes that are used to represent the integer may be variable. Such is the case with the "mapping pair offsets" used by NTFS to locate the contiguous runs of clusters that make up the (non-MFT resident portion of the) file. In order to save precious MFT (Master File Table) entry space, signed offsets are stored in variable-sized (1 to 8-byte long) integers.
My answer is probably obvious by now. I am thinking about letting the programmer use the ultimate weapon she has in her toolchest:
The mighty "if" instruction.
For the specific example you kindly gave, I am thinking that the following syntax might be useful for reading UCS-2 characters from a file saved with Notepad (in "Unicode" or "Unicode Big Endian" format):
const uint16_t nByteOrderMask = raw_memory <uint16_t, little_endian>::peek (pFileBuffer);
const bool bBigEndian = nByteOrderMask == 0xFEFF;
... // The loop: { const uint16_t cCurrent = bBigEndian ? raw_memory <uint16_t, big_endian>::peek (pCurrent) : raw_memory <uint16_t, little_endian>::peek (pCurrent); ... }
I await your thoughts... Thank you again ! (-:
-- Yours truly, Adder
On 9/7/11, Joe Mucchiello <jmucchiello@yahoo.com> wrote:
For a library designed to deal with the weirdness of binary file formats, I'm surprised it does not support 24-bit integers of arbitrary alignment/sign.
uint32_t x = raw_memory <uint32_t, big_endian>::peek (&vbBuffer [5]);
Do you also support a non-template "endianness" parameter? Sometimes you don't know the endian-style of the file until you open it and read some value out of it.
Joe
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

My apologies for the useless message that just went to the list....
Before I answer the first question regarding 24-bit integers, etc., please allow me to kindly ask you:
Where are you going to store the 24-bit integer that you read from "raw memory" (= untyped memory = array of bytes) ? Or where is the 24-bit integer that you plan to write to "raw memory" stored ?
(In what type of variable ?)
It could be a vanilla int. Or a boost::integer::uint24_t. How about: struct foo { int rgb:24; int bar:3; int baz:4; int alpha:1; }; The point of your library is to interface with low-level file formats. Anything goes down there.
My answer is probably obvious by now. I am thinking about letting the programmer use the ultimate weapon she has in her toolchest:
The mighty "if" instruction.
I didn't need the long winded fluff.

On 9/8/11, Joe Mucchiello <jmucchiello@yahoo.com> wrote:
Where are you going to store the 24-bit integer that you read from "raw memory" (= untyped memory = array of bytes) ? Or where is the 24-bit integer that you plan to write to "raw memory" stored ?
(In what type of variable ?)
It could be a vanilla int.
I would like to ask you: How would you read the 3 bytes of data into a 32-bit integer ? (E.g.: Read 16 bits, read 8 bits and put them together ? Read 32 bits and discard bits ? Read 3 groups of 8 bits each ? Check alignment before reading and branch according to result hoping to get better performance ?) How would you write the 32-bit integer value into 3 bytes of data ? (E.g.: Write 16 bits and write 8 bits ? Write 32 bits ? Write 3 groups of 8 bits each ?)
Or a boost::integer::uint24_t.
Is uint24_t a way to represent an integer in 24 bits and perform operations (arithmetic, input/output, etc. -- similarly to using a fundamental type) ?
How about:
struct foo { int rgb:24; int bar:3; int baz:4; int alpha:1; };
Forgive me. I believe that this approach is fundamentally wrong.
The point of your library is to interface with low-level file formats. Anything goes down there.
Precisely since anything can go down there, I believe that the burden has to be shared between the maintainer(s) of the library and the programmers who use the library. It cannot be otherwise, or I would be creating a monster. The purpose of my questions has been to illustrate that, above a certain level of choices, the possibilities fork exponentially and thus should not be treated exclusively in the library. That would artificially limit choices and performance for the final application.
My answer is probably obvious by now. I am thinking about letting the programmer use the ultimate weapon she has in her toolchest:
The mighty "if" instruction.
I didn't need the long winded fluff.

From: Adder <adder.thief@gmail.com>
How would you read the 3 bytes of data into a 32-bit integer ?
With a lot of code I expect your library to mitigate for me. One way: // little endian int32_t i = 0; unsigned char c; std::istream is = .... is >> c; i = c; is >> c; i += (c << 8); is >> c; i += (c << 16); if (i & 00800000) i = i | 0xFF000000; // sign extend the result I assume you can figure out the big endian method. What I don't understand is the question. For many years now people have written code similar to the above to deal with weird file formats. Do you really think is "unheard of" to have to read 3-bytes into a 4-byte integer? What is the point of a Raw Memory library that does not provide simple functions to do this?
I wrote:
Or a boost::integer::uint24_t.
Is uint24_t a way to represent an integer in 24 bits and perform operations (arithmetic, input/output, etc. -- similarly to using a fundamental type) ?
I found it in boost::endian
How about:
struct foo { int rgb:24; int bar:3; int baz:4; int alpha:1; };
Forgive me. I believe that this approach is fundamentally wrong.
You don't deal with legacy code much? What do you do with a legacy process that writes struct foo to a file 100 times and then you need to read that file on another system? I can't call up the programmer who made the monstrosity 20 years ago and tell him his approach is fundamentally wrong.

How would you read the 3 bytes of data into a 32-bit integer ?
With a lot of code I expect your library to mitigate for me. One way: [...]
That which you desire, you shall have. For I could not forgive myself if I let people write that kind of code in the future.
I assume you can figure out the big endian method. What I don't understand is the question. For many years now people have written code similar to the above to deal with weird file formats.
For many years, people have been doing it wrong.
Do you really think is "unheard of" to have to read 3-bytes into a 4-byte integer? What is the point of a Raw Memory library that does not provide simple functions to do this?
There are no "simple" functions to do this. Not unless you just want a wrapper (over std::reverse, std::reverse_copy or std::accumulate) that you never benchmark.
I wrote:
Or a boost::integer::uint24_t.
Is uint24_t a way to represent an integer in 24 bits and perform operations (arithmetic, input/output, etc. -- similarly to using a fundamental type) ?
I found it in boost::endian
Whatever. Boost-wannabe.RawMemory is not about converting data that cannot be processed natively to another kind of data that cannot be processed natively. Once I give you the uint32_t that you so much desire, you will be able to place it in any bitfield or limited-size integer you choose. And once you give me the uint32_t, I will poke it back to "raw memory".
How about:
struct foo { int rgb:24; int bar:3; int baz:4; int alpha:1; };
Forgive me. I believe that this approach is fundamentally wrong.
You don't deal with legacy code much?
I am your regular graduate newbie.
What do you do with a legacy process that writes struct foo to a file 100 times and then you need to read that file on another system? I can't call up the programmer who made the monstrosity 20 years ago and tell him his approach is fundamentally wrong.
We might have to study monstruosities (in order to obtain the spec of the binary data format or understand legacy code). But we should not write monstruosities again. Nor encourage that via our libraries. I know that you know how to deal portably with the "foo" struct above. Otherwise, I would point to http://adder.iworks.ro/Boost/RawMemory/#TheLastThing . Thank you truly, for you are one of the few who cares to help me improve my unworthy library, whether I like it/you or not. -- Sincerely, Adder
participants (2)
-
Adder
-
Joe Mucchiello