Re: [boost] Re: static sized strings

Rob Stewart wrote:
From: John Nagle > Reece Dunn wrote:
Rob Stewart wrote:
From: John Nagle
You're comparing apples to oranges. Either you mean for the buffer to be exactly four bytes or you don't. If you do, then there's no room for a null terminator. You're not dealing with a "C string" (a null terminated string), you're dealing with a four byte buffer. If you allocated four bytes for a "C string" and write "ABCDE" to it, or even strcat() "ABCD" to it, you overrun the buffer. Why would we want to provide consistent semantics for that behavior?
The idea is to prevent overrun when this type of situation occurs. Thus, it will be a 3 character buffer with the extra character being a null-terminator.
IOW, make it a runtime error to fix the capacity too small to permit null termination when calling c_str(). That still leaves room for things like the 4-character file signature to which you referred, and yet prevents buffer overrun, but doesn't require foregoing flexibility.
If you want a 4-char file signature, you can use "boost::array<char,4>", which does that job. Is there any real need for that functionality in char_string?
Isn't boost::array specific to generic arrays, whereas using a variant of char_string, you can use string functions that will be optimized for string operations. This is the main reason for using a special string class.
char_string might have some convenience template functions to interconvert "boost::array" and "boost::char_string".
That is a good idea. It will mean keeping track of the string length,
Yes, that seems to be necessary.
The question is, which is better: char_string<4> or boost::array<char,4>? I suggest that the latter is better. In C code, and similar C++ code, arrays of char are used as buffers of fixed length and as memory for strings. Code will be clearer if one uses boost::array<char,N> for the former and char_string<N> for the latter. Once you make that distinction of purpose, null termination can be integral to char_string without complication or unwanted overhead.
That makes sense.
What about the base class issue? There's a need to be able to write something like "char_string_base& s" when you want to pass around fixed-capacity strings of more than one capacity.
If that is a necessary feature, then the length can be stored in the base class. However, such a base class means that the dtor must be virtual. Is vtable overhead acceptable in such a class? Perhaps two types are needed.
My design does not use a virtual base class, so that isn't an issue. John Nagle's version does, so that is where the problem arises. I have several issues with the use of a virtual base class: [1] If you want to operate on a variable length character string specifically, why not templatize the function: template< int n > void myfn( boost::char_array< n > & s ){ ... } [2] How do you deal with wide-character strings? My update generalizes to support char and wchar_t based buffers, but with a virtual base class, you are limited to char buffers. [3] One of the reasons for having a virtual class is to supply custom string operations, e.g. using Windows-specific string functions instead of the standard library ones. This can also be solved with a policy template like that found in basic_string. My current version uses this approach, improving interoperability with basic_strings. Regards, Reece _________________________________________________________________ Use MSN Messenger to send music and pics to your friends http://www.msn.co.uk/messenger

From: "Reece Dunn" <msclrhd@hotmail.com>
Rob Stewart wrote:
From: John Nagle > Reece Dunn wrote:
You're comparing apples to oranges. Either you mean for the buffer to be exactly four bytes or you don't. If you do, then there's no room for a null terminator. You're not dealing with a "C string" (a null terminated string), you're dealing with a four byte buffer. If you allocated four bytes for a "C string" and write "ABCDE" to it, or even strcat() "ABCD" to it, you overrun the buffer. Why would we want to provide consistent semantics for that behavior?
The idea is to prevent overrun when this type of situation occurs. Thus, it will be a 3 character buffer with the extra character being a null-terminator.
I think we're miscommunicating, and it's probably me misinterpreting something. The four byte matter was raised because someone wanted to be able to peek into a buffer of data read from a file. The 5th byte wasn't a null terminator and it shouldn't be changed, so I took that to suggest effectively overlaying a char_string on that file's contents in the buffer. Another interpretation of that need is to pass the buffer to a char_string<4> and expect that only the first four bytes be copied.
IOW, make it a runtime error to fix the capacity too small to permit null termination when calling c_str(). That still leaves room for things like the 4-character file signature to which you referred, and yet prevents buffer overrun, but doesn't require foregoing flexibility.
If you want a 4-char file signature, you can use "boost::array<char,4>", which does that job. Is there any real need for that functionality in char_string?
Isn't boost::array specific to generic arrays, whereas using a variant of char_string, you can use string functions that will be optimized for string operations. This is the main reason for using a special string class.
You're right that using boost::array doesn't offer any string facilities. Perhaps what we need is a string library that uses all namespace scope functions and type generators to generalize the notion of a string (this is not unlike Thorsten's CollectionTraits library). Then, a boost::array<char,N>, a std::string, even a C string can be treated generically as a string. With such a facility, boost::array would work just fine for the file signature example and char_string would be relieved from needing to handle the "no terminator" case.
What about the base class issue? There's a need to be able to write something like "char_string_base& s" when you want to pass around fixed-capacity strings of more than one capacity.
If that is a necessary feature, then the length can be stored in the base class. However, such a base class means that the dtor must be virtual. Is vtable overhead acceptable in such a class? Perhaps two types are needed.
My design does not use a virtual base class, so that isn't an issue. John Nagle's version does, so that is where the problem arises. I have several issues with the use of a virtual base class:
A "virtual base class" or a base class with a virtual function?
[1] If you want to operate on a variable length character string specifically, why not templatize the function: template< int n > void myfn( boost::char_array< n > & s ){ ... }
The issue had to do with being able to create collections of variable sized string objects.
[2] How do you deal with wide-character strings? My update generalizes to support char and wchar_t based buffers, but with a virtual base class, you are limited to char buffers.
That's true only if the character type is encoded in the base class. Why would it be?
[3] One of the reasons for having a virtual class is to supply custom string operations, e.g. using Windows-specific string functions instead of the standard library ones. This can also be solved with a policy template like that found in basic_string. My current version uses this approach, improving interoperability with basic_strings.
You can certainly design an ABC with many pure virtual functions that the derived types implement, but that was never the intent of the base class idea. Your policy approach will permit a lot of custimization, but perhaps the better approach is to externalize all operations. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Reece Dunn wrote:
Rob Stewart wrote:
From: John Nagle > Reece Dunn wrote:
Rob Stewart wrote:
From: John Nagle The question is, which is better: char_string<4> or boost::array<char,4>? I suggest that the latter is better. In C code, and similar C++ code, arrays of char are used as buffers of fixed length and as memory for strings. Code will be clearer if one uses boost::array<char,N> for the former and char_string<N> for the latter. Once you make that distinction of purpose, null termination can be integral to char_string without complication or unwanted overhead.
That makes sense.
Agreed. If you're doing something that requires an array of characters of a specific size, it's often part of some structure defined by some external standard, like a Macintosh file signature or an ISO CD header. There, you need something that contains exactly the characters specified, and nothing else. "boost::array<char,N>" is thus the right tool for that job.
What about the base class issue? There's a need to be able to write something like "char_string_base& s" when you want to pass around fixed-capacity strings of more than one capacity.
If that is a necessary feature, then the length can be stored in the base class. However, such a base class means that the dtor must be virtual. Is vtable overhead acceptable in such a class? Perhaps two types are needed.
My design does not use a virtual base class, so that isn't an issue. John Nagle's version does, so that is where the problem arises. I have several issues with the use of a virtual base class:
[1] If you want to operate on a variable length character string specifically, why not templatize the function: template< int n > void myfn( boost::char_string< n > & s ){ ... }
That leads to templatizing every function that has a char_string argument, with the associated headaches and overhead. It's not easy to retrofit that to existing code without a major rewrite. With a base class approach, you can just replace "char *" with "char_string_base&", then fix any compile errors, and your program becomes buffer overflow safe. That's easy to do.
[2] How do you deal with wide-character strings? My update generalizes to support char and wchar_t based buffers, but with a virtual base class, you are limited to char buffers.
I'm assuming that wchar_string will be much like char_string, but may be a separate file. The functions it has to call (strncat vs. wstrncat) have different names, so you can't just crank out both forms with a template anyway.
[3] One of the reasons for having a virtual class is to supply custom string operations, e.g. using Windows-specific string functions instead of the standard library ones. This can also be solved with a policy template like that found in basic_string. My current version uses this approach, improving interoperability with basic_strings.
Did you ever put this in the Boost sandbox? John Nagle
participants (3)
-
John Nagle
-
Reece Dunn
-
Rob Stewart