Re: [boost] Re: static sized strings

Rob Stewart wrote:
From: John Nagle <nagle@animats.com>
STL strings are variable-sized, with no limit. When you add characters, "size()" increases. Should fixed-size
^^^^ I presume you're referring to fixed-capacity strings which can have a variable size, right?
yup. The idea is to make it a buffer-overflow safe replacement for C style character buffers, e.g.: char buf[ 100 ]; ::sprintf( buf, "...", ... ); while allowing it to behave like a basic_string.
strings be similar? If the STL string operations are to be meaningful, they have to be.
Only if you want to apply mutating algorithms that modify size. IOW, it isn't unreasonable to choose either fixed-size or fixed-capacity strings. If the former, then there are no mutating algorithms that affect the size. If the latter, the size can change, but an exception occurs when trying to exceed the capacity.
That's the way I see it, but instead of throwing an exception, the class simply prevents overrun and fills the available space, e.g.: char_string< 5 > buf; buf.copy( "Meine Grosse Welt!" ); std::cout << buf << '\n'; // prints "Mein" (null terminated)
In STL strings, "size" doesn't include a trailing null. Using "c_str" can result in expanding the string to add a trailing null. With fixed-size strings, we don't have that option.
If c_str is provided, there must always be space for a trailing null. So you can't use char_string when you need a specific fixed length, as for Macintosh 4-character file signatures. Perhaps "boost::array<char>" is the way to go for that sort of thing. Trying to do both null-terminated and non-null-terminated strings in the same class will get ugly.
If the string if fixed-capacity *and* there remains sufficient capacity, c_str() can null terminate the buffer. If there isn't sufficient remaining capacity, then throw an exception.
IOW, make it a runtime error to fix the capacity too small to permit null termination when calling c_str(). That still leaves room for things like the 4-character file signature to which you referred, and yet prevents buffer overrun, but doesn't require foregoing flexibility.
That is a good idea. It will mean keeping track of the string length, but it does permit both versions without requiring complex internals/semasntics or seperate classes. In order to keep interoperability with C style usage (i.e. operator const char *()), we can have the following: inline const char_type * c_str() { // null-terninate buffer: str[ len ] = char_type( '\0' ); return( str ) } inilne operator const char *() { return( c_str()); } Regards, Reece _________________________________________________________________ Tired of 56k? Get a FREE BT Broadband connection http://www.msn.co.uk/specials/btbroadband

"Reece Dunn" <msclrhd@hotmail.com> writes:
I presume you're referring to fixed-capacity strings which can have a variable size, right?
yup. The idea is to make it a buffer-overflow safe replacement for C style character buffers, e.g.:
char buf[ 100 ]; ::sprintf( buf, "...", ... );
while allowing it to behave like a basic_string.
Why not make an unlimited-size string with a parameterized internal "small string optimization" buffer? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

If you're willing to accept a call to "new", you can just use std::string and reserve some initial size. So you don't need that capabiilty in char_string. char_string is for situations when you don't want to invoke dynamic allocation at all. In desktop applications, that's not too common, but in real time work, inside operating systems, and in embedded applications it's not unusual. char_string is also useful for retrofitting old code to protect it against buffer overflows. John Nagle Team Overbot David Abrahams wrote:
"Reece Dunn" <msclrhd@hotmail.com> writes:
I presume you're referring to fixed-capacity strings which can have a variable size, right?
yup. The idea is to make it a buffer-overflow safe replacement for C style character buffers, e.g.:
char buf[ 100 ]; ::sprintf( buf, "...", ... );
while allowing it to behave like a basic_string.
Why not make an unlimited-size string with a parameterized internal "small string optimization" buffer?

John Nagle <nagle@overbot.com> writes:
If you're willing to accept a call to "new", you can just use std::string and reserve some initial size.
It's not the same thing. You might only be willing to accept a call to "new" if things go beyond a certain length. That's *usually* the only reason I'd ever have a fixed-size buffer. Very occasionally I'll be in code that where I don't want to throw an exception, but then a string that responds to overflow by throwing is no better.
So you don't need that capabiilty in char_string. char_string is for situations when you don't want to invoke dynamic allocation at all. In desktop applications, that's not too common, but in real time work, inside operating systems, and in embedded applications it's not unusual.
I'm saying, it's almost always better to degrade performance gracefully as the program's input grows than it is to introduce an execution discontinuity like an exception (if possible). In many of those scenarios you've named, exceptions, like dynamic allocation have been banned anyway for similar real and/or imaginary reasons. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

From: David Abrahams <dave@boost-consulting.com>
John Nagle <nagle@overbot.com> writes:
If you're willing to accept a call to "new", you can just use std::string and reserve some initial size.
It's not the same thing. You might only be willing to accept a call to "new" if things go beyond a certain length. That's *usually* the only reason I'd ever have a fixed-size buffer. Very occasionally I'll be in code that where I don't want to throw an exception, but then a string that responds to overflow by throwing is no better.
char_string can throw a more meaningful exception than std::bad_alloc. In the small string optimization or std::string with reserve approaches, the buffer grows when the input is too large, and other, downstream code may overflow or behave badly. Stopping the overflow early may be better and having an automated mechanism for preventing overflow is wise.
So you don't need that capabiilty in char_string. char_string is for situations when you don't want to invoke dynamic allocation at all. In desktop applications, that's not too common, but in real time work, inside operating systems, and in embedded applications it's not unusual.
I'm saying, it's almost always better to degrade performance gracefully as the program's input grows than it is to introduce an execution discontinuity like an exception (if possible). In many of those scenarios you've named, exceptions, like dynamic allocation have been banned anyway for similar real and/or imaginary reasons.
There's probably room and need for both types of strings. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

David Abrahams wrote:
John Nagle <nagle@overbot.com> writes:
If you're willing to accept a call to "new", you can just use std::string and reserve some initial size.
Char_string should truncate and maintain null-termination for the C-type operations, including strcat strcopy sprintf etc. An exception should only occur for explicit subscripting out of range: char_string s<10>; s[11] = 'x'; Remember, a major purpose of char_string is to stop buffer overflow attacks. It's primarily for retrofit to old code. New code should use <string>. This should be a simple, all-inline, no library .hpp file. John Nagle Animats

From: John Nagle <nagle@animats.com>
Remember, a major purpose of char_string is to stop buffer overflow attacks. It's primarily for retrofit to old code. New code should use <string>.
A buffer overflow occurs when data overwrites allocated memory. A small string optimization class won't encounter that problem unless there is insufficient memory for allocation. Thus, a "never grows" class and a small string optimization class permitting you to determine the stack allocation size solve the overflow problem. (Whether permitting arbitrarily large buffers of incoming data without overflow bugs is beneficial is another matter.) -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Reece Dunn wrote:
Rob Stewart wrote:
From: John Nagle <nagle@animats.com>
STL strings are variable-sized, with no limit. When you add characters, "size()" increases. Should fixed-size
^^^^ I presume you're referring to fixed-capacity strings which can have a variable size, right?
Yes.
yup. The idea is to make it a buffer-overflow safe replacement for C style character buffers, e.g.:
char buf[ 100 ]; ::sprintf( buf, "...", ... );
Yes.
If the string if (sic) fixed-capacity *and* there remains sufficient capacity, c_str() can null terminate the buffer. If there isn't sufficient remaining capacity, then throw an exception.
Done that way, if you write char_string<4> s; strcat(s,"ABCDE"); // truncates at "ABCD", no null. printf("s=%s\n",s.c_str()); // c_str throws exception which is quite different from classic <string.h> semantics. You couldn't use that as a drop-in replacement for C strings. Nor is it compatible with STL basic_string semantics. I'd suggest consistent null-terminated semantics for char_string.
IOW, make it a runtime error to fix the capacity too small to permit null termination when calling c_str(). That still leaves room for things like the 4-character file signature to which you referred, and yet prevents buffer overrun, but doesn't require foregoing flexibility.
If you want a 4-char file signature, you can use "boost::array<char,4>", which does that job. Is there any real need for that functionality in char_string? char_string might have some convenience template functions to interconvert "boost::array" and "boost::char_string".
That is a good idea. It will mean keeping track of the string length,
Yes, that seems to be necessary. What about the base class issue? There's a need to be able to write something like "char_string_base& s" when you want to pass around fixed-capacity strings of more than one capacity.
Regards, Reece

From: John Nagle <nagle@animats.com>
Reece Dunn wrote:
Rob Stewart wrote:
From: John Nagle <nagle@animats.com>
If the string if (sic) fixed-capacity *and* there remains sufficient capacity, c_str() can null terminate the buffer. If there isn't sufficient remaining capacity, then throw an exception.
Done that way, if you write
char_string<4> s; strcat(s,"ABCDE"); // truncates at "ABCD", no null. printf("s=%s\n",s.c_str()); // c_str throws exception
which is quite different from classic <string.h> semantics. You couldn't use that as a drop-in replacement for C strings. Nor is it compatible with STL basic_string semantics. I'd suggest consistent null-terminated semantics for char_string.
You're comparing apples to oranges. Either you mean for the buffer to be exactly four bytes or you don't. If you do, then there's no room for a null terminator. You're not dealing with a "C string" (a null terminated string), you're dealing with a four byte buffer. If you allocated four bytes for a "C string" and write "ABCDE" to it, or even strcat() "ABCD" to it, you overrun the buffer. Why would we want to provide consistent semantics for that behavior?
IOW, make it a runtime error to fix the capacity too small to permit null termination when calling c_str(). That still leaves room for things like the 4-character file signature to which you referred, and yet prevents buffer overrun, but doesn't require foregoing flexibility.
If you want a 4-char file signature, you can use "boost::array<char,4>", which does that job. Is there any real need for that functionality in char_string? char_string might have some convenience template functions to interconvert "boost::array" and "boost::char_string".
That is a good idea. It will mean keeping track of the string length,
Yes, that seems to be necessary.
The question is, which is better: char_string<4> or boost::array<char,4>? I suggest that the latter is better. In C code, and similar C++ code, arrays of char are used as buffers of fixed length and as memory for strings. Code will be clearer if one uses boost::array<char,N> for the former and char_string<N> for the latter. Once you make that distinction of purpose, null termination can be integral to char_string without complication or unwanted overhead.
What about the base class issue? There's a need to be able to write something like "char_string_base& s" when you want to pass around fixed-capacity strings of more than one capacity.
If that is a necessary feature, then the length can be stored in the base class. However, such a base class means that the dtor must be virtual. Is vtable overhead acceptable in such a class? Perhaps two types are needed. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;
participants (5)
-
David Abrahams
-
John Nagle
-
John Nagle
-
Reece Dunn
-
Rob Stewart