Re: [boost] Re: static sized strings

Rob Stewart wrote:
From: "Reece Dunn" <msclrhd@hotmail.com>
Rob Stewart wrote:
From: John Nagle > Reece Dunn wrote: The idea is to prevent overrun when this type of situation occurs. Thus, it will be a 3 character buffer with the extra character being a null-terminator.
I think we're miscommunicating, and it's probably me misinterpreting something.
The four byte matter was raised because someone wanted to be able to peek into a buffer of data read from a file. The 5th byte
That was me :). I was suggesting that one possible application for this type of class was in managing header files of binary data, e.g. tar files, doing things like: struct tar_header { ... boost::char_string< 6 > magic; } hdr; if( hdr.magic == "ustar\x20\x20\0" ) // process GNU TAR file
wasn't a null terminator and it shouldn't be changed, so I took that to suggest effectively overlaying a char_string on that file's contents in the buffer.
I have revised my initial idea of supporting both null-terminated and non-terminated string buffers in response that (a) boost::array< char, N > is a good candidate for hte latter, and (b) adding a boolean template parameter to select null termination and the associated code made the logic too complex.
IOW, make it a runtime error to fix the capacity too small to permit null termination when calling c_str(). That still leaves room for things like the 4-character file signature to which you referred, and yet prevents buffer overrun, but doesn't require foregoing flexibility.
If you want a 4-char file signature, you can use "boost::array<char,4>", which does that job. Is there any real need for that functionality in char_string?
Isn't boost::array specific to generic arrays, whereas using a variant of char_string, you can use string functions that will be optimized for string operations. This is the main reason for using a special string class.
You're right that using boost::array doesn't offer any string facilities. Perhaps what we need is a string library that uses all namespace scope functions and type generators to generalize the notion of a string (this is not unlike Thorsten's CollectionTraits library). Then, a boost::array<char,N>, a std::string, even a C string can be treated generically as a string. With such a facility, boost::array would work just fine for the file signature example and char_string would be relieved from needing to handle the "no terminator" case.
Doesn't the string algorithms library do just that?
What about the base class issue? There's a need to be able to write something like "char_string_base& s" when you want to pass around fixed-capacity strings of more than one capacity.
If that is a necessary feature, then the length can be stored in the base class. However, such a base class means that the dtor must be virtual. Is vtable overhead acceptable in such a class? Perhaps two types are needed.
My design does not use a virtual base class, so that isn't an issue. John Nagle's version does, so that is where the problem arises. I have several issues with the use of a virtual base class:
A "virtual base class" or a base class with a virtual function?
Base class with a set of virtual functions (my mistake).
[1] If you want to operate on a variable length character string specifically, why not templatize the function: template< int n > void myfn( boost::char_array< n > & s ){ ... }
The issue had to do with being able to create collections of variable sized string objects.
Hmmm. That could be tricky :(.
[2] How do you deal with wide-character strings? My update generalizes to support char and wchar_t based buffers, but with a virtual base class, you are limited to char buffers.
That's true only if the character type is encoded in the base class. Why would it be?
You need to know the character type you are operating on in the base class. This is the basic idea that John Nagle's approach takes: class char_string_base { public: virtual std::size_t length() const; virtual const char * c_str() const; virtual void copy( const char * ); }; But I agree with John's comments that it may be necessary to have a wchar_string_base to support wide characters.
[3] One of the reasons for having a virtual class is to supply custom string operations, e.g. using Windows-specific string functions instead of the standard library ones. This can also be solved with a policy template like that found in basic_string. My current version uses this approach, improving interoperability with basic_strings.
You can certainly design an ABC with many pure virtual functions that the derived types implement, but that was never the intent of the base class idea.
Your policy approach will permit a lot of custimization, but perhaps the better approach is to externalize all operations.
Can you expand on what you mean by externalize. Regards, Reece _________________________________________________________________ Sign-up for a FREE BT Broadband connection today! http://www.msn.co.uk/specials/btbroadband

From: "Reece Dunn" <msclrhd@hotmail.com>
Rob Stewart wrote:
From: "Reece Dunn" <msclrhd@hotmail.com>
Rob Stewart wrote:
From: John Nagle > Reece Dunn wrote:
I have revised my initial idea of supporting both null-terminated and non-terminated string buffers in response that (a) boost::array< char, N > is a good candidate for hte latter, and (b) adding a boolean template parameter to select null termination and the associated code made the logic too complex.
Ah, I didn't realize that you did that.
You're right that using boost::array doesn't offer any string facilities. Perhaps what we need is a string library that uses all namespace scope functions and type generators to generalize the notion of a string (this is not unlike Thorsten's CollectionTraits library). Then, a boost::array<char,N>, a std::string, even a C string can be treated generically as a string. With such a facility, boost::array would work just fine for the file signature example and char_string would be relieved from needing to handle the "no terminator" case.
Doesn't the string algorithms library do just that?
I don't know, does it?
[2] How do you deal with wide-character strings? My update generalizes to support char and wchar_t based buffers, but with a virtual base class, you are limited to char buffers.
That's true only if the character type is encoded in the base class. Why would it be?
You need to know the character type you are operating on in the base class. This is the basic idea that John Nagle's approach takes:
class char_string_base { public: virtual std::size_t length() const; virtual const char * c_str() const; virtual void copy( const char * ); };
It depends upon the purpose of the base class. I was thinking more along the lines of a hook permitting them to be stored heterogeneously, but a little while after I sent my last reply, it occurred to me that there would be no good way to regain the derived type to do anything with the object. Thus, John's approach is the only one that would be useful.
But I agree with John's comments that it may be necessary to have a wchar_string_base to support wide characters.
Definitely.
[3] One of the reasons for having a virtual class is to supply custom string operations, e.g. using Windows-specific string functions instead of the standard library ones. This can also be solved with a policy template like that found in basic_string. My current version uses this approach, improving interoperability with basic_strings.
You can certainly design an ABC with many pure virtual functions that the derived types implement, but that was never the intent of the base class idea.
Actually, it was his idea.
Your policy approach will permit a lot of custimization, but perhaps the better approach is to externalize all operations.
Can you expand on what you mean by externalize.
Just the idea that you would use size(s), append(s, t), etc. rather than mfs. That approach means that s and t can be of many different types unified under a single syntactic umbrella. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

This isn't something you need to customize heavily, because it's intended as a replacement for something old and well-defined. We're not likely to see any new operations in the C string strcat/strcpy/etc. family. Is the code in the Boost sandbox yet? John Nagle Reece Dunn wrote:
Rob Stewart wrote:
From: "Reece Dunn" <msclrhd@hotmail.com>
Rob Stewart wrote:
From: John Nagle > Reece Dunn wrote:
Your policy approach will permit a lot of custimization, but perhaps the better approach is to externalize all operations.
Can you expand on what you mean by externalize.
Regards, Reece
participants (3)
-
John Nagle
-
Reece Dunn
-
Rob Stewart