Re: [boost] Re: static sized strings

Rob Stewart wrote:
From: "Reece Dunn" <msclrhd@hotmail.com>
Rob Stewart wrote:
From: "Reece Dunn" <msclrhd@hotmail.com>
John Nagle wrote: [snip] I think you misunderstand what I am trying to say. See my comments below.
Actually, I think you misunderstood my points.
ok.
[1] You need to specify the size of the buffer. You cannot declare an object of type fixed_string_base (as it is abstract), you can only have references and pointers to it so you can operate on variable-capacity strings.
Ah, but you missed that I was declaring a fixed_string_base, where string capacity is not known. I was relying on make_fixed_string() to deduce the array length and create an appropriate fixed_string for it.
I didn't miss it, just misunderstood what you intended.
[2] This is the point that we are discussing. Should this line be:
fixed_string< LENGTH > t;
as will work with the current sandbox implementation (my suggestion #2), or should it be:
fixed_string< LENGTH + 1 > t;
And that's exactly my point. It should be the former as shown by my equivalencies.
Different programmers favour the different semantics, so I ask: why not parameterise it, providing a default behaviour.
I'm not so sure they do and I'm not sure you need to allow for it. If you do, I suspect you'll encourage off-by-one mistakes. Granted, those mistakes won't result in overruns, but they will be annoying. Furthermore that will result in different fixed_string types; will you provide for conversions among them? (That is, for example, from a fixed_string that adds one to the capacity to one that doesn't?)
It is the former in the implementation I have, but if you have char data[ 100 ]; // manipulate and use the data buffer in this example, the buffer is 100 characters, but only 99 are writable, with the 100th being a null character (assuming this is a C-style string). Using the current implementation, this would lead to fixed_string< 100 > data; having an extra character. This would transfer to the other model.
(I know there is no make_fixed_string() -- yet -- but such a facility would be appropriate.)
Er... the fixed_string constructors! When you declare the object, you should know the size of the buffer you are using, e.g.:
fixed_string< 100 > data;
The problem is that you don't know the length of the string. That is, with this code:
char s[] = "1234";
s is a fixed size array of char with five elements. The programmer didn't have to write:
char s[5] = "1234";
Instead, the compiler figured out the length.
That's what I'm trying to provide an equivalence for:
fixed_string_base & s = make_fixed_string("1234");
With that approach, the client doesn't need to precompute the length of the literal and can rely on the compiler (and template magic) to deduce it and use it to create a fixed_string<5>.
I've shown returning by value (with the temporary bound to the reference s), but it could return a smart pointer just as well.
Your original example didn't have fixed_string_base as a reference. It should be easy to implement a make_fixed_string function.
I think this clearer reveals that fixed_string's size parameter ^^^^^^^ That should have been "clearly."
should specify the number of characters. Remember, one can use boost::array to manage a fixed size, non-string buffer. If buffer overrun protection is insufficient in boost::array, that should be fixed (or a new class should be added to Boost). Thus, fixed_string can ignore that usage.
I am not disputing this. The question is, is the null terminator counted as a character. For the example you show, it is, but for buffers (see the example above) it isn't. I haven't got a conversion between the two models because I am only implementing the one model at the moment, but conversion would be automatic because the functions that take strings as arguments take fixed_string_base & arguments, so they should be usable regardless, consider: boost::fixed_string< 20 > str1( "This is a long string!" ); boost::fixed_string< 10 > str2; str2.assign( str1 ); This will work (cutting the string short in str2) even though the strings are of different capacities. Regards, Reece _________________________________________________________________ Express yourself with cool emoticons - download MSN Messenger today! http://www.msn.co.uk/messenger

From: "Reece Dunn" <msclrhd@hotmail.com>
Rob Stewart wrote:
From: "Reece Dunn" <msclrhd@hotmail.com>
Rob Stewart wrote:
From: "Reece Dunn" <msclrhd@hotmail.com>
John Nagle wrote:
char data[ 100 ]; // manipulate and use the data buffer in this example, the buffer is 100 characters, but only 99 are writable, with the 100th being a null character (assuming this is a C-style string).
Using the current implementation, this would lead to fixed_string< 100 > data; having an extra character.
That's the usage I suggested should be relegated to boost::array: it is a data buffer, not a string.
The problem is that you don't know the length of the string. That is, with this code:
char s[] = "1234";
s is a fixed size array of char with five elements. The programmer didn't have to write:
char s[5] = "1234";
Instead, the compiler figured out the length.
That's what I'm trying to provide an equivalence for:
fixed_string_base & s = make_fixed_string("1234");
With that approach, the client doesn't need to precompute the length of the literal and can rely on the compiler (and template magic) to deduce it and use it to create a fixed_string<5>.
I've shown returning by value (with the temporary bound to the reference s), but it could return a smart pointer just as well.
Your original example didn't have fixed_string_base as a reference. It should be easy to implement a make_fixed_string function.
I'm not surprised. I wasn't concentrating on real semantics so much as suggestive ones. Note, however, that one has to prevent copying among fixed_string_bases to avoid slicing.
I think this clearer reveals that fixed_string's size parameter ^^^^^^^ That should have been "clearly."
should specify the number of characters. Remember, one can use boost::array to manage a fixed size, non-string buffer. If buffer overrun protection is insufficient in boost::array, that should be fixed (or a new class should be added to Boost). Thus, fixed_string can ignore that usage.
I am not disputing this. The question is, is the null terminator counted as a character. For the example you show, it is, but for buffers (see the example above) it isn't.
Right. I'm emphatically voting for declaring the type with the length of the string; the class adds space for the terminator.
I haven't got a conversion between the two models because I am only implementing the one model at the moment, but conversion would be automatic because the functions that take strings as arguments take fixed_string_base & arguments, so they should be usable regardless, consider:
Makes sense.
boost::fixed_string< 20 > str1( "This is a long string!" ); boost::fixed_string< 10 > str2; str2.assign( str1 );
This will work (cutting the string short in str2) even though the strings are of different capacities.
Good, that's exactly what should happen. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

I've done some cleanup on my version of fixed_string at http://www.animats.com/source This now conforms to Dunn's decision on length, i.e. "fixed_string<char,4> s" now has space for four characters plus a trailing null. I decided that "operator[]" had to let users reach the trailing null without a subscript error, so that the classic C idiom of a loop stopped by the trailing null for (i=0; s[i]; i++) would still work. Because "operator[]" can't tell a read from a write, this offers the possibility of overstoring the trailing null. But c_str() will always reset the trailing null if necessary. The functions of fixed_string are now all accessable via fixed_string_base, as virtual functions, which was an ommission in the previous version. "insert", "erase", and the various substring functions still aren't in. Soon. We're approaching convergence on this. John Nagle Animats

From: John Nagle <nagle@animats.com>
I've done some cleanup on my version of fixed_string at
This now conforms to Dunn's decision on length, i.e.
"fixed_string<char,4> s" now has space for four characters plus a trailing null.
Good.
I decided that "operator[]" had to let users reach the trailing null without a subscript error, so that the classic C idiom of a loop stopped by the trailing null
for (i=0; s[i]; i++)
Good idea.
would still work. Because "operator[]" can't tell a read from a write, this offers the possibility of overstoring the
Sure it can: just return a proxy. Assignment through the proxy is a write and can be checked against exceeding N. Conversion of the proxy to the character type is a read and can permit accessing element N + 1.
trailing null. But c_str() will always reset the trailing null if necessary.
Will you still need to reset the trailing null if using the proxy?
The functions of fixed_string are now all accessable via fixed_string_base, as virtual functions, which was an ommission in the previous version.
Good. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Rob Stewart wrote:
From: John Nagle <nagle@animats.com>
I've done some cleanup on my version of fixed_string at
This now conforms to Dunn's decision on length, i.e.
"fixed_string<char,4> s" now has space for four characters plus a trailing null.
Good.
I decided that "operator[]" had to let users reach the trailing null without a subscript error, so that the classic C idiom of a loop stopped by the trailing null
for (i=0; s[i]; i++)
Good idea.
would still work. Because "operator[]" can't tell a read from a write, this offers the possibility of overstoring the
Sure it can: just return a proxy. Assignment through the proxy is a write and can be checked against exceeding N. Conversion of the proxy to the character type is a read and can permit accessing element N + 1.
Returning a proxy object adds complexity to a very low level operation, which may not be a good idea. I'm struggling to keep this rather simple object as simple as possible. There's also the problem that the proxy object would have to be on the stack. Consider char_string<16> s("0123456789ABCDEF"); char& hexchar(int n) { return(s[n]); } Either we pass back a proxy object by value, which is expensive, or we pass a reference to a local object, which is wrong. Or we just pass a char, which is what people expect. I don't want to get too clever here.
Will you still need to reset the trailing null if using the proxy?
Some trailing null maintenance is necessary, because you can mix C++ and C operations. For example; char_string<72> s = "Hello"; s += '.' // add period printf("%s\n",s.c_str()); Currently, every use of "operator[]" invalidates the length. John Nagle Team Overbot

From: John Nagle <nagle@animats.com>
Rob Stewart wrote:
From: John Nagle <nagle@animats.com>
would still work. Because "operator[]" can't tell a read from a write, this offers the possibility of overstoring the
Sure it can: just return a proxy. Assignment through the proxy is a write and can be checked against exceeding N. Conversion of the proxy to the character type is a read and can permit accessing element N + 1.
Returning a proxy object adds complexity to a very low level operation, which may not be a good idea. I'm struggling to keep this rather simple object as simple as possible.
The complexity is in the implementation, not the use. There also doesn't have to be much, if any, overhead.
There's also the problem that the proxy object would have to be on the stack. Consider
char_string<16> s("0123456789ABCDEF");
char& hexchar(int n) { return(s[n]); }
Either we pass back a proxy object by value, which is expensive, or we pass a reference to a local object, which is wrong. Or we just pass a char, which is what people expect. I don't want to get too clever here.
Do we really want to allow for such things? I'm not sure a function like hexchar() is a good idea in any application. That's not to say that it doesn't occur, but should we preclude efficiency or safety to permit such code?
Will you still need to reset the trailing null if using the proxy?
Some trailing null maintenance is necessary, because you can mix C++ and C operations. For example;
char_string<72> s = "Hello"; s += '.' // add period printf("%s\n",s.c_str());
Currently, every use of "operator[]" invalidates the length.
I'm apparently missing something. In the above example, construction will ensure that no more than 72 characters are copied to the internal buffer and that there is a trailing null. In the += operator, you'll check to see whether there's room for another character (within the allotted 72), and will add the character plus null. Finally, calling c_str() returns const access to the internal buffer. Calling operator [] is ranged checked, so you can't exceed 72 or 73. With the proxy approach, you can ensure that no write is permitted if the index is 73, so there is no need to reset the null. In summary, any mutating operation, except operator [], writes the null, so a proxy returning operator [] only needs to prevent writes to character N + 1. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;
participants (3)
-
John Nagle
-
Reece Dunn
-
Rob Stewart