[Review] Fixed Strings review period extended

Hartmut Kaiser

29 Jan 2006 29 Jan '06

11:30 p.m.

Hi all, normally, the review period of the Fixed Strings library written by Reece Dunn ended today. All we got so far are 3 reviews, which is too less to make a proper decision. I really would like to encourage you to submit a review for this library to get a representative result. For this reason I'ld like to extend the review period for another week until February 5th to allow for more reviews. I'm convinced, that a Fixed Strings library would make a good addition to Boost and in the end it should be useful to a lot more of people than to the 3 of us who send in a review. Regards Hartmut Review Manager

Show replies by date

Andy Little

30 Jan 30 Jan

2:55 a.m.

"Hartmut Kaiser" wrote

...

Hi all,

normally, the review period of the Fixed Strings library written by Reece Dunn ended today. All we got so far are 3 reviews, which is too less to make a proper decision. I really would like to encourage you to submit a review for this library to get a representative result. For this reason I'ld like to extend the review period for another week until February 5th to allow for more reviews. I'm convinced, that a Fixed Strings library would make a good addition to Boost and in the end it should be useful to a lot more of people than to the 3 of us who send in a review.

A Fixed Strings library but this one? . What interests me is that the fixed_string library is way ahead as the top download in the Vault download list, but few reviewers have resulted. I feel the mystery can be explained by the reviews so far. looking through them we have all said pretty much the same thing. Surprise that there is no overflow policy. Concern at the lack of detail in the documentation and the way the documentation and the class itself are laid out. IOW as currently implemented fixed_string isnt delivering what it seems to promise. I hope Reece takes this in his stride by the way . I would guess that has to be the reason why so many downloads have resulted in so little response. It must be worth analysing that lack of response as a first step in revisiting the design. There must be some of the 421 downloaders of fixed_string that can shed some light on this? regards Andy Little

Hartmut Kaiser

3:18 a.m.

Andy Little wrote:

...

...
normally, the review period of the Fixed Strings library written by Reece Dunn ended today. All we got so far are 3 reviews, which is too less to make a proper decision. I really would like to encourage you to submit a review for this library to get a representative result. For this reason I'ld like to extend the review period for another week until February 5th to allow for more reviews. I'm convinced, that a Fixed Strings library would make a good addition to Boost and in the end it should be useful to a lot more of people than to the 3 of us who send in a review.

A Fixed Strings library but this one? .

If you read my mail carefully - I never said 'this one'.

...

What interests me is that the fixed_string library is way ahead as the top download in the Vault download list, but few reviewers have resulted. I feel the mystery can be explained by the reviews so far. looking through them we have all said pretty much the same thing. Surprise that there is no overflow policy. Concern at the lack of detail in the documentation and the way the documentation and the class itself are laid out. IOW as currently implemented fixed_string isnt delivering what it seems to promise. I hope Reece takes this in his stride by the way . I would guess that has to be the reason why so many downloads have resulted in so little response. It must be worth analysing that lack of response as a first step in revisiting the design.

As you might have expected, I read through the reviews _very_ carefully. There are two reasons, why I proposed to extend this review. 1) The library was downloaded by more than 400 interested people and I don't expect the 3 reviews to represent all of them. 2) Even if the currently reviewed library won't be accepted to Boost mainly for the reasons stated by the three reviews, I think we need a Fixed Strings library in Boost and the more dicussion we get the better for a future library. Regards Hartmut

John Maddock

11:03 a.m.

...

As you might have expected, I read through the reviews _very_ carefully.

There are two reasons, why I proposed to extend this review. 1) The library was downloaded by more than 400 interested people and I don't expect the 3 reviews to represent all of them. 2) Even if the currently reviewed library won't be accepted to Boost mainly for the reasons stated by the three reviews, I think we need a Fixed Strings library in Boost and the more dicussion we get the better for a future library.

As one as the silent downloaders, here's my non-review: 1) The introduction and rationale to the docs are very encouraging, they lead me to think that there's a good little library here, but... 2) Once you get past the introduction things go downhill (sorry Reece). Really the docs have all the problems that lead me to dislike Doxygen docs in general: too much detail in all the places you don't need it (implementation details) and too little detail where required: for example the fixed_string class has next to no docs - just a list of constructors and assign member functions. Where are all the other basic_string interfaces? The introduction seems to suggets that they should be there, but there's nothing in the docs. Could be that they're present in a base class, but as a potential user *I don't want to know that*, I just want "give me the interface, and give it to me fast". Doc's asside let me throw in a couple of design questions, these are all open issues BTW and just me thinking out loud really: 1) Should the fixed_string be an aggregate? Currently we have Boost.Array that is an aggregate, but this design is not? If it were fixed_strings could be statically initialised: fixed_string<char, 10> s = { "abc" }; Potentially this leads to problems with the string length not being initialised (or set wrongly), but I believe there may be solutions to that (use the value zero as a "not set" value, and lazy initialise the length on demand). 2) Do we need all the these policies? There seems to be a lot of detail about various policies (formatting for example) that I would have expected to "just work", while the one policy that most people seem to want (overflow control) is missing). Those were this issues that stood out to me, but with apologies to Reece, I should stress that I haven't been able to give this the time it deserves. Regards, John.

Pavel Antokolsky aka Zigmar

7:36 a.m.

On 1/30/06, Andy Little <andy@servocomm.freeserve.co.uk> wrote:

...

"Hartmut Kaiser" wrote

...
Hi all,

normally, the review period of the Fixed Strings library written by Reece Dunn ended today. All we got so far are 3 reviews, which is too less to make a proper decision. I really would like to encourage you to submit a review for this library to get a representative result. For this reason I'ld like to extend the review period for another week until February 5th to allow for more reviews. I'm convinced, that a Fixed Strings library would make a good addition to Boost and in the end it should be useful to a lot more of people than to the 3 of us who send in a review.

A Fixed Strings library but this one? . What interests me is that the fixed_string library is way ahead as the top download in the Vault download list, but few reviewers have resulted. I feel the mystery can be explained by the reviews so far. looking through them we have all said pretty much the same thing.

For me personally, the reason I didn't submit any review, that I'm not completely understand the purpose of the library. The documentation (as few mentioned) lacks motivation and general description. So how can I review the library, if I can't understand what the library is trying to solve and how. I can't say for others, but for it is the reason. I suggest that even if documentation can't be updated during formal review, that the author will give a little more extended explanation here, at mail list. -- Best regards, Zigmar

Reece Dunn

9:02 a.m.

Pavel Antokolsky aka Zigmar wrote:

...

For me personally, the reason I didn't submit any review, that I'm not completely understand the purpose of the library. The documentation (as few mentioned) lacks motivation and general description. So how can I review the library, if I can't understand what the library is trying to solve and how. I can't say for others, but for it is the reason. I suggest that even if documentation can't be updated during formal review, that the author will give a little more extended explanation here, at mail list.

The main aim of the library was presented in the "The Problem"/"The Solution" sections (the first ones) of the documentation. Normally, in C (or even some C++ code), you have constructs that look like this: char buffer[ 15 ]; sprintf( buffer, "Some %s text", "verly long" ); The problem is that the above would cause a buffer overrun, which is the most common cause of denial of service attacks and other security holes in major applications. The variant that allows you to specify the size of the string buffer is better, but not prefect. Consider the following: wchar_t buffer[ 5 ]; wcsncpy( buffer, sizeof(buffer), L"12345678" ); At first glance, this code looks safe, but will also cause a buffer overrun. The fixed_string class is designed to solve this problem. The above examples would be: fixed_string< 15, char > buffer; sprintf( buffer, "Some %s text", "verly long" ); and: fixed_string< 5, wchar_t > buffer; wcscpy( buffer, L"12345678" ); If the introduction does not make this clear, I have written the documentation wrong. *This* is why feedback is useful - to know what people don't understand about the documentation, so I can address those issues for the next review. Otherwise, the documentation will most likely still suffer from the same problems. Also, the C and C++ string implementations do not prevent you passing in null strings. Thus, in the msvc 7.1 implementation, the following will cause a run-time crash: std::string foo( 0 ); std::ostringstream ss; ss << "Crash! Boom! Bang!" << static_cast< const char * >( 0 ) << std::endl; - Reece

David Abrahams

9:51 a.m.

"Reece Dunn" <msclrhd@hotmail.com> writes:

...

Pavel Antokolsky aka Zigmar wrote:

...
For me personally, the reason I didn't submit any review, that I'm not completely understand the purpose of the library. The documentation (as few mentioned) lacks motivation and general description. So how can I review the library, if I can't understand what the library is trying to solve and how. I can't say for others, but for it is the reason. I suggest that even if documentation can't be updated during formal review, that the author will give a little more extended explanation here, at mail list.

The main aim of the library was presented in the "The Problem"/"The Solution" sections (the first ones) of the documentation.

Normally, in C (or even some C++ code), you have constructs that look like this:

char buffer[ 15 ]; sprintf( buffer, "Some %s text", "verly long" );

The problem is that the above would cause a buffer overrun, which is the most common cause of denial of service attacks and other security holes in major applications. The variant that allows you to specify the size of the string buffer is better, but not prefect. Consider the following:

wchar_t buffer[ 5 ]; wcsncpy( buffer, sizeof(buffer), L"12345678" );

At first glance, this code looks safe, but will also cause a buffer overrun.

The fixed_string class is designed to solve this problem. The above examples would be:

fixed_string< 15, char > buffer; sprintf( buffer, "Some %s text", "verly long" );

and:

fixed_string< 5, wchar_t > buffer; wcscpy( buffer, L"12345678" );

The reason I didn't submit a review was that I didn't have time. But more generally, I don't think fixed strings are the right tool for this job 99% of the time. A variable-sized string with small string optimization will serve nicely, and if used properly, will virtually eliminate the problem of buffer overruns. I don't think we should be encouraging people to patch their way around unsafe code in a way that's still unsafe. It's still unsafe because the code's original author expects the code to work, not for it to throw an exception or truncate the input or whatever the proposed library does. -- Dave Abrahams Boost Consulting www.boost-consulting.com

Pavel Antokolsky aka Zigmar

9:52 a.m.

Hi, Reece On 1/30/06, Reece Dunn <msclrhd@hotmail.com> wrote:

...

...
For me personally, the reason I didn't submit any review, that I'm not completely understand the purpose of the library. The documentation (as few mentioned) lacks motivation and general description. So how can I review the library, if I can't understand what the library is trying to solve and how. I can't say for others, but for it is the reason. I suggest that even if documentation can't be updated during formal review, that the author will give a little more extended explanation here, at mail list. [...] If the introduction does not make this clear, I have written the documentation wrong. *This* is why feedback is useful - to know what people don't understand about the documentation, so I can address those issues for

Pavel Antokolsky aka Zigmar wrote: the next review. Otherwise, the documentation will most likely still suffer from the same problems. Unfortunatly, it is not clear. Now, I'm beginning to see the point. Originally, after the reading introcution, especially first sentence ( "There is a lot of C (and even C++) code, that uses character buffers and the C-style string functions for various reasons: updating legacy C code, wanting to keep executable size down without linking to a large dynamic library or wanting to tune performance by keeping the strings on the stack." ) I've got the idea, that the main purpose it to create _fast_ stack-based strings with C++ interface (something like boost::array), and buffer overrun protection are just nice feature of such strings. As far as I understand now from you description, previews reviews and my look at the code, perfomance is not a main goal at all, and the buffer protection is the target. But then, why string should be "fixed" (and, AFAIU, it is not exactly fixed)? Isn't it just, say, safe_string? I'm not criticizing, just trying to understand the rationale... -- Best regards, Zigmar

Jim Douglas

9:54 a.m.

Reece Dunn wrote:

...

Normally, in C (or even some C++ code), you have constructs that look like this:

char buffer[ 15 ]; sprintf( buffer, "Some %s text", "verly long" );

No self-respecting coding standard would allow you to write the code above. Rather it would insist that at least you wrote: snprintf( buffer, 15, "Some %s text", "verly long" ); Does that not solve the problem of overruns? Jim

John Maddock

11:06 a.m.

...

No self-respecting coding standard would allow you to write the code above. Rather it would insist that at least you wrote:

snprintf( buffer, 15, "Some %s text", "verly long" );

Does that not solve the problem of overruns?

Um, which portable standard defines snprintf ? :-> John.

Markus Schöpflin

11:27 a.m.

John Maddock wrote:

...

...
No self-respecting coding standard would allow you to write the code above. Rather it would insist that at least you wrote:

snprintf( buffer, 15, "Some %s text", "verly long" );

Does that not solve the problem of overruns?

Um, which portable standard defines snprintf ? :->

IEEE Std 1003.1, 2004 Edition Markus

John Maddock

6:03 p.m.

...

IEEE Std 1003.1, 2004 Edition

Doh! And it's in C99 as well. Oh well I'll just shut up now... John.

Reece Dunn

11:07 a.m.

Jim Douglas wrote:

...

...
Normally, in C (or even some C++ code), you have constructs that look

Reece Dunn wrote: like

...
this:

char buffer[ 15 ]; sprintf( buffer, "Some %s text", "verly long" );

No self-respecting coding standard would allow you to write the code above. Rather it would insist that at least you wrote:

snprintf( buffer, 15, "Some %s text", "verly long" );

Does that not solve the problem of overruns?

Coinsider: wsnprintf( buffer, sizeof(buffer), L"Some %s text", L"verly long" ); as the second example demonstrated. You are using the safe version of the string API, but passing in an incorrect size due to an incorrect sizeof() calculation. Where the above should be: wsnprintf( buffer, sizeof(buffer)/sizeof(buffer[0]), L"Some %s text", L"verly long" ); - Reece

Reece Dunn

8:44 a.m.

Andy Little wrote:

...

A Fixed Strings library but this one? . What interests me is that the fixed_string library is way ahead as the top download in the Vault download list, but few reviewers have resulted. I feel the mystery can be explained by the reviews so far. looking through them we have all said pretty much the same thing. Surprise that there is no overflow policy.

I intend to fix this for the next review :).

...

Concern at the lack of detail in the documentation and the way the documentation and the class itself are laid out.

I hope to improve the documentation so that users can follow it better.

...

IOW as currently implemented fixed_string isnt delivering what it seems to promise. I hope Reece takes this in his stride by the way . I would guess that has to be the reason why so many downloads have resulted in so little response. It must be worth analysing that lack of response as a first step in revisiting the design.

I can only fix issues that are being raised. That said, I intend to move the basic_string_impl class out of the detail namespace as this is more useful in other contexts. That class is based on the flex_string class written by Andrei Alexandrescu. However, this class - like the basic_string class - is *way* too large. I have experimented with splitting this class into several smaller ones based on the grouping in the standard documentation (iterators, capacity, etc.) so you can combine them. This way, it will be like the iterator classes. These have had some limited success and I intend on working on this after fixing the issues raised in the review. With this change, I intend to move the fixed_string class into boost/string so that other string classes like const_string can be placed there as well. - Reece

Andy Little

10:26 a.m.

"Reece Dunn" wrote

...

Andy Little wrote:

...
A Fixed Strings library but this one? . What interests me is that the fixed_string library is way ahead as the top download in the Vault download list, but few reviewers have resulted. I feel the mystery can be explained by the reviews so far. looking through them we have all said pretty much the same thing. Surprise that there is no overflow policy.

I intend to fix this for the next review :).

The other surprise for me is that its possible to modify the capacity of a fixed_string. That means the class name fixed_string is misleading. Unless fixed means 'not broken' string implying that std::string is flawed. Its that sense in which it doesnt promise what it delivers. I would expect a fixed_string to have a fixed length. This fits best with the rationale of fixed_string to be a char array replacement. I suspect that is what potential users expected, but nothing like what they got so far. The solution now is to give users what they expected in the first place.

...

...
Concern at the lack of detail in the documentation and the way the documentation and the class itself are laid out.

I hope to improve the documentation so that users can follow it better.

Ok but there is also a lack of information. Class function signatures are documented but there is no information on what they do. As a potential user I get no sense that you as the author are keen about the class and are interested in trying to help me to get to grips with it. It looks like the documentation was sketched in and then just never worked on after that. The good thing though is that you have a potential user-base of 400 to get this right for next time. It would be worth going through the documentation of other (boost) libraries that you like to see how the documentation is laid out. There are also some examples of poor documentation in boost, so use those as examples of what to avoid.

...

I can only fix issues that are being raised.

Whatever... You need to figure why no issues are being raised. Answer I think is that, because there is little information provided , then it is difficult to raise any points about it by telepathy so to speak.

...

That said, I intend to move the basic_string_impl class out of the detail namespace as this is more useful in other contexts. That class is based on the flex_string class written by Andrei Alexandrescu. However, this class - like the basic_string class - is *way* too large.

I have experimented with splitting this class into several smaller ones based on the grouping in the standard documentation (iterators, capacity, etc.) so you can combine them. This way, it will be like the iterator classes. These have had some limited success and I intend on working on this after fixing the issues raised in the review.

With this change, I intend to move the fixed_string class into boost/string so that other string classes like const_string can be placed there as well.

OK. Bear in mind that you have 400 potential users who wanted a fixed_string. Its probably worth trying to find out what they thought a 'fixed_string' is and how it differs from what you provided. OTOH It sounds now like you have opted to design something completely different again, which seems a bit of a shame as you will lose many of the potential fixed_string users. regards Andy Little

Reece Dunn

11:56 a.m.

Andy Little wrote:

...

The other surprise for me is that its possible to modify the capacity of a fixed_string.

Are you sure? You can write algorithms that don't rely on a specific capacity string. Thus, instead of: template< int n > int strlen( const fixed_string< n, char >& str ) { return str.length(); } you can write: int strlen( const fixed_string_base< char >& str ) { return str.length(); }

...

That means the class name fixed_string is misleading. Unless fixed means 'not broken' string implying that std::string is flawed.

The std::string interface has several flaws, such as it having too many methods (see Herb Sutter's comments on this topic) and string types that have different CharTraits not being compatible with each other. However, fixed_string does not attempt to fix these. It is short for "fixed capacity string".

...

Its that sense in which it doesnt promise what it delivers. I would expect a fixed_string to have a fixed length.

It has a fixed capacity, but variable length.

...

This fits best with the rationale of fixed_string to be a char array replacement. I suspect that is what potential users expected, but nothing like what they got so far. The solution now is to give users what they expected in the first place.

The above strlen example (and the rationale behind it) was that some users (during the initial development) wanted to be able to write operations that used fixed-capacity string such that they could redistribute the binaries only. Thus the use of the fixed_string_base. An earlier reviewer has shown me how to split these so you get the performance from using fixed_string and the ability to operate on any fixed_string< n > object without templatizing the method on n.

...

...
...
Concern at the lack of detail in the documentation and the way the documentation and the class itself are laid out.

I hope to improve the documentation so that users can follow it better.

Ok but there is also a lack of information. Class function signatures are documented but there is no information on what they do. As a potential user I get no sense that you as the author are keen about the class and are interested in trying to help me to get to grips with it. It looks like the documentation was sketched in and then just never worked on after that. The good thing though is that you have a potential user-base of 400 to get this right for next time.

Noted.

...

It would be worth going through the documentation of other (boost) libraries that you like to see how the documentation is laid out. There are also some examples of poor documentation in boost, so use those as examples of what to avoid.

The Spirit docs are very good :).

...

...
I can only fix issues that are being raised.

Whatever... You need to figure why no issues are being raised. Answer I think is that, because there is little information provided , then it is difficult to raise any points about it by telepathy so to speak.

You are thinking of a circle ;). The reviews that there have been have raised most of these points, so I will address these for the next review.

...

...
That said, I intend to move the basic_string_impl class out of the detail namespace as this is more useful in other contexts. That class is based on the flex_string class written by Andrei Alexandrescu. However, this class - like the basic_string class - is *way* too large.

I have experimented with splitting this class into several smaller ones based on the grouping in the standard documentation (iterators, capacity, etc.) so you can combine them. This way, it will be like the iterator classes. These have had some limited success and I intend on working on this after fixing the issues raised in the review.

With this change, I intend to move the fixed_string class into boost/string so that other string classes like const_string can be placed there as well.

OK. Bear in mind that you have 400 potential users who wanted a fixed_string. Its probably worth trying to find out what they thought a 'fixed_string' is and how it differs from what you provided. OTOH It sounds now like you have opted to design something completely different again, which seems a bit of a shame as you will lose many of the potential fixed_string users.

I intend to keep the behaviour of the fixed_string class (modified, according to the review feedback). This is what the unit tests are for :). It is just *how* that is implemented that will change. - Reece

Andy Little

12:44 p.m.

"Reece Dunn" <msclrhd@hotmail.com> wrote

...

Andy Little wrote:

...
The other surprise for me is that its possible to modify the capacity of a fixed_string.

Are you sure? You can write algorithms that don't rely on a specific capacity string.

OK I was under the impression you could (modify the capacity). Spell it out for me in the documentation in this case.

...

Thus, instead of:

template< int n > int strlen( const fixed_string< n, char >& str ) { return str.length(); }

you can write:

int strlen( const fixed_string_base< char >& str ) { return str.length(); }

Its possible to do this for fixed_string directly too. #include <boost/fixed_string/fixed_string.hpp> #include <boost/mpl/bool.hpp> #include <boost/utility/enable_if.hpp> #include <string> namespace boost{ template <typename T> struct is_fixed_string : mpl::false_{}; template <int N, typename C, typename P1, typename P2> struct is_fixed_string<fixed_string<N,C,P1,P2> > : mpl::true_{}; template <typename T> typename boost::enable_if< is_fixed_string<T>, int >::type strlen( const T& str ) { return str.length(); } } int main() { // check doesnt interfere here const char arr[]= "hello"; std::cout << strlen(arr) <<'\n'; boost::fixed_string<20> str = "hello"; std::cout << strlen(str); int n =9; // std::cout << strlen(n); //error because not a fixed_string or char array }

...

...
That means the class name fixed_string is misleading. Unless fixed means 'not broken' string implying that std::string is flawed.

The std::string interface has several flaws, such as it having too many methods (see Herb Sutter's comments on this topic) and string types that have different CharTraits not being compatible with each other. However, fixed_string does not attempt to fix these. It is short for "fixed capacity string".

...
Its that sense in which it doesnt promise what it delivers. I would expect a fixed_string to have a fixed length.

It has a fixed capacity, but variable length.

OK. [...]

...

strlen example (and the rationale behind it) was that some users (during the initial development) wanted to be able to write operations that used fixed-capacity string such that they could redistribute the binaries only. Thus the use of the fixed_string_base. An earlier reviewer has shown me how to split these so you get the performance from using fixed_string and the ability to operate on any fixed_string< n > object without templatizing the method on n.

OK . [...]

...

The Spirit docs are very good :).

[...] OK thats a good model . fixed_string should be easier to describe than Spirit too.

...

I intend to keep the behaviour of the fixed_string class (modified, according to the review feedback). This is what the unit tests are for :). It is just *how* that is implemented that will change.

What about the (lack of) overrun policy? regards Andy Little

Reece Dunn

5:44 p.m.

Andy Little wrote:

...

"Reece Dunn" <msclrhd@hotmail.com> wrote

...
Are you sure? You can write algorithms that don't rely on a specific capacity string.

OK I was under the impression you could (modify the capacity). Spell it out for me in the documentation in this case.

Noted.

...

...
you can write:

int strlen( const fixed_string_base< char >& str ) { return str.length(); }

Its possible to do this for fixed_string directly too.

template <typename T> typename boost::enable_if< is_fixed_string<T>, int >::type strlen( const T& str ) { return str.length(); }

But the solution is still using templates. You don't need to have a template to use std::[w]string. Thus, if you wanted to move: __declspec(dllexport) void some_function( const std::string & str ); to use fixed_string with the implementation you mention, you would need to expose the implementation of some_function to your customers and thus cannot use it as part of a library (.lib) or DLL.

...

...
The Spirit docs are very good :).

OK thats a good model . fixed_string should be easier to describe than Spirit too.

...

...
I intend to keep the behaviour of the fixed_string class (modified, according to the review feedback). This is what the unit tests are for :). It is just *how* that is implemented that will change.

What about the (lack of) overrun policy?

Coming soon to a fixed string library near you :). The current unit tests will use the default overrun policy, which will be to truncate the string if overrun occurs. - Reece

Andy Little

8 p.m.

"Reece Dunn" wrote

...

But the solution is still using templates. You don't need to have a template to use std::[w]string. Thus, if you wanted to move:

__declspec(dllexport) void some_function( const std::string & str );

to use fixed_string with the implementation you mention, you would need to expose the implementation of some_function to your customers and thus cannot use it as part of a library (.lib) or DLL.

The limitations of templates are widely known. The advantages are too. If the advantages dont appear in this case then there is little point in using templates. If it needs to be implementable as part of a DLL then it obviously cant practically have compile time fixed capacity, but that is the trade off you have to make ( OTOH maybe you could instantiate a few useful sizes, so maybe its not impossible). It strikes me that the design radically changed midstream and you now have a strange hybrid. This is the source of a lot of confusion with the result that potential users throw up their hands and move on. Are you designing a string with a compile time fixed size or are you dessigning a string with a construction time fixed capacity, if so then why not a runtime variable capacity? What requirements must the class satisfy? What are the limitations imposed by these requirements? What then are the advantages over std::string etc, etc. regards Andy Little

Andy Little

5:53 p.m.

"Andy Little" wrote

...

"Reece Dunn" <msclrhd@hotmail.com> wrote

[...]

...

...
you can write:

int strlen( const fixed_string_base< char >& str ) { return str.length(); }

Its possible to do this for fixed_string directly too.

[cut example coide] Sorry.. that example was completely unnecessary. I dont see whats wrong with just writing template< int n > int strlen( const fixed_string< n, char >& str ) { return str.length(); } Whats wrong with it? regards Andy Little

Reece Dunn

8:11 p.m.

Andy Little wrote:

...

Sorry.. that example was completely unnecessary.

I dont see whats wrong with just writing

template< int n > int strlen( const fixed_string< n, char >& str ) { return str.length(); }

Whats wrong with it?

Nothing... in that case. Now, what happened if you have: template< int n > void make_ip_address( unsigned long ip, fixed_string< n, char > & ) but you wanted to put this function in a library or DLL either to hide a proprietry implementation or because the implementation of this is too long to sensibly be a template (or even both). I was just giving a simple example of functionality that some people want. I wasn't saying that was the best way to implement strlen. - Reece

Andy Little

9:50 p.m.

"Reece Dunn" wrote Andy Little wrote

...

...
Whats wrong with it?

Nothing... in that case. Now, what happened if you have:

template< int n > void make_ip_address( unsigned long ip, fixed_string< n, char > & )

but you wanted to put this function in a library or DLL either to hide a proprietry implementation or because the implementation of this is too long to sensibly be a template (or even both).

Sure dont make it a template . OTOH BTW Why would you want to choose fixed_string for this application?. String manipulation time will be dwarfed by look up time anyway.

...

I was just giving a simple example of functionality that some people want. I wasn't saying that was the best way to implement strlen.

I think you might find the non template version starts to look quite like a (maybe improved) version of std::string. Fact is once the capacity is changeable theres no reason not to provide the whole interface. OTOH maybe you are trying to make an alternative to std::string, if so then you need to show demostrable benefits, but then I hope it wouldnt be called fixed_string ! regards Andy Little

Paul A Bristow

3:17 p.m.

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Hartmut Kaiser | Sent: 29 January 2006 23:31 | To: boost@lists.boost.org | Subject: [boost] [Review] Fixed Strings review period extended As another silent downloader, I just want to agree that we really do need a fixed_string, (a less than ideal name for what I would call a bounded_capacity string). This one looks reasonable but I was unclear that it was the right one. I suggest that the answer is to reject for now, but ask Reece to do a LOT more work on the documentation. It left me confused, and although this is not an uncommon state ;-) it would appear that I am not alone. Doxygen is NOT helping at all - it is deluding the author in thinking the job is done automatically.. What we need here is a full discussion of the rationale - pros and cons - for why this design is the least worst, at least. (In the end, the language is at fault - it doesn't have built-in array-bound checked arrays where the compiler at least knows the fixed maximum capacity. While it might have been better if we hadn't started from there, we did, and we are now seeking as good a bolt-on fix as possible.) Paul -- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB Phone and SMS text +44 1539 561830, Mobile and SMS text +44 7714 330204 mailto: pbristow@hetp.u-net.com http://www.hetp.u-net.com/index.html http://www.hetp.u-net.com/Paul%20A%20Bristow%20info.html

David Abrahams

3:40 p.m.

"Paul A Bristow" <pbristow@hetp.u-net.com> writes:

...

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Hartmut Kaiser | Sent: 29 January 2006 23:31 | To: boost@lists.boost.org | Subject: [boost] [Review] Fixed Strings review period extended

As another silent downloader, I just want to agree that we really do need a fixed_string,

Why? -- Dave Abrahams Boost Consulting www.boost-consulting.com

Gennadiy Rozental

5:06 p.m.

"David Abrahams" <dave@boost-consulting.com> wrote in message news:87slr5bu6z.fsf@boost-consulting.com...

...

"Paul A Bristow" <pbristow@hetp.u-net.com> writes:

...
| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Hartmut Kaiser | Sent: 29 January 2006 23:31 | To: boost@lists.boost.org | Subject: [boost] [Review] Fixed Strings review period extended

As another silent downloader, I just want to agree that we really do need a fixed_string,

Why?

Usage of big stack buffer for string formatting was (is?) usual practice in C programming. Especially for error message formatting. Nowadays need for stack_string (which IMO would be a better name) is unclear. In practice it's just an attempt on performance improvements. I guess it could be useful in performance critical areas where need for dynamic allocation could hurt. An alternative is having std::string somewhere cashed. But we a) may not place for it b) still do not want to depend on performance on dynamic clear/resize methods. Gennadiy

David Abrahams

10:15 p.m.

"Gennadiy Rozental" <gennadiy.rozental@thomson.com> writes:

...

"David Abrahams" <dave@boost-consulting.com> wrote in message news:87slr5bu6z.fsf@boost-consulting.com...

...
"Paul A Bristow" <pbristow@hetp.u-net.com> writes:

...
| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Hartmut Kaiser | Sent: 29 January 2006 23:31 | To: boost@lists.boost.org | Subject: [boost] [Review] Fixed Strings review period extended

As another silent downloader, I just want to agree that we really do need a fixed_string,

Why?

Usage of big stack buffer for string formatting was (is?) usual practice in C programming.

Sure. That doesn't make it right. It's the cause of innumerable bugs, and not just overrun errors. The bjam sources (written in 'C') used to do this, and it made bjam completely non-scalable. I ripped all the fixed buffers out and replaced them with a proper variable-sized string "class" (written in 'C') that uses the small string optimization, and it works great. If I had used some kind of fixed-sized buffer there would have remained all kinds of edge cases where the code stopped working. If you're going to go through code and replace fixed-sized char arrays on the stack with something more reliable, it makes no sense at all to go halfway and use something that is still going to give you problems when your input data is larger than you might have anticipated (or whatever). Just use a class that can grow as necessary.

...

Especially for error message formatting. Nowadays need for stack_string (which IMO would be a better name) is unclear. In practice it's just an attempt on performance improvements.

Right. Start with the SSO on a general-purpose string class. If you find yourself needing larger statically-allocated buffers for efficiency reasons, add a template parameter that allows you to specify the static buffer length. How many applications do you think would find that arrangement to be still too inefficient?

...

I guess it could be useful in performance critical areas where need for dynamic allocation could hurt.

This is a .01% optimization. Profile first and prove that dynamic allocation hurts. Use a large SSO buffer so you have dynamic allocation as insurance. If, after you've gone through all that care to determine that you really need a fixed-sized character array, you can afford to be careful enough with those few lines of code to make it right.

...

An alternative is having std::string somewhere cashed.

...

But we a) may not place for it b) still do not want to depend on performance on dynamic clear/resize methods.

I don't understand. -- Dave Abrahams Boost Consulting www.boost-consulting.com

Gennadiy Rozental

31 Jan 31 Jan

1:06 a.m.

...

...
An alternative is having std::string somewhere cashed.

??

...
But we a) may not place for it b) still do not want to depend on performance on dynamic clear/resize methods.

I don't understand.

Let's see. Here is frequently called function: .. if( ... ) { // here we need string to format a message } ... if( ... ) { // here we need string to format a message again } stack_string you could just put inside each if branch. No need to care that it gets destroyed. With any string that employ dynamic allocation one might find it wasteful. So usually we are trying to reuse the same variable, right? Now where do we put it? If this is member function we could put in class body, which may or may not be convenient. Or we could make it static. But then clear/resize methods for string should be efficient and not include and memory deallocation. And MT could be an issue. This is the case I was talking about. Gennadiy

Eric Niebler

1:34 a.m.

David Abrahams wrote:

...

If you're going to go through code and replace fixed-sized char arrays on the stack with something more reliable, it makes no sense at all to go halfway and use something that is still going to give you problems when your input data is larger than you might have anticipated (or whatever).

I'm going to play devil's advocate here and disagree for the sake of discussion. You're ignoring failure mode. A string truncation bug is, in all likelyhood, less severe than a buffer overrun. If, for instance, there were a large monopolistic corporation trying to eliminate remote exploits from their ENORMOUS codebase, a switch to something like fixed_string might be significantly cheaper than reengineering their code to work with a growable string class. A different approach to achieve the same ends might be to define overloads of the C string APIs that worked with std::string, so the migration of legacy code can actually eliminate bugs instead of making them less severe. Has this approach been considered? -- Eric Niebler Boost Consulting www.boost-consulting.com

David Abrahams

2:20 a.m.

"Eric Niebler" <eric@boost-consulting.com> writes:

...

David Abrahams wrote:

...
If you're going to go through code and replace fixed-sized char arrays on the stack with something more reliable, it makes no sense at all to go halfway and use something that is still going to give you problems when your input data is larger than you might have anticipated (or whatever).

I'm going to play devil's advocate here and disagree for the sake of discussion. You're ignoring failure mode. A string truncation bug is, in all likelyhood, less severe than a buffer overrun.

Maybe. Probably. So what? If you're going to change the code anyway, why not change it to something that avoids both, reliably?

...

If, for instance, there were a large monopolistic corporation trying to eliminate remote exploits from their ENORMOUS codebase, a switch to something like fixed_string might be significantly cheaper than reengineering their code to work with a growable string class.

Seems to me that if you can write strcpy(fixed_string, some_source_string); you can just as easily write strcpy(growable_string, some_source_string); instead.

...

A different approach to achieve the same ends might be to define overloads of the C string APIs that worked with std::string,

Gee that smells like a familiar idea ;-)

...

so the migration of legacy code can actually eliminate bugs instead of making them less severe. Has this approach been considered?

I think that's my point, Eric :) -- Dave Abrahams Boost Consulting www.boost-consulting.com

Eric Niebler

4:12 a.m.

David Abrahams wrote:

...

"Eric Niebler" <eric@boost-consulting.com> writes:

...
A different approach to achieve the same ends might be to define overloads of the C string APIs that worked with std::string,

Gee that smells like a familiar idea ;-)

...
so the migration of legacy code can actually eliminate bugs instead of making them less severe. Has this approach been considered?

I think that's my point, Eric :)

:-P That's what I get for only skimming the thread -- I didn't see you suggest this approach. Sorry for the noise, and consider this an endorsement of Dave's suggestion. -- Eric Niebler Boost Consulting www.boost-consulting.com

David Abrahams

3:21 p.m.

"Eric Niebler" <eric@boost-consulting.com> writes:

...

David Abrahams wrote:

...
"Eric Niebler" <eric@boost-consulting.com> writes:

...
A different approach to achieve the same ends might be to define overloads of the C string APIs that worked with std::string,

Gee that smells like a familiar idea ;-)

...
so the migration of legacy code can actually eliminate bugs instead of making them less severe. Has this approach been considered?

I think that's my point, Eric :)

:-P That's what I get for only skimming the thread -- I didn't see you suggest this approach. Sorry for the noise, and consider this an endorsement of Dave's suggestion.

To be fair, I didn't suggest that specifically until forced into it by Gennadiy. My fundamental position is that you should probably *not* be using a fixed-sized string for this sort of transition. The details of how you use a variable length string are less important to me. If you want to keep a <cstring>-like interface, that's fine, but ideally all that code should be revisited anyway and you could easily end up with better code by translating it to a more natural and expressive interface. Regards, Dave -- Dave Abrahams Boost Consulting www.boost-consulting.com

Martin

7:49 a.m.

David Abrahams <dave <at> boost-consulting.com> writes:

...

"Paul A Bristow" <pbristow <at> hetp.u-net.com> writes:

...
As another silent downloader, I just want to agree that we really do need a fixed_string,

Why?

I found it useful when working with C APIs. With a fixed_string type I can do: boost::fixed_string<30> str; someoldcapi(str.buffer()); str.find(...) instead of first using a char array and then copy it to a basic_string. Second use for me is to have a basic_string compatible buffer to use in shared memory/IPC. struct commandlineargs { boost::fixed_string<20> arg1; boost::filesystem::basic_path<boost::fixed_string<256> > path1 }; Unfortuntaly the review version of the fixed_string doesn't work with any of the above cases (first case needs a setlength() call). overrun protection isn't important for me.

Reece Dunn

8:59 a.m.

Martin wrote:

...

I found it useful when working with C APIs.

With a fixed_string type I can do:

boost::fixed_string<30> str; someoldcapi(str.buffer()); str.find(...)

instead of first using a char array and then copy it to a basic_string.

Second use for me is to have a basic_string compatible buffer to use in shared memory/IPC.

Don't you mean char[] compatible buffer? Yes... the current version has broken that, but I now know how to rework it so that this should work properly:

...

struct commandlineargs { boost::fixed_string<20> arg1; boost::filesystem::basic_path<boost::fixed_string<256> > path1 };

Unfortuntaly the review version of the fixed_string doesn't work with any of the above cases (first case needs a setlength() call).

In the new version, buffer() will return an object that will recalculate the string size when it is destroyed, so you won't need to call setlength(). I will provide a bufferex() or raw_buffer() that won't update the length, so you could have: fixed_string< 100 > str; some_fn( str.buffer()); some_fn2( str.raw_buffer(), str.length_calculator()); str.setlength( some_fn2( str.raw_buffer())); depending on the prototype of the function.

...

overrun protection isn't important for me.

Ok. - Reece

David Abrahams

3:37 p.m.

Martin <adrianm@touchdown.se> writes:

...

David Abrahams <dave <at> boost-consulting.com> writes:

...
"Paul A Bristow" <pbristow <at> hetp.u-net.com> writes:

...
As another silent downloader, I just want to agree that we really do need a fixed_string,

Why?

I found it useful when working with C APIs.

With a fixed_string type I can do:

boost::fixed_string<30> str; someoldcapi(str.buffer()); str.find(...)

instead of first using a char array and then copy it to a basic_string.

std::string str(30, 0); someoldcapi(&str[0]); str.resize(str.find(0)); str.find(...) If we need a class to help with such a transition, we should design it to make some of the operations more natural. But even that is not probably worth the trouble.

...

Second use for me is to have a basic_string compatible buffer to use in shared memory/IPC.

struct commandlineargs { boost::fixed_string<20> arg1; boost::filesystem::basic_path<boost::fixed_string<256> > path1 };

Isn't that what the allocator argument to basic_string is for?

...

Unfortuntaly the review version of the fixed_string doesn't work with any of the above cases (first case needs a setlength() call).

Speaks volumes. -- Dave Abrahams Boost Consulting www.boost-consulting.com

Ion Gaztañaga

5:30 p.m.

...

...
Second use for me is to have a basic_string compatible buffer to use in shared memory/IPC.

struct commandlineargs { boost::fixed_string<20> arg1; boost::filesystem::basic_path<boost::fixed_string<256> > path1 };

Isn't that what the allocator argument to basic_string is for?

I agree with Dave, I don't see the need of a fixed string for IPC: a fixed_buffer_allocator for basic_string should do the job. You have two possibilities: -> Use an allocator that has the buffer as member. The only drawback is that the string will create a temporary default constructed allocator and will assign it to the basic_string's member allocator. So you have a extra stack when constructing the basic_string. And the allocators have private, per-instance memory, and can't be swapped (like current Howard Hinnant's proposal says), so they are stateful allocators (which I think are not standard). -> Use an allocator that holds the pointer and size of a buffer. This is more flexible but you can't freely pass the string because the lifetime of the buffer must be the same as the lifetime of the basic_string. But sometimes this could be very useful and you can swap the strings. char buffer [100]; c_buffer_allocator a (buffer, 100); basic_string<char, std::char_traits<char>, a> str (a); In both cases when trying to reallocate memory, the allocator would throw an exception, so we have buffer protection, and all basic_string uses. -> Another alternative to work with fixed buffers is to create a buffer formatting iostream. For example, in Boost candidate Shmem you have bufferstream classes: basic_bufferstream<char, std::char_traits<char> > bufstream; bufferstream mybufstream(my_cstring, 100*5); bufstream << int(1) << "text" << std::ends; If you read past-end the buffer, that will trigger eofbit and if you try to right past-end you will have badbit activated. You can activate also exceptions, like with other iostreams. Regards, Ion

Martin Adrian

1 Feb 1 Feb

7:46 a.m.

Ion Gaztañaga <igaztanaga <at> gmail.com> writes:

...

...
Isn't that what the allocator argument to basic_string is for?

I agree with Dave, I don't see the need of a fixed string for IPC: a fixed_buffer_allocator for basic_string should do the job. You have two possibilities:

I also agree that it is a job for the allocator but I have tried and never got it to work properly. The new allocator proposal look promising but we are not there yet.

...

-> Use an allocator that has the buffer as member. The only drawback is that the string will create a temporary default constructed allocator and will assign it to the basic_string's member allocator. So you have a extra stack when constructing the basic_string. And the allocators have private, per-instance memory, and can't be swapped (like current Howard Hinnant's proposal says), so they are stateful allocators (which I think are not standard).

How do you set the size of the buffer. AFAIK the basic_string implementation is free to ask for any size memory as first allocation. e.g. gcc allocates extra space for internal data and also use some algorithm to optimize memory allocation. (If I remember correctly it will always round up to nearest multiple of 128 bytes except for small sizes.)

...

-> Use an allocator that holds the pointer and size of a buffer. This is more flexible but you can't freely pass the string because the lifetime of the buffer must be the same as the lifetime of the basic_string. But sometimes this could be very useful and you can swap the strings.

char buffer [100]; c_buffer_allocator a (buffer, 100); basic_string<char, std::char_traits<char>, a> str (a);

you probably mean: basic_string<char, std::char_traits<char>, c_buffer_allocator> str(a); Now explain how I use it with boost::filesystem or the string_algo library.

Ion Gaztañaga

8:08 p.m.

...

How do you set the size of the buffer. AFAIK the basic_string implementation is free to ask for any size memory as first allocation.

e.g. gcc allocates extra space for internal data and also use some algorithm to optimize memory allocation. (If I remember correctly it will always round up to nearest multiple of 128 bytes except for small sizes.)

Good point. You are right. The allocation strategy will almost never request the exact byte quantity. Regards, Ion

Reece Dunn

31 Jan 31 Jan

9:13 a.m.

Paul A Bristow wrote:

...

As another silent downloader, I just want to agree that we really do need a fixed_string, (a less than ideal name for what I would call a bounded_capacity string). This one looks reasonable but I was unclear that it was the right one.

There are currently some design flaws that I intend to fix for the next review that have been discussed in other posts.

...

I suggest that the answer is to reject for now, but ask Reece to do a LOT more work on the documentation. It left me confused, and although this is not an uncommon state ;-) it would appear that I am not alone.

Hopefully, for the next review, the documentation should be a lot better.

...

Doxygen is NOT helping at all - it is deluding the author in thinking the job is done automatically..

Doxygen is useful for creating a reference. It is not a substitute for a tutorial/overview/how to use guide, a la the Spirit documentation.

...

What we need here is a full discussion of the rationale - pros and cons - for why this design is the least worst, at least.

I will come up with a list and post it later on today.

...

(In the end, the language is at fault - it doesn't have built-in array-bound checked arrays where the compiler at least knows the fixed maximum capacity. While it might have been better if we hadn't started from there, we did, and we are now seeking as good a bolt-on fix as possible.)

The C/C++ language was designed so that you didn't pay for (in performance and memory) what you didn't use. Bounds checking and overrun protection fall into this category. Also, as Dave has pointed out, in some applications it is bad to clip the string contents or throw an exception. That said, slightly OT, if design by contract support becomes available in the language, you could add that bounds checking as pre/post conditions or class invariants. - Reece

David Abrahams

3:44 p.m.

"Hartmut Kaiser" <hartmut.kaiser@gmail.com> writes:

...

Hi all,

normally, the review period of the Fixed Strings library written by Reece Dunn ended today. All we got so far are 3 reviews, which is too less to make a proper decision. I really would like to encourage you to submit a review for this library to get a representative result. For this reason I'ld like to extend the review period for another week until February 5th to allow for more reviews. I'm convinced, that a Fixed Strings library would make a good addition to Boost and in the end it should be useful to a lot more of people than to the 3 of us who send in a review.

For what it's worth, if there's another review I intend to vote strongly against. I have nothign against Reece or against this particular design for a fixed strings library. As we discuss this issue I become more and more convinced that using fixed sized strings at all is just wrong in principle 99.9% of the time. We could get a lot more mileage out of a fixed-capacity allocator, when such a thing is needed. And it would be lots more effective if allocators had a realloc interface as has often been proposed, and if basic_string were forced to use it use it. -- Dave Abrahams Boost Consulting www.boost-consulting.com

7111

Age (days ago)

7114

Last active (days ago)

List overview

Download

38 comments

14 participants

participants (14)

Andy Little
David Abrahams
Eric Niebler
Gennadiy Rozental
Hartmut Kaiser
Ion Gaztañaga
Jim Douglas
John Maddock
Markus Schöpflin
Martin
Martin Adrian
Paul A Bristow
Pavel Antokolsky aka Zigmar
Reece Dunn