Re: [boost] Re: Formal Review: FOREACH macro

Peter Dimov wrote:
Eric Niebler wrote:
Yes, but I was refering to native (C-style) strings, as in: BOOST_FOREACH(char ch, "hello world")
This is a good opportunity to ask: what does this do?
char const a[] = { 0, 1, 2 };
Or for extra fun, drop the 0.
BOOST_FOREACH( int x, a ) { std::cout << x << std::end; }

Peter Dimov wrote:
Peter Dimov wrote:
Eric Niebler wrote:
Yes, but I was refering to native (C-style) strings, as in: BOOST_FOREACH(char ch, "hello world")
This is a good opportunity to ask: what does this do?
char const a[] = { 0, 1, 2 };
Or for extra fun, drop the 0.
BOOST_FOREACH( int x, a ) { std::cout << x << std::end; }
The short answer is: it is consistent with how Boost.Range handles it. The longer answer is that it is treated as a null-terminated string. In the first case, the loop immediately terminates and in the second, behavior is undefined -- you read off the end of the array because it is not null-terminated. I think this is OK. If you think this is a problem, you should discuss it with Thorsten. Whatever we decide, FOREACH and Range should be consistent. -- Eric Niebler Boost Consulting www.boost-consulting.com

Eric Niebler wrote:
I think this [size(char[N]) == N-1] is OK.
It's very much a non-problem today, both because the majority of char[] arrays are indeed string literals, and because BOOST_FOREACH on a string literal seems pretty pointless. Things may become more interesting in the future, when on platform X BOOST_FOREACH on int[] could decide to skip the last element of the array because utf32_t happened to map to int.
If you think this is a problem, you should discuss it with Thorsten.
I'm not looking forward to that.

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:00a101c550d8$39178d50$6401a8c0@pdimov2... | Eric Niebler wrote: | | > I think this [size(char[N]) == N-1] is OK. | | It's very much a non-problem today, both because the majority of char[] | arrays are indeed string literals, and because BOOST_FOREACH on a string | literal seems pretty pointless. | | Things may become more interesting in the future, when on platform X | BOOST_FOREACH on int[] could decide to skip the last element of the array | because utf32_t happened to map to int. | | > If you think this is a problem, you should discuss it with Thorsten. | | I'm not looking forward to that. Please do anyway. If you think there is a major problem in the defaults of the Range library, please say so. I'll be writing a proposal about it in a few months time. best regards -Thorsten

Thorsten Ottosen wrote:
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:00a101c550d8$39178d50$6401a8c0@pdimov2...
If you think this is a problem, you should discuss it with Thorsten.
I'm not looking forward to that.
Please do anyway. If you think there is a major problem in the defaults of the Range library, please say so.
I think that defining size( T[N] ) as N or N-1, depending on whether T is a character type, will cause problems in generic code and preclude the widespread use of the library. Looking at the actual contents of the array for a NULL terminator can be even worse, as it may easily lead to undefined behavior. In addition, this definition will not adapt well if other character types are added as part of Unicode-ification, in particular, if these character types are typedefs. I, for one, will be avoiding the library because of these issues. "Do the right thing" is a design principle that I usually advocate, but in this specific case I believe that doing the predictable and consistent thing is better.

Hi, On Tue, May 10, 2005 at 10:53:16PM +0300, Peter Dimov wrote:
Thorsten Ottosen wrote:
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:00a101c550d8$39178d50$6401a8c0@pdimov2...
If you think this is a problem, you should discuss it with Thorsten.
I'm not looking forward to that.
Please do anyway. If you think there is a major problem in the defaults of the Range library, please say so.
I think that defining size( T[N] ) as N or N-1, depending on whether T is a character type, will cause problems in generic code and preclude the widespread use of the library.
Looking at the actual contents of the array for a NULL terminator can be even worse, as it may easily lead to undefined behavior.
In addition, this definition will not adapt well if other character types are added as part of Unicode-ification, in particular, if these character types are typedefs.
I, for one, will be avoiding the library because of these issues. "Do the right thing" is a design principle that I usually advocate, but in this specific case I believe that doing the predictable and consistent thing is better.
Actualy I'm the author of this idea so I can give an explanation. I came up to this problem when I was adopting first version of container_traits (predecessor of Boost.Range) to use with StringAlgo library. While playing with different range type I have found an unexpected behaviour when a parameter was of this type: char str[]="Hello" Actual type of str is char[6] so it also contains terminating zero. And this is the actual problem. If we apply the default behavior, on this type (ie, range will include this terminator), it will make char[] useless. More importantly, it will differ from the behaviour of char* and wchar_t*, std::basic_string and probably any possible string class. So I'm not sure what is consistent and predicable, but from the user perspective, char[] is first of all a string constant. So it should be threated like a string not like an 'array'. I consider it very unexpected when boost::end("hello") will not designate end of "hello" rather end of "hello\0" Yet I understand you reservations, but the only way out of this problem would be to define different boost::end() for string ranges. I'm not sure if this is an acceptable way. However I can see a way how to resolve a problem with future unicode character classes. Simple trait class can do the job nicely. Regards, Pavol

Pavol Droba wrote:
However I can see a way how to resolve a problem with future unicode character classes. Simple trait class can do the job nicely.
This works if the designated character types are classes, or new integral types. But if they're typedefs for existing types, e.g. unsigned int, traits classes won't be able to determine the intended use. Jonathan

On Tue, May 10, 2005 at 02:56:21PM -0600, Jonathan Turkanis wrote:
Pavol Droba wrote:
However I can see a way how to resolve a problem with future unicode character classes. Simple trait class can do the job nicely.
This works if the designated character types are classes, or new integral types. But if they're typedefs for existing types, e.g. unsigned int, traits classes won't be able to determine the intended use.
That's true. Maybe a proposed "strong typedef" can help here in the furure. Pavol

Pavol Droba wrote:
Yet I understand you reservations, but the only way out of this problem would be to define different boost::end() for string ranges. I'm not sure if this is an acceptable way.
This is pretty much what I was saying. Looking for a NULL terminator is only acceptable when you know that you are dealing with NULL-terminated strings. If I'm writing generic code that works on C arrays, I'll use N as the size of T[N]. If I'm writing generic code that works on strings, I'll use N-1 or strlen. I won't use the Range library in either of these cases because it tries to guess which one I need.

On Wed, May 11, 2005 at 12:27:45AM +0300, Peter Dimov wrote:
Pavol Droba wrote:
Yet I understand you reservations, but the only way out of this problem would be to define different boost::end() for string ranges. I'm not sure if this is an acceptable way.
This is pretty much what I was saying. Looking for a NULL terminator is only acceptable when you know that you are dealing with NULL-terminated strings.
If I'm writing generic code that works on C arrays, I'll use N as the size of T[N]. If I'm writing generic code that works on strings, I'll use N-1 or strlen. I won't use the Range library in either of these cases because it tries to guess which one I need.
Actualy I think, that Range works pretty well for string applications. But I'm all in favor to make things more consistent. What about introducing str_end() functions. The default implementation would fall-back to end(), but it can be overriden for special cases (like char[]). If we go this way however, we must remove direct suppor for null-terminated strings like char* and leave it only to str_end(). Since otherwise there will still be a bit of confusion. Naturaly, there will be need for str_size, and possibly str_begin (for consistency). So far it seems easy. But there is a catch as usualy. There are generic facilities like FOREACH, that iterate over an arbitrary range. Question is "what is an arbitrary range?". If we split the range interface into two distinct groups, same will have to be done for FOREACH and any such facility. And things can get even worse if by change we come up to another range branch. Let's try another approach then. It is possible to create an adaptor, that will create an iterator_range out of a problematic range. So if you instead of direclty passing the value, you will encapsulate it first and pass it like this: char[] ch={'h', 'e', 'l', 'l', 'o', 0 ,'b', 'y', 'e'}; array(str) // array like approach -> result [hello\0bye] str(ch) // string like approach -> result [hello] Advantage of this is that it leave open most of the future possibilities, disadvantage is that it requires more writting. However, this conversion can be hidden inside the library function, if its intention is unambigous. Having this facility at hand we can decide what will be default for the plain range. I'm in favor of the current approach since it is used more frequently IMHO. Regards, Pavol

Pavol Droba wrote:
What about introducing str_end() functions. The default implementation would fall-back to end(), but it can be overriden for special cases (like char[]). If we go this way however, we must remove direct suppor for null-terminated strings like char* and leave it only to str_end(). Since otherwise there will still be a bit of confusion. Naturaly, there will be need for str_size, and possibly str_begin (for consistency).
I'd name it strlen, actually. And I wouldn't fall back to end(); a string type should be marked as such by providing an appropriate overload for strlen (and strbegin/strend).
So far it seems easy. But there is a catch as usualy. There are generic facilities like FOREACH, that iterate over an arbitrary range.
It would be perfectly reasonable for a language-level foreach to omit the trailing zero when iterating over a character literal. But I don't think that this case is possible to detect. I wouldn't be surprised if it is, though, given the current wizardry in BOOST_FOREACH (somehow exploiting the fact that literals are const but convertible to char*.) But this is a special case. In general, when you see char[4] (or wchar_t[4]), you have absolutely no license to replace the 4 with undefined behavior. std::swprintf( buffer, size(buffer), L"%d", i );

On Wed, May 11, 2005 at 02:27:43PM +0300, Peter Dimov wrote:
Pavol Droba wrote:
What about introducing str_end() functions. The default implementation would fall-back to end(), but it can be overriden for special cases (like char[]). If we go this way however, we must remove direct suppor for null-terminated strings like char* and leave it only to str_end(). Since otherwise there will still be a bit of confusion. Naturaly, there will be need for str_size, and possibly str_begin (for consistency).
I'd name it strlen, actually. And I wouldn't fall back to end(); a string type should be marked as such by providing an appropriate overload for strlen (and strbegin/strend).
I think that this is not a reasonable way of thinking. It is not up to library developer to restrict a user in this way. For instance it is quite frequent to use vector<char> as a storage for textual data. Why would you like to restrict a usage of string-oriented algorithms for this particular container.
So far it seems easy. But there is a catch as usualy. There are generic facilities like FOREACH, that iterate over an arbitrary range.
It would be perfectly reasonable for a language-level foreach to omit the trailing zero when iterating over a character literal. But I don't think that this case is possible to detect. I wouldn't be surprised if it is, though, given the current wizardry in BOOST_FOREACH (somehow exploiting the fact that literals are const but convertible to char*.)
But this is a special case. In general, when you see char[4] (or wchar_t[4]), you have absolutely no license to replace the 4 with undefined behavior.
It is not a special case at all. Exactly the same goes for StringAlgo library and I'm sure that for almost everything, that relies on Boost.Range. As I see it the problem has not been spotted because until now, the only real test case for Boost.Range in Boost was StringAlgo library (correct me if I'm wrong). And the current behaviour is perfectly reasonable in its context. I'm pretty sure, that another requirement on container classes, that are supposed to be used via Range interface is inacceptable. What exactly you don't like on my second proposal? It allows user to decide what he/she intends to do. To sumarize, I fully understand you concerns, but you have to try to understand mine.
From the perspective of string processing, that current approach is perfectly legal and I dare to say, that it is on par with usage you trying to achive. So let's try to come to some solution, that will be acceptable for both parties.
Regards, Pavol

Pavol Droba wrote:
To sumarize, I fully understand you concerns, but you have to try to understand mine.
From the perspective of string processing, that current approach is perfectly legal
and I dare to say, that it is on par with usage you trying to achive. So let's try to come to some solution, that will be acceptable for both parties.
I am not interested in compromises or acceptability to "both parties". If I were designing a string library, I wouldn't attempt to redesign a lower-level range library so that it fits my needs. I can easily implement whatever hacks make sense at my level rather than pushing them a level down. That's just my opinion, of course. I may be wrong.

On Wed, May 11, 2005 at 04:11:59PM +0300, Peter Dimov wrote:
Pavol Droba wrote:
To sumarize, I fully understand you concerns, but you have to try to understand mine.
From the perspective of string processing, that current approach is perfectly legal
and I dare to say, that it is on par with usage you trying to achive. So let's try to come to some solution, that will be acceptable for both parties.
I am not interested in compromises or acceptability to "both parties". If I were designing a string library, I wouldn't attempt to redesign a lower-level range library so that it fits my needs. I can easily implement whatever hacks make sense at my level rather than pushing them a level down.
That's just my opinion, of course. I may be wrong.
It is hard to argue if the other party is not trying to understand. You are trying to leverage the purity of design to the actual usefulnes. It is nice goal, but it is not suitable for real people. We can make Range library to work as you request, but it would make it fully unusable on string literals. And correct me if I'm wrong, but from the C++ programmer perspecive, char[] is in 90% cases used as a string literal. Arbitrary character array is used mostly as legacy api interface. For all other usages, there is std::vector and alike. I'm not saying that the current design is perfect, but until now, nobody has shown any complaints nor reported any problems. Given the fact, that Boost.Range is in CVS for a year I consider it as small proof, that it is not broken that much. Now a small remark towards the string library. I strogly suggest that you look more closely to Boost.Range and Boost.StringAlgo and the way they are tied togheter. StringAlgo is not a "string" library, rather an "algorithm" library. It can deal with any container that Range supports. Since Boost.Range is interfacing with a container, me as a library writter have a limited number of oportunies to introduces "hacks". Not mentioning that I don't want to. So it is up to Boost.Range to handle this issues. I will gladly make any fixes in the StringAlgo lib to accomodate a possible changes in the Range interface. However, it must be officialy possible. As I have already stated, I don't think, that StringAlgo library is somehow special in this regard. Any library that will use Range will be affected. I pretty sure, that majority of users will be suprised by the fact that statement: a_generic_algorith("hello") will process "hello\0" rather then "hello" Best Regards, Pavol

Pavol Droba wrote:
And correct me if I'm wrong, but from the C++ programmer perspecive, char[] is in 90% cases used as a string literal.
This has not been my experience. char[] is usually a char[] in my code, a character buffer with a fixed size, as I tried to show with the swprintf example. (Sometimes it's a short array of characters, typically 4, but this is rare.) String literals typically decay to char const* the moment they are used. Of course my experience may be unusual.

Peter Dimov wrote:
It would be perfectly reasonable for a language-level foreach to omit the trailing zero when iterating over a character literal. But I don't think that this case is possible to detect. I wouldn't be surprised if it is, though, given the current wizardry in BOOST_FOREACH (somehow exploiting the fact that literals are const but convertible to char*.)
It does? What makes you think that? I wonder if the fact that a const char[N] is convertible to a (non-const) char* can be used to detect whether a given character array should be treated as a null-terminated string. It's an interesting question, but I don't think it's a good idea, becuase it wouldn't handle cases like: const char sz[] = "hello"; boost::size(sz); // 5 or 6? boost::size("hello"); // better give the same answer as above FWIW, I agree with Peter and Dave A. that character arrays should be treated as arrays and not as null-terminated strings. The truth is, you just don't know, and you need the user's guidance to pick. I'd like to see something like: template<typename T, std::size_t N> iterator_range<T const *> ignore_trailing_null(T const (&rg)[N]) { BOOST_ASSERT(T(0) == rg[N-1]); return make_iterator_range(&rg[0], &rg[N-1]); } That way: boost::size("hello"); // 6 boost::size(ignore_trailing_null("hello")); // 5 Perhaps there is a better is a better name than "ignore_trailing_null". -- Eric Niebler Boost Consulting www.boost-consulting.com

Pavol Droba <droba@topmail.sk> writes:
However I can see a way how to resolve a problem with future unicode character classes. Simple trait class can do the job nicely.
The way to handle this is with a simple range wrapper: string_literal("Hello, world") -- Dave Abrahams Boost Consulting www.boost-consulting.com

On Wed, May 11, 2005 at 08:59:59AM -0400, David Abrahams wrote:
Pavol Droba <droba@topmail.sk> writes:
However I can see a way how to resolve a problem with future unicode character classes. Simple trait class can do the job nicely.
The way to handle this is with a simple range wrapper:
string_literal("Hello, world")
This is what I have actualy suggested in later posts. Regards, Pavol

Pavol Droba <droba@topmail.sk> writes:
On Wed, May 11, 2005 at 08:59:59AM -0400, David Abrahams wrote:
Pavol Droba <droba@topmail.sk> writes:
However I can see a way how to resolve a problem with future unicode character classes. Simple trait class can do the job nicely.
The way to handle this is with a simple range wrapper:
string_literal("Hello, world")
This is what I have actualy suggested in later posts.
I'm confused, then. When I read
We can make Range library to work as you request, but it would make it fully unusable on string literals.
I get the impression you think the range library should work on string literals directly. I am with Peter in the opinion that string literals can only properly be treated by a range library as an array of characters, including the null terminator. -- Dave Abrahams Boost Consulting www.boost-consulting.com

On Wed, May 11, 2005 at 11:38:24AM -0400, David Abrahams wrote:
Pavol Droba <droba@topmail.sk> writes:
On Wed, May 11, 2005 at 08:59:59AM -0400, David Abrahams wrote:
Pavol Droba <droba@topmail.sk> writes:
However I can see a way how to resolve a problem with future unicode character classes. Simple trait class can do the job nicely.
The way to handle this is with a simple range wrapper:
string_literal("Hello, world")
This is what I have actualy suggested in later posts.
I'm confused, then. When I read
My proposal was here: http://article.gmane.org/gmane.comp.lib.boost.devel/123953/ <cite> Let's try another approach then. It is possible to create an adaptor, that will create an iterator_range out of a problematic range. So if you instead of direclty passing the value, you will encapsulate it first and pass it like this: char[] ch={'h', 'e', 'l', 'l', 'o', 0 ,'b', 'y', 'e'}; array(str) // array like approach -> result [hello\0bye] str(ch) // string like approach -> result [hello] Advantage of this is that it leave open most of the future possibilities, disadvantage is that it requires more writting. However, this conversion can be hidden inside the library function, if its intention is unambigous. Having this facility at hand we can decide what will be default for the plain range. I'm in favor of the current approach since it is used more frequently IMHO. </cite> First, the adaptors should be natural part of Boost.Range library, since only Range library is the interface to a container/range that a library uses. Idea is that a user will always be able to choose his prefered type, while library writes will be able to define default, that better suits the usage of the library. For example, a string algorithm would look like: template<typename RangeT> ... AnAlgorithm(const RangeT& aRange) { boost::sub_range<RangeT> StrRange=str(aRange); // Do something with StrRange } while binary processing algorithm would use 'array' construct. Imporant aspect is, that neither of 'str' or 'array' constucts should change a type if one of them is already applied. Regardles of what default will range provide, user will always be able to select what he/she wants. So what about this? Regards, Pavol

"Peter Dimov" <pdimov@mmltd.net> writes:
Thorsten Ottosen wrote:
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:00a101c550d8$39178d50$6401a8c0@pdimov2...
If you think this is a problem, you should discuss it with Thorsten.
I'm not looking forward to that.
Please do anyway. If you think there is a major problem in the defaults of the Range library, please say so.
I think that defining size( T[N] ) as N or N-1, depending on whether T is a character type, will cause problems in generic code and preclude the widespread use of the library.
vector<bool>, anybody?? -- Dave Abrahams Boost Consulting www.boost-consulting.com
participants (6)
-
David Abrahams
-
Eric Niebler
-
Jonathan Turkanis
-
Pavol Droba
-
Peter Dimov
-
Thorsten Ottosen