Re: [Boost-users] Report from Mont Tremblant C++ Committee meeting
On Sun, 9 Oct 2005 21:32:30 -0400, "Beman Dawes"
String-algo : Interested, but concern over interface and choice of functions, generic vs basic_string.5.3 separate proposal
Any details on this? I love this library, and I would love to see it standardized in some form. Overall I've found the interface and function choices to be excellent (and I'd love to see even more), and I love that it's generic and not limited to basic_string. I use it on vectors and other containers for different kinds of network protocol parsing. -- Be seeing you.
"Thore Karlsen"
On Sun, 9 Oct 2005 21:32:30 -0400, "Beman Dawes"
wrote: [...]
String-algo : Interested, but concern over interface and choice of functions, generic vs basic_string.5.3 separate proposal
Any details on this? I love this library, and I would love to see it standardized in some form. Overall I've found the interface and function choices to be excellent (and I'd love to see even more), and I love that it's generic and not limited to basic_string. I use it on vectors and other containers for different kinds of network protocol parsing.
Thorsten Ottosen acted as champion for the paper, but I'll try to recall the discussion. There concern was that at least some of the algorithms were only useful in the context of strings, and so it would be an over-generalization to supply them as free algorithms. One way to counter that argument would be to identify the functions you have found useful on other containers, and to provide some use cases to buttress the argument. --Beman
On Mon, 10 Oct 2005 15:14:19 -0400, "Beman Dawes"
String-algo : Interested, but concern over interface and choice of functions, generic vs basic_string.5.3 separate proposal
Any details on this? I love this library, and I would love to see it standardized in some form. Overall I've found the interface and function choices to be excellent (and I'd love to see even more), and I love that it's generic and not limited to basic_string. I use it on vectors and other containers for different kinds of network protocol parsing.
Thorsten Ottosen acted as champion for the paper, but I'll try to recall the discussion.
There concern was that at least some of the algorithms were only useful in the context of strings, and so it would be an over-generalization to supply them as free algorithms.
One way to counter that argument would be to identify the functions you have found useful on other containers, and to provide some use cases to buttress the argument.
Well, one thing I use it for is parsing HTTP directly in the read buffer, which is a vector. If the interfaces weren't generic, I'd either have to write my own functions to duplicate the functionality, or I'd have to copy the incoming data to a string. The first seems silly, and the latter would have unacceptable overhead in my case. The HTTP I'm parsing is streaming video from several dozen cameras at once, so I have to work with the buffers directly. I also use this library on plain old C strings, which wouldn't be possible if it were locked to basic_string. Some changes may make sense, but I really like the way it is now. -- Be seeing you.
There concern was that at least some of the algorithms were only useful in the context of strings, and so it would be an over-generalization to supply them as free algorithms.
One way to counter that argument would be to identify the functions you have found useful on other containers, and to provide some use cases to buttress the argument.
Well, one thing I use it for is parsing HTTP directly in the read buffer, which is a vector. If the interfaces weren't generic, I'd either have to write my own functions to duplicate the functionality, or I'd have to copy the incoming data to a string. The first seems silly, and the latter would have unacceptable overhead in my case. The HTTP I'm parsing is streaming video from several dozen cameras at once, so I have to work with the buffers directly.
I also use this library on plain old C strings, which wouldn't be possible if it were locked to basic_string.
Some changes may make sense, but I really like the way it is now.
The way I see it all string algorithms should be using class like const_string in their interfaces. basic_string should implement const_string interface. I really think we need to provide this within boost (I know there is something in queue - lets where it will go) Gennadiy
Hi, On Tue, Oct 11, 2005 at 03:59:23PM -0400, Gennadiy Rozental wrote:
There concern was that at least some of the algorithms were only useful in the context of strings, and so it would be an over-generalization to supply them as free algorithms.
One way to counter that argument would be to identify the functions you have found useful on other containers, and to provide some use cases to buttress the argument.
Well, one thing I use it for is parsing HTTP directly in the read buffer, which is a vector. If the interfaces weren't generic, I'd either have to write my own functions to duplicate the functionality, or I'd have to copy the incoming data to a string. The first seems silly, and the latter would have unacceptable overhead in my case. The HTTP I'm parsing is streaming video from several dozen cameras at once, so I have to work with the buffers directly.
I also use this library on plain old C strings, which wouldn't be possible if it were locked to basic_string.
Some changes may make sense, but I really like the way it is now.
The way I see it all string algorithms should be using class like const_string in their interfaces. basic_string should implement const_string interface. I really think we need to provide this within boost (I know there is something in queue - lets where it will go)
Are you 100% sure, that const_string is the only reasonable string representation? There are already several others (FlexiString, sgi's rope) and several are already announced (f.e. unicode_string). There has been so many discussions that monolithic approach is wrong, yet some people still argue in favor of them. Original basic_string is mistake IMHO. It has overblown interface and yet it is still not complete enough. Regards, Pavol.
The way I see it all string algorithms should be using class like const_string in their interfaces. basic_string should implement const_string interface. I really think we need to provide this within boost (I know there is something in queue - lets where it will go)
Are you 100% sure, that const_string is the only reasonable string representation?
const_string is more interface then a representation. There possible some design variations here
There are already several others (FlexiString, sgi's rope) and several are already
Non of them is standard, so no need to pay attention. Any *string* class that want conform string_algo interface needs to satisfy const_string one.
announced (f.e. unicode_string).
As for unicode string it's a separate issue IMO. I personally quite sure that none of string algorithms will be applicable anyway. But this is topic of separate discussion based on some real submission.
There has been so many discussions that monolithic approach is wrong, yet some people still argue in favor of them.
All depends on what you mean by monolithic. IMO string algorithms design should based in CharType/StringType not iterator type
Original basic_string is mistake IMHO. It has overblown interface and yet it is still not complete enough.
This as completely separate issue. Gennadiy
On Tue, Oct 11, 2005 at 05:07:44PM -0400, Gennadiy Rozental wrote:
The way I see it all string algorithms should be using class like const_string in their interfaces. basic_string should implement const_string interface. I really think we need to provide this within boost (I know there is something in queue - lets where it will go)
Are you 100% sure, that const_string is the only reasonable string representation?
const_string is more interface then a representation. There possible some design variations here
This seems better, but I don't see any advantage to use it here instead of iterators.
There are already several others (FlexiString, sgi's rope) and several are already
Non of them is standard, so no need to pay attention. Any *string* class that want conform string_algo interface needs to satisfy const_string one.
None of the programs or classes I write are part of standard, yet the standard library is designed to be used within them. Why are algorithms in stl not tied only to containers presented there? Sorry, but I think, that this view is very shorthanded.
announced (f.e. unicode_string).
As for unicode string it's a separate issue IMO. I personally quite sure that none of string algorithms will be applicable anyway. But this is topic of separate discussion based on some real submission.
There has been so many discussions that monolithic approach is wrong, yet some people still argue in favor of them.
All depends on what you mean by monolithic. IMO string algorithms design should based in CharType/StringType not iterator type
I still don't see any good reason to it. String algorithms are primary based on *strings* and not *char type*. A string is first of all a container. As such thare are several options of its internal representation, none of them superior to other in all possible use cases. The only reasonble abstraction we have so far is through iterators. I'm not saying that this is the only possible abstraction, but it has proven adequate. Regards, Pavol
Are you 100% sure, that const_string is the only reasonable string representation?
const_string is more interface then a representation. There possible some design variations here
This seems better, but I don't see any advantage to use it here instead of iterators.
An advantage is that I do not need to care about iterator types.
There are already several others (FlexiString, sgi's rope) and several are already
Non of them is standard, so no need to pay attention. Any *string* class that want conform string_algo interface needs to satisfy const_string one.
None of the programs or classes I write are part of standard, yet the standard library is designed to be used within them.
Why are algorithms in stl not tied only to containers presented there?
Sorry, but I think, that this view is very shorthanded.
STL does specify iterators and collections concepts that conforming class *required* to comply. The same situation here.
announced (f.e. unicode_string).
As for unicode string it's a separate issue IMO. I personally quite sure that none of string algorithms will be applicable anyway. But this is topic of separate discussion based on some real submission.
There has been so many discussions that monolithic approach is wrong, yet some people still argue in favor of them.
All depends on what you mean by monolithic. IMO string algorithms design should based in CharType/StringType not iterator type
I still don't see any good reason to it. String algorithms are primary based on *strings* and not *char type*. A string is first of all a container.
Exactly - they are based on strings, not string iterators. But if we agree that all strings (we are interested to cover with this library) are parameterized by character type we could use char type parameterization
As such thare are several options of its internal representation, none of them superior to other in all possible use cases.
The only reasonble abstraction we have so far is through iterators.
IMO iterator abstraction is unnachural for strings. Unless we are talking
about unicode in 99.99% of cases string is just some_template
I'm not saying that this is the only possible abstraction, but it has proven adequate.
It's proven implementable not adequate IMO
Regards, Pavol
Regards, Gennadiy
On Tue, Oct 11, 2005 at 11:57:09PM -0400, Gennadiy Rozental wrote: [snip]
I still don't see any good reason to it. String algorithms are primary based on *strings* and not *char type*. A string is first of all a container.
Exactly - they are based on strings, not string iterators. But if we agree that all strings (we are interested to cover with this library) are parameterized by character type we could use char type parameterization
As such thare are several options of its internal representation, none of them superior to other in all possible use cases.
The only reasonble abstraction we have so far is through iterators.
IMO iterator abstraction is unnachural for strings. Unless we are talking about unicode in 99.99% of cases string is just some_template
This is very true. String is realy in most cases some_template
I'm not saying that this is the only possible abstraction, but it has proven adequate.
It's proven implementable not adequate IMO
But I would realy like know what leads you to an idea, that it is not adequate. What is missing and what should be made differently? Can you provide a simple example of an algorithm, that is written your way? Regards, Pavol
An advantage is that I do not need to care about iterator types.
At the expense of making many useful operations *impossible*, let me give you one use case - in fact the motivating reason for creating an iterator based regex lib: Imagine you have a text editor, it contains text in a contain-of-containers, representing a vector of lines (I'm talking logically not literally here, you don't actually have to have a vector of vectors, but it looks like that in the interface). You can easily create a composite iterator that enumerates all the characters in the whole text, but to do anything with that you need an iterator agnostic string algorithm library. Or... Imagine you have a text file that's too large to fit into program memory, you could create a custom iterator that loads on demand the part of the file it's pointing at, but looks to the outside world like a continuous sequence of several gigabytes of characters. And yes the regex lib has been used to search multi-gigabyte archives like this. In the early days of the lib (before Boost or BB for short <g>), I also used this technique to search large files under MS-DOS without breaking through the 64K segment limit.
I still don't see any good reason to it. String algorithms are primary based on *strings* and not *char type*. A string is first of all a container.
Exactly - they are based on strings, not string iterators. But if we agree that all strings (we are interested to cover with this library) are parameterized by character type we could use char type parameterization
As such thare are several options of its internal representation, none of them superior to other in all possible use cases.
The only reasonble abstraction we have so far is through iterators.
IMO iterator abstraction is unnachural for strings. Unless we are talking about unicode in 99.99% of cases string is just some_template
Sorry, but that's rubbish, at least as far as legacy string types are concerned, and as I said users are pretty much forced to use these in many situations. That's not to say that we can't have algorithm overloads that accept a string type rather than an iterator pair as argument (regex works this way), given a suitable string_traits<> or whatever the library could then adapt to almost any string type. In fact I have never found a string class that can't be converted to an iterator pair (and believe me I've seen a lot of string classes!) John.
There are already several others (FlexiString, sgi's rope) and several are already
Non of them is standard, so no need to pay attention. Any *string* class that want conform string_algo interface needs to satisfy const_string one.
And back in the real world users use all kinds of propietory string types: CString, UnicodeString,AnsiString, WideString, need I go on? Users *have* to use these types to interact with their native GUI's and other class libraries, having a string type neutral algorithm library allows them to interact with all of these without copying data all the time. Oh and what about memory mapped files, should we be able to support those as well? And for those who think that in-place modification is the way to go: you do realise that it's often *slower* than building a new string don't you? John.
On Wed, 12 Oct 2005 10:38:25 +0100, "John Maddock"
And for those who think that in-place modification is the way to go: you do realise that it's often *slower* than building a new string don't you?
Often, but not always, which is why I like the current string_algo, which allows you to do it whichever way is appropriate for the specific situation. -- Be seeing you.
"John Maddock"
There are already several others (FlexiString, sgi's rope) and several are already
Non of them is standard, so no need to pay attention. Any *string* class that want conform string_algo interface needs to satisfy const_string one.
And back in the real world users use all kinds of propietory string types: CString, UnicodeString,AnsiString, WideString, need I go on? Users *have* to use these types to interact with their native GUI's and other class libraries, having a string type neutral algorithm library allows them to interact with all of these without copying data all the time. Oh and what about memory mapped files, should we be able to support those as well?
IMO all these non-standard string classes are beside the point. It's maybe ok for Boost library to try to accommodate everything. But String Algorithms as part of STL should define some specific concepts that are expected users string to support and should be primarily oriented to work nicely with standard string. Gennadiy
On Wed, Oct 12, 2005 at 01:58:10PM -0400, Gennadiy Rozental wrote:
"John Maddock"
wrote in message news:01ab01c5cf13$49900270$fc220d52@fuji... There are already several others (FlexiString, sgi's rope) and several are already
Non of them is standard, so no need to pay attention. Any *string* class that want conform string_algo interface needs to satisfy const_string one.
And back in the real world users use all kinds of propietory string types: CString, UnicodeString,AnsiString, WideString, need I go on? Users *have* to use these types to interact with their native GUI's and other class libraries, having a string type neutral algorithm library allows them to interact with all of these without copying data all the time. Oh and what about memory mapped files, should we be able to support those as well?
IMO all these non-standard string classes are beside the point. It's maybe ok for Boost library to try to accommodate everything. But String Algorithms as part of STL should define some specific concepts that are expected users string to support and should be primarily oriented to work nicely with standard string.
Will you please provide us with some information where the current implementation does *not* work nicely with the standard string? Regards, Pavol
On Wed, 12 Oct 2005 13:58:10 -0400, Gennadiy Rozental wrote
"John Maddock"
wrote in message news:01ab01c5cf13$49900270$fc220d52@fuji... There are already several others (FlexiString, sgi's rope) and several are already
Non of them is standard, so no need to pay attention. Any *string* class that want conform string_algo interface needs to satisfy const_string one.
And back in the real world users use all kinds of propietory string types: CString, UnicodeString,AnsiString, WideString, need I go on? Users *have* to use these types to interact with their native GUI's and other class libraries, having a string type neutral algorithm library allows them to interact with all of these without copying data all the time. Oh and what about memory mapped files, should we be able to support those as well?
IMO all these non-standard string classes are beside the point. It's maybe ok for Boost library to try to accommodate everything. But String Algorithms as part of STL should define some specific concepts that are expected users string to support and should be primarily oriented to work nicely with standard string.
I disagree strongly. Surely it would have been easier to specify Regex for basic_string only -- but John didn't and the committee accepted. There are really good why users might not use standard string types and we want to do the same things to these strings as we do with std strings. Remember the interface to this library is very natural: std::string str1("HeLlO WoRld!"); std::to_upper(str1); // str1=="HELLO WORLD!" MyFancyString str2("HeLlO WoRld!"); std::to_upper(str2); // str2=="HELLO WORLD!" Makes no sense to me to exclude the later capability... Jeff
Jeff Garland wrote:
std::string str1("HeLlO WoRld!"); std::to_upper(str1); // str1=="HELLO WORLD!"
Opening the to_upper can of worms may not be a good idea. ;-) std::string str1("weiß"); std::to_upper(str1); // str1=="WEISS"?
On Sat, 15 Oct 2005 19:30:15 +0300, Peter Dimov wrote
Jeff Garland wrote:
std::string str1("HeLlO WoRld!");> std::to_upper(str1); // str1=="HELLO WORLD!" Opening the to_upper can of worms may not be a good idea. ;-) std::string str1("weiß"); std::to_upper(str1); // str1=="WEISS"?
Actually, I think we should -- even if it 'just does what C does' I don't want to have to adapt my C++ back to see everytime I need to uppercase a string. And now for small bit of humor (found on the Web see P-38 Can Opener) CAN OPENER DIRECTIONS Open blade. Place opener as shown in diagram. Twist down to puncture slot in can top inside rim. Cut top by advancing opener with rocking mo- tion. Take small bites. I especially like the last bit of advice in the directions ;-) Jeff
Peter Dimov wrote:
Jeff Garland wrote:
std::string str1("HeLlO WoRld!"); std::to_upper(str1); // str1=="HELLO WORLD!"
Opening the to_upper can of worms may not be a good idea. ;-)
std::string str1("weiß"); std::to_upper(str1); // str1=="WEISS"?
You could have string::string() grab the locale() :D
On 10/12/05 5:38 AM, "John Maddock"
And for those who think that in-place modification is the way to go: you do realise that it's often *slower* than building a new string don't you?
And what if the string uses more than half of the available space? (I'm not advocating in-place modification, but just mentioning that "return a copy" has its flaws too.) -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com
And what if the string uses more than half of the available space? (I'm not advocating in-place modification, but just mentioning that "return a copy" has its flaws too.)
Then you're probably going to have to start paging to file and other complicated things, it's not easy one to solve. But a good point nonetheless. John.
On 10/14/05 5:35 AM, "John Maddock"
And what if the string uses more than half of the available space? (I'm not advocating in-place modification, but just mentioning that "return a copy" has its flaws too.)
Then you're probably going to have to start paging to file and other complicated things, it's not easy one to solve. But a good point nonetheless.
I was originally going to write "memory space," but I left it out since I realized that disk space could run low too. Maybe I should have been explicit, to block your "just use virtual memory/paging" save. -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com
On Tue, 11 Oct 2005 15:59:23 -0400, "Gennadiy Rozental"
Well, one thing I use it for is parsing HTTP directly in the read buffer, which is a vector. If the interfaces weren't generic, I'd either have to write my own functions to duplicate the functionality, or I'd have to copy the incoming data to a string. The first seems silly, and the latter would have unacceptable overhead in my case. The HTTP I'm parsing is streaming video from several dozen cameras at once, so I have to work with the buffers directly.
I also use this library on plain old C strings, which wouldn't be possible if it were locked to basic_string.
Some changes may make sense, but I really like the way it is now.
The way I see it all string algorithms should be using class like const_string in their interfaces.
Maybe I'm misunderstanding you, but are you suggesting that no string algorithms should modify a string in-place? That would be unacceptable to me. I wouldn't use such a library. -- Be seeing you.
"Thore Karlsen"
On Tue, 11 Oct 2005 15:59:23 -0400, "Gennadiy Rozental"
wrote: Well, one thing I use it for is parsing HTTP directly in the read buffer, which is a vector. If the interfaces weren't generic, I'd either have to write my own functions to duplicate the functionality, or I'd have to copy the incoming data to a string. The first seems silly, and the latter would have unacceptable overhead in my case. The HTTP I'm parsing is streaming video from several dozen cameras at once, so I have to work with the buffers directly.
I also use this library on plain old C strings, which wouldn't be possible if it were locked to basic_string.
Some changes may make sense, but I really like the way it is now.
The way I see it all string algorithms should be using class like const_string in their interfaces.
Maybe I'm misunderstanding you, but are you suggesting that no string algorithms should modify a string in-place? That would be unacceptable to me. I wouldn't use such a library.
Actually what I had in mind is similar to basic_cstring class I am using in Boost.Test, which supports both const and mutating versions. But most/many of string algorithms would use const version. Gennadiy
String-algo : Interested, but concern over interface and choice of functions, generic vs basic_string.5.3 separate proposal
Any details on this? I love this library, and I would love to see it standardized in some form. Overall I've found the interface and function choices to be excellent (and I'd love to see even more), and I love that it's generic and not limited to basic_string. I use it on vectors and other containers for different kinds of network protocol parsing.
Thorsten Ottosen acted as champion for the paper, but I'll try to recall the discussion.
There concern was that at least some of the algorithms were only useful in the context of strings, and so it would be an over-generalization to supply them as free algorithms.
My point exactly. I believe string_algo library went (completely?) wrong way. It should never use iterator parameterization, but char type only. If there exist any algorithm that is useful beyond strings - it doesn't belong to this library. Gennadiy
On Mon, 10 Oct 2005 10:46:35 -0500, Thore Karlsen wrote
On Sun, 9 Oct 2005 21:32:30 -0400, "Beman Dawes"
wrote: [...]
String-algo : Interested, but concern over interface and choice of functions, generic vs basic_string.5.3 separate proposal
Any details on this? I love this library, and I would love to see it standardized in some form. Overall I've found the interface and function choices to be excellent (and I'd love to see even more), and I love that it's generic and not limited to basic_string. I use it on vectors and other containers for different kinds of network protocol parsing.
My take on the discussion is that the concerns can be overcome with some additional selling to the committee. As Beman indicated, concern was raised and some discussion happened, but the time for discussion was pretty short (~30 minutes). Both Thorsten and I gave some explanations, but we didn't really have a crisp prepared response for the particular issues raised. In particular, what I wish we would have had more time to do is systematically step thru the various algorithms with a justification for each. For example: trim_right ---> see Perl chomp, see Icon trim, RWCString strip() replace ---> see Perl s///, see Icon replace, RWCString replace() etc... (for those that don't know, Icon is programming langauge decendent of SNOBOL4 -- both are known for particularly good string processing capabilities). Bottom line is that I think most of these can be easily justified with a bit of research into existing practice. As for the basic_string versus generic discussion, I have to say, there were some really well known generic programmers in the room that were shockingly in favor of adding these to basic_string directly. The basic argument was to make things more useable to 'normal' programmers -- essentially to serve the 95% case. To me, though, the generic algorithms are far superior because I want to use the algorithms with new string types (utfstring, boost::fixed_string, sgi::ropes, boost::const_string, etc). And I think that with the range interfaces in this library the function interface is very easy for beginners to use. Again, I think some follow up explanation with clear examples of the advantages might change minds. Finally, I think there needs to be a clear presentation of why all the parts of the library go together. That is, without the classifiers part, the algorithms aren't easy to use, etc... Jeff
participants (9)
-
Beman Dawes
-
Daryle Walker
-
Gennadiy Rozental
-
Jeff Garland
-
John Maddock
-
Pavol Droba
-
Peter Dimov
-
Simon Buchan
-
Thore Karlsen