Hi, I am aware that string_algo has an if_iequal functor. Is there also an equivalent for a less-than comparison? I have to compare strings ignoring the case but taking locales into account. Right now I'm calling boost::algorithm::to_lower on two copies of the original strings. I'm not so sure if this is very efficient when I'm comparing thousands of strings. Here's the functor as it is now: struct FileLessNameA : std::binary_function<const File&,const File&,bool> { bool operator() (const File& lhs, const File& rhs) const { std::string path1( lhs.get_name().c_str() ); std::string path2( rhs.get_name().c_str() ); boost::algorithm::to_lower( path1 ); boost::algorithm::to_lower( path2 ); return path1 < path2; } }; Any suggestions for improvements appreciated :) Regards, Matthias -- Matthias Kaeppler
Hi, On Sat, Jul 16, 2005 at 03:02:11PM +0200, Matthias Kaeppler wrote:
Hi,
I am aware that string_algo has an if_iequal functor. Is there also an equivalent for a less-than comparison? I have to compare strings ignoring the case but taking locales into account. Right now I'm calling boost::algorithm::to_lower on two copies of the original strings. I'm not so sure if this is very efficient when I'm comparing thousands of strings.
Here's the functor as it is now:
struct FileLessNameA : std::binary_function<const File&,const File&,bool> { bool operator() (const File& lhs, const File& rhs) const { std::string path1( lhs.get_name().c_str() ); std::string path2( rhs.get_name().c_str() ); boost::algorithm::to_lower( path1 ); boost::algorithm::to_lower( path2 ); return path1 < path2; } };
There is no such a function in string_algo lib. But you can easily use std::lexicographical_compare. There is a variant with a custom comparison predicate. For that you can use boost::is_iequal defined in boost/algorithm/string/compare.hpp. Best regards, Pavol.
Hi Pavol,
There is no such a function in string_algo lib. But you can easily use std::lexicographical_compare.
There is a variant with a custom comparison predicate. For that you can use boost::is_iequal defined in boost/algorithm/string/compare.hpp.
According to this paper (http://lafstern.org/matt/col2_new.pdf) by Matt Austern, using lexicographical_compare (alone) is not a good idea, because it doesn't take locales into account and thus case conversion may be incorrect. He proposed a solution in this article, but since Matt Austern has already contributed to Boost, I thought this functor might already be part of the Boost libraries. -- Matthias Kaeppler
On Sat, Jul 16, 2005 at 08:11:45PM +0200, Matthias Kaeppler wrote:
Hi Pavol,
There is no such a function in string_algo lib. But you can easily use std::lexicographical_compare.
There is a variant with a custom comparison predicate. For that you can use boost::is_iequal defined in boost/algorithm/string/compare.hpp.
According to this paper (http://lafstern.org/matt/col2_new.pdf) by Matt Austern, using lexicographical_compare (alone) is not a good idea, because it doesn't take locales into account and thus case conversion may be incorrect.
He proposed a solution in this article, but since Matt Austern has already contributed to Boost, I thought this functor might already be part of the Boost libraries.
Well I dare to say that the solution I have proposed to you is correct. I'm fully aware that the case-conversion is locale awere. If you have a look at the boost::is_iequal predicate, you will see that it takes a locale as an agrument. If you provide correct locales, the comparison will be correct as well. Actualy it is exactly the solution proposed in the paper you mentioned. Best regards, Pavol.
On Sat, 16 Jul 2005 18:46:33 +0200, Pavol Droba <droba@topmail.sk> wrote:
I am aware that string_algo has an if_iequal functor. Is there also an equivalent for a less-than comparison? I have to compare strings ignoring the case but taking locales into account. Right now I'm calling boost::algorithm::to_lower on two copies of the original strings. I'm not so sure if this is very efficient when I'm comparing thousands of strings.
Here's the functor as it is now:
struct FileLessNameA : std::binary_function<const File&,const File&,bool> { bool operator() (const File& lhs, const File& rhs) const { std::string path1( lhs.get_name().c_str() ); std::string path2( rhs.get_name().c_str() ); boost::algorithm::to_lower( path1 ); boost::algorithm::to_lower( path2 ); return path1 < path2; } };
There is no such a function in string_algo lib. But you can easily use std::lexicographical_compare.
There is a variant with a custom comparison predicate. For that you can use boost::is_iequal defined in boost/algorithm/string/compare.hpp.
But is_iequal only tests for equality, it doesn't provide an ordering of the characters. He's looking for something like is_iless, which I think would be a nice addition too string_algo. -- Be seeing you.
Thore Karlsen wrote:
But is_iequal only tests for equality, it doesn't provide an ordering of the characters. He's looking for something like is_iless, which I think would be a nice addition too string_algo.
That's exactly what I mean. Calling lexicographical_compare with is_iequal as the predicate would return true if they are, well, equal. That's not what I need. I need an ordering predicate. -- Matthias Kaeppler
On Sat, Jul 16, 2005 at 06:17:28PM -0500, Thore Karlsen wrote:
On Sat, 16 Jul 2005 18:46:33 +0200, Pavol Droba <droba@topmail.sk> wrote:
I am aware that string_algo has an if_iequal functor. Is there also an equivalent for a less-than comparison? I have to compare strings ignoring the case but taking locales into account. Right now I'm calling boost::algorithm::to_lower on two copies of the original strings. I'm not so sure if this is very efficient when I'm comparing thousands of strings.
Here's the functor as it is now:
struct FileLessNameA : std::binary_function<const File&,const File&,bool> { bool operator() (const File& lhs, const File& rhs) const { std::string path1( lhs.get_name().c_str() ); std::string path2( rhs.get_name().c_str() ); boost::algorithm::to_lower( path1 ); boost::algorithm::to_lower( path2 ); return path1 < path2; } };
There is no such a function in string_algo lib. But you can easily use std::lexicographical_compare.
There is a variant with a custom comparison predicate. For that you can use boost::is_iequal defined in boost/algorithm/string/compare.hpp.
But is_iequal only tests for equality, it doesn't provide an ordering of the characters. He's looking for something like is_iless, which I think would be a nice addition too string_algo.
I see. Sorry about that. Somehow I have overlooked this obvious aspect. You are right, such a function can be usefull. I will add it (but it will have to wait after the release) Regards, Pavol
Pavol Droba wrote:
You are right, such a function can be usefull. I will add it (but it will have to wait after the release)
That's the spirit! Thanks a lot Pavol. If I can help you with anything, just drop me a mail at matthias_at_digitalraid_dot_com. Best regards, Matthias -- Matthias Kaeppler
Hm, by the way (this is more a general question): Since it will obviously take some time for this functor to make it into boost officially, what would you recommend for me to do now? I just edited compare.hpp already to feature an is_iless functor (no big deal) and it works just fine. But using a self-modified version of a library file in my program is not such a good idea right? If you were me, would you put that functor into a separate file until it's officialy in boost, or just edit the original boost file in /usr/include/boost/<...> and "pretend" it would already be in the library? -- Matthias Kaeppler
On Sun, Jul 17, 2005 at 02:23:11PM +0200, Matthias Kaeppler wrote:
Hm, by the way (this is more a general question):
Since it will obviously take some time for this functor to make it into boost officially, what would you recommend for me to do now? I just edited compare.hpp already to feature an is_iless functor (no big deal) and it works just fine. But using a self-modified version of a library file in my program is not such a good idea right?
If you were me, would you put that functor into a separate file until it's officialy in boost, or just edit the original boost file in /usr/include/boost/<...> and "pretend" it would already be in the library?
It is your code tree, so it is purely up to you to make decision. But I would suggest to make your functor unrelated to a boost tree and potentialy to replace it in the future with the boost version. This way you have the best probability to avoid conflicts. Regards, Pavol
"Pavol Droba" <droba@topmail.sk> wrote in message news:20050717105715.GM17311@lenin.felcer.sk...
On Sat, Jul 16, 2005 at 06:17:28PM -0500, Thore Karlsen wrote:
On Sat, 16 Jul 2005 18:46:33 +0200, Pavol Droba <droba@topmail.sk> wrote: ...
But is_iequal only tests for equality, it doesn't provide an ordering of the characters. He's looking for something like is_iless, which I think would be a nice addition too string_algo.
I see. Sorry about that. Somehow I have overlooked this obvious aspect.
You are right, such a function can be usefull. I will add it (but it will have to wait after the release)
Any thoughts on carrying this one step further. I've the case where I need not just case insensitivity, but more of an interleaved case sensitivity. For example using the following: int CompareInterleavedCaseChar( char a1, char a2 ) { if( std::toupper(a1) < std::toupper(a2) ) return -1; else if( std::toupper(a2) < std::toupper(a1) ) return 1; else if( a1 < a2 ) return -1; else if( a2 < a1 ) return 1; else return 0; } results in the following ordering: ABCdef ABcDef AbcDef Abcdef in other words A < a < B < b < C ... Thanks, Jeff Flinn
Pavol Droba wrote:
There is no such a function in string_algo lib. But you can easily use std::lexicographical_compare.
There is a variant with a custom comparison predicate. For that you can use boost::is_iequal defined in boost/algorithm/string/compare.hpp. But is_iequal only tests for equality, it doesn't provide an ordering of the characters. He's looking for something like is_iless, which I think would be a nice addition too string_algo.
I see. Sorry about that. Somehow I have overlooked this obvious aspect.
You are right, such a function can be usefull. I will add it (but it will have to wait after the release)
Hi Pavol, What's the status of this feature? I've been looking for it too. :) Olaf
On Thu, Mar 02, 2006 at 09:41:17PM +0100, Olaf van der Spek wrote:
Pavol Droba wrote:
There is no such a function in string_algo lib. But you can easily use std::lexicographical_compare.
There is a variant with a custom comparison predicate. For that you can use boost::is_iequal defined in boost/algorithm/string/compare.hpp. But is_iequal only tests for equality, it doesn't provide an ordering of the characters. He's looking for something like is_iless, which I think would be a nice addition too string_algo.
I see. Sorry about that. Somehow I have overlooked this obvious aspect.
You are right, such a function can be usefull. I will add it (but it will have to wait after the release)
Hi Pavol,
What's the status of this feature? I've been looking for it too. :)
Hi, The functionality is in cvs. I have added lexicographical_compare (also in i variant) and comparison predicates is_(i)less, is_not_(i)greater. Check it out. Regards, Pavol
Pavol Droba wrote:
The functionality is in cvs. I have added lexicographical_compare (also in i variant) and comparison predicates is_(i)less, is_not_(i)greater.
Check it out.
struct is_not_greater { template< typename T1, typename T2 > bool operator()( const T1& Arg1, const T2& Arg2 ) const { return Arg1>=Arg2; } };
=, isn't that is_not_less?
struct is_iless { is_iless( const std::locale& Loc=std::locale() ) : m_Loc( Loc ) {} template< typename T1, typename T2 > bool operator()( const T1& Arg1, const T2& Arg2 ) const { #if defined(__BORLANDC__) && (__BORLANDC__ >= 0x560) && (__BORLANDC__ <= 0x564) && !defined(_USE_OLD_RW_STL) return std::toupper(Arg1)<std::toupper(Arg2); #else return std::toupper(Arg1,m_Loc)<std::toupper(Arg2,m_Loc); #endif } } What about 'A' vs 'a'? Shouldn't 'A' < 'a'? I'm also wondering how expensive std::toupper is. Would it make sense to do a == first to avoid the std::toupper calls in certain cases?
Hi, On Fri, Mar 03, 2006 at 02:42:14PM +0100, Olaf van der Spek wrote:
Pavol Droba wrote:
The functionality is in cvs. I have added lexicographical_compare (also in i variant) and comparison predicates is_(i)less, is_not_(i)greater.
Check it out.
struct is_not_greater { template< typename T1, typename T2 > bool operator()( const T1& Arg1, const T2& Arg2 ) const { return Arg1>=Arg2; } };
=, isn't that is_not_less?
Oh, this is what happens when the regression tests are not complete. Thanks for notice, you are right of couse. The operator should be reversed.
struct is_iless { is_iless( const std::locale& Loc=std::locale() ) : m_Loc( Loc ) {}
template< typename T1, typename T2 > bool operator()( const T1& Arg1, const T2& Arg2 ) const { #if defined(__BORLANDC__) && (__BORLANDC__ >= 0x560) && (__BORLANDC__ <= 0x564) && !defined(_USE_OLD_RW_STL) return std::toupper(Arg1)<std::toupper(Arg2); #else return std::toupper(Arg1,m_Loc)<std::toupper(Arg2,m_Loc); #endif } }
What about 'A' vs 'a'? Shouldn't 'A' < 'a'?
Hmm, interesting point. But it is not fully correct. Take this example: str1="Aab" str2="aAa" in my opinion following should hold str1>str2, however if 'A' < 'a' then the comparison would yield the oposite. Such a comparison make sense only if both strings are equal (except the case). Current implementation of ilexicographical_compare does not perform this extra step since it tries to be compliant with std::lexicographical_compare.
I'm also wondering how expensive std::toupper is. Would it make sense to do a == first to avoid the std::toupper calls in certain cases?
Well, this is a more complicated point. Small optimization you are proposing does not really make any difference. I have read a paper (I don't remember where), where author proposes to cache the results of tolower during the comparison. This approch is complex and it effectively disables the possibility to use user-defined comparison predicates. So I decided to go for a simple solution. Best regards, Pavol
On 3/3/06, Pavol Droba <droba@topmail.sk> wrote:
What about 'A' vs 'a'? Shouldn't 'A' < 'a'?
Hmm, interesting point. But it is not fully correct. Take this example:
str1="Aab" str2="aAa"
in my opinion following should hold str1>str2, however if 'A' < 'a' then the comparison would yield the oposite. Such a comparison make sense only if both strings are equal (except the case).
True, I only realized that later.
Current implementation of ilexicographical_compare does not perform this extra step since it tries to be compliant with std::lexicographical_compare.
I'm also wondering how expensive std::toupper is. Would it make sense to do a == first to avoid the std::toupper calls in certain cases?
Well, this is a more complicated point. Small optimization you are proposing does not really make any difference. I have read a paper (I don't remember where),
Why not? Isn't a simple compare much faster than two table lookup?
where author proposes to cache the results of tolower during the comparison.
I'm not sure how that would work, could you explain?
This approch is complex and it effectively disables the possibility to use user-defined comparison predicates. So I decided to go for a simple solution.
Best regards, Pavol _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
On Fri, Mar 03, 2006 at 11:29:58PM +0100, Olaf van der Spek wrote:
Well, this is a more complicated point. Small optimization you are proposing does not really make any difference. I have read a paper (I don't remember where),
Why not? Isn't a simple compare much faster than two table lookup?
Ok, maybe you are right. One compare will not cost much and it can speedup few cases.
where author proposes to cache the results of tolower during the comparison.
I'm not sure how that would work, could you explain?
I don't remember the details. The idea was to eliminate some of the virual calls, that take place when dealing with locales. The algorihtm was described on several pages. Unfortunaltely, I forgot where I have seen that paper. Regards, Pavol
Pavol Droba <droba@topmail.sk> wrote in news:20060307091753.GU793@lenin.felcer.sk:
On Fri, Mar 03, 2006 at 11:29:58PM +0100, Olaf van der Spek wrote:
Well, this is a more complicated point. Small optimization you are proposing does not really make any difference. I have read a paper (I don't remember where),
Why not? Isn't a simple compare much faster than two table lookup?
Ok, maybe you are right. One compare will not cost much and it can speedup few cases.
where author proposes to cache the results of tolower during the comparison.
I'm not sure how that would work, could you explain?
I don't remember the details. The idea was to eliminate some of the virual calls, that take place when dealing with locales. The algorihtm was described on several pages. Unfortunaltely, I forgot where I have seen that paper.
Regards, Pavol
I believe what you are looking for is found in Effective STL by Scott Meyers (pages 229-237). On page 236-237 he writes a case insensitive string comparison functor that has this optimization, here it is: struct lt_str_2 : public std::binary_function<std::string, std::string, bool> { struct lt_char { const char* tab; lt_char(const char* t) : tab(t) {} bool operator()(char x, char y) const { return tab[x-CHAR_MIN] < tab[y-CHAR_MIN]; } }; char tab[CHAR_MAX - CHAR_MIN + 1]; lt_str_2(const std::locale& L = std::locale::classic()) { const std::ctype<char>& ct = std::use_facet<std::ctype<char> >(L); for (int i=CHAR_MIN; i<=CHAR_MAX; ++i) tab[i-CHAR_MIN] = (char)i; ct.toupper(tab, tab + (CHAR_MAX - CHAR_MIN + 1)); } bool operator()(const std::string& x, const std::string& y) const { return std::lexicographical_compare(x.begin(), x.end(), y.begin(), y.end(), lt_char(tab)); } }; He then talks about the pros and cons of this. "This isn't a fully general solution ..., if your were working with 32-bit UCS-4 characters." "This might be slower if you're creating an lt_str_2 function object, using it to compare a few short string, and then throwing it away." "For any substantial use, though, lt_str_2 will be noticeable faster" I hope that I am allowed to reproduce this. If I am not, I humbly ask Scott Meyers for forgiveness. Andy
participants (7)
-
Andy
-
Jeff Flinn
-
Matthias Kaeppler
-
Olaf van der Spek
-
Olaf van der Spek
-
Pavol Droba
-
Thore Karlsen