Hello!
When comparing the following UTF-8 string pairs using Boost.Locale (any
backend) at the "identical" level (accents are relevant) and a UTF-8
locale (I tried de_DE.utf-8) on Debian Testing (boost 1.49), I get a
result that does not make sense to me.
"Muller" is considered less than "Müller" (as expected), but "Muller 2"
is considered more than "Müller 1", despite the different result for the
names alone.
Do I have bug in my code, in the underlying libraries or in my
expectations?
#include
#include
#include
#include
#include
#include
#include <iostream>
int main(int argc, char **argv)
{
setlocale(LC_ALL, "");
std::cout << "backends: " <<
boost::join(boost::locale::localization_backend_manager::global().get_all_backends(),
", ") << std::endl;
boost::locale::localization_backend_manager::global().select(argc > 2 ? argv[2] : "icu");
std::locale loc = boost::locale::generator()(argc > 1 ? argv[1] : "de_DE.UTF-8");
typedef boost::tuple string_pair_t;
std::vector pairs =
boost::assign::tuple_list_of("Muller", "Müller")
("Muller 2", "Müller 1")
("Muller B", "Müller A");
BOOST_FOREACH (const string_pair_t &pair, pairs) {
const std::string &a = boost::get<0>(pair),
&b = boost::get<1>(pair);
int cmp = std::use_facet(loc).
compare(boost::locale::collator_base::identical, a, b);
std::cout <<
a << " and " << b <<
" are " <<
(cmp == 0 ? "identical" : "different") <<
" (" <<
(cmp < 0 ? '<' :
cmp > 0 ? '>' : '=') <<
")" << std::endl;
}
return 0;
}
The output on my system:
$ /tmp/mueller de_DE.utf-8 icu
backends: icu, posix, std
Muller and Müller are different (<)
Muller 2 and Müller 1 are different (>)
Muller B and Müller A are different (>)
Bye, Patrick