[string_algo] ilexicographical_compare()/is_iless() bug?

I want to share a problem using ilexicographical_compare(). I think that ilexicographical_compare() must compare two strings in "alphabetical" order, the order letters appear in alphabet. But ilexicographical_compare() uses is_iless() to compare letters. And such comparison looks like: std::toupper<T1>(Arg1,m_Loc)<std::toupper<T2>(Arg2,m_Loc); So, in fact, letters are compared depending on their position in a charset. It's not always match the alphabet order. Examples are Cyrillic letters "Io" and "i" (Unicode 0451 and 0456). I think the right solution is to compare like this: T1 Ch1 = std::toupper<T1>(Arg1,m_Loc); T2 Ch2 = std::toupper<T2>(Arg2,m_Loc); return std::use_facet< std::collate<typename CharType> > (m_Loc).compare(&Ch1, &Ch1 + 1, &Ch2, &Ch2 + 1);

On Sun, Apr 8, 2012 at 3:46 PM, Dmitry Vinogradov <sraider@yandex.ru> wrote:
I want to share a problem using ilexicographical_compare().
I think that ilexicographical_compare() must compare two strings in "alphabetical" order, the order letters appear in alphabet. But ilexicographical_compare() uses is_iless() to compare letters. And such comparison looks like: std::toupper<T1>(Arg1,m_Loc)<std::toupper<T2>(Arg2,m_Loc);
I think this isn't right either. If you've got both "A" and "a", the order appears to be undefined.
So, in fact, letters are compared depending on their position in a charset. It's not always match the alphabet order. Examples are Cyrillic letters "Io" and "i" (Unicode 0451 and 0456).
I think the right solution is to compare like this: T1 Ch1 = std::toupper<T1>(Arg1,m_Loc); T2 Ch2 = std::toupper<T2>(Arg2,m_Loc); return std::use_facet< std::collate<typename CharType> > (m_Loc).compare(&Ch1, &Ch1 + 1, &Ch2, &Ch2 + 1);
The idea sounds reasonable. Olaf

On Sun, Apr 08, 2012 at 06:50:50PM +0200, Olaf van der Spek wrote:
On Sun, Apr 8, 2012 at 3:46 PM, Dmitry Vinogradov <sraider@yandex.ru> wrote:
I want to share a problem using ilexicographical_compare().
I think that ilexicographical_compare() must compare two strings in "alphabetical" order, the order letters appear in alphabet. But ilexicographical_compare() uses is_iless() to compare letters. And such comparison looks like: std::toupper<T1>(Arg1,m_Loc)<std::toupper<T2>(Arg2,m_Loc);
I think this isn't right either. If you've got both "A" and "a", the order appears to be undefined.
So, in fact, letters are compared depending on their position in a charset. It's not always match the alphabet order. Examples are Cyrillic letters "Io" and "i" (Unicode 0451 and 0456).
I think the right solution is to compare like this: T1 Ch1 = std::toupper<T1>(Arg1,m_Loc); T2 Ch2 = std::toupper<T2>(Arg2,m_Loc); return std::use_facet< std::collate<typename CharType> > (m_Loc).compare(&Ch1, &Ch1 + 1, &Ch2, &Ch2 + 1);
So how does all this interact with amusing locales like say Turkish where the dottedness of i is preserved when casing, or glyphs that do not have upper/lower-case forms? -- Lars Viklund | zao@acc.umu.se
participants (3)
-
Dmitry Vinogradov
-
Lars Viklund
-
Olaf van der Spek