
Hello, I was looking on Google for a good means of dealing with UTF encoded text files, and I ran across the old postings about Alberto Barbati's UTF library for Boost. While I know that the old files are still around, is there any progress being made on this front? Or is this a dead-end? Because it's exactly what I was looking for! -Owen Anderson

I would recommend the IBM ICU library[1] as I believe it is currently the best C++ Unicode support library. Its interface does not match that of Boost libraries or the C++ Standard library, but I don't believe there are any decent Unicode libraries with a better interface. There has been some discussion on this list regarding creating a Boost Unicode library, but currently such a Boost library does not exist. [1] http://www-306.ibm.com/software/globalization/icu/index.jsp -- Jeremy Maitin-Shepard

I would recommend the IBM ICU library[1] as I believe it is currently the best C++ Unicode support library. Its interface does not match that of Boost libraries or the C++ Standard library, but I don't believe there are any decent Unicode libraries with a better interface. There has been some discussion on this list regarding creating a Boost Unicode library, but currently such a Boost library does not exist.
[1] http://www-306.ibm.com/software/globalization/icu/index.jsp
I would also recommend ICU and point out that: regex will now work with ICU and handle UTF-8 UTF-16 or UTF-32 input via some new "Unicode aware" interfaces. There are some unofficial iterator adapters in boost/regex/pending/unicode_iterator.hpp that will convert between various Unicode encodings "on the fly": for example they can be used to make a UTF-8 sequence appear "as if" they were a UTF-32 sequence or whatever. HTH, John.

There has been a good deal of talk on a Unicode library, but I havn't followed it much. I've been using http://dev.int64.org/snips/utf8.hpp for dealing with UTF-8. (caveat: expects wchar_t to be UTF-16 or UTF-32) On 8/12/05, Owen Anderson <resistor@mac.com> wrote:
Hello,
I was looking on Google for a good means of dealing with UTF encoded text files, and I ran across the old postings about Alberto Barbati's UTF library for Boost. While I know that the old files are still around, is there any progress being made on this front? Or is this a dead-end? Because it's exactly what I was looking for!
-Owen Anderson
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Cory Nelson http://www.int64.org

On Sat, 13 Aug 2005 06:15:02 +0400, Cory Nelson <phrosty@gmail.com> wrote:
There has been a good deal of talk on a Unicode library, but I havn't followed it much. I've been using http://dev.int64.org/snips/utf8.hpp for dealing with UTF-8. (caveat: expects wchar_t to be UTF-16 or UTF-32)
I've just checked out the source file and spotted: ret=( (((wchar_t)((*iter++)&0x0F)) << 12) | (((wchar_t)((*iter++)&0x3F)) << 6) | ((wchar_t)((*iter++)&0x3F)) ); Aren't these increments in the expression source of UB because there are no sequence points between them? -- Maxim Yegorushkin

"Maxim Yegorushkin" <maxim.yegorushkin@gmail.com> writes:
On Sat, 13 Aug 2005 06:15:02 +0400, Cory Nelson <phrosty@gmail.com> wrote:
There has been a good deal of talk on a Unicode library, but I havn't followed it much. I've been using http://dev.int64.org/snips/utf8.hpp for dealing with UTF-8. (caveat: expects wchar_t to be UTF-16 or UTF-32)
I've just checked out the source file and spotted:
ret=( (((wchar_t)((*iter++)&0x0F)) << 12) | (((wchar_t)((*iter++)&0x3F)) << 6) | ((wchar_t)((*iter++)&0x3F)) );
Aren't these increments in the expression source of UB because there are no sequence points between them?
Yes. -- Dave Abrahams Boost Consulting www.boost-consulting.com

On 8/13/05, Owen Anderson <resistor@mac.com> wrote:
I was looking on Google for a good means of dealing with UTF encoded text files ...
Work on a Unicode library is progressing. boost/detail/utf8_codecvt_facet.hpp and libs/detail/utf8_codecvt_facet.cpp (available from your local Boost distribution) look OK for what you want. Regards, Rogier
participants (7)
-
Cory Nelson
-
David Abrahams
-
Jeremy Maitin-Shepard
-
John Maddock
-
Maxim Yegorushkin
-
Owen Anderson
-
Rogier van Dalen