Idea to support unicode strings

27 Sep 2006

      Hi,

I wrote a wrapper around John Maddock's unicode iterators (Thanks to Tomas for the hint),
which provide a std::string like interface to access utf8, utf16 or utf32 encoded strings.

<example>
   utf8_string u8("unicode string"); // construct by utf8 coded char[]
   u8 += 0x0020; // add some chars
   u8 += 0x0391; // alpha
   u8 += 0x0392; // betha
   u8 += 0x0393; // gamma
   std::cout << u8.raw() << std::endl; // access encoded string
   std::copy(u8.begin(), u8.end(), std::ostream_iterator<utf32_char>(std::cout, ", "));
   std::cout << std::endl;
   utf32_string u32=u8; // assign and convert to utf32;
   std::copy(u32.begin(), u32.end(), std::ostream_iterator<utf32_char>(std::cout, ", "));
   std::cout << std::endl;
</example>

The wrapper can be extended to support additional encodings like latin-1 or windows-1252,
by providing encode and decode iterators.

The source for the wrapper:
http://opensource.nicai-systems.com/unicode/unicode.h

And some test code:
http://opensource.nicai-systems.com/unicode/test_unicode.cpp

If there is any interest I can extend the code to support more std::basic_string
methods, and add additional encodings...

Regards,
Nils

Nils Springob

Tomas Pecholt

Nils Springob

Tomas Pecholt

Nils Springob

tags

participants (2)