RE: [boost] Re: Any interest in adding unicode support to boost?

From: Erik Wien [mailto:wien@start.no] Sent: October 20, 2004 14:37
- Why would the user want to change the encoding? Especially between UTF-16 and UTF-32?
Well... Different people have different needs. If you are mostly using ASCII characters, and require small size, UTF-8 would fit your bill. If you need the best general performance on most operations, use UTF-16. If you need fast iteration over code points and size doesn't matter, use UTF-32.
another argument is legacy / compatibility. I'v written an XML library in C++ that essentially wraps a C library (libxml2 for the insiders), and so I'd like to avoid unnecessary string copies when passing data around. While the inside uses utf-8, I want to make the wrapper API adaptive to user-side unicode APIs. Thus the unicode type has become a template parameter, and conversion is being done by a trait class. Regards, Stefan

Stefan Seefeld wrote:
another argument is legacy / compatibility. I'v written an XML library in C++ that essentially wraps a C library (libxml2 for the insiders), and so I'd like to avoid unnecessary string copies when passing data around. While the inside uses utf-8, I want to make the wrapper API adaptive to user-side unicode APIs. Thus the unicode type has become a template parameter, and conversion is being done by a trait class.
If the inside uses UTF-8, you should just accept UTF-8. The standard way to avoid copies is to provide a function/constructor that takes an iterator range. Users can now pass anything and aren't forced to exclusively choose one encoding, which becomes a part of the component type and can never be changed. To paraphrase, unnecessary parameterization is the root of most evil nowadays. Parameters are a liability, not an asset.
participants (2)
-
Peter Dimov
-
Stefan Seefeld