Re: [boost] Re: [Unicode strings] We're off

17 Mar 2005


      Erik Wien wrote:
...
Thorsten Ottosen wrote:
...
hm...the function is only going to be used by 3 different classes, right?
If so at most 3 times the size of a virtual function solution;
No. 5 I think. UTF-8, and UTF-16 and 32 in both endians. The ones in the 
platform's reversed endian would only really be used for file parsing 
though, whenever we get around to that...
...
v-tables fill up too; and virtual functions in a class template
can have *large* code size impact if not all virtual functions
are used. (So are they?)
The idea is to keep the virtual interface to a bare minimum, and let the 
string class itself create it's own complex interface by combining these 
virtual functions. Basically just having functions for setting, getting 
and iteration in the implementation, meaning they should all be used 
frequently.
...
sometimes strong typesafety is good; sometimes it's not
Yep. What we need to decide on, is whether it is good more than it is 
not. :)
...
ok, that seems to motivate that some form of dynamic types should be 
there.
That's what I thought too a while ago, but I'm not that sure anymore. 
I'll admit I'm no iostream wizard, but wouldn't it be possible to create 
some kind of unicode_stream by making a specialization of char_traits 
for unsigned ints (Unicode code points), and then create some facets (I 
forget which ones, codecvt and ctype I guess) that enable these streams 
to read all Unicode encoding forms from their buffer, and transcode into 
a sequence of Unicode code points before returning them to the user? 
This would mean that the users would not have to know what kind of 
encoding is used in the file they are reading. It would be totally 
transparent to them.
...
It seems to me that we then need four classes
utf8_string
utf16_string
utf32_string
utf_string  // the dynamic one
The three first ones could be created by having one template class 
templated on encoding, and have it use the encoding_traits classes from 
the current prototype. I have tried this before, and it works fine. The 
neccessity of the last one would depend on whether the iostream 
functionality I mentioned above would work or not. If it is possible, I 
don't really see the need for a dynamic string class either.
I think it would be desireble to have the dynamic class even when not 
having such iostream functionality. Sometimes we dont know which utf we 
are going to use, even when needing to read it from somewhere else or 
making it through some low-level way. But having iostreams read and 
write unicode would be awesome. Maybe having somekind of stringstream 
would be great too, but I think it would be much more work than it was 
planned.
...
- Erik
_______________________________________________
Unsubscribe & other changes: 
http://lists.boost.org/mailman/listinfo.cgi/boost
-- 
    Felipe Magno de Almeida
    UIN: 2113442
     email: felipe.almeida at ic unicamp br, felipe.m.almeida at gmail 
com, felipe at synergy com
I am a C, modern C++, MFC, ODBC, Windows Services, MAPI developer
from synergy, and Computer Science student from State
University of Campinas(UNICAMP).
To know more about:
Unicamp: http://www.ic.unicamp.br
Synergy: http://www.synergy.com.br
current work: http://www.mintercept.com