Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter

16 Aug 2011

      On Tue, Aug 16, 2011, Phil Endecott wrote:
...
Soares Chen Ruo Fei wrote:
...
I think it'll be easier to just remove the
decrement function completely.
No, don't do that.  (That would be like removing random access from
std::vector because std::list can't implement it efficiently.)
I'm not familiar with the algorithms requiring bidirectional access that
Artyom mentions, but a standard way to make them work with iterators for
various different encodings would be to specialise the algorithms.  You
would have a main implementation that requires the bidirectional (or random
access) iterator, and a forwarding implementation that looks like this:
template <typename FORWARD_ITER>
void algorithm(FORWARD_ITER begin, FORWARD_ITER end)
{
 // Make a copy of the range into a bidirectional container:
 std::vector< typename FORWARD_ITER::value_type > v(begin,end);
 // Call the other specialisation:
 algorithm(v.begin(),v.end());
}
That is the standard time-vs-space complexity trade-off.
Well I don't think forcing all generic Unicode algorithms to provide
specialization version for forward-only iterators is any better than
providing a less-efficient bidirectional iterator. Such a burden is
too high for the algorithm developers. Or perhaps a better decision is
to simply let the compiler yield a (friendly?) error when the generic
algorithm uses the decrement/random access operator, and find a way to
inform the user to convert the string to standard UTF strings before
passing to the Unicode algorithms.

Or perhaps I could find a way to let template instances of
unicode_string_adapter with MBCS encoding to store convert the string
to UTF string during construction and store the UTF encoded string
instead. The only problem for this is that during conversion back to
the raw string, the string adapter would have to reconvert the
internally stored UTF-encoded string back to the MBCS-encoded string.
This can be expensive if the user regularly wants access the raw
string, unless we store two smart pointers within the string adapter -
one for the MBCS string and one for the converted UTF string, but
doing so would waste storage space as well.
...
...
(Actually it's also because I don't know if there is any way to
conditionally let the code point iterator inherit from either
std::forward_iterator or std::bidirectional_iterator)
You don't mean "inherit from".  You mean "be a model of".  See Artyom's
"VERY BAD DESIGN" post.  There should not be any virtual methods anywhere in
this library.  If you don't understand how that can be done, we should
discuss that urgently.
The virtual functions are used in my prototype file
dynamic_unicode_string.hpp. The design hasn't gone through much
thought and I wrote it just to demonstrate to Artyom that dynamic
encoded strings can be implemented at a higher layer by using virtual
functions. There might be more efficient ways to do so but I'll leave
it for another discussion thread.

Soares