Re: [boost] [locale] Review

20 Apr 2011

      On 19/04/2011 22:40, Artyom wrote:
...
Text processing, not localization,
apart there is a stream charset
conversion...
What is localization if not text processing?
...
I mean binary code.
When you have
template<typename Type>
class foo {
    void bar() { something type independent }
}
And then use:
foo<char>
and
foo<wchar_t>
bar would be eventually duplicated in binary
code as
void foo<char>::bar();
   void foo<wchar_t>::bar();
Except that foo<T>::bar() would mostly just call T::baz() and 
T::whatamacallit(), so it will be completely different for T or U.

Generating very small functions that keep jumping and doing indirections 
between each other is even worse than duplicating a bit of binary code.
Inlining gives very good performance benefits.

It appears that you are reluctant to template usage, because you're not 
comfortable with templates techniques and you still believe age-old myths.
Some people here have deployed template code in very constrained 
environments, such as microcontrollers with only a few KBs of memory, 
and it works just fine.

I don't think that kind of template fear is positive for a Boost 
library. But then why not, maybe Boost libraries really overuse 
templates as some claim.
...
This would not happen. It is not fancy
header only library that does some small
functions character by character.
This library uses a dozen of various APIs...
Do you really think it is possible to
do it without a single new?
Yes, no problem at all.
As I said, just take an output iterator or a container that you append 
to. This way the allocation is handled by whatever the user gives for 
the output.
...
And BTW most of them
are called for locale's facets generation,
basically once locale initialized....
Yes, those are required due to your design decisions of using the C++ 
locale facet system; but that may not be the case for all usages.

Note those are just general remarks, this is not a major problem. But it 
would definitely be best if you could reduce places where allocations 
happen to a minimum.
...
Yes? So how would you return a string? I don't see there
any unexpected allocations.
I shall repeat the fix then. Return an iterator_range (similar to a pair 
of pointers) instead of a basic_string.
...
Yes, it is simple to write
template<typename Input,typename Output>
   Output bad_to_upper(Input begin,Input end,Output out,std::locale const&l)
   {
     typedef std::ctype<typename Input::value_type>  facet_type;
     while(begin!=end)
       *out++ = std::use_facet<facet_type>(l).to_upper(*begin)++;
   }
But it does not work this way because
to_upper needs entire chunk and not arbitrary
character at every point.
You need to call some virtual function on some
range it does not even know what Iterator is...
That's a backend limitation, that you may be able to overcome or not.

If the backend really needs a contiguous memory buffer, then just copy 
the range to a contiguous memory buffer. I don't see where the problem is.

I'm just trying to make your library easier to use. I may not have a 
pointer to it handy, and generating one could cost a lot of lines of 
code, which would make your library a bit annoying to use.
...
So you are tring to apply techniques that
does not belog here.
Why because you need either to:
template<typename Input,typename Output>
   Output a_to_upper(Input begin,Input end,Output out,std::locale const&l)
   {
     typedef typename Input::value_type char_type;
     typedef boost::locale::convert<char_type>  facet_type;
     std::vector<char_type>  input_buf;
     std::copy(begin,end,back_insterer(temporary_buf));
     std::basic_string<char_type>  output_buf
       = std::use_facet<facet_type>(l).to_upper(&input_buf[0],input_buf.size());
     std::copy(output_buf.begin(),output_buf.end(),out);
   }
But it does two allocations!$@R$%#!
Not good.
That could be the general case, yes (though it would be best if to_upper 
could directly take a buffer to write to rather than return a basic_string).
Then you would specialize the cases where either input or output are 
contiguous iterators to avoid those copies.

I suppose you could also do something like.

char_type output_buffer[buffer_size];
while(input_begin != input_end)
{
    char_type* output = output_buffer;
    status = use_facet<whatever>(l).to_upper(input_begin, input_end, 
input_begin, output_buffer, output_buffer + buffer_size, output);

    if(status == ...) // do something smart in case of error or incomplete

    std::copy(output_buffer, output, out);
}

Zero memory allocation needs to happen from the library itself with that 
kind of design.
...
So lets create a some virtual iterator:
If you want that kind of thing, use type erasure. This is the same 
technique used by any or function.
It allows a single type to contain any other type and dispatch each of 
it member functions dynamically as long as it models a particular concept.

For iterators, google for any_iterator.
...
But, hey!#%#$%#4
For each character I call virtual function WOW
the cost is too big!
Indeed, this is probably why any_iterator never became popular.

But note codecvt facets can have exactly that same problem. At least 
Dikumware STL calls codecvt facets functions character per character.

Re: [boost] [locale] Review

Mathias Gaunard