Re: [boost] [locale] Review

19 Apr 2011

      ...
From: Mathias Gaunard <mathias.gaunard@ens-lyon.org>
...
I can accept that some operations may  be
better to work on arbitrary streams but
most of them just  don't need it.
For example collation... When exactly
do  you need to compare arbitrary
text streams?
Because the data does  not exist in memory, may be computed on the fly, 
or whatever really.
A  possible application is simply to chain operations in a pipeline, i.e. 
without having to apply one operation completely, then the other on that 
result, etc (and do the intermediate temporary buffer  allocations).
Pipeline and collation?

Either I don't get you or we have too different
points of view.

Not every programming concept is about stream 
processing, especially collation where you sort
two Units of data, where each unit is a whole 
part.

But lets live it behind because I don't
see that we would get anywhere
...
...
I thought to provide stream API
for charset  conversion but put it
on hold as it is not really a
central  part, especially when
codecvt it there.
I believe it *is* the  very central part of any text processing system.
Text processing, not localization,
apart there is a stream charset
conversion...
...
...
Take a deeper look to the section.
It  is different from backend selection.
If I want to add a backend, I only  want to add a new repository with the 
implementation for that backend. I do  not want to have to hack all 
shared files by adding some additional  ifdefs.
It is different from localization backend and utility
that converts one encoding to other.

But I see your point.
...
...
Because there is no need to duplicate
 a complex code via template metaprogamming
if a simple function call can  be made.
This sentence doesn't make any sense to me.
Template  meta-programming is not a mean to duplicate code.
Nor is normal template  usage, which is what I suggested instead of 
virtual functions, template  meta-programming.
I mean binary code.

When you have

template<typename Type>
class foo {
   void bar() { something type independent }
}

And then use:

  foo<char>

and

  foo<wchar_t>

bar would be eventually duplicated in binary
code as

  void foo<char>::bar();
  void foo<wchar_t>::bar();

Regardless the fact it does the same job.
And finally you get huge executables that
basically copy same things around.
...
...
...
...
...
A lot of new  and   vectors too, I'd prefer if ideally
the library never   allocated  anything.
I'm sorry but it is  just  something
that can't and would never  happen.
This  request has no reasonable  base especially
for such complex   topic.
Usage of templates instead of inclusion  polymorphism
would allow  to avoid newing the object and using  a smart
pointer, for example.
I'm not  sure what exact location bothers use but
anywhere (unless I miss  something)
there are minimal data copying, and I relate heavily
 on RVO.
I didn't say copying, I said allocation and usage of new.
grep  -r -F "new " * should give you the exact locations.
This would not happen. It is not fancy
header only library that does some small
functions character by character.

This library uses a dozen of various APIs...
Do you really think it is possible to
do it without a single new?

And BTW most of them
are called for locale's facets generation,
basically once locale initialized....

If you would really had run this grep and seen
each use case of them you wouldn't even
write this "grep" sentence
...
...
If you  see some not-required copying tell me.
...
Plus  the  instances of allocation in the
boundary stuff (when you build an  index
vector and when you copy into a new basic_string)
 appears to be  unnecessary.
More specific  location? I don't remember
such thing, I just need better  pointers
to answer.
I've been very precise. You unnecessarily  allocate a new string and copy 
the contents in the operator* of  token_iterator.
Yes? So how would you return a string? I don't see there
any unexpected allocations.

------------------------------------------------------

I want to say few words to summarize 
because I don't see it is going anywhere

Boost.Locale is not Boost.Unicode, it behaves
differently, it thinks differently and does
many things in a way normal localization
APIs all over the world do it.

Yes, ranges in nice and important
concept for template metaprogramming,
but it is not template library and would
never be.

You can't expect from the library to provide
techniques suitable for template system.

Yes, it is simple to write

  template<typename Input,typename Output>
  Output bad_to_upper(Input begin,Input end,Output out,std::locale const &l)
  {
    typedef std::ctype<typename Input::value_type> facet_type;
    while(begin!=end)
      *out++ = std::use_facet<facet_type>(l).to_upper(*begin)++;
  }

But it does not work this way because
to_upper needs entire chunk and not arbitrary
character at every point.

You need to call some virtual function on some
range it does not even know what Iterator is...

So you are tring to apply techniques that
does not belog here.

Why because you need either to:

  template<typename Input,typename Output>
  Output a_to_upper(Input begin,Input end,Output out,std::locale const &l)
  {
    typedef typename Input::value_type char_type;
    typedef boost::locale::convert<char_type> facet_type;
    std::vector<char_type> input_buf;
    std::copy(begin,end,back_insterer(temporary_buf));
    std::basic_string<char_type> output_buf 
      = std::use_facet<facet_type>(l).to_upper(&input_buf[0],input_buf.size());
    std::copy(output_buf.begin(),output_buf.end(),out);
  }

But it does two allocations!$@R$%#!
Not good.

So lets create a some virtual iterator:

  template<typename CharType>
  class base_iterator<CharType> {
    virtual CharType value() { return value_; }
    virtual bool next() = 0;
  protected:
    CharType value_;
  }

  template<IteratorType>
  class wrapper : public base_iterator<typename IteratorType::value_type> {
     wrapper(IteratorType begin,IteratorType end): begin_(begin),end_(end) {}
     virtual bool next() {
        if(begin==end) 
          return false;
        value_ == *begin++;
     }
  private:
    IteratorType begin_,end_;
  }

Same for

  template<typename CharType>
  class base_output_iterator<CharType> { ... }

  template<IteratorType>
  class output_wrapper :

And now we rewrite our function as:

  template<typename Input,typename Output>
  Output b_to_upper(Input begin,Input end,Output out,std::locale const &l)
  {
    typedef typename Input::value_type char_type;
    input_wrapper<char_type> input(begin,end);
    output_wrapper<char_type> output(out);
    std::use_facet<facet_type>(l).to_upper(input,output);
    return output.value();
  }

But, hey!#%#$%#4

For each character I call virtual function WOW
the cost is too big!

$^$%^%@#$^@#%@#$%@#$%

Attempt nuber three, make virtual functions 
more efficient

  template<IteratorType>
  class input_wrapper : public std::istream<typename IteratorType::value_type> {
    ...
  }
  template<IteratorType>
  class output_wrapper : public std::ostream<typename IteratorType::value_type> 
{
    ...
  }

Now they are buffered and no virtual functions call and even
under the hood it may work on single memory chunk...

  template<typename Input,typename Output>
  Output c_to_upper(Input begin,Input end,Output out,std::locale const &l)
  {
    typedef typename Input::value_type char_type;
    input_wrapper<char_type> input(begin,end);
    output_wrapper<char_type> output(out);
    std::use_facet<facet_type>(l).to_upper(input,output);
    return output.value();
  }

But hey... We created to iostream object because
user wanted to do convert a string to upper...

Something really-really-really wrong here.

------------------------------------------

Template metaprograming techniuqes
just to fit there.

You may want to enforce them as much 
as you can but they are and will be ugly.

Don't try to make things more fancy then
they should be especially when
it comes to text and every string
I've ever seen has something
like c_str()....

Artyom