Re: [boost] [locale] Review

19 Apr 2011

      ...
From: Mathias Gaunard <mathias.gaunard@ens-lyon.org>
...
Can you please state explicitly  what things are
conditional for the library to get to  mini-review.
I don't know, I just read some more of the other reviews,
and no one seems to have the same concerns as me, therefore
I do not know if my  demands are justified.
...
Boost.Locale is build from very  practical
point of view - primary usability. I don't
try to  solve all possible corner cases, especially
when they are not central  part of localization
task.
I'd rather prefer spending  more time on adding
specific missing features useful for  localization
over providing stuff like iterator based interface
 for some operations that usually executed
on small chunks of  text.
I think that for a Boost library, the API can be more 
important  than the features themselves.
As you know, the API is a subject to change
in Boost, and I would say that it is
changed toooo freely and toooo often.

However in case of Boost.Locale I think
more changes will come but they
would extend the functionality.

However everything really depends on
needs.
...
...
...
Indeed,  the  Boost.Locale rationale clearly
says that it only takes  std::basic_string
and  const char* as input because that's  what
the backend libraries expect  anyway.
Not only, there are more reasons for this:
http://cppcms.sourceforge.net/boost_locale/html/appendix.html#why_linear_chu...
I  can't find any more reasons than what I said.
Am I missing  something?
Yes:

   Most of supported operations on text like collation,
   case handling usually work on small chunks of text

I can accept that some operations may be
better to work on arbitrary streams but
most of them just don't need it.

For example collation... When exactly
do you need to compare arbitrary
text streams?
...
...
To be honest I don't think that  iterators
especially over non-UTF charsets are good
 idea.
I'd rather accept a concept of stream
that you  read from and write to:
1. It is buffered
2. It is fast  and can work with
    big chunks.
3. It can be passed  transparently
    and ABI stable to any backends.
So I would rather see implementation of
generic algorithms over  I/O streams rather
then iterators.
Ok. I had the feeling that  iterators were more popular
than streams, but streams are fine too; they're very
close concepts.
I'll reveal you the plan that I'm going to provide
stream like interface for search algorithm.

Unfortunately it is really non-trivial, not
because it is not possible but rather
because there is no enough time to implement
it.

Basically what I need is to implement
icu::CharacterIterator over seekable
std::istream

The biggest problem is that CharacterIterator
requires UTF-16 indexing and this is
not-so simple problem.

So I left this out of the scope of the
current version of Boost.Locale and I'll
add it in future.

I think it is doable but it requires
me to look deep into ICU sources
because it is not really documented
how to do it.

Once it will be done, it would be
possible to implement same strategy
for boundary analysis and probably
some others as well (normalization I think)

So basically user would provide I/O streams
for this purpose (of course with some
nice wrappers for common cases)

But it would not happen now, as it requires
quite a lot work and testing.

This is basically what I had said in the lines
in the documentation.

   When new API is introduced into Boost.Locale
   in future, such that it likely works on large
   chunks of text, will provide an interface for
   non-linear text handling.

However the only API that really
needs this is search api, for some
others like segmentation it is nice to
have. For others like collation
it is not needed at all.

So this is a quick glance to the
future that currently only on
planning stage and is not a part
of the review.

I thought to provide stream API
for charset conversion but put it
on hold as it is not really a 
central part, especially when
codecvt it there.
...
...
More specific  points?
I saw stuff  like
if(is_linear_iterator<SomeType>::is_linear)
    ....
Additionally, this particular trait is not canonical 
(should be  ::value, not ::is_linear), is not righly
named (it's contiguous, not linear),  and could have
other uses outside of Boost.Locale, so it should be in its own  file.
Such things should appear in code review.

I'm really welcome to learn how to do things better.
The question is whether such changes are critical or not.
...
...
...
While most of the backends  are  separated,
it is not the case for encoding/codepage.cpp,
 which is shared by  all backends and contains ifdefs.
This is  not a very nice extension  point.
Actually  not codepage only.
- Formatting (locale::format)
-  Locale information
- Message Formatting (gettext)
- codecvt  facets (not details)
And more...
So there are  lots of shared parts.
I meant shared parts which are not really shared,
i.e. a file that is shared but that contains completely
different code chosen  using #ifdefs.
Not exactly, it consists of fallbacks,
if for example iconv does not support windows-xyx
uconv would be used.

It allows to merge several libraries in way
they backup each other.

Take a deeper look to the section.

It is different from backend selection.
...
...
...
I'm also a bit concerned about the  generalized
usage of virtual  functions anywhere;
 it seems unnecessary in quite a few places,
please prefer   using templates when dynamic binding is not required.
More  specific points?
Couldn't boost::locale::util::base_converter be a  concept rather than an 
abstract base class?
Because there is no need to duplicate
a complex code via template metaprogamming
if a simple function call can be made.

Don't overrate template metaprogramming
and don't underestimate virtual functions.
...
...
...
A lot of new  and  vectors too, I'd prefer if ideally
the library never  allocated  anything.
I'm sorry but it is just  something
that can't and would never happen.
This  request has no reasonable base especially
for such complex  topic.
Usage of templates instead of inclusion polymorphism
would allow  to avoid newing the object and using a smart
pointer, for example.
I'm not sure what exact location bothers use but
anywhere (unless I miss something)
there are minimal data copying, and I relate heavily
on RVO.

If you see some not-required copying tell me.
...
Plus  the instances of allocation in the
boundary stuff (when you build an index
vector and when you copy into a new basic_string)
appears to be  unnecessary.
More specific location? I don't remember
such thing, I just need better pointers 
to answer.
...
...
...
I hope  my review will be well received
and that you will  consider all  of my "feature requests".
Your library is practical and can  get  things done for sure,
but maybe it's not quite as good as  it could be in terms  of
interface yet.
The interface is a part of the design and have
designed to be  simple and covinent rather then
fully generic.
Passing an  arbitrary range to the functions is convenient, forcing
the user to copy it to a  vector and pass the begin and end pointers
before it can call the functions  isn't.
Where exactly, because I need better context as above.

Artyom