
From: Mathias Gaunard <mathias.gaunard@ens-lyon.org>
Can you please state explicitly what things are conditional for the library to get to mini-review.
I don't know, I just read some more of the other reviews, and no one seems to have the same concerns as me, therefore I do not know if my demands are justified.
Boost.Locale is build from very practical point of view - primary usability. I don't try to solve all possible corner cases, especially when they are not central part of localization task.
I'd rather prefer spending more time on adding specific missing features useful for localization over providing stuff like iterator based interface for some operations that usually executed on small chunks of text.
I think that for a Boost library, the API can be more important than the features themselves.
As you know, the API is a subject to change in Boost, and I would say that it is changed toooo freely and toooo often. However in case of Boost.Locale I think more changes will come but they would extend the functionality. However everything really depends on needs.
Indeed, the Boost.Locale rationale clearly says that it only takes std::basic_string and const char* as input because that's what the backend libraries expect anyway.
Not only, there are more reasons for this:
http://cppcms.sourceforge.net/boost_locale/html/appendix.html#why_linear_chu...
I can't find any more reasons than what I said. Am I missing something?
Yes: Most of supported operations on text like collation, case handling usually work on small chunks of text I can accept that some operations may be better to work on arbitrary streams but most of them just don't need it. For example collation... When exactly do you need to compare arbitrary text streams?
To be honest I don't think that iterators especially over non-UTF charsets are good idea.
I'd rather accept a concept of stream that you read from and write to:
1. It is buffered 2. It is fast and can work with big chunks. 3. It can be passed transparently and ABI stable to any backends.
So I would rather see implementation of generic algorithms over I/O streams rather then iterators.
Ok. I had the feeling that iterators were more popular than streams, but streams are fine too; they're very close concepts.
I'll reveal you the plan that I'm going to provide stream like interface for search algorithm. Unfortunately it is really non-trivial, not because it is not possible but rather because there is no enough time to implement it. Basically what I need is to implement icu::CharacterIterator over seekable std::istream The biggest problem is that CharacterIterator requires UTF-16 indexing and this is not-so simple problem. So I left this out of the scope of the current version of Boost.Locale and I'll add it in future. I think it is doable but it requires me to look deep into ICU sources because it is not really documented how to do it. Once it will be done, it would be possible to implement same strategy for boundary analysis and probably some others as well (normalization I think) So basically user would provide I/O streams for this purpose (of course with some nice wrappers for common cases) But it would not happen now, as it requires quite a lot work and testing. This is basically what I had said in the lines in the documentation. When new API is introduced into Boost.Locale in future, such that it likely works on large chunks of text, will provide an interface for non-linear text handling. However the only API that really needs this is search api, for some others like segmentation it is nice to have. For others like collation it is not needed at all. So this is a quick glance to the future that currently only on planning stage and is not a part of the review. I thought to provide stream API for charset conversion but put it on hold as it is not really a central part, especially when codecvt it there.
More specific points?
I saw stuff like
if(is_linear_iterator<SomeType>::is_linear) ....
Additionally, this particular trait is not canonical (should be ::value, not ::is_linear), is not righly named (it's contiguous, not linear), and could have other uses outside of Boost.Locale, so it should be in its own file.
Such things should appear in code review. I'm really welcome to learn how to do things better. The question is whether such changes are critical or not.
While most of the backends are separated, it is not the case for encoding/codepage.cpp, which is shared by all backends and contains ifdefs. This is not a very nice extension point.
Actually not codepage only.
- Formatting (locale::format) - Locale information - Message Formatting (gettext) - codecvt facets (not details)
And more...
So there are lots of shared parts.
I meant shared parts which are not really shared, i.e. a file that is shared but that contains completely different code chosen using #ifdefs.
Not exactly, it consists of fallbacks, if for example iconv does not support windows-xyx uconv would be used. It allows to merge several libraries in way they backup each other. Take a deeper look to the section. It is different from backend selection.
I'm also a bit concerned about the generalized usage of virtual functions anywhere; it seems unnecessary in quite a few places, please prefer using templates when dynamic binding is not required.
More specific points?
Couldn't boost::locale::util::base_converter be a concept rather than an abstract base class?
Because there is no need to duplicate a complex code via template metaprogamming if a simple function call can be made. Don't overrate template metaprogramming and don't underestimate virtual functions.
A lot of new and vectors too, I'd prefer if ideally the library never allocated anything.
I'm sorry but it is just something that can't and would never happen.
This request has no reasonable base especially for such complex topic.
Usage of templates instead of inclusion polymorphism would allow to avoid newing the object and using a smart pointer, for example.
I'm not sure what exact location bothers use but anywhere (unless I miss something) there are minimal data copying, and I relate heavily on RVO. If you see some not-required copying tell me.
Plus the instances of allocation in the boundary stuff (when you build an index vector and when you copy into a new basic_string) appears to be unnecessary.
More specific location? I don't remember such thing, I just need better pointers to answer.
I hope my review will be well received and that you will consider all of my "feature requests". Your library is practical and can get things done for sure, but maybe it's not quite as good as it could be in terms of interface yet.
The interface is a part of the design and have designed to be simple and covinent rather then fully generic.
Passing an arbitrary range to the functions is convenient, forcing the user to copy it to a vector and pass the begin and end pointers before it can call the functions isn't.
Where exactly, because I need better context as above. Artyom