
From: Mathias Gaunard <mathias.gaunard@ens-lyon.org>
I can accept that some operations may be better to work on arbitrary streams but most of them just don't need it.
For example collation... When exactly do you need to compare arbitrary text streams?
Because the data does not exist in memory, may be computed on the fly, or whatever really. A possible application is simply to chain operations in a pipeline, i.e. without having to apply one operation completely, then the other on that result, etc (and do the intermediate temporary buffer allocations).
Pipeline and collation? Either I don't get you or we have too different points of view. Not every programming concept is about stream processing, especially collation where you sort two Units of data, where each unit is a whole part. But lets live it behind because I don't see that we would get anywhere
I thought to provide stream API for charset conversion but put it on hold as it is not really a central part, especially when codecvt it there.
I believe it *is* the very central part of any text processing system.
Text processing, not localization, apart there is a stream charset conversion...
Take a deeper look to the section.
It is different from backend selection.
If I want to add a backend, I only want to add a new repository with the implementation for that backend. I do not want to have to hack all shared files by adding some additional ifdefs.
It is different from localization backend and utility that converts one encoding to other. But I see your point.
Because there is no need to duplicate a complex code via template metaprogamming if a simple function call can be made.
This sentence doesn't make any sense to me.
Template meta-programming is not a mean to duplicate code. Nor is normal template usage, which is what I suggested instead of virtual functions, template meta-programming.
I mean binary code. When you have template<typename Type> class foo { void bar() { something type independent } } And then use: foo<char> and foo<wchar_t> bar would be eventually duplicated in binary code as void foo<char>::bar(); void foo<wchar_t>::bar(); Regardless the fact it does the same job. And finally you get huge executables that basically copy same things around.
A lot of new and vectors too, I'd prefer if ideally the library never allocated anything.
I'm sorry but it is just something that can't and would never happen.
This request has no reasonable base especially for such complex topic.
Usage of templates instead of inclusion polymorphism would allow to avoid newing the object and using a smart pointer, for example.
I'm not sure what exact location bothers use but anywhere (unless I miss something) there are minimal data copying, and I relate heavily on RVO.
I didn't say copying, I said allocation and usage of new. grep -r -F "new " * should give you the exact locations.
This would not happen. It is not fancy header only library that does some small functions character by character. This library uses a dozen of various APIs... Do you really think it is possible to do it without a single new? And BTW most of them are called for locale's facets generation, basically once locale initialized.... If you would really had run this grep and seen each use case of them you wouldn't even write this "grep" sentence
If you see some not-required copying tell me.
Plus the instances of allocation in the boundary stuff (when you build an index vector and when you copy into a new basic_string) appears to be unnecessary.
More specific location? I don't remember such thing, I just need better pointers to answer.
I've been very precise. You unnecessarily allocate a new string and copy the contents in the operator* of token_iterator.
Yes? So how would you return a string? I don't see there any unexpected allocations. ------------------------------------------------------ I want to say few words to summarize because I don't see it is going anywhere Boost.Locale is not Boost.Unicode, it behaves differently, it thinks differently and does many things in a way normal localization APIs all over the world do it. Yes, ranges in nice and important concept for template metaprogramming, but it is not template library and would never be. You can't expect from the library to provide techniques suitable for template system. Yes, it is simple to write template<typename Input,typename Output> Output bad_to_upper(Input begin,Input end,Output out,std::locale const &l) { typedef std::ctype<typename Input::value_type> facet_type; while(begin!=end) *out++ = std::use_facet<facet_type>(l).to_upper(*begin)++; } But it does not work this way because to_upper needs entire chunk and not arbitrary character at every point. You need to call some virtual function on some range it does not even know what Iterator is... So you are tring to apply techniques that does not belog here. Why because you need either to: template<typename Input,typename Output> Output a_to_upper(Input begin,Input end,Output out,std::locale const &l) { typedef typename Input::value_type char_type; typedef boost::locale::convert<char_type> facet_type; std::vector<char_type> input_buf; std::copy(begin,end,back_insterer(temporary_buf)); std::basic_string<char_type> output_buf = std::use_facet<facet_type>(l).to_upper(&input_buf[0],input_buf.size()); std::copy(output_buf.begin(),output_buf.end(),out); } But it does two allocations!$@R$%#! Not good. So lets create a some virtual iterator: template<typename CharType> class base_iterator<CharType> { virtual CharType value() { return value_; } virtual bool next() = 0; protected: CharType value_; } template<IteratorType> class wrapper : public base_iterator<typename IteratorType::value_type> { wrapper(IteratorType begin,IteratorType end): begin_(begin),end_(end) {} virtual bool next() { if(begin==end) return false; value_ == *begin++; } private: IteratorType begin_,end_; } Same for template<typename CharType> class base_output_iterator<CharType> { ... } template<IteratorType> class output_wrapper : And now we rewrite our function as: template<typename Input,typename Output> Output b_to_upper(Input begin,Input end,Output out,std::locale const &l) { typedef typename Input::value_type char_type; input_wrapper<char_type> input(begin,end); output_wrapper<char_type> output(out); std::use_facet<facet_type>(l).to_upper(input,output); return output.value(); } But, hey!#%#$%#4 For each character I call virtual function WOW the cost is too big! $^$%^%@#$^@#%@#$%@#$% Attempt nuber three, make virtual functions more efficient template<IteratorType> class input_wrapper : public std::istream<typename IteratorType::value_type> { ... } template<IteratorType> class output_wrapper : public std::ostream<typename IteratorType::value_type> { ... } Now they are buffered and no virtual functions call and even under the hood it may work on single memory chunk... template<typename Input,typename Output> Output c_to_upper(Input begin,Input end,Output out,std::locale const &l) { typedef typename Input::value_type char_type; input_wrapper<char_type> input(begin,end); output_wrapper<char_type> output(out); std::use_facet<facet_type>(l).to_upper(input,output); return output.value(); } But hey... We created to iostream object because user wanted to do convert a string to upper... Something really-really-really wrong here. ------------------------------------------ Template metaprograming techniuqes just to fit there. You may want to enforce them as much as you can but they are and will be ugly. Don't try to make things more fancy then they should be especially when it comes to text and every string I've ever seen has something like c_str().... Artyom