
On 19/04/2011 22:40, Artyom wrote:
Text processing, not localization, apart there is a stream charset conversion...
What is localization if not text processing?
I mean binary code.
When you have
template<typename Type> class foo { void bar() { something type independent } }
And then use:
foo<char>
and
foo<wchar_t>
bar would be eventually duplicated in binary code as
void foo<char>::bar(); void foo<wchar_t>::bar();
Except that foo<T>::bar() would mostly just call T::baz() and T::whatamacallit(), so it will be completely different for T or U. Generating very small functions that keep jumping and doing indirections between each other is even worse than duplicating a bit of binary code. Inlining gives very good performance benefits. It appears that you are reluctant to template usage, because you're not comfortable with templates techniques and you still believe age-old myths. Some people here have deployed template code in very constrained environments, such as microcontrollers with only a few KBs of memory, and it works just fine. I don't think that kind of template fear is positive for a Boost library. But then why not, maybe Boost libraries really overuse templates as some claim.
This would not happen. It is not fancy header only library that does some small functions character by character.
This library uses a dozen of various APIs... Do you really think it is possible to do it without a single new?
Yes, no problem at all. As I said, just take an output iterator or a container that you append to. This way the allocation is handled by whatever the user gives for the output.
And BTW most of them are called for locale's facets generation, basically once locale initialized....
Yes, those are required due to your design decisions of using the C++ locale facet system; but that may not be the case for all usages. Note those are just general remarks, this is not a major problem. But it would definitely be best if you could reduce places where allocations happen to a minimum.
Yes? So how would you return a string? I don't see there any unexpected allocations.
I shall repeat the fix then. Return an iterator_range (similar to a pair of pointers) instead of a basic_string.
Yes, it is simple to write
template<typename Input,typename Output> Output bad_to_upper(Input begin,Input end,Output out,std::locale const&l) { typedef std::ctype<typename Input::value_type> facet_type; while(begin!=end) *out++ = std::use_facet<facet_type>(l).to_upper(*begin)++; }
But it does not work this way because to_upper needs entire chunk and not arbitrary character at every point.
You need to call some virtual function on some range it does not even know what Iterator is...
That's a backend limitation, that you may be able to overcome or not. If the backend really needs a contiguous memory buffer, then just copy the range to a contiguous memory buffer. I don't see where the problem is. I'm just trying to make your library easier to use. I may not have a pointer to it handy, and generating one could cost a lot of lines of code, which would make your library a bit annoying to use.
So you are tring to apply techniques that does not belog here.
Why because you need either to:
template<typename Input,typename Output> Output a_to_upper(Input begin,Input end,Output out,std::locale const&l) { typedef typename Input::value_type char_type; typedef boost::locale::convert<char_type> facet_type; std::vector<char_type> input_buf; std::copy(begin,end,back_insterer(temporary_buf)); std::basic_string<char_type> output_buf = std::use_facet<facet_type>(l).to_upper(&input_buf[0],input_buf.size()); std::copy(output_buf.begin(),output_buf.end(),out); }
But it does two allocations!$@R$%#! Not good.
That could be the general case, yes (though it would be best if to_upper could directly take a buffer to write to rather than return a basic_string). Then you would specialize the cases where either input or output are contiguous iterators to avoid those copies. I suppose you could also do something like. char_type output_buffer[buffer_size]; while(input_begin != input_end) { char_type* output = output_buffer; status = use_facet<whatever>(l).to_upper(input_begin, input_end, input_begin, output_buffer, output_buffer + buffer_size, output); if(status == ...) // do something smart in case of error or incomplete std::copy(output_buffer, output, out); } Zero memory allocation needs to happen from the library itself with that kind of design.
So lets create a some virtual iterator:
If you want that kind of thing, use type erasure. This is the same technique used by any or function. It allows a single type to contain any other type and dispatch each of it member functions dynamically as long as it models a particular concept. For iterators, google for any_iterator.
But, hey!#%#$%#4
For each character I call virtual function WOW the cost is too big!
Indeed, this is probably why any_iterator never became popular. But note codecvt facets can have exactly that same problem. At least Dikumware STL calls codecvt facets functions character per character.