
AMDG On 04/10/2011 02:12 AM, Artyom wrote:
encoding/iconv_codepage.hpp
line 168: What are the possible errno values besides EILSEQ, EINVAL, and E2BIG? If you get an error you don't expect, is silently stopping the correct behavior?
According to opengroup's specifications, Linux man pages and others these are the error states that can occur.
Maybe I should throw conversion_error if the method is stop().
That seems reasonable. This situation shouldn't ever happen, but it's probably a good idea to detect it.
iconv_to_utf: The comments from iconv_from_utf apply. Also, the implementation is almost the same as iconv_from_utf. I think you could define a single function template to implement both.
Not exactly. I had found that it is much safer and readable to have two implementations then to create quite complex meta programming to make sure it works.
That's sometimes true. I know around here we tend to err on the side of too much metaprogramming, but in this case I don't think you even need much.
It is much easier to assume that at least one side is "char"
Really? In this case the code looks almost identical.
lines 86,101: The STL is your friend: size_t remove_substitutions(std::vector<wchar_t> &v)
{ v.erase(std::remove(v.begin(), v.end(), wchar_t(0xFFFD)), v.end()); return v.size(); }
After reading it carefully I can see how it works but generally I see how it works but it isn't so readable.
In any case interesting and it may save some allocations.
Well, remove/erase is idiomatic, so it should be clearer to those familiar the STL.
icu/numeric.cpp:
In num_format: You don't handle bool?
ICU does not handle it... So I can't do something better for this - I pass it to std::locale::classic formatting.
Is there any way that you could use the message translation mechanism for "true" and "false" if std::boolalpha is set? Otherwise it should be formatted the same as an integer.
line 290: Alas, the standard doesn't guarantee that you can put_back an arbitrary number of characters. This isn't going to work if you're reading from a non-seekable stream and the line is sufficiently long.
The problem is that it is the best I can do from-inside std::locale's num_fmt facet.
It has many major limitations. For exaple if you would try to parse 12345YX as
std::cin>> number>> character
then character would be X.
This is specially critical when I parse spellout format to be able to putback correct part.
Maybe I should try to see what is the size of the buffer and/or document that the stream should be able to handle putback well.
It is actually fairly simple as streambuf has a buffer so it can actually receive putback characters without problem.
For a limited number of characters. If you're reading from a file or from a stringstream, you'll be okay because pbackfail can seek to an earlier position in the stream. If you're reading from a terminal, it should be okay because it's usually line buffered, and the lines aren't usually very long. I could see running into a problem if you're reading from a pipe, however.
Unfortunately it is the best solution I could reach so far with the limitation of std::num_get
Yeah. I can't think of anything better either. This is really more of a limitation of std::steambuf, though.
posix/converter.cpp:
I noticed that you convert UTF8 to wchar_t. For the posix backend, are there other multibyte encodings that would need this? Or would converting everything to UTF32 cause other problems?
- For most encodings (single byte) toupper/tolower does better job (faster simpler smaller)
<snip>
I can change it for MB encodings but not sure if it would have added value.
Ok. I basically know nothing about the domain. Feel free to take any comments like this with a grain of salt.
line 85: If this fails, don't you still need to call freelocale? I think this would be simplified if you created a simple RAII wrapper that holds a locale_t by value.
Ok... I see missed this.
I prefer to have it shared as every facet has its own copy (2 characters * 4-5 facets)
I'll fix this.
Oh, I wasn't suggesting that you go away from shared_ptr: struct posix_locale : boost::noncopyable { posix_locale(); ~posix_locale(); locale_t impl_; }; shared_ptr<posix_locale> p(new posix_locale()); It's harder to get this wrong. In Christ, Steven Watanabe