[boost] [locale] review part 2.1: source

10 Apr 2011

      AMDG

Okay, I managed to get through the encoding, icu, and posix
subdirectories today.

encoding/codepage.cpp:

line 28: In C++ you should use <cstring> (and std::strlen)

encoding/conv.hpp:

encoding/iconv_codepage.hpp

I think the convention for headers like
this which are really source files in disguise
is to give them the extension .ipp.

line 125: You don't actually care how the vector's
   contents are initialized, do you?  You can just
   specify the size.

line 150: I think sresult.append(out_start, out_ptr);
   would be slightly clearer.

line 168: What are the possible errno values
   besides EILSEQ, EINVAL, and E2BIG?  If
   you get an error you don't expect, is
   silently stopping the correct behavior?

iconv_between: Inheritance from iconv_from_utf is
   a little odd.

iconv_to_utf: The comments from iconv_from_utf apply.
   Also, the implementation is almost the same as
   iconv_from_utf.  I think you could define a single
   function template to implement both.

encoding/uconv_codepage.hpp:

line 52: Do you really intend to translate any
   std::exception?  Including std::bad_alloc?
   That isn't consistant with what iconv_codepage does.

encoding/wconv_codepage.hpp:

Missing #include <cstring>, <string>

lines 86,101: The STL is your friend:
     size_t remove_substitutions(std::vector<wchar_t> &v)

     {

         v.erase(std::remove(v.begin(), v.end(), wchar_t(0xFFFD)), v.end());
         return v.size();
     }

icu/all_generator.hpp:

Why <vector>?  You're using locale and string, but not vector.

icu/boundary.cpp:

icu/cdata.hpp:

Missing #include <string>

icu/uconv.hpp:

Needs #include <string> and #include <memory>.
line 8: This #include guard is excessively generic.

icu/codecvt.cpp:

lines 44, 48: If this throws, cvt_ needs to be closed.

icu/codecvt.hpp:

Missing #include <memory>
The #include guard is wrong.

icu/collator.cpp:

line 115: Does do_basic_transform guarantee that
   the string contains no embedded null bytes?
   It seems that this is required for hash to
   work correctly.

icu/conversion.cpp:

icu/date_time.cpp:

In calendar_impl: I can't figure out why you're locking
   in some functions and not others.  For the most part,
   you seem to lock in non-mutating functions and not
   in mutating functions, so I was guessing that you
   are trying to guarantee that a calender can be read
   concurrently, but writes must be exclusive and that
   ICU does not guarantee that concurrent reads are
   okay.  However, get_timezone and is_same don't lock.
   Does ICU handle these methods differently from the
   others W.R.T. thread-safety?

line 236: Do you need a lock here?  You're only using
   self which is a different object.

icu/formatter.cpp:

line 69: This seems like a really bad idea. It means
   that integers larger than 0x7FFFFFFFFFFFFFFF will
   be formatted incorrectly.  If this is an unavoidable
   limitation of ICU, I think you shoud probably throw
   an exception and document the behavior.  And don't
   think that people won't try to print unsigned numbers
   this big.  I know I have.

line 263: This comparison is subtly wrong for int64_t.
   Consider what happens when date is exactly 2^63.
   limits_type::max() will be 0x7FFFFFFFFFFFFFFF.
   According to [expr] this will be converted to
   double.  For most rounding modes, the result
   will be 2^63, which is equal to date.  Then,
   when we get to line 268 and cast back to ValueType,
   the result will be 0x8000000000000000, and the
   behavior is undefined according to [conv.fpint].
   (Note: It's obviously not easy to trigger this,
   but I'm guessing that a specially crafted test
   case could)

icu/formatter.hpp:

line 44: What about float and long double.  long double
   may have more precision than double.  Also, even for
   float, the double rounding caused by working with double
   can produce slightly different results from when using
   float directly.  Does ICU only allow you to use double
   and [u]int[32|64]_t?

line 93: I find this comment a bit misleading.  It wasn't
   clear until I looked at the implementation that what
   was being cached was the ICU formatter.  The
   boost::locale::impl_icu::formatter object is not
   cached.

icu/icu_backend.cpp:

line 69: Ideally you should not mark the object
   as valid until you know that this function will
   not fail.

line 91: I'd kind of like install to be thread-safe...

icu/icu_backend.hpp:

icu/icu_util.hpp:

icu/numeric.cpp:

In num_format: You don't handle bool?

line 290: Alas, the standard doesn't guarantee that
   you can put_back an arbitrary number of characters.
   This isn't going to work if you're reading from a
   non-seekable stream and the line is sufficiently long.

icu/predefined_formatters.hpp:

icu/time_zone.hpp:

icu/time_zone.cpp:

icu/uconv.hpp:

posix/all_generater.hpp:

posix/codecvt.cpp:

mb2_iconv_converter seems to be doing a lot
of the same work as encoding/iconv_codepage.
Is there any way to avoid the duplication?

posix/codecvt.hpp:

Missing #include <memory> and <string>.

posix/collate.cpp:

posix/converter.cpp:

I noticed that you convert UTF8 to wchar_t.
For the posix backend, are there other multibyte
encodings that would need this?  Or would
converting everything to UTF32 cause other
problems?

posix/numeric.cpp:

line 54: sizeof(buf) is 64, but you pass 256?

line 57: is 1024 absolutely guaranteed to be
   enough?  Otherwise you should use a loop
   with exponential reallocation.  Line 97 too.

lines 68, 75: std::copy?

posix/posix_backend.cpp:

install: The same comment W.R.T. thread-safety applies
   as for icu.

line 85: If this fails, don't you still need to call freelocale?
   I think this would be simplified if you created a simple RAII
   wrapper that holds a locale_t by value.

posix/posix_backend.hpp:

(To be continued)

In Christ,
Steven Watanabe