
I don't think neither ICU nor Boost supports non-32 bit platform.
I understand that strictly speaking it is non-standard but de-facto there are no problems with this and this assumption is quite safe.
Since this assumption arbitrarily limits the library to particular platforms and it is trivial to avoid relying it, I see no point in doing so.
As I told you I can fix it, but practically it does not give any advantage. Give me a name of modern compiler that supports C++03 and has sizeof(unsigned) < 4?
I can change this to uint32_t
line 131: Please be consistent about using unsigned vs. uint32_t.
Good point
line 140: shouldn't offset be std::size_t? Why should the size of the string be limited to 4GB? std::size_t is guaranteed to be large enough to handle anything that can be stored in memory.
You are right, I'll fix this.
However, two points:
1. I already know compilers were size_t != sizeof(void *) so actually it must be uintptr_t... but it is not in C++03 but rather in C99. (it is in C++0x)
1) You work with contiguous memory, so size_t is the correct one, not uintptr_t
size_t is used in boost as it is closest way to use something in C++03 standard that is similar to uintptr_t. That is way C99 and C++03 created uintptr_t. In any case as I told I'll change this for better consistency/
2) uintptr_t should ideally be available from boost/cstdint.hpp (it doesn't seem to be ATM, that's a bug)
3) I don't know of compilers where those are different, mind saying which ones exhibit this?
As I said before I prefer to relay on native types rather then boost's types especially if it is feasible. As I told above I'll change this were necessary but it is just for "code-beauty".
2. There is no reason on planet earth you need to create boundary mapping for text>= 4G!
Even entire War& Peace book takes about 3MB.
So if you do this you are probably doing something very-very wrong.
But this is different story.
Another unnecessary arbitrary limit.
Why would you want to prevent people from treating very large data with your library?
On the Internet, there is text much bigger than War & Peace, even if not necessarily in the form of books.
As I told above I'll change it to size_t but... You don't want to run boundary analysis on such chunks of text in any case - and if you do it, you are doing something wrong.
Why do you have explicit specializations of boundary_indexing? They're all the same. The compiler is perfectly capable of instantiating them all from a single definition.
Ok this is something that you'll see all over the Boost.Locale code when it comes to deal with facets generations.
It is in order to support (brain damaged) DLL platform.
Both MSVC and GCC support extern template, which is also a C++0x feature.
Believe me, it was only feasible solution I've reached after spending many man hours fighting DLL's, std::locale facets and both compilers. (AFAIR MSVC uses specialization as well for its own facets) So after fighting both compilers and spending days on solving and tuning this problem I can say this is what should be done. It requires some good experience with std::locale facets, DLLs exporting, and having specific tests. So what I did is unfortunately required and most reliable solution.
Yes... However all supported platforms are at least 32bits.
There is one small points. As far as I remember I always used native types over strict types in the interfaces (function calls) because if it is changed between small releases (like stdint.h found) it would not break binary compatibility so native types (as long as they good enough) are better alternative.
(We are talking about C++ compilers that does not handle C99 stdint well)
Boost has a portable stdint equivalent.
I know and use it, see the description above.
In any case, you shouldn't even use uint32_t, but uint_least32_t, or better yet, char32_t if it is available.
No, see: http://cppcms.sourceforge.net/boost_locale/html/appendix.html#why_no_special... In any case I use uint32_t or uint16_t when I do relate to handling of UTF-16 or UTF-32
boost/cuchar.cpp in my Unicode library shows the right way to define the types Unicode needs to work with.
Same link as above, typedef of uint16_t or uint32_t is unsuitable for Boost.Locale character representation. See your Unicode library is orthogonal to Boost.Locale, it handles things very different and has different purpose. Artyom