Re: [boost] [general] What will string handling in C++ look like in the future

8 Feb 2011

      On 02/08/2011 04:17 PM, Chad Nelson wrote:
...
A good rule of thumb, but keep in mind that ASCII (or more formally
"US-ASCII") is the colloquial name for the seven-bit ISO 646 encoding,
Everybody knows that. But which one? Everyone doesn't agree and there is 
significant variation.

You can't point to a single standard that even a majority of people 
agree on as being the official ASCII. It was revised many times over the 
years.

So it's a bad way to refer to a spec. It's probably why GCC prints 
compiler warning messages using the backtick/grave and apostrophe as if 
they were paired single quotes. It's broken.
...
and "ANSI" was used for Windows code-page 1252 because Microsoft based
it on an early ISO-8859-1 draft.[1] (The name is still in use in the
Windows API, but they say it's a "historical reference, but is nowadays
a misnomer that continues to persist in the Windows community.")
MS also used it to contrast with the "OEM" code page, which was their 
way of saying "ANSI" was for system stuff that didn't change (e.g. DLL 
names) and "OEM" was for UI and interoperable stuff that was deeply 
customized for foreign markets.
...
The blame for "Unicode encoding" can probably be laid on Microsoft
too.[2]
Unicode was originally sold as a 16-bit fixed-width encoding, with 
perhaps just the minor variation for endianness. 64K characters ought to 
be enough for anybody they said. But they just couldn't stop themselves 
from inflicting yet another endless variety of multibyte encodings on 
the world.
...
(Sorry to get pedantic on you, just taking a break before the
No, surely it was I who was trolling for pedantry. Sorry!
...
hopefully-final coding session on my UTF string library, which includes
converter classes for many common code-pages, including ascii (typedef
of us_ascii) and windows_ansi (typedef of windows1252)... I've been
swimming in this stuff for the last several weeks. ;-) )
Oh I know. I worked on that stuff for many years while working on 
document printing and display software. The variations are endless. I 
used to keep a book on my desk that was over an inch thick of just code 
pages. Half of them were "ASCII" code pages. The other half were "EBCDIC".

Perhaps you've seen this:
http://en.wikipedia.org/wiki/ISO/IEC_646#National_variants

I still can't figure out of "ISO 646 US" and "ANSI X3.4-1968" are the 
same as Unicode U+0000 - U+007F (for those 128 points). I think there 
are some differences.

You can maybe get away with "US ASCII" in the US (other than Spanish 
speakers), Canada (other than Quebec), Austrailia and New Zealand. Maybe 
a few other places. But make sure you reference a modern relevant 
standard for it. It'd probably be better if you just referenced the 
specific standards directly and avoid the imprecise term "ASCII".

- Marsh

Re: [boost] [general] What will string handling in C++ look like in the future

Marsh Ray