
On 02/08/2011 04:17 PM, Chad Nelson wrote:
A good rule of thumb, but keep in mind that ASCII (or more formally "US-ASCII") is the colloquial name for the seven-bit ISO 646 encoding,
Everybody knows that. But which one? Everyone doesn't agree and there is significant variation. You can't point to a single standard that even a majority of people agree on as being the official ASCII. It was revised many times over the years. So it's a bad way to refer to a spec. It's probably why GCC prints compiler warning messages using the backtick/grave and apostrophe as if they were paired single quotes. It's broken.
and "ANSI" was used for Windows code-page 1252 because Microsoft based it on an early ISO-8859-1 draft.[1] (The name is still in use in the Windows API, but they say it's a "historical reference, but is nowadays a misnomer that continues to persist in the Windows community.")
MS also used it to contrast with the "OEM" code page, which was their way of saying "ANSI" was for system stuff that didn't change (e.g. DLL names) and "OEM" was for UI and interoperable stuff that was deeply customized for foreign markets.
The blame for "Unicode encoding" can probably be laid on Microsoft too.[2]
Unicode was originally sold as a 16-bit fixed-width encoding, with perhaps just the minor variation for endianness. 64K characters ought to be enough for anybody they said. But they just couldn't stop themselves from inflicting yet another endless variety of multibyte encodings on the world.
(Sorry to get pedantic on you, just taking a break before the
No, surely it was I who was trolling for pedantry. Sorry!
hopefully-final coding session on my UTF string library, which includes converter classes for many common code-pages, including ascii (typedef of us_ascii) and windows_ansi (typedef of windows1252)... I've been swimming in this stuff for the last several weeks. ;-) )
Oh I know. I worked on that stuff for many years while working on document printing and display software. The variations are endless. I used to keep a book on my desk that was over an inch thick of just code pages. Half of them were "ASCII" code pages. The other half were "EBCDIC". Perhaps you've seen this: http://en.wikipedia.org/wiki/ISO/IEC_646#National_variants I still can't figure out of "ISO 646 US" and "ANSI X3.4-1968" are the same as Unicode U+0000 - U+007F (for those 128 points). I think there are some differences. You can maybe get away with "US ASCII" in the US (other than Spanish speakers), Canada (other than Quebec), Austrailia and New Zealand. Maybe a few other places. But make sure you reference a modern relevant standard for it. It'd probably be better if you just referenced the specific standards directly and avoid the imprecise term "ASCII". - Marsh