[inspect] A few issues in non-ASCII checker

Hi, I had a summary look at the changes made to Boost.Inspect (since I last saw it) and noticed a few issues: * the "*A*" marker is used either for broken links, invalid urls etc. *and* non-ASCII chars * "non-ASCII" itself looks like a misleading name: see the string gPunct, defined in ascii_check.cpp * non portable (though widely portable :-)) code: if ( c >= 'a' && c <= 'z' ) return false; if ( c >= 'A' && c <= 'Z' ) return false; if ( c >= '0' && c <= '9' ) return false; If there aren't problems with standard library support I'd suggest replacing the above with if ( std::isalnum( static_cast< int >( c ), std::locale::classic() ) ) Similarly for the subsequent isspace-like tests * again in ascii_check.cpp there's some scaffolding to get a line number from a file position; this is needed in other inspector classes as well, and is done much differently. I think it is worth unifying this part and make it available in one place -- Genny

Gennaro Prota wrote:
Hi,
I had a summary look at the changes made to Boost.Inspect (since I last saw it) and noticed a few issues:
* the "*A*" marker is used either for broken links, invalid urls etc. *and* non-ASCII chars
That was fixed earlier this month, IIRC.
* "non-ASCII" itself looks like a misleading name: see the string gPunct, defined in ascii_check.cpp
* non portable (though widely portable :-)) code:
if ( c >= 'a' && c <= 'z' ) return false; if ( c >= 'A' && c <= 'Z' ) return false; if ( c >= '0' && c <= '9' ) return false;
That was mentioned in "[inspect] Hall of Shame plus non-ASCII characters" recently.
If there aren't problems with standard library support I'd suggest replacing the above with
if ( std::isalnum( static_cast< int >( c ), std::locale::classic() ) )
Similarly for the subsequent isspace-like tests
You need to read the discussion. isalnum isn't what we want for source files, and whatever we settle on isn't going to be locale dependent.
* again in ascii_check.cpp there's some scaffolding to get a line number from a file position; this is needed in other inspector classes as well, and is done much differently. I think it is worth unifying this part and make it available in one place
Yes. John Maddock has been asking that we pinpoint errors more precisely. --Beman

At 10:25 PM -0400 7/9/08, Beman Dawes wrote:
Gennaro Prota wrote:
Hi,
I had a summary look at the changes made to Boost.Inspect (since I last saw it) and noticed a few issues:
* the "*A*" marker is used either for broken links, invalid urls etc. *and* non-ASCII chars
That was fixed earlier this month, IIRC.
I don't think so :-( It's gotten better, since it now prints "non-ASCII" as well, but it's not really fixed.
* "non-ASCII" itself looks like a misleading name: see the string gPunct, defined in ascii_check.cpp
* non portable (though widely portable :-)) code:
if ( c >= 'a' && c <= 'z' ) return false; if ( c >= 'A' && c <= 'Z' ) return false; if ( c >= '0' && c <= '9' ) return false;
That was mentioned in "[inspect] Hall of Shame plus non-ASCII characters" recently.
If there aren't problems with standard library support I'd suggest replacing the above with
if ( std::isalnum( static_cast< int >( c ), std::locale::classic() ) )
Similarly for the subsequent isspace-like tests
You need to read the discussion. isalnum isn't what we want for source files, and whatever we settle on isn't going to be locale dependent.
* again in ascii_check.cpp there's some scaffolding to get a line number from a file position; this is needed in other inspector classes as well, and is done much differently. I think it is worth unifying this part and make it available in one place
Yes. John Maddock has been asking that we pinpoint errors more precisely.
I looked into this; hoisting the routine would be easy, but I didn't want to disturb the tool this close to release. P.S. New inspection reports are up at: <http://www.idio.com/misc/inspect-release.html> <http://www.idio.com/misc/inspect-trunk.html> -- -- Marshall Marshall Clow Idio Software <mailto:marshall@idio.com> It is by caffeine alone I set my mind in motion. It is by the beans of Java that thoughts acquire speed, the hands acquire shaking, the shaking becomes a warning. It is by caffeine alone I set my mind in motion.

Beman Dawes wrote:
* "non-ASCII" itself looks like a misleading name: see the string gPunct, defined in ascii_check.cpp
* non portable (though widely portable :-)) code:
if ( c >= 'a' && c <= 'z' ) return false; if ( c >= 'A' && c <= 'Z' ) return false; if ( c >= '0' && c <= '9' ) return false;
That was mentioned in "[inspect] Hall of Shame plus non-ASCII characters" recently.
If there aren't problems with standard library support I'd suggest replacing the above with
if ( std::isalnum( static_cast< int >( c ), std::locale::classic() ) )
Similarly for the subsequent isspace-like tests
You need to read the discussion.
Done now. But to be honest I don't understand why you pointed me to it. Anyway,
isalnum isn't what we want for source files, and whatever we settle on isn't going to be locale dependent.
I didn't mean isalnum *alone*. It was meant to replace just the tests I quoted. And I specified std::locale::classic() (aka "C" locale). This just to set the records straights, as the whole affair is no big deal and you are very busy with more important things. Note that I don't know how "ASCII" got in here, but if the issue is VC++ warning C4819 then it refers to the "ANSI" codepage (Windows-1252). In the end, source files should IMHO stick to the 96 characters of the basic source set. Plus carriage return :-) -- Genny
participants (3)
-
Beman Dawes
-
Gennaro Prota
-
Marshall Clow