Inspection report and non-ASCII characters

Would it be possible for the inspection report to print out the line containing the non-ASCII characters? There are a few files that are being flagged up, where I just can't find anything wrong with them :-( Thanks, John.

On Sun, Jun 29, 2008 at 5:05 PM, John Maddock <john@johnmaddock.co.uk> wrote:
Would it be possible for the inspection report to print out the line containing the non-ASCII characters? There are a few files that are being flagged up, where I just can't find anything wrong with them :-(
How about whitespace? End-of-line or end-of-file issues perhaps? -- Dean Michael C. Berris Software Engineer, Friendster, Inc.

John Maddock wrote:
Would it be possible for the inspection report to print out the line containing the non-ASCII characters? There are a few files that are being flagged up, where I just can't find anything wrong with them :-(
Hum... Take a look at trunk\tools\inspect\ascii_check.cpp It seems misnamed; it is apparently really checking for characters the c++ standard says are OK in source programs, regardless of encoding. Also, I notice the code: if ( c >= 'a' && c <= 'z' ) return false; if ( c >= 'A' && c <= 'Z' ) return false; That isn't right for EBCDIC. See http://en.wikipedia.org/wiki/EBCDIC. Although that is being pedantic - there is little chance the code will ever run on an non-ASCII system. But before changing anything, we really need to figure out what our Boost standard is. How about anything 0x20-0x7E plus 0x09, 0x0A, 0x0D? (0x09 is a tab, but we already have a more specific check for that.) --Beman

At 5:59 PM -0400 6/30/08, Beman Dawes wrote:
John Maddock wrote:
Would it be possible for the inspection report to print out the line containing the non-ASCII characters? There are a few files that are being flagged up, where I just can't find anything wrong with them :-(
Hum... Take a look at trunk\tools\inspect\ascii_check.cpp
It seems misnamed; it is apparently really checking for characters the c++ standard says are OK in source programs, regardless of encoding.
Also, I notice the code:
if ( c >= 'a' && c <= 'z' ) return false; if ( c >= 'A' && c <= 'Z' ) return false;
That isn't right for EBCDIC. See http://en.wikipedia.org/wiki/EBCDIC. Although that is being pedantic - there is little chance the code will ever run on an non-ASCII system.
But before changing anything, we really need to figure out what our Boost standard is. How about anything 0x20-0x7E plus 0x09, 0x0A, 0x0D?
I'll be happy to make that change, and to print the offending line - just let me know what people want ;-) -- -- Marshall Marshall Clow Idio Software <mailto:marshall@idio.com> It is by caffeine alone I set my mind in motion. It is by the beans of Java that thoughts acquire speed, the hands acquire shaking, the shaking becomes a warning. It is by caffeine alone I set my mind in motion.

Beman Dawes wrote:
John Maddock wrote:
Would it be possible for the inspection report to print out the line containing the non-ASCII characters? There are a few files that are being flagged up, where I just can't find anything wrong with them :-(
Hum... Take a look at trunk\tools\inspect\ascii_check.cpp
It seems misnamed; it is apparently really checking for characters the c++ standard says are OK in source programs, regardless of encoding.
Also, I notice the code:
if ( c >= 'a' && c <= 'z' ) return false; if ( c >= 'A' && c <= 'Z' ) return false;
That isn't right for EBCDIC. See http://en.wikipedia.org/wiki/EBCDIC. Although that is being pedantic - there is little chance the code will ever run on an non-ASCII system.
But before changing anything, we really need to figure out what our Boost standard is. How about anything 0x20-0x7E plus 0x09, 0x0A, 0x0D?
(0x09 is a tab, but we already have a more specific check for that.)
I'm all for strict checking, but at present it's too hard to find out what's wrong :-( Does the character set in section 2.2 apply to the contents of strings as well as code BTW? If so then the strict checks may be justified...? John.

At 9:58 AM +0100 7/1/08, John Maddock wrote:
Does the character set in section 2.2 apply to the contents of strings as well as code BTW? If so then the strict checks may be justified...?
My understanding is that they do (and in comments, which is where many of these problems are occurring) -- -- Marshall Marshall Clow Idio Software <mailto:marshall@idio.com> It is by caffeine alone I set my mind in motion. It is by the beans of Java that thoughts acquire speed, the hands acquire shaking, the shaking becomes a warning. It is by caffeine alone I set my mind in motion.

2008/6/29 John Maddock <john@johnmaddock.co.uk>:
Would it be possible for the inspection report to print out the line containing the non-ASCII characters?
That's a good idea, but will take a little effort. Can you create a ticket? A quicker fix might be to just to list the problematic characters.
There are a few files that are being flagged up, where I just can't find anything wrong with them :-(
You forgot the 'A' in ASCII: http://svn.boost.org/trac/boost/changeset/46943/ If you can get your text editor to help you, this is a lot easier to fix. If it doesn't have explicit support or a plugin, you could just try opening it with the wrong encoding and seeing what looks weird. I suppose running Visual C++ with full warnings would also help. Daniel

Daniel James wrote:
You forgot the 'A' in ASCII:
Ah, thanks for spotting those.
If you can get your text editor to help you, this is a lot easier to fix. If it doesn't have explicit support or a plugin, you could just try opening it with the wrong encoding and seeing what looks weird. I suppose running Visual C++ with full warnings would also help.
Nope no level 4 warnings from VC++. I found some more by searching for [^a-zA-Z0-9_{}\[\]#()<>%\:;.?*+-/^&|~!\\"'= ] but they were within string literals that were regular expressions: for example use of the "$" sign in a regex seems to trigger the new checking code? If so there's no way to "fix" that :-( John.

2008/7/1 John Maddock <john@johnmaddock.co.uk>:
Daniel James wrote:
If you can get your text editor to help you, this is a lot easier to fix. If it doesn't have explicit support or a plugin, you could just try opening it with the wrong encoding and seeing what looks weird. I suppose running Visual C++ with full warnings would also help.
Nope no level 4 warnings from VC++.
Well, whatever the issue is with Visual C++. But it's not just Visual C++ that has problems. For example, our doxygen setup chokes on any non UTF-8 characters.
I found some more by searching for [^a-zA-Z0-9_{}\[\]#()<>%\:;.?*+-/^&|~!\\"'= ] but they were within string literals that were regular expressions: for example use of the "$" sign in a regex seems to trigger the new checking code? If so there's no way to "fix" that :-(
$ is allowed, you've missed some of the characters: static const string gPunct ( "$_{}[]#()<>%:;.?*+-/^&|~!=,\\\"'@^`" ); If there are any other characters that are reasonable in strings, they should be added, or a separate check for strings implemented. But, sadly, currency symbols (apart from the dollar, obviously) and accented characters really do cause problems. Daniel

At 11:08 AM +0100 7/1/08, Daniel James wrote:
2008/6/29 John Maddock <john@johnmaddock.co.uk>:
Would it be possible for the inspection report to print out the line containing the non-ASCII characters?
Done. Checked into the trunk as revision 46981. New reports up at: <http://www.idio.com/misc/inspect-release.html> <http://www.idio.com/misc/inspect-trunk.html> [ Please ignore the complaints about files named ".DS_Store"; that's just the Mac OS Finder droppings. ] -- -- Marshall Marshall Clow Idio Software <mailto:marshall@idio.com> It is by caffeine alone I set my mind in motion. It is by the beans of Java that thoughts acquire speed, the hands acquire shaking, the shaking becomes a warning. It is by caffeine alone I set my mind in motion.

Marshall Clow wrote:
New reports up at: <http://www.idio.com/misc/inspect-release.html> <http://www.idio.com/misc/inspect-trunk.html>
Ok, I have a small problem: spirit libs/spirit/example/qi/mini_xml_samples/1.xml: *C*, *L* libs/spirit/example/qi/mini_xml_samples/2.xml: *C*, *L* libs/spirit/example/qi/mini_xml_samples/3.xml: *C*, *L* These are sample files for the very-minimal tiny-xml sample. The parser is very simple and does not parse comments. Adding rules for comments, just to satisfy the inspection report, does not seem to be justified as it will add noise to the example. Here's an example: <foo> <bar>bar 1</bar> <bar>bar 2</bar> <bar>bar 3</bar> </foo> What can I do about this? Do we have some way to inhibit checking for such /special-case/ files? Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net

Joel,
New reports up at: <http://www.idio.com/misc/inspect-release.html> <http://www.idio.com/misc/inspect-trunk.html>
Ok, I have a small problem:
spirit
libs/spirit/example/qi/mini_xml_samples/1.xml: *C*, *L* libs/spirit/example/qi/mini_xml_samples/2.xml: *C*, *L* libs/spirit/example/qi/mini_xml_samples/3.xml: *C*, *L*
These are sample files for the very-minimal tiny-xml sample. The parser is very simple and does not parse comments. Adding rules for comments, just to satisfy the inspection report, does not seem to be justified as it will add noise to the example. Here's an example:
<foo> <bar>bar 1</bar> <bar>bar 2</bar> <bar>bar 3</bar> </foo>
What can I do about this? Do we have some way to inhibit checking for such /special-case/ files?
We could remove the files completely and put the xml snippets as strings into the example. I know this somehow defeats the intent to show how to parse arbitrary simple xml, but hey it's a sample... Regards Hartmut

Hartmut Kaiser wrote:
Joel,
New reports up at: <http://www.idio.com/misc/inspect-release.html> <http://www.idio.com/misc/inspect-trunk.html> Ok, I have a small problem:
spirit
libs/spirit/example/qi/mini_xml_samples/1.xml: *C*, *L* libs/spirit/example/qi/mini_xml_samples/2.xml: *C*, *L* libs/spirit/example/qi/mini_xml_samples/3.xml: *C*, *L*
These are sample files for the very-minimal tiny-xml sample. The parser is very simple and does not parse comments. Adding rules for comments, just to satisfy the inspection report, does not seem to be justified as it will add noise to the example. Here's an example:
<foo> <bar>bar 1</bar> <bar>bar 2</bar> <bar>bar 3</bar> </foo>
What can I do about this? Do we have some way to inhibit checking for such /special-case/ files?
We could remove the files completely and put the xml snippets as strings into the example. I know this somehow defeats the intent to show how to parse arbitrary simple xml, but hey it's a sample...
Or you could open the file and skip over the first N lines before parsing the XML. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

David Abrahams wrote:
Hartmut Kaiser wrote:
Joel,
New reports up at: <http://www.idio.com/misc/inspect-release.html> <http://www.idio.com/misc/inspect-trunk.html> Ok, I have a small problem:
spirit
libs/spirit/example/qi/mini_xml_samples/1.xml: *C*, *L* libs/spirit/example/qi/mini_xml_samples/2.xml: *C*, *L* libs/spirit/example/qi/mini_xml_samples/3.xml: *C*, *L*
These are sample files for the very-minimal tiny-xml sample. The parser is very simple and does not parse comments. Adding rules for comments, just to satisfy the inspection report, does not seem to be justified as it will add noise to the example. Here's an example:
<foo> <bar>bar 1</bar> <bar>bar 2</bar> <bar>bar 3</bar> </foo>
What can I do about this? Do we have some way to inhibit checking for such /special-case/ files? We could remove the files completely and put the xml snippets as strings into the example. I know this somehow defeats the intent to show how to parse arbitrary simple xml, but hey it's a sample...
Or you could open the file and skip over the first N lines before parsing the XML.
I think I'll just rename it *.toyxml. It's not real xml anyway. Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net

Joel de Guzman wrote:
David Abrahams wrote:
Or you could open the file and skip over the first N lines before parsing the XML.
I think I'll just rename it *.toyxml. It's not real xml anyway.
How will that address the inspection report issues? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

David Abrahams wrote:
Joel de Guzman wrote:
David Abrahams wrote:
Or you could open the file and skip over the first N lines before parsing the XML. I think I'll just rename it *.toyxml. It's not real xml anyway.
How will that address the inspection report issues?
The inspections don't look at all files. Just a specified list for each type of inspection. *.toyxml isn't in any of the list. Also, many of the inspectors can be disabled for specific files by embedding comments in the file with some magic words. We need to document that better. --Beman

Marshall Clow wrote:
Done. Checked into the trunk as revision 46981.
New reports up at: <http://www.idio.com/misc/inspect-release.html> <http://www.idio.com/misc/inspect-trunk.html>
Yeh! Thanks Marshall! John.

At 9:48 PM -0700 7/1/08, Marshall Clow wrote:
At 11:08 AM +0100 7/1/08, Daniel James wrote:
2008/6/29 John Maddock <john@johnmaddock.co.uk>:
Would it be possible for the inspection report to print out the line containing the non-ASCII characters?
Done. Checked into the trunk as revision 46981.
New reports up at: <http://www.idio.com/misc/inspect-release.html> <http://www.idio.com/misc/inspect-trunk.html>
[ Please ignore the complaints about files named ".DS_Store"; that's just the Mac OS Finder droppings. ]
I have been updating this report every day or two. Just FYI... -- -- Marshall Marshall Clow Idio Software <mailto:marshall@idio.com> It is by caffeine alone I set my mind in motion. It is by the beans of Java that thoughts acquire speed, the hands acquire shaking, the shaking becomes a warning. It is by caffeine alone I set my mind in motion.
participants (8)
-
Beman Dawes
-
Daniel James
-
David Abrahams
-
Dean Michael Berris
-
Hartmut Kaiser
-
Joel de Guzman
-
John Maddock
-
Marshall Clow