Inspection report and non-ASCII characters

older
[dynamic_bitset] bug in to_ulong()...

John Maddock

29 Jun 2008 29 Jun '08

9:05 a.m.

Would it be possible for the inspection report to print out the line containing the non-ASCII characters? There are a few files that are being flagged up, where I just can't find anything wrong with them :-( Thanks, John.

Show replies by date

Dean Michael Berris

30 Jun 30 Jun

2:57 a.m.

On Sun, Jun 29, 2008 at 5:05 PM, John Maddock <john@johnmaddock.co.uk> wrote:

...

Would it be possible for the inspection report to print out the line containing the non-ASCII characters? There are a few files that are being flagged up, where I just can't find anything wrong with them :-(

How about whitespace? End-of-line or end-of-file issues perhaps? -- Dean Michael C. Berris Software Engineer, Friendster, Inc.

Beman Dawes

9:59 p.m.

John Maddock wrote:

...

Would it be possible for the inspection report to print out the line containing the non-ASCII characters? There are a few files that are being flagged up, where I just can't find anything wrong with them :-(

Hum... Take a look at trunk\tools\inspect\ascii_check.cpp It seems misnamed; it is apparently really checking for characters the c++ standard says are OK in source programs, regardless of encoding. Also, I notice the code: if ( c >= 'a' && c <= 'z' ) return false; if ( c >= 'A' && c <= 'Z' ) return false; That isn't right for EBCDIC. See http://en.wikipedia.org/wiki/EBCDIC. Although that is being pedantic - there is little chance the code will ever run on an non-ASCII system. But before changing anything, we really need to figure out what our Boost standard is. How about anything 0x20-0x7E plus 0x09, 0x0A, 0x0D? (0x09 is a tab, but we already have a more specific check for that.) --Beman

Marshall Clow

10:02 p.m.

At 5:59 PM -0400 6/30/08, Beman Dawes wrote:

...

John Maddock wrote:

...
Would it be possible for the inspection report to print out the line containing the non-ASCII characters? There are a few files that are being flagged up, where I just can't find anything wrong with them :-(

Hum... Take a look at trunk\tools\inspect\ascii_check.cpp

It seems misnamed; it is apparently really checking for characters the c++ standard says are OK in source programs, regardless of encoding.

Also, I notice the code:

if ( c >= 'a' && c <= 'z' ) return false; if ( c >= 'A' && c <= 'Z' ) return false;

That isn't right for EBCDIC. See http://en.wikipedia.org/wiki/EBCDIC. Although that is being pedantic - there is little chance the code will ever run on an non-ASCII system.

But before changing anything, we really need to figure out what our Boost standard is. How about anything 0x20-0x7E plus 0x09, 0x0A, 0x0D?

I'll be happy to make that change, and to print the offending line - just let me know what people want ;-) -- -- Marshall Marshall Clow Idio Software <mailto:marshall@idio.com> It is by caffeine alone I set my mind in motion. It is by the beans of Java that thoughts acquire speed, the hands acquire shaking, the shaking becomes a warning. It is by caffeine alone I set my mind in motion.

John Maddock

1 Jul 1 Jul

8:58 a.m.

Beman Dawes wrote:

...

...
John Maddock wrote:

...
Would it be possible for the inspection report to print out the line containing the non-ASCII characters? There are a few files that are being flagged up, where I just can't find anything wrong with them :-(

Hum... Take a look at trunk\tools\inspect\ascii_check.cpp

It seems misnamed; it is apparently really checking for characters the c++ standard says are OK in source programs, regardless of encoding.

Also, I notice the code:

if ( c >= 'a' && c <= 'z' ) return false; if ( c >= 'A' && c <= 'Z' ) return false;

That isn't right for EBCDIC. See http://en.wikipedia.org/wiki/EBCDIC. Although that is being pedantic - there is little chance the code will ever run on an non-ASCII system.

But before changing anything, we really need to figure out what our Boost standard is. How about anything 0x20-0x7E plus 0x09, 0x0A, 0x0D?

(0x09 is a tab, but we already have a more specific check for that.)

I'm all for strict checking, but at present it's too hard to find out what's wrong :-( Does the character set in section 2.2 apply to the contents of strings as well as code BTW? If so then the strict checks may be justified...? John.

Marshall Clow

1:31 p.m.

At 9:58 AM +0100 7/1/08, John Maddock wrote:

...

Does the character set in section 2.2 apply to the contents of strings as well as code BTW? If so then the strict checks may be justified...?

My understanding is that they do (and in comments, which is where many of these problems are occurring) -- -- Marshall Marshall Clow Idio Software <mailto:marshall@idio.com> It is by caffeine alone I set my mind in motion. It is by the beans of Java that thoughts acquire speed, the hands acquire shaking, the shaking becomes a warning. It is by caffeine alone I set my mind in motion.

Daniel James

10:08 a.m.

2008/6/29 John Maddock <john@johnmaddock.co.uk>:

...

Would it be possible for the inspection report to print out the line containing the non-ASCII characters?

That's a good idea, but will take a little effort. Can you create a ticket? A quicker fix might be to just to list the problematic characters.

...

There are a few files that are being flagged up, where I just can't find anything wrong with them :-(

You forgot the 'A' in ASCII: http://svn.boost.org/trac/boost/changeset/46943/ If you can get your text editor to help you, this is a lot easier to fix. If it doesn't have explicit support or a plugin, you could just try opening it with the wrong encoding and seeing what looks weird. I suppose running Visual C++ with full warnings would also help. Daniel

John Maddock

11:45 a.m.

Daniel James wrote:

...

...
You forgot the 'A' in ASCII:

http://svn.boost.org/trac/boost/changeset/46943/

Ah, thanks for spotting those.

...

...
If you can get your text editor to help you, this is a lot easier to fix. If it doesn't have explicit support or a plugin, you could just try opening it with the wrong encoding and seeing what looks weird. I suppose running Visual C++ with full warnings would also help.

Nope no level 4 warnings from VC++. I found some more by searching for [^a-zA-Z0-9_{}\[\]#()<>%\:;.?*+-/^&|~!\\"'= ] but they were within string literals that were regular expressions: for example use of the "$" sign in a regex seems to trigger the new checking code? If so there's no way to "fix" that :-( John.

Daniel James

12:15 p.m.

2008/7/1 John Maddock <john@johnmaddock.co.uk>:

...

Daniel James wrote:

...

...
...
If you can get your text editor to help you, this is a lot easier to fix. If it doesn't have explicit support or a plugin, you could just try opening it with the wrong encoding and seeing what looks weird. I suppose running Visual C++ with full warnings would also help.

Nope no level 4 warnings from VC++.

Well, whatever the issue is with Visual C++. But it's not just Visual C++ that has problems. For example, our doxygen setup chokes on any non UTF-8 characters.

...

I found some more by searching for [^a-zA-Z0-9_{}\[\]#()<>%\:;.?*+-/^&|~!\\"'= ] but they were within string literals that were regular expressions: for example use of the "$" sign in a regex seems to trigger the new checking code? If so there's no way to "fix" that :-(

$ is allowed, you've missed some of the characters: static const string gPunct ( "$_{}[]#()<>%:;.?*+-/^&|~!=,\\\"'@^`" ); If there are any other characters that are reasonable in strings, they should be added, or a separate check for strings implemented. But, sadly, currency symbols (apart from the dollar, obviously) and accented characters really do cause problems. Daniel

Marshall Clow

2 Jul 2 Jul

4:48 a.m.

At 11:08 AM +0100 7/1/08, Daniel James wrote:

...

2008/6/29 John Maddock <john@johnmaddock.co.uk>:

...
Would it be possible for the inspection report to print out the line containing the non-ASCII characters?

Done. Checked into the trunk as revision 46981. New reports up at: <http://www.idio.com/misc/inspect-release.html> <http://www.idio.com/misc/inspect-trunk.html> [ Please ignore the complaints about files named ".DS_Store"; that's just the Mac OS Finder droppings. ] -- -- Marshall Marshall Clow Idio Software <mailto:marshall@idio.com> It is by caffeine alone I set my mind in motion. It is by the beans of Java that thoughts acquire speed, the hands acquire shaking, the shaking becomes a warning. It is by caffeine alone I set my mind in motion.

Joel de Guzman

5:42 a.m.

New subject: Inspection report (*C*, *L*)

Marshall Clow wrote:

...

New reports up at: <http://www.idio.com/misc/inspect-release.html> <http://www.idio.com/misc/inspect-trunk.html>

Ok, I have a small problem: spirit libs/spirit/example/qi/mini_xml_samples/1.xml: *C*, *L* libs/spirit/example/qi/mini_xml_samples/2.xml: *C*, *L* libs/spirit/example/qi/mini_xml_samples/3.xml: *C*, *L* These are sample files for the very-minimal tiny-xml sample. The parser is very simple and does not parse comments. Adding rules for comments, just to satisfy the inspection report, does not seem to be justified as it will add noise to the example. Here's an example: <foo> <bar>bar 1</bar> <bar>bar 2</bar> <bar>bar 3</bar> </foo> What can I do about this? Do we have some way to inhibit checking for such /special-case/ files? Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net

Hartmut Kaiser

12:52 p.m.

New subject: Inspection report (*C*, *L*)

Joel,

...

...
New reports up at: <http://www.idio.com/misc/inspect-release.html> <http://www.idio.com/misc/inspect-trunk.html>

Ok, I have a small problem:

spirit

libs/spirit/example/qi/mini_xml_samples/1.xml: *C*, *L* libs/spirit/example/qi/mini_xml_samples/2.xml: *C*, *L* libs/spirit/example/qi/mini_xml_samples/3.xml: *C*, *L*

These are sample files for the very-minimal tiny-xml sample. The parser is very simple and does not parse comments. Adding rules for comments, just to satisfy the inspection report, does not seem to be justified as it will add noise to the example. Here's an example:

<foo> <bar>bar 1</bar> <bar>bar 2</bar> <bar>bar 3</bar> </foo>

What can I do about this? Do we have some way to inhibit checking for such /special-case/ files?

We could remove the files completely and put the xml snippets as strings into the example. I know this somehow defeats the intent to show how to parse arbitrary simple xml, but hey it's a sample... Regards Hartmut

David Abrahams

12:54 p.m.

New subject: Inspection report (*C*, *L*)

Hartmut Kaiser wrote:

...

Joel,

...
...
New reports up at: <http://www.idio.com/misc/inspect-release.html> <http://www.idio.com/misc/inspect-trunk.html> Ok, I have a small problem:

spirit

libs/spirit/example/qi/mini_xml_samples/1.xml: *C*, *L* libs/spirit/example/qi/mini_xml_samples/2.xml: *C*, *L* libs/spirit/example/qi/mini_xml_samples/3.xml: *C*, *L*

These are sample files for the very-minimal tiny-xml sample. The parser is very simple and does not parse comments. Adding rules for comments, just to satisfy the inspection report, does not seem to be justified as it will add noise to the example. Here's an example:

<foo> <bar>bar 1</bar> <bar>bar 2</bar> <bar>bar 3</bar> </foo>

What can I do about this? Do we have some way to inhibit checking for such /special-case/ files?

We could remove the files completely and put the xml snippets as strings into the example. I know this somehow defeats the intent to show how to parse arbitrary simple xml, but hey it's a sample...

Or you could open the file and skip over the first N lines before parsing the XML. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Joel de Guzman

1:42 p.m.

New subject: Inspection report (*C*, *L*)

David Abrahams wrote:

...

Hartmut Kaiser wrote:

...
Joel,

...
...
New reports up at: <http://www.idio.com/misc/inspect-release.html> <http://www.idio.com/misc/inspect-trunk.html> Ok, I have a small problem:

spirit

libs/spirit/example/qi/mini_xml_samples/1.xml: *C*, *L* libs/spirit/example/qi/mini_xml_samples/2.xml: *C*, *L* libs/spirit/example/qi/mini_xml_samples/3.xml: *C*, *L*

These are sample files for the very-minimal tiny-xml sample. The parser is very simple and does not parse comments. Adding rules for comments, just to satisfy the inspection report, does not seem to be justified as it will add noise to the example. Here's an example:

<foo> <bar>bar 1</bar> <bar>bar 2</bar> <bar>bar 3</bar> </foo>

What can I do about this? Do we have some way to inhibit checking for such /special-case/ files? We could remove the files completely and put the xml snippets as strings into the example. I know this somehow defeats the intent to show how to parse arbitrary simple xml, but hey it's a sample...

Or you could open the file and skip over the first N lines before parsing the XML.

I think I'll just rename it *.toyxml. It's not real xml anyway. Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net

David Abrahams

3:35 p.m.

New subject: Inspection report (*C*, *L*)

Joel de Guzman wrote:

...

David Abrahams wrote:

...
Or you could open the file and skip over the first N lines before parsing the XML.

I think I'll just rename it *.toyxml. It's not real xml anyway.

How will that address the inspection report issues? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Beman Dawes

5:30 p.m.

New subject: Inspection report (*C*, *L*)

David Abrahams wrote:

...

Joel de Guzman wrote:

...
David Abrahams wrote:

...
Or you could open the file and skip over the first N lines before parsing the XML. I think I'll just rename it *.toyxml. It's not real xml anyway.

How will that address the inspection report issues?

The inspections don't look at all files. Just a specified list for each type of inspection. *.toyxml isn't in any of the list. Also, many of the inspectors can be disabled for specific files by embedding comments in the file with some magic words. We need to document that better. --Beman

John Maddock

8:35 a.m.

Marshall Clow wrote:

...

...
Done. Checked into the trunk as revision 46981.

New reports up at: <http://www.idio.com/misc/inspect-release.html> <http://www.idio.com/misc/inspect-trunk.html>

Yeh! Thanks Marshall! John.

Marshall Clow

5 Jul 5 Jul

7:55 p.m.

At 9:48 PM -0700 7/1/08, Marshall Clow wrote:

...

At 11:08 AM +0100 7/1/08, Daniel James wrote:

...
2008/6/29 John Maddock <john@johnmaddock.co.uk>:

...
Would it be possible for the inspection report to print out the line containing the non-ASCII characters?

Done. Checked into the trunk as revision 46981.

New reports up at: <http://www.idio.com/misc/inspect-release.html> <http://www.idio.com/misc/inspect-trunk.html>

[ Please ignore the complaints about files named ".DS_Store"; that's just the Mac OS Finder droppings. ]

I have been updating this report every day or two. Just FYI... -- -- Marshall Marshall Clow Idio Software <mailto:marshall@idio.com> It is by caffeine alone I set my mind in motion. It is by the beans of Java that thoughts acquire speed, the hands acquire shaking, the shaking becomes a warning. It is by caffeine alone I set my mind in motion.

6240

Age (days ago)

6246

Last active (days ago)

List overview

Download

17 comments

8 participants

participants (8)

Beman Dawes
Daniel James
David Abrahams
Dean Michael Berris
Hartmut Kaiser
Joel de Guzman
John Maddock
Marshall Clow