[inspect] Hall of Shame plus non-ASCII characters

newer
[1.36] version.hpp still at 1.35.1

Beman Dawes

26 Jun 2008 26 Jun '08

2:22 a.m.

There is a fresh run of the trunk inspect report up at http://mysite.verizon.net/beman/inspect.html A couple of things are different: * A "Hall of Shame" has been added to highlight what libraries are the worst offenders. I'm open to suggestions as to at what point we should cut off reporting. Maybe limit it to the worst 10 libraries? * A check for non-ASCII characters in source files has been added by Marshall Clow. It is picking up non-ASCII characters in people's names in copyright messages; that's why Boost multi-index looks so bad in the report. We need to come up with preferred approach for those with non-ASCII characters in their names. --Beman

Show replies by date

Joel de Guzman

26 Jun 26 Jun

3:07 a.m.

Beman Dawes wrote:

...

There is a fresh run of the trunk inspect report up at http://mysite.verizon.net/beman/inspect.html

Question: How is this tested for c++ files?: *A* invalid bookmarks, invalid urls, broken links, unlinked files Fusion got lots of *A*s, but I don't understand why. Take: libs/fusion/test/sequence/tuple_comparison.cpp for example. How did it become an unlinked file? What am I missing? Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net

vicente.botet

6:14 a.m.

----- Original Message ----- From: "Joel de Guzman" <joel@boost-consulting.com> To: <boost@lists.boost.org> Sent: Thursday, June 26, 2008 5:07 AM Subject: Re: [boost] [inspect] Hall of Shame plus non-ASCII characters

...

Beman Dawes wrote:

...
There is a fresh run of the trunk inspect report up at http://mysite.verizon.net/beman/inspect.html

Question:

How is this tested for c++ files?:

*A* invalid bookmarks, invalid urls, broken links, unlinked files

Fusion got lots of *A*s, but I don't understand why. Take:

libs/fusion/test/sequence/tuple_comparison.cpp

for example. How did it become an unlinked file? What am I missing?

Hi, There is a problem with the *A* tests. It is repeated *L* missing Boost license info, or wrong reference text *C* missing copyright notice *R* invalid (cr only) line-ending ====*A* invalid bookmarks, invalid urls, broken links, unlinked files *N* file/directory names issues *T* tabs in file ====*A* non-ASCII chars in file *M* uses of min or max that have not been protected from the min/max macros, or unallowed #undef-s *U* unnamed namespace in header Best Vicente

gchen

7:52 a.m.

Joel de Guzman wrote:

...

Beman Dawes wrote:

...
There is a fresh run of the trunk inspect report up at http://mysite.verizon.net/beman/inspect.html

Question:

How is this tested for c++ files?:

*A* invalid bookmarks, invalid urls, broken links, unlinked files

Fusion got lots of *A*s, but I don't understand why. Take:

libs/fusion/test/sequence/tuple_comparison.cpp

I think the *A* is caused by the ä in Jaakko's surname. It could be the same reason for lots of *A* in other libraries.

Joel de Guzman

8:29 a.m.

gchen wrote:

...

Joel de Guzman wrote:

...
Beman Dawes wrote:

...
There is a fresh run of the trunk inspect report up at http://mysite.verizon.net/beman/inspect.html

Question:

How is this tested for c++ files?:

*A* invalid bookmarks, invalid urls, broken links, unlinked files

Fusion got lots of *A*s, but I don't understand why. Take:

libs/fusion/test/sequence/tuple_comparison.cpp

I think the *A* is caused by the ä in Jaakko's surname. It could be the same reason for lots of *A* in other libraries.

Ouch. How do you spell Jaakko Järvi in plain ASCII? Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net

Sebastian Redl

8:33 a.m.

Joel de Guzman wrote:

...

Ouch. How do you spell Jaakko Järvi in plain ASCII? The German transliteration of the ä is ae. Not sure if that applies to Scandinavian names, but I think so.

Sebastian

joaquin＠tid.es

8:44 a.m.

Sebastian Redl escribió:

...

Joel de Guzman wrote:

...
Ouch. How do you spell Jaakko Järvi in plain ASCII?

The German transliteration of the ä is ae. Not sure if that applies to Scandinavian names, but I think so.

Jaakko Järvi is Finnish, and seems like ä --> ae is not acceptable in that language, the custom being ä --> a instead: http://tinyurl.com/6m2gkq So, I think the transliteration should be Jaakko Jarvi, but surely some Finnish colleague can shed a light here. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

gchen

8:44 a.m.

Joel de Guzman wrote:

...

gchen wrote:

...
Joel de Guzman wrote:

...
Beman Dawes wrote:

...
There is a fresh run of the trunk inspect report up at http://mysite.verizon.net/beman/inspect.html

Question:

How is this tested for c++ files?:

*A* invalid bookmarks, invalid urls, broken links, unlinked files

Fusion got lots of *A*s, but I don't understand why. Take:

libs/fusion/test/sequence/tuple_comparison.cpp

I think the *A* is caused by the ä in Jaakko's surname. It could be the same reason for lots of *A* in other libraries.

Ouch. How do you spell Jaakko Järvi in plain ASCII?

I know some languages can use 'ae' to stand for ä, but this may not be valid for all other languages, so this non-ASCII character issue seems a little puzzle to solve, to me.

Jens Seidel

10:56 a.m.

On Thu, Jun 26, 2008 at 04:44:11PM +0800, gchen wrote:

...

Joel de Guzman wrote:

...
Ouch. How do you spell Jaakko Järvi in plain ASCII?

I know some languages can use 'ae' to stand for ä, but this may not be valid for all other languages, so this non-ASCII character issue seems a little puzzle to solve, to me.

Let's ask iconv for a transliteration: echo "Jaakko Järvi" | iconv -t ascii//TRANSLIT Jaakko Jaervi It seems to prefer "ae" even in fi_FI locale ... Jens

Daniel James

11:50 a.m.

2008/6/26 Jens Seidel <jensseidel@users.sf.net>:

...

On Thu, Jun 26, 2008 at 04:44:11PM +0800, gchen wrote:

...
Joel de Guzman wrote:

...
Ouch. How do you spell Jaakko Järvi in plain ASCII?

I know some languages can use 'ae' to stand for ä, but this may not be valid for all other languages, so this non-ASCII character issue seems a little puzzle to solve, to me.

Let's ask iconv for a transliteration:

echo "Jaakko Järvi" | iconv -t ascii//TRANSLIT Jaakko Jaervi

He uses Jarvi: http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?People/Jaakko... Daniel

Beman Dawes

4:07 p.m.

Daniel James wrote:

...

2008/6/26 Jens Seidel <jensseidel@users.sf.net>:

...
On Thu, Jun 26, 2008 at 04:44:11PM +0800, gchen wrote:

...
Joel de Guzman wrote:

...
Ouch. How do you spell Jaakko Järvi in plain ASCII? I know some languages can use 'ae' to stand for ä, but this may not be valid for all other languages, so this non-ASCII character issue seems a little puzzle to solve, to me. Let's ask iconv for a transliteration:

echo "Jaakko Järvi" | iconv -t ascii//TRANSLIT Jaakko Jaervi

He uses Jarvi:

http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?People/Jaakko...

Developers get to choose how they spells their own names:-) I've pinged Jaakko to be sure it is OK if we go ahead and make the change. --Beman

jarvij＠gmail.com

6:09 p.m.

Jaakko Jarvi is fine. (my passport says Jaervi, but I just tend to drop the dots from ä). Jens Seidel <jensseidel@users.sf.net> writes:

...

On Thu, Jun 26, 2008 at 04:44:11PM +0800, gchen wrote:

...
Joel de Guzman wrote:

...
Ouch. How do you spell Jaakko Järvi in plain ASCII?

I know some languages can use 'ae' to stand for ä, but this may not be valid for all other languages, so this non-ASCII character issue seems a little puzzle to solve, to me.

Let's ask iconv for a transliteration:

echo "Jaakko Järvi" | iconv -t ascii//TRANSLIT Jaakko Jaervi

It seems to prefer "ae" even in fi_FI locale ...

Jens _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

joaquin＠tid.es

8:14 a.m.

Beman Dawes escribió:

...

There is a fresh run of the trunk inspect report up at http://mysite.verizon.net/beman/inspect.html

A couple of things are different:

* A "Hall of Shame" has been added to highlight what libraries are the worst offenders. I'm open to suggestions as to at what point we should cut off reporting. Maybe limit it to the worst 10 libraries?

* A check for non-ASCII characters in source files has been added by Marshall Clow. It is picking up non-ASCII characters in people's names in copyright messages; that's why Boost multi-index looks so bad in the report. We need to come up with preferred approach for those with non-ASCII characters in their names.

I think the options are: 1. The inspect tool is modified so as to bypass author names (possibly taken from an author names file). In a sense, this defeats the whole purpose of the non-ASCII check, I guess. 2. Supress all diacritical marks: Joaquín M López Muñoz --> Joaquin M Lopez Munoz 3. Encode with HTML entities: Joaquín M López Muñoz --> Joaquín M López Muñoz Whatever approach is agreed upon I'll happily apply asap to Boost.MultiIndex. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Beman Dawes

10:03 a.m.

joaquin@tid.es wrote:

...

Beman Dawes escribió:

...
There is a fresh run of the trunk inspect report up at http://mysite.verizon.net/beman/inspect.html

A couple of things are different:

* A "Hall of Shame" has been added to highlight what libraries are the worst offenders. I'm open to suggestions as to at what point we should cut off reporting. Maybe limit it to the worst 10 libraries?

* A check for non-ASCII characters in source files has been added by Marshall Clow. It is picking up non-ASCII characters in people's names in copyright messages; that's why Boost multi-index looks so bad in the report. We need to come up with preferred approach for those with non-ASCII characters in their names.

I think the options are:

1. The inspect tool is modified so as to bypass author names (possibly taken from an author names file). In a sense, this defeats the whole purpose of the non-ASCII check, I guess.

There just isn't a way in standard C++ to deal with non-ASCII characters that will preserve their correct display on all systems, and avoids errors and/or warnings on some Asian language systems.

...

2. Supress all diacritical marks:

Joaquín M López Muñoz --> Joaquin M Lopez Munoz

I think that's really the only viable choice. Authors are free to use and transformation they desire, as long as it is entirely ASCII.

...

3. Encode with HTML entities:

Joaquín M López Muñoz --> Joaquín M López Muñoz

That makes the name much less readable except when viewed with a web browser or other HTML aware renderer.

...

Whatever approach is agreed upon I'll happily apply asap to Boost.MultiIndex.

Unless someone else comes up with an unexpected solution, I think you are going to have to become Joaquín M López Muñoz --> Joaquin M Lopez Munoz:-) --Beman

hervebronnimann＠mac.com

1:24 p.m.

I'm sure that Joaquin, Jaakko, Ion and I (among others) have long ago been used to the transliteration of our names. After being called Arve by credit cards and phone vendors (a good test for hanging up quickly!) I for one no longer care about it, although I write it correctly in latex, word, html, etc. and I did edit by hand and get notarized the birth certificates of my children and they are learning to write their names properly :-) Seriously, given that alll documentation is in English, why should we care that a copyright name is transliterated? It still has legal value, which is the only purpose of the notice. For the rest (documentation, etc) we can put our hearts content o diacritical marks. -- Herve' Bro"nnimann (if you can read that :) Sent via BlackBerry from T-Mobile -----Original Message----- From: Beman Dawes <bdawes@acm.org> Date: Thu, 26 Jun 2008 06:03:36 To:boost@lists.boost.org Subject: Re: [boost] [inspect] Hall of Shame plus non-ASCII characters joaquin@tid.es wrote:

...

Beman Dawes escribió:

...
There is a fresh run of the trunk inspect report up at http://mysite.verizon.net/beman/inspect.html

A couple of things are different:

* A "Hall of Shame" has been added to highlight what libraries are the worst offenders. I'm open to suggestions as to at what point we should cut off reporting. Maybe limit it to the worst 10 libraries?

* A check for non-ASCII characters in source files has been added by Marshall Clow. It is picking up non-ASCII characters in people's names in copyright messages; that's why Boost multi-index looks so bad in the report. We need to come up with preferred approach for those with non-ASCII characters in their names.

I think the options are:

1. The inspect tool is modified so as to bypass author names (possibly taken from an author names file). In a sense, this defeats the whole purpose of the non-ASCII check, I guess.

There just isn't a way in standard C++ to deal with non-ASCII characters that will preserve their correct display on all systems, and avoids errors and/or warnings on some Asian language systems.

...

2. Supress all diacritical marks:

Joaquín M López Muñoz --> Joaquin M Lopez Munoz

I think that's really the only viable choice. Authors are free to use and transformation they desire, as long as it is entirely ASCII.

...

3. Encode with HTML entities:

Joaquín M López Muñoz --> Joaquín M López Muñoz

That makes the name much less readable except when viewed with a web browser or other HTML aware renderer.

...

Whatever approach is agreed upon I'll happily apply asap to Boost.MultiIndex.

Unless someone else comes up with an unexpected solution, I think you are going to have to become Joaquín M López Muñoz --> Joaquin M Lopez Munoz:-) --Beman _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Mathias Gaunard

1:40 p.m.

hervebronnimann@mac.com wrote:

...

Seriously, given that alll documentation is in English, why should we care that a copyright name is transliterated?

English does borrow some words from other languages, such as French, where accents are needed. Think of "fiancé(e)" for example. Documentation should be able to write these correctly.

Beman Dawes

4:15 p.m.

Mathias Gaunard wrote:

...

hervebronnimann@mac.com wrote:

...
Seriously, given that alll documentation is in English, why should we care that a copyright name is transliterated?

English does borrow some words from other languages, such as French, where accents are needed. Think of "fiancé(e)" for example. Documentation should be able to write these correctly.

It isn't documentation that is the issue; it is C++ source files (.cpp, .hpp, .ipp, etc). Any Unicode character should be OK in documentation files. Inspect is only supposed to check C++ source files for non-ASCII characters. --Beman

Beman Dawes

4:12 p.m.

hervebronnimann@mac.com wrote:

...

I'm sure that Joaquin, Jaakko, Ion and I (among others) have long ago been used to the transliteration of our names. After being called Arve by credit cards and phone vendors (a good test for hanging up quickly!) I for one no longer care about it, although I write it correctly in latex, word, html, etc. and I did edit by hand and get notarized the birth certificates of my children and they are learning to write their names properly :-)

Seriously, given that alll documentation is in English, why should we care that a copyright name is transliterated? It still has legal value, which is the only purpose of the notice. For the rest (documentation, etc) we can put our hearts content o diacritical marks.

-- Herve' Bro"nnimann (if you can read that :)

Thanks, Hervé --Beman

dherring＠ll.mit.edu

3:41 p.m.

On Thu, 26 Jun 2008, joaquin@tid.es wrote:

...

Beman Dawes escribió:

...
* A check for non-ASCII characters in source files has been added by Marshall Clow. It is picking up non-ASCII characters in people's names in copyright messages; that's why Boost multi-index looks so bad in the report. We need to come up with preferred approach for those with non-ASCII characters in their names.

I think the options are: ... 2. Supress all diacritical marks:

Joaquín M López Muñoz --> Joaquin M Lopez Munoz

3. Encode with HTML entities:

Joaquín M López Muñoz --> Joaquín M López Muñoz

4. Use standard LaTeX encodings: Joaquín M López Muñoz --> Joaqu\'\in M L\'opez Mu\~noz or (as if the babel package were enabled) Joaqu'in M L'opez Mu~noz See http://tobi.oetiker.ch/lshort/lshort.pdf section 2.4.8, Accents and Special Characters This seems both precise and mostly readable. - Daniel

Beman Dawes

4:19 p.m.

dherring@ll.mit.edu wrote:

...

On Thu, 26 Jun 2008, joaquin@tid.es wrote:

...
Beman Dawes escribió:

...
* A check for non-ASCII characters in source files has been added by Marshall Clow. It is picking up non-ASCII characters in people's names in copyright messages; that's why Boost multi-index looks so bad in the report. We need to come up with preferred approach for those with non-ASCII characters in their names.

I think the options are: ... 2. Supress all diacritical marks:

Joaquín M López Muñoz --> Joaquin M Lopez Munoz

3. Encode with HTML entities:

Joaquín M López Muñoz --> Joaquín M López Muñoz

4. Use standard LaTeX encodings:

Joaquín M López Muñoz --> Joaqu\'\in M L\'opez Mu\~noz or (as if the babel package were enabled) Joaqu'in M L'opez Mu~noz

See http://tobi.oetiker.ch/lshort/lshort.pdf section 2.4.8, Accents and Special Characters

This seems both precise and mostly readable.

It is OK if a developer wants to do that, we don't have any way to change how C++ compilers process text, so there is no that will have the desired effect. --Beman

Mathias Gaunard

12:22 p.m.

Beman Dawes wrote:

...

* A check for non-ASCII characters in source files has been added by Marshall Clow. It is picking up non-ASCII characters in people's names in copyright messages;

Why not simply allow comments to be in utf-8?

Marshall Clow

1:59 p.m.

At 2:22 PM +0200 6/26/08, Mathias Gaunard wrote:

...

Beman Dawes wrote:

...
* A check for non-ASCII characters in source files has been added by Marshall Clow. It is picking up non-ASCII characters in people's names in copyright messages;

Why not simply allow comments to be in utf-8?

See ticket <http://svn.boost.org/trac/boost/ticket/1736>, titled "Headers containing non ASCII characters cause MS VC to issue warning 4819 in some locales" IIRC, utf-8 sequences can contain characters that are not in the range 32 ... 127. These will trip the warnings referenced above. [ Note: I have no incoming mail until about 6 PM PDT this evening. ] -- -- Marshall Marshall Clow Idio Software <mailto:marshall@idio.com> It is by caffeine alone I set my mind in motion. It is by the beans of Java that thoughts acquire speed, the hands acquire shaking, the shaking becomes a warning. It is by caffeine alone I set my mind in motion.

Mathias Gaunard

8:08 p.m.

Marshall Clow wrote:

...

See ticket <http://svn.boost.org/trac/boost/ticket/1736>, titled "Headers containing non ASCII characters cause MS VC to issue warning 4819 in some locales"

IIRC, utf-8 sequences can contain characters that are not in the range 32 ... 127. These will trip the warnings referenced above.

I see. The ideal solution would be UTF-7 or quoted-printable, but I doubt these are supported in any text editor.

Beman Dawes

4:22 p.m.

Mathias Gaunard wrote:

...

Beman Dawes wrote:

...
* A check for non-ASCII characters in source files has been added by Marshall Clow. It is picking up non-ASCII characters in people's names in copyright messages;

Why not simply allow comments to be in utf-8?

UTF-8 causes exactly the same problem; the high-order bit is sometimes on, and that causes errors or warnings with some compiler / operating system combinations --Beman

David Abrahams

4:54 p.m.

Beman Dawes wrote:

...

There is a fresh run of the trunk inspect report up at http://mysite.verizon.net/beman/inspect.html

A couple of things are different:

* A "Hall of Shame" has been added to highlight what libraries are the worst offenders. I'm open to suggestions as to at what point we should cut off reporting. Maybe limit it to the worst 10 libraries?

Something, anything, that is much less than the whole list, please! Otherwise I promise many will ignore it -- I know I will; not through any evil intention either. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Beman Dawes

27 Jun 27 Jun

4:12 p.m.

David Abrahams wrote:

...

Beman Dawes wrote:

...
There is a fresh run of the trunk inspect report up at http://mysite.verizon.net/beman/inspect.html

A couple of things are different:

* A "Hall of Shame" has been added to highlight what libraries are the worst offenders. I'm open to suggestions as to at what point we should cut off reporting. Maybe limit it to the worst 10 libraries?

Something, anything, that is much less than the whole list, please! Otherwise I promise many will ignore it -- I know I will; not through any evil intention either.

Understood. I'm the same way. Take a look at http://mysite.verizon.net/beman/inspect.html It includes a whole bunch of changes to make the report more readable. Comments appreciated! --Beman

David Abrahams

5:11 p.m.

Beman Dawes wrote:

...

Take a look at http://mysite.verizon.net/beman/inspect.html

It includes a whole bunch of changes to make the report more readable.

Comments appreciated!

The list of worst offenders looks a lot better now that none of my work appears there ;-) -- Dave Abrahams BoostPro Computing http://www.boostpro.com

6248

Age (days ago)

6249

Last active (days ago)

List overview

Download

26 comments

15 participants

participants (15)

Beman Dawes
Daniel James
David Abrahams
dherring＠ll.mit.edu
gchen
hervebronnimann＠mac.com
jarvij＠gmail.com
Jens Seidel
joaquin＠tid.es
Joel de Guzman
Joel de Guzman
Marshall Clow
Mathias Gaunard
Sebastian Redl
vicente.botet