Re: [boost] [Boost-bugs] [ boost-Bugs-1461533 ] Non-basic-source-character-set characters conflict with MSVC

Foster Brereton

30 Mar 2006 30 Mar '06

6:52 p.m.

I have attached a list to this email (There are 354 in the list as of this writing). I couldn't figure out how to attach the file to the bug report after-the-fact. Note that these are the most likely candidate source files to cause an error when building the Boost sources -- some test cases and example sources were omitted. Blessings, Foster On 3/30/06, Marshall Clow <marshall@idio.com> wrote:

...

At 9:43 AM -0800 3/30/06, SourceForge.net wrote:

...
Bugs item #1461533, was opened at 2006-03-30 09:43 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=107586&aid=1461533&group_id=7586

[ snip ]

...
Initial Comment: There exist high-ascii characters in the Boost sources that cannot be converted to the local region for a given computer because the context of the originating region of the source is not known. Thus, these high-ascii characters may not have a proper mapping, and a warning will be emitted by MSVC 2003 and 2005. This should be considered a failure case because, though low, it is still emitting an alert. Thus, without the correct region set on your system, the Boost sources fail to build.

To whoever entered this bug (Foster? Nigel?) - please provide a list of offending files in the bug report. -- -- Marshall

Marshall Clow Idio Software <mailto:marshall@idio.com>

It is by caffeine alone I set my mind in motion. It is by the beans of Java that thoughts acquire speed, the hands acquire shaking, the shaking becomes a warning. It is by caffeine alone I set my mind in motion. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- Foster T. Brereton - Computer Scientist Software Technology Lab, Adobe Systems Incorporated fbrereto@adobe.com -- http://opensource.adobe.com

Attachments:

high_ascii_offenders.txt (text/plain — 14.0 KB)

Show replies by date

Marshall Clow

30 Mar 30 Mar

6:57 p.m.

New subject: [Boost-bugs] [ boost-Bugs-1461533 ] Non-basic-source-character-set characters conflict with MSVC

"Foster Brereton" <fosterb.boost@gmail.com> wrote:

...

I have attached a list to this email (There are 354 in the list as of this writing). I couldn't figure out how to attach the file to the bug report after-the-fact. Note that these are the most likely candidate source files to cause an error when building the Boost sources -- some test cases and example sources were omitted.

Thanks! I have attached the file to the bug report. [ You may need to be logged into SF to attach a file ] -- -- Marshall Marshall Clow Idio Software <mailto:marshall@idio.com> It is by caffeine alone I set my mind in motion. It is by the beans of Java that thoughts acquire speed, the hands acquire shaking, the shaking becomes a warning. It is by caffeine alone I set my mind in motion.

Peter Dimov

10:07 p.m.

New subject: [Boost-bugs] [ boost-Bugs-1461533 ]Non-basic-source-character-set characters conflict with MSVC

Foster Brereton wrote:

...

I have attached a list to this email (There are 354 in the list as of this writing). I couldn't figure out how to attach the file to the bug report after-the-fact. Note that these are the most likely candidate source files to cause an error when building the Boost sources -- some test cases and example sources were omitted.

Many of these are caused by a name in the copyright clause containing non-ASCII characters. Replacing ö and ø with o (even oe) in people's names doesn't seem very polite to me, and such "rechristening" may have legal implications.

Foster Brereton

10:19 p.m.

New subject: [Boost-bugs] [ boost-Bugs-1461533 ]Non-basic-source-character-set characters conflict with MSVC

I agree that mangling the proper spelling of an individual's name is inappropriate, and that a proper solution to this issue would circumvent that option. Perhaps each Boost library could have a copyright file associated with it (<library>.copyright.utf8, or some other naming convention), and the boillerplate within the sources of that Boost library could reference that copyright file (which in turn would reference the Boost License file at the root of the source tree). The high-ASCII text could then be in that external file, avoiding the compiler, and nobody's name gets inappropriately altered. Thoughts? Alternatives? Blessings, Foster On 3/30/06, Peter Dimov <pdimov@mmltd.net> wrote:

...

Foster Brereton wrote:

...
I have attached a list to this email (There are 354 in the list as of this writing). I couldn't figure out how to attach the file to the bug report after-the-fact. Note that these are the most likely candidate source files to cause an error when building the Boost sources -- some test cases and example sources were omitted.

Many of these are caused by a name in the copyright clause containing non-ASCII characters. Replacing ö and ø with o (even oe) in people's names doesn't seem very polite to me, and such "rechristening" may have legal implications.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- Foster T. Brereton - Computer Scientist Software Technology Lab, Adobe Systems Incorporated fbrereto@adobe.com -- http://opensource.adobe.com

Andras Erdei

11:31 p.m.

New subject: [Boost-bugs] [ boost-Bugs-1461533 ]Non-basic-source-character-set characters conflict with MSVC

On 3/31/06, Foster Brereton <fosterb.boost@gmail.com> wrote:

...

I agree that mangling the proper spelling of an individual's name is inappropriate, and that a proper solution to this issue would circumvent that option.

it gets mangled anyway (what you see is code-page dependent, in my case a greek sum instead of the a: in ja:rvi's name, much worse than seeing "jarvi") Thoughts? Alternatives?

...

not sure about the correct term, it's called something like "flying accents", so that e.g. an A with an accent is written as A' it's pretty straightforward, supported by many tools (like TeX or vim), and resembles Unicode's way (Unicode U+00C1 written as Unicode U+0041 + U+0301) unfortunately i don't know about any standards (or even official documents) br, andras

Yuval Ronen

1 Apr 1 Apr

11:05 a.m.

New subject: [Boost-bugs] [ boost-Bugs-1461533 ]Non-basic-source-character-set characters conflict with MSVC

Andras Erdei wrote:

...

On 3/31/06, Foster Brereton <fosterb.boost@gmail.com> wrote:

...
I agree that mangling the proper spelling of an individual's name is inappropriate, and that a proper solution to this issue would circumvent that option.

it gets mangled anyway (what you see is code-page dependent, in my case a greek sum instead of the a: in ja:rvi's name, much worse than seeing "jarvi")

Thoughts? Alternatives?

not sure about the correct term, it's called something like "flying accents", so that e.g. an A with an accent is written as A'

it's pretty straightforward, supported by many tools (like TeX or vim), and resembles Unicode's way (Unicode U+00C1 written as Unicode U+0041 + U+0301)

unfortunately i don't know about any standards (or even official documents)

What if I ever become a Boost member (seems unlikely right now :-) ), would it be acceptable for me to write my name in Hebrew there? I think not... Should I feel offended because I have to sign my name down this message as "Yuval" rather than the original "יובל" (which most of you can't even read because your OS/mail-reader doesn't support Hebrew)? Again I think not... Either all text, including names, is written in English only, or it can be written in any language, but not in source files, or all source files are converted to Unicode. Going the middle way of allowing English, plus some selected European languagues (because they happen to be somewhat close to English) sounds wrong to me (and can be considered unfair by some, but that's not my point). Thanks, Yuval

Andy Little

7:13 p.m.

New subject: [Boost-bugs] [ boost-Bugs-1461533 ]Non-basic-source-character-set characters conflict with MSVC

"Yuval Ronen" wrote

...

Either all text, including names, is written in English only, or it can be written in any language, but not in source files, or all source files are converted to Unicode. Going the middle way of allowing English, plus some selected European languagues (because they happen to be somewhat close to English) sounds wrong to me (and can be considered unfair by some, but that's not my point).

The characters allowed in source files are actually laid down in the C+ standard AFAIK. That is limited to the characters allowed in the grammar and I'm fairly sure that doesnt include e.g the copyright symbol etc. regards Andy Little

AlisdairM

8 p.m.

New subject: [Boost-bugs] [ boost-Bugs-1461533 ]Non-basic-source-character-set characters conflict with MSVC

Andy Little wrote:

...

The characters allowed in source files are actually laid down in the C+ standard AFAIK. That is limited to the characters allowed in the grammar and I'm fairly sure that doesnt include e.g the copyright symbol etc.

The standard is actually not very helpful on this score: For the first phase of translation: "Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set"

...

From that point on, the standard works in terms of the basic source character set, but as the mapping to get there is implementation defined the source file can use any format the vendors want to support, including extended characters (which generally map to \0x... sequences)

-- AlisdairM

Andy Little

9:02 p.m.

New subject: [Boost-bugs] [ boost-Bugs-1461533]Non-basic-source-character-set characters conflict with MSVC

"AlisdairM" wrote

...

Andy Little wrote:

...
The characters allowed in source files are actually laid down in the C+ standard AFAIK. That is limited to the characters allowed in the grammar and I'm fairly sure that doesnt include e.g the copyright symbol etc.

The standard is actually not very helpful on this score:

For the first phase of translation: "Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set"

...
From that point on, the standard works in terms of the basic source character set, but as the mapping to get there is implementation defined the source file can use any format the vendors want to support, including extended characters (which generally map to \0x... sequences)

OK Thats very helpful... For boost source files that makes the answer quite simple, because boost works with multiple vendors so the set of characters allowable in boost source files should consist of ( set of characters allowed by vendor A ) & ( set of characters allowed by vendor B ) & ...(set of characters allowed by vendor X ) | (Set of characters which after mapping become characters allowed outside comments ) Which assuming vendor X is unknown reolves to: (Set of characters which after mapping become characters allowed outside comments) regards Andy Little

Yuval Ronen

2 Apr 2 Apr

6:55 p.m.

New subject: [Boost-bugs] [ boost-Bugs-1461533]Non-basic-source-character-set characters conflict with MSVC

Andy Little wrote:

...

"AlisdairM" wrote

...
Andy Little wrote:

...
The characters allowed in source files are actually laid down in the C+ standard AFAIK. That is limited to the characters allowed in the grammar and I'm fairly sure that doesnt include e.g the copyright symbol etc.

The standard is actually not very helpful on this score:

For the first phase of translation: "Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set"

...
From that point on, the standard works in terms of the basic source character set, but as the mapping to get there is implementation defined the source file can use any format the vendors want to support, including extended characters (which generally map to \0x... sequences)

OK Thats very helpful...

For boost source files that makes the answer quite simple, because boost works with multiple vendors so the set of characters allowable in boost source files should consist of ( set of characters allowed by vendor A ) & ( set of characters allowed by vendor B ) & ...(set of characters allowed by vendor X ) | (Set of characters which after mapping become characters allowed outside comments )

Which assuming vendor X is unknown reolves to: (Set of characters which after mapping become characters allowed outside comments)

Why "outside comments"? AFAIU, the problem is high-ASCII chracters found in comments, which emits the warning the OP complained about. He actually didn't say that explicitly, but I don't think there are any such chracters in Boost code itself, only in comments. IOW, this discussion is about "inside comments", not outside...

Andy Little

6:06 p.m.

New subject: [Boost-bugs] [ boost-Bugs-1461533]Non-basic-source-character-set characters conflict with MSVC

"Yuval Ronen" wrote

...

Why "outside comments"? AFAIU, the problem is high-ASCII chracters found in comments, which emits the warning the OP complained about. He actually didn't say that explicitly, but I don't think there are any such chracters in Boost code itself, only in comments. IOW, this discussion is about "inside comments", not outside...

Inside comments, only allow the characters that are allowed outside comments. regards Andy Little

Foster Brereton

3 Apr 3 Apr

4:26 a.m.

New subject: [Boost-bugs] [ boost-Bugs-1461533]Non-basic-source-character-set characters conflict with MSVC

On 4/2/06, Andy Little <andy@servocomm.freeserve.co.uk> wrote:

...

Inside comments, only allow the characters that are allowed outside comments.

regards Andy Little

I am getting the reports third-hand, but it is my understanding that the problems we are having are with certain high-ASCII characters in comments only. I am not sure if the same characters in a non-comment context would generate the same warning. Here's what I have from one of the developers that originally posted the issue: <quote> the warning C4819 (an unsuitable character against current code page) caused the error C2220 when header files (.h and .hpp) which are referred from the compiling source file contain upper ASCII characters. In case of source files which contain upper ASCII characters, C4819 warning occurred, but C2220 error did not occur. I tried to correct the upper ASCII characters in only header files on my local client and the build successfully passed. So this problem seems to be improved if we correct upper ASCII characters in only the header files although it's not the perfect way. Thanks. </quote> Blessings, Foster -- Foster T. Brereton - Computer Scientist Software Technology Lab, Adobe Systems Incorporated fbrereto@adobe.com -- http://opensource.adobe.com

Andy Little

1:21 p.m.

New subject: [Boost-bugs] [ boost-Bugs-1461533]Non-basic-source-character-set characters conflict with MSVC

"Peter Dimov" wrote

...

Foster Brereton wrote:

...
I have attached a list to this email (There are 354 in the list as of this writing). I couldn't figure out how to attach the file to the bug report after-the-fact. Note that these are the most likely candidate source files to cause an error when building the Boost sources -- some test cases and example sources were omitted.

Many of these are caused by a name in the copyright clause containing non-ASCII characters. Replacing Ã¶ and Ã¸ with o (even oe) in people's names doesn't seem very polite to me, and such "rechristening" may have legal implications.

I only looked at the first offender <boost/archive/detail/auto_link_warchive.hpp> In the copyright is a high ascii © symbol.(I dont know if it'll survive the trip but its a little C in a circle) The same directory contains other headers so i figured... Why arent these showing in the rogues gallery? Lo and behold These files contain (C) which is not high ascii of course. IOW in the first case in the list at least its a trivial fix without (presumably) great legal implications. P.S. I followed this by plugging the offending copyright character into Windows Exploere search facility in boost directory. Funnily enough I came up with a very similar looking rogues gallery to the list there .... ;-) regards Andy Little

7072

Age (days ago)

7076

Last active (days ago)

List overview

Download

12 comments

7 participants

participants (7)

AlisdairM
Andras Erdei
Andy Little
Foster Brereton
Marshall Clow
Peter Dimov
Yuval Ronen