[HEAD] lexical_cast<int8_t>("127") bug?

Dean Michael Berris

7 May 2007 7 May '07

11:14 a.m.

Hi Everyone, I've recently tried the following (inlined test) which isolates the problem I've encountered with boost::lexical_cast<int8_t>: #include <iostream> #include <boost/lexical_cast.hpp> int main(int argc, char * argv[]) { int8_t value; try { value = boost::lexical_cast<int8_t>("127"); } catch (std::exception & e) { std::cerr << e.what() << '\n'; } return 0; }; This is compiled with gcc 4.1.2 on Linux. Any idea why 127 wouldn't fit into an 8-bit signed integer? The defined range for signed 8 bit integers should be -127..+127 right? Am I missing something? Insights and pointers would be most appreciated. -- Dean Michael C. Berris http://cplusplus-soup.blogspot.com/ mikhailberis AT gmail DOT com +63 928 7291459

Show replies by date

Sebastian Redl

7 May 7 May

12:13 p.m.

Dean Michael Berris wrote:

...

Hi Everyone,

I've recently tried the following (inlined test) which isolates the problem I've encountered with boost::lexical_cast<int8_t>:

#include <iostream> #include <boost/lexical_cast.hpp>

int main(int argc, char * argv[]) { int8_t value; try { value = boost::lexical_cast<int8_t>("127"); } catch (std::exception & e) { std::cerr << e.what() << '\n'; } return 0; };

This is compiled with gcc 4.1.2 on Linux. Any idea why 127 wouldn't fit into an 8-bit signed integer? The defined range for signed 8 bit integers should be -127..+127 right?

-128..+127 in 2's complement, actually. Can confirm for 1.33.1 with GCC 4.1.1. Seems to be a bug. Sebastian Redl

Paul A Bristow

2:18 p.m.

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Dean Michael Berris Sent: 07 May 2007 12:14 To: boost@lists.boost.org Subject: [boost] [HEAD] lexical_cast<int8_t>("127") bug?

Hi Everyone,

I've recently tried the following (inlined test) which isolates the problem I've encountered with boost::lexical_cast<int8_t>:

#include <iostream> #include <boost/lexical_cast.hpp>

int main(int argc, char * argv[]) { int8_t value; try { value = boost::lexical_cast<int8_t>("127"); } catch (std::exception & e) { std::cerr << e.what() << '\n'; } return 0; };

This is compiled with gcc 4.1.2 on Linux. Any idea why 127 wouldn't fit into an 8-bit signed integer? The defined range for signed 8 bit integers should be -127..+127 right?

I also recollect that I never understood why zz = lexical_cast<int>("0xffff"); // Fails! However I was more concerned about the floating-point problems, now sorted out, and I didn't waste any brain activity on it. (This was with MSVC 7 and 8). Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

Sebastian Redl

2:31 p.m.

Paul A Bristow wrote:

...

I also recollect that I never understood why

zz = lexical_cast<int>("0xffff"); // Fails!

However I was more concerned about the floating-point problems, now sorted out, and I didn't waste any brain activity on it.

That's because all lexical cast does is create a string stream and use the extractor to get the value out. The int extractor doesn't parse code style integers. It just looks at the base flag (oct, dec or hex) and expects the number to be in this format. Sebastian Redl

Paul A Bristow

2:41 p.m.

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Sebastian Redl Sent: 07 May 2007 15:32 To: boost@lists.boost.org Subject: Re: [boost] [HEAD] lexical_cast<int8_t>("127") bug?

...
I also recollect that I never understood why

zz = lexical_cast<int>("0xffff"); // Fails!

However I was more concerned about the floating-point

Paul A Bristow wrote: problems, now sorted out,

...
and I didn't waste any brain activity on it.

That's because all lexical cast does is create a string stream and use the extractor to get the value out. The int extractor doesn't parse code style integers. It just looks at the base flag (oct, dec or hex) and expects the number to be in this format.

Obvious, now you point it out - but I don't recollect this being documented. I'm sure I'll not be the last unthinking user to fall into this pit ;-) Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

Sebastian Redl

3:56 p.m.

Paul A Bristow wrote:

...

Obvious, now you point it out - but I don't recollect this being documented.

It is, though somewhat obscurely: "Returns the result of streaming |arg| into a standard library string-based stream and then out as a |Target| object." Sebastian Redl

Paul A Bristow

8:20 p.m.

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Sebastian Redl Sent: 07 May 2007 16:56 To: boost@lists.boost.org Subject: Re: [boost] [HEAD] lexical_cast<int8_t>("127") bug?

Paul A Bristow wrote:

...
Obvious, now you point it out - but I don't recollect this being documented.

It is, though somewhat obscurely:

...

"Returns the result of streaming |arg| into a standard library string-based stream and then out as a |Target| object."

If you understand that, you wouldn't need to ask? ;-) Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

Edward Diener

5:48 p.m.

Sebastian Redl wrote:

...

Paul A Bristow wrote:

...
I also recollect that I never understood why

zz = lexical_cast<int>("0xffff"); // Fails!

However I was more concerned about the floating-point problems, now sorted out, and I didn't waste any brain activity on it.

That's because all lexical cast does is create a string stream and use the extractor to get the value out. The int extractor doesn't parse code style integers. It just looks at the base flag (oct, dec or hex) and expects the number to be in this format.

My understanding is that if the istringstream sees a string starting with Ox or 0X it should automatically parse it as a hexadecimal value.

Phil Endecott

4:01 p.m.

Dean wrote:

...

I've recently tried the following (inlined test) which isolates the problem I've encountered with boost::lexical_cast<int8_t>:

#include <iostream> #include <boost/lexical_cast.hpp>

int main(int argc, char * argv[]) { int8_t value; try { value = boost::lexical_cast<int8_t>("127"); } catch (std::exception & e) { std::cerr << e.what() << '\n'; } return 0; };

This is compiled with gcc 4.1.2 on Linux. Any idea why 127 wouldn't fit into an 8-bit signed integer? The defined range for signed 8 bit integers should be -127..+127 right?

Hi Dean, You haven't said what you expected to happen and what happened instead. So here's a guess: you expect int8_t to be treated as an integer by lexical_cast. It isn't; despite the name, it's a character; you are asking it to convert a string to a character. (Try some other strings where you have "127". I think it will reject anything with more than one character in it.) I have previously suggested that this behaviour should change, but the opinion of the list has been that using int8_t probably indicates "over-optimised code", and that the feature can be easily worked around with some additional casts. In this case, I suggest that you lexical_cast to int, do a bounds check yourself (if necessary) and then assign to the int8_t. Regards, Phil.

Dean Michael Berris

8 May 8 May

5:39 a.m.

Hi Phil! On 5/7/07, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:

...

You haven't said what you expected to happen and what happened instead. So here's a guess: you expect int8_t to be treated as an integer by lexical_cast. It isn't; despite the name, it's a character; you are asking it to convert a string to a character. (Try some other strings where you have "127". I think it will reject anything with more than one character in it.)

Yup, your guess is right.

...

I have previously suggested that this behaviour should change, but the opinion of the list has been that using int8_t probably indicates "over-optimised code", and that the feature can be easily worked around with some additional casts.

In this case, I suggest that you lexical_cast to int, do a bounds check yourself (if necessary) and then assign to the int8_t.

Although I would agree with the observation that the code would be "over-optimised" where int8_t would appear, I still think the semantics of `lexical_cast<int8_t>("127")` should allow for the conversion to happen -- either convert to a 'short int' then assign to a char, if that makes any sense. I agree that it should be changed, either maybe with a compile-time assertion saying that "lexical_cast<char> is undefined", or making it "just work" as the semantic suggests. At the very least, we should document this to say that 'int8_t is not really an int, therefore lexical_cast won't work correctly with it'. Thanks Phil! -- Dean Michael C. Berris http://cplusplus-soup.blogspot.com/ mikhailberis AT gmail DOT com +63 928 7291459

Mathias Gaunard

7:10 p.m.

Dean Michael Berris wrote:

...

I still think the semantics of `lexical_cast<int8_t>("127")` should allow for the conversion to happen -- either convert to a 'short int' then assign to a char, if that makes any sense.

There is no way to distinguish int8_t and char. They are both aliases to the same type. Welcome to the wonderful world of typedefs.

Andrey Semashev

8:24 p.m.

Hello Mathias, Tuesday, May 8, 2007, 11:10:11 PM, you wrote:

...

Dean Michael Berris wrote:

...
I still think the semantics of `lexical_cast<int8_t>("127")` should allow for the conversion to happen -- either convert to a 'short int' then assign to a char, if that makes any sense.

...

There is no way to distinguish int8_t and char. They are both aliases to the same type. Welcome to the wonderful world of typedefs.

It could be a "strong typedef": class int8_t { char value; public: int8_t() {} int8_t(int8_t const& that) : value(that.value) {} int8_t(char that) : value(that) {} // etc. all applicable operators and traits }; But I guess this would eventually break something more critical than lexical_cast. -- Best regards, Andrey mailto:andysem@mail.ru

Mathias Gaunard

9 May 9 May

12:20 a.m.

Andrey Semashev wrote:

...

It could be a "strong typedef":

It could, but it isn't. Remember that int8_t etc. are from C. By the way, if they were, there could be a more or less serious performance hit depending on the implementation. Structures/classes are not handled the same way as primitive types in most ABIs for example I think. Which is quite a shame, by the way.

Jody Hagins

2:55 a.m.

On Wed, 09 May 2007 02:20:29 +0200 Mathias Gaunard <mathias.gaunard@etu.u-bordeaux1.fr> wrote:

...

Andrey Semashev wrote:

...
It could be a "strong typedef":

It could, but it isn't. Remember that int8_t etc. are from C.

However, char, signed char, and unsigned char are different types. int8_t is supposed to be typedef'd as signed char, which is a different type than char. I do not see any reason lexical_cast<int8_t>() or lexical_cast<signed char>() should be treated the same way as lexical_cast<char> since the internal C++ type mechanisms treat "char" "signed char" and "unsigned char" as completely different types...

Mathias Gaunard

10:13 a.m.

Jody Hagins wrote:

...

However, char, signed char, and unsigned char are different types. int8_t is supposed to be typedef'd as signed char, which is a different type than char. I do not see any reason lexical_cast<int8_t>() or lexical_cast<signed char>() should be treated the same way as lexical_cast<char> since the internal C++ type mechanisms treat "char" "signed char" and "unsigned char" as completely different types...

Quite an interesting idea. So 'char' would be a character, but 'signed char' and 'unsigned char' would be integers. This behaviour might break some code, but I don't think 'signed char' or 'unsigned char' are much used to store characters.

Felipe Magno de Almeida

5:52 p.m.

On 5/9/07, Mathias Gaunard <mathias.gaunard@etu.u-bordeaux1.fr> wrote:

...

Jody Hagins wrote:

...
However, char, signed char, and unsigned char are different types. int8_t is supposed to be typedef'd as signed char, which is a different type than char. I do not see any reason lexical_cast<int8_t>() or lexical_cast<signed char>() should be treated the same way as lexical_cast<char> since the internal C++ type mechanisms treat "char" "signed char" and "unsigned char" as completely different types...

Quite an interesting idea. So 'char' would be a character, but 'signed char' and 'unsigned char' would be integers.

This behaviour might break some code, but I don't think 'signed char' or 'unsigned char' are much used to store characters.

unsigned char are very used for type punning. IIRC, the only standard way to do that. So, unsigned char is used at least to hold "bytes", not only numbers. -- Felipe Magno de Almeida

Jens Finkhäuser

6:20 p.m.

On Wed, May 09, 2007 at 02:52:27PM -0300, Felipe Magno de Almeida wrote:

...

On 5/9/07, Mathias Gaunard <mathias.gaunard@etu.u-bordeaux1.fr> wrote:

...
Jody Hagins wrote:

...
However, char, signed char, and unsigned char are different types. int8_t is supposed to be typedef'd as signed char, which is a different type than char. I do not see any reason lexical_cast<int8_t>() or lexical_cast<signed char>() should be treated the same way as lexical_cast<char> since the internal C++ type mechanisms treat "char" "signed char" and "unsigned char" as completely different types...

Quite an interesting idea. So 'char' would be a character, but 'signed char' and 'unsigned char' would be integers.

This behaviour might break some code, but I don't think 'signed char' or 'unsigned char' are much used to store characters.

unsigned char are very used for type punning. IIRC, the only standard way to do that. So, unsigned char is used at least to hold "bytes", not only numbers.

IMHO what this discussion shows is that in this case there's no "obvious" meaning of 8 bit integer types (uses seem to be number, character or byte). As far as I am concerned, that implies that it's up to the user to supply that meaning, and not for the library to assign one. Much as I like lexical_cast, in the light of this discussion, would it not make a little more sense to be more explicit about what exactly you want to happen? "Lexical" meanings aren't always obvious. How about having string_cast<std::string>(123); int_cast<uint64_t>("127"); int_cast<char>("127"); string_cast<char>("127"); float_cast<double>("3.1416"); ... etc instead? The nice thing about those would be that they're specializations of lexical_cast - you explicitly provide the information on how lexical_cast should interpret it's type argument, in broad classes of (more or less) built-in types: integers, floating point numbers and character strings. For all types not falling into those classes, stick to lexical_cast and whatever that brings. I have to admit that at this moment I can't recall much about the implementation of lexical cast except that it uses string streams in some cases, so this might not work without breaking more. But personally I'm more in favour of providing more explicit types of casts than deciding for the users of the library how to interpret what type of char.

Phil Endecott

7:40 p.m.

Jens wrote:

...

IMHO what this discussion shows is that in this case there's no "obvious" meaning of 8 bit integer types (uses seem to be number, character or byte). As far as I am concerned, that implies that it's up to the user to supply that meaning, and not for the library to assign one.

Yes. But there are two existing ways for the user to say what they mean: (1) By using int8_t / uint8_t for number-bytes and char for character-bytes. For this to work we would need to either (a) Do some magic so we can distinguish [un]signed char from [u]int8_t. I believe that's impossible. (a) Change the behaviour of standard streams for [un]signed char. That's not going to happen, and even if it could happen it would probably break significant amounts of code that uses 'unsigned char' for characters. (b) Change lexical_cast to special case 'unsigned char' and 'signed char'. This is clearly unpopular with the list though I personally would be happy with it. (c) Allow the user to enable (b) if that is what they want. I get the impression that this could be done by supplying your own partial specialisation of lexical_cast; is this true? (Off topic, there is the question of whether common lexical casts can be made more efficient by providing specialisations that invoke C library functions like strtol(); I think someone told me that this wasn't possible, but I don't think I ever knew why.) (2) By using format() instead of lexical_cast(). (For to-string conversions. In principle something symmetric could be provided for from-string conversions.) Using format, you supply a format specifier in the style of printf() that indicates whether you want to do a string or numeric conversion. However, *this format specifier is ignored* and if you ask for a '%d' int8_t you'll still get a character. I would much prefer to have one of these existing methods 'do what I want', rather than invent something else. I'm going to see if I can write a my_lexical_cast() that uses lexical_cast by default but changes the behaviour for [un]signed char. Phil.

Michael Marcin

11:56 p.m.

In the same vein as this discussion I was very suprised that lexical_cast<bool>("true") threw bad_lexical_cast on me. Can't this be made to work by simply putting "stream.setf(std::ios::boolalpha);" inside of lexical_stream's constructor? Thanks, Michael Marcin

Alexander Nasonov

10 May 10 May

5:47 a.m.

Michael Marcin wrote:

...

In the same vein as this discussion I was very suprised that lexical_cast<bool>("true") threw bad_lexical_cast on me.

Can't this be made to work by simply putting "stream.setf(std::ios::boolalpha);" inside of lexical_stream's constructor?

This change would break lexical_cast<bool>("1"). -- Alexander Nasonov http://nasonov.blogspot.com

Andrey Semashev

9 May 9 May

9:49 a.m.

Hello Mathias, Wednesday, May 9, 2007, 4:20:29 AM, you wrote:

...

Andrey Semashev wrote:

...

...
It could be a "strong typedef":

...

It could, but it isn't. Remember that int8_t etc. are from C.

...

By the way, if they were, there could be a more or less serious performance hit depending on the implementation. Structures/classes are not handled the same way as primitive types in most ABIs for example I think. Which is quite a shame, by the way.

Modern compilers produce almost the same binaries with such light classes as they were fundamental types. But you're right about ABI. This would surely affect mangling, for example. Another solution is to add a wrapper that would be recognized by lexical_cast: int8_t n = lexical_cast< as_int< int8_t > >("127"); At least lexical_cast could encapsulate range checking during conversion. -- Best regards, Andrey mailto:andysem@mail.ru

Dean Michael Berris

8 May 8 May

10:17 p.m.

On 5/8/07, Mathias Gaunard <mathias.gaunard@etu.u-bordeaux1.fr> wrote:

...

Dean Michael Berris wrote:

...
I still think the semantics of `lexical_cast<int8_t>("127")` should allow for the conversion to happen -- either convert to a 'short int' then assign to a char, if that makes any sense.

There is no way to distinguish int8_t and char. They are both aliases to the same type. Welcome to the wonderful world of typedefs.

Perhaps is there was a specialization for lexical_cast<int8_t> and/or lexical_cast<char> which implemented the "correct" behavior as the semantics for lexical_cast suggest (not an expert with cross platform types, but I'm guessing 'short int' will always be 8 bits, unless I'm missing something), maybe this will "just work"? I tried looking at the code which involves lexical_cast, but I'm seeing too much BOOST_PP_* stuff that I currently don't have the bandwidth for. Any pointers on how to make that happen? Or would leaving it alone and saying in the docs that this isn't going to happen correctly be enough? -- Dean Michael C. Berris http://cplusplus-soup.blogspot.com/ mikhailberis AT gmail DOT com +63 928 7291459

Mathias Gaunard

9 May 9 May

12:14 a.m.

Dean Michael Berris wrote:

...

I'm guessing 'short int' will always be 8 bits, unless I'm missing something

That would be difficult, given that the standard mandates short to be at least 16 bits.

Dan Day

4:02 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 5/8/07, Mathias Gaunard wrote:

...

Dean Michael Berris wrote:

...
I still think the semantics of `lexical_cast("127")` should allow for the conversion to happen -- either convert to a 'short int' then assign to a char, if that makes any sense.

...

There is no way to distinguish int8_t and char. They are both aliases to the same type. Welcome to the wonderful world of typedefs.

Maybe I'm just short-sighted, but it seems to me this really isn't a problem. A lexical_cast conversion from char * -> char doesn't make much practical sense in my mind except for grabbing the first character in the string, which can be done numerous other ways. AFAICS, lexical_cast could be specialized for char * -> char for such conversions as the OP desires. To me, that would be far more useful than the current behavior. - - -- - - ------------------------------------------- Dan Day -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) Comment: http://firegpg.tuxfamily.org iD8DBQFGQUf5kXCf8XmmG+kRAgUbAJ0WoGOPLzF7REfWwDOxQzJBMGkBRwCffzmy Deh76uiFftcU++XnSRirmIg= =Tpgl -----END PGP SIGNATURE-----

Gregory Dai

5:48 a.m.

On 5/8/07, Dan Day <coolmandan@gmail.com> wrote:

...

On 5/8/07, Mathias Gaunard wrote:

...
Dean Michael Berris wrote:

...
I still think the semantics of `lexical_cast("127")` should allow for the conversion to happen -- either convert to a 'short int' then assign to a char, if that makes any sense.

...
There is no way to distinguish int8_t and char. They are both aliases to the same type. Welcome to the wonderful world of typedefs.

Maybe I'm just short-sighted, but it seems to me this really isn't a problem. A lexical_cast conversion from char * -> char doesn't make much practical sense in my mind except for grabbing the first character in the

Agreed. This is the same behavior as the std::iostreams do, as had been pointed out by a number of people already. Let's leave it at that. string, which can be done numerous other ways. AFAICS, lexical_cast could be

...

specialized for char * -> char for such conversions as the OP desires. To me, that would be far more useful than the current behavior.

Nope, it's tempting to do "nice" thing, but again let's leave it at that (see above), and close the subject. Greg

Dean Michael Berris

7:25 a.m.

On 5/8/07, Gregory Dai <gregory.dai@gmail.com> wrote:

...

On 5/8/07, Dan Day <coolmandan@gmail.com> wrote:

...
On 5/8/07, Mathias Gaunard wrote:

...
Dean Michael Berris wrote:

...
I still think the semantics of `lexical_cast("127")` should allow for the conversion to happen -- either convert to a 'short int' then assign to a char, if that makes any sense.

...
There is no way to distinguish int8_t and char. They are both aliases to the same type. Welcome to the wonderful world of typedefs.

Maybe I'm just short-sighted, but it seems to me this really isn't a problem. A lexical_cast conversion from char * -> char doesn't make much practical sense in my mind except for grabbing the first character in the

Agreed. This is the same behavior as the std::iostreams do, as had been pointed out by a number of people already. Let's leave it at that.

I don't see why we should leave it at that though. Considering the semantics of using static_cast<char>(1) -- unless there's something wrong with that statement inherently -- the casting type shouldn't complain that "hey, you're trying to cast an int to a character... this shouldn't work!" but obviously the user definitely wants to do that otherwise the user would've done it another way.

...

string, which can be done numerous other ways. AFAICS, lexical_cast could be

...
specialized for char * -> char for such conversions as the OP desires. To me, that would be far more useful than the current behavior.

Nope, it's tempting to do "nice" thing, but again let's leave it at that (see above), and close the subject.

If the argument is just because the standard streams treat it that way, doesn't mean that we should stick to that when obviously a lexical_cast<T> implies that you want to convert a string / const char * to a numeric value of the type T. I particularly don't see why the constraint that the standard streams imposes in this case should affect the behavior and/or semantics of lexical_cast<T> -- because if I really for instance wanted to use a standard stream (which has completely different semantics from lexical_cast<T>) then I would have used that. Being that lexical_cast<T> uses a standard stream underneath the hood should be an implementation detail, and should not concern the users of lexical_cast<T> who actually expect to have a string converted to a numeric value of type T -- granted that T is a numeric type. -- Dean Michael C. Berris http://cplusplus-soup.blogspot.com/ mikhailberis AT gmail DOT com +63 928 7291459

Mathias Gaunard

10:17 a.m.

Dean Michael Berris wrote:

...

I particularly don't see why the constraint that the standard streams imposes in this case should affect the behavior and/or semantics of lexical_cast<T> -- because if I really for instance wanted to use a standard stream (which has completely different semantics from lexical_cast<T>)

How is it different semantically?

...

then I would have used that. Being that lexical_cast<T> uses a standard stream underneath the hood should be an implementation detail, and should not concern the users of lexical_cast<T>

It is not an implementation detail. That's how you can overload it, making lexical_cast generic.

Dean Michael Berris

11:02 a.m.

On 5/9/07, Mathias Gaunard <mathias.gaunard@etu.u-bordeaux1.fr> wrote:

...

Dean Michael Berris wrote:

...
I particularly don't see why the constraint that the standard streams imposes in this case should affect the behavior and/or semantics of lexical_cast<T> -- because if I really for instance wanted to use a standard stream (which has completely different semantics from lexical_cast<T>)

How is it different semantically?

std::istringstream iss(some_string); char t; iss >> t; I expected a character, and got the first one. Compare this to: char c = lexical_cast<char>(some_string); Where I explicitly say "I want to lexical cast some_string and store the value into a character". If for some reason some_string is beyond the bounds of the "char" type, then I should get a bad_lexical_cast saying so.

...

...
then I would have used that. Being that lexical_cast<T> uses a standard stream underneath the hood should be an implementation detail, and should not concern the users of lexical_cast<T>

It is not an implementation detail. That's how you can overload it, making lexical_cast generic.

What I meant was, that lexical_cast<T> may choose to do something else entirely under the hood, just as long as the expectation is that it casts a string to a numeric type given by T. If it so happens that through some typedef (or even explicitly), a lexical_cast<char>(...) is requested, the behavior of lexical_cast should be consistent and treat "char" as an integral type. lexical_cast<T> can still be generic and fulfill the semantics of the intended usage, by specializing on cases where T is a given type which holds (or can hold) numeric values specifically in the case where T is a char/signed char/unsigned char or typedefs thereof. Or if implementing that specialization is an abominable idea (which I don't see why it should be), then we should at least make it fail -- and say that at compile time, you're trying to use lexical_cast<T> in a manner that's not defined/supported. Or a warning which says "trying to lexical cast to a char..." would make sense, if that's at all possible. If you also think documenting this issue will and should be enough, then that should be acceptable too. But we should at least do _something_ about it -- we at least owe it to the users who might run into the same trap. Or not. :-) -- Dean Michael C. Berris http://cplusplus-soup.blogspot.com/ mikhailberis AT gmail DOT com +63 928 7291459

Jody Hagins

5:01 p.m.

On Tue, 8 May 2007 22:48:19 -0700 "Gregory Dai" <gregory.dai@gmail.com> wrote:

...

Nope, it's tempting to do "nice" thing, but again let's leave it at that (see above), and close the subject.

Just because IOstreams does it that way, does not mean it is correct for lexical_cast<>... Simply leaving it at that, and closing the subject does not seem like a good idea either...

Paul A Bristow

5:25 p.m.

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Jody Hagins Sent: 09 May 2007 18:01 To: boost@lists.boost.org Subject: Re: [boost] [HEAD] lexical_cast<int8_t>("127") bug?

On Tue, 8 May 2007 22:48:19 -0700 "Gregory Dai" <gregory.dai@gmail.com> wrote:

...
Nope, it's tempting to do "nice" thing, but again let's leave it at that (see above), and close the subject.

...

Just because IOstreams does it that way, does not mean it is correct for lexical_cast<>... Simply leaving it at that, and closing the subject does not seem like a good idea either...

For what little it is worth, I agree with this. Documenting the 'pit' is the absolute minimum, but I think other suggestions are MUCH better. Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

Alexander Nasonov

5:58 p.m.

Paul A Bristow wrote:

...

Documenting the 'pit' is the absolute minimum, but I think other suggestions are MUCH better.

Felipe Magno de Almeida just posted this "unsigned char are very used for type punning. IIRC, the only standard way to do that. So, unsigned char is used at least to hold "bytes", not only numbers." So, someone can assume that lexical_cast<int8_t>("1") would read one byte. -- Alexander Nasonov http://nasonov.blogspot.com It is dangerous to be sincere unless you are also stupid. -- George Bernard Shaw -- This quote is generated by: /usr/pkg/bin/curl -L http://tinyurl.com/veusy \ | sed -e 's/^document\.write(.//' -e 's/.);$/ --/' \ -e 's/<[^>]*>//g' -e 's/^More quotes from //' \ | fmt | tee ~/.signature-quote

Alexander Nasonov

5:53 p.m.

...

"Gregory Dai" <gregory.dai@gmail.com> wrote:

...
Nope, it's tempting to do "nice" thing, but again let's leave it at that (see above), and close the subject.

Jody Hagins wrote:

...

Just because IOstreams does it that way, does not mean it is correct for lexical_cast<>... Simply leaving it at that, and closing the subject does not seem like a good idea either...

I'd like to remind that boost doesn't have a sole control on lexical_cast anymore, see N1973 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1973.html There is a good sentence in the document: "We are not in a position to change I/O streams at this late stage, but something like lexical_cast is not required to repeat those little surprises." Though, I completely agree with Gregory. What do you folks think of this change in the documentation? FAQ Q: Why lexical_cast<int8_t>("127") throws bad_lexical_cast? A: The type int8_t is a typedef to signed char which is read from a stream that holds "127". The bad_lexical_cast is thrown because the stream is not at EOF after reading. The standard defines same semantics for all char types. Possible workaround numeric_cast<int8_t>(lexical_cast<int>("127")) or, more generic expression for any integer type T numeric_cast<T>(lexical_cast< promote<T>::type >("127")) -- Alexander Nasonov http://nasonov.blogspot.com Reason is experimental intelligence, conceived after the pattern of science, and used in the creation of social arts; it has something to do. It liberates man from the bondage of the past, due to ignorance and accident hardened into custom. It projects a better future and assists man in its realization. -- John Dewey -- This quote is generated by: /usr/pkg/bin/curl -L http://tinyurl.com/veusy \ | sed -e 's/^document\.write(.//' -e 's/.);$/ --/' \ -e 's/<[^>]*>//g' -e 's/^More quotes from //' \ | fmt | tee ~/.signature-quote

Dean Michael Berris

6:56 p.m.

On 5/9/07, Alexander Nasonov <alnsn@yandex.ru> wrote:

...

I'd like to remind that boost doesn't have a sole control on lexical_cast anymore, see N1973 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1973.html

There is a good sentence in the document:

"We are not in a position to change I/O streams at this late stage, but something like lexical_cast is not required to repeat those little surprises."

Which makes sense for lexical_cast to be able to do the "right thing" as the semantics of usage imply, even if it's just the one that Boost implements. Watchathink?

...

Though, I completely agree with Gregory.

What do you folks think of this change in the documentation?

FAQ

Q: Why lexical_cast<int8_t>("127") throws bad_lexical_cast? A: The type int8_t is a typedef to signed char which is read from a stream that holds "127". The bad_lexical_cast is thrown because the stream is not at EOF after reading. The standard defines same semantics for all char types. Possible workaround

numeric_cast<int8_t>(lexical_cast<int>("127"))

or, more generic expression for any integer type T

numeric_cast<T>(lexical_cast< promote<T>::type >("127"))

I have no objections. That is granted if we really don't want to make lexical_cast<int8_t>("127") "just work" as how the semantics of usage actually imply. I'd be alright with just a documentation of the possible pit-fall. But I'd still rather be able to "code what I mean" when lexical_cast<int8_t>(some_string) is needed in my code. Guess I'd have to work on a patch that does break existing functionality then... Thanks for the responses! :) -- Dean Michael C. Berris http://cplusplus-soup.blogspot.com/ mikhailberis AT gmail DOT com +63 928 7291459

Dean Michael Berris

7 p.m.

On 5/9/07, Dean Michael Berris <mikhailberis@gmail.com> wrote:

...

Guess I'd have to work on a patch that does break existing functionality then...

Of course, I mean't "doesn't". :-) -- Dean Michael C. Berris http://cplusplus-soup.blogspot.com/ mikhailberis AT gmail DOT com +63 928 7291459

Yuval Ronen

11 May 11 May

9:32 a.m.

Dan Day wrote:

...

...
...
I still think the semantics of `lexical_cast("127")` should allow for the conversion to happen -- either convert to a 'short int' then assign to a char, if that makes any sense.

...
There is no way to distinguish int8_t and char. They are both aliases to the same type. Welcome to the wonderful world of typedefs.

Maybe I'm just short-sighted, but it seems to me this really isn't a problem. A lexical_cast conversion from char * -> char doesn't make much practical sense in my mind except for grabbing the first character in the string, which can be done numerous other ways. AFAICS, lexical_cast could be specialized for char * -> char for such conversions as the OP desires. To me, that would be far more useful than the current behavior.

FWIW, I completely agree. I consider lexical_cast<char>("123") to be silly, and I don't quite care what the semantics of it are. It might as well not compile at all. lexical_cast<[u]int8_t>("123") is very useful, and should provide numeric conversion. The fact the the standard I/O streams behave differently is, well, unfortunate, but doesn't mean we have to continue punishing ourselves in lexical_cast also (or in to_string/string_to which I hope will replace lexical_cast, but that's completely OT).

Alexander Nasonov

7 May 7 May

4:16 p.m.

Dean Michael Berris wrote:

...

Hi Everyone,

I've recently tried the following (inlined test) which isolates the problem I've encountered with boost::lexical_cast<int8_t>:

Run this and enter 127 int main(int argc, char * argv[]) { int8_t value; if(std::cin >> value) { std::cout << '\n' << value << "\ngood\n"; } } On FreeBSD 6.2, gcc 3.4, I get this

...

./a.out 127

1 good It reads only '1', as if int8_t were char. -- Alexander Nasonov http://nasonov.blogspot.com Being is desirable because it is identical with Beauty, and Beauty is loved because it is Being. We ourselves possess Beauty when we are true to our own being; ugliness is in going over to another order; knowing ourselves, we are beautiful; in self-ignorance, we are ugly. -- Ambrose Bierce -- This quote is generated by: /usr/pkg/bin/curl -L http://tinyurl.com/veusy \ | sed -e 's/^document\.write(.//' -e 's/.);$/ --/' \ -e 's/<[^>]*>//g' -e 's/^More quotes from //' \ | fmt | tee ~/.signature-quote

6654

Age (days ago)

6658

Last active (days ago)

List overview

Download

35 comments

15 participants

participants (15)

Alexander Nasonov
Andrey Semashev
Dan Day
Dean Michael Berris
Edward Diener
Felipe Magno de Almeida
Gregory Dai
Jens Finkhäuser
Jody Hagins
Mathias Gaunard
Michael Marcin
Paul A Bristow
Phil Endecott
Sebastian Redl
Yuval Ronen