Multiple files utf8_codecvt_facet.cpp

Hi, is there a reason why both program_options and serialization contain very similar files utf8_codecvt_facet.cpp? Matthias

Matthias Troyer wrote:
is there a reason why both program_options and serialization contain very similar files utf8_codecvt_facet.cpp?
I think it's because neither Robert nor I feel ourself at liberty to create io/utf8 library without review. I think it's probably possible to create a library without officially announcing it, but to be usefull it has to be in regression tests, and then users will start thinking such library is officially available and asking where are the docs, and so on. - Volodya

Vladimir Prus <ghost@cs.msu.su> writes:
Matthias Troyer wrote:
is there a reason why both program_options and serialization contain very similar files utf8_codecvt_facet.cpp?
I think it's because neither Robert nor I feel ourself at liberty to create io/utf8 library without review.
I think it's probably possible to create a library without officially announcing it, but to be usefull it has to be in regression tests, and then users will start thinking such library is officially available and asking where are the docs, and so on.
As I've posted elsewhere, that's what the detail namespace/directory is for. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

David Abrahams wrote:
Vladimir Prus <ghost@cs.msu.su> writes:
Matthias Troyer wrote:
is there a reason why both program_options and serialization contain very similar files utf8_codecvt_facet.cpp?
I think it's because neither Robert nor I feel ourself at liberty to create io/utf8 library without review.
I think it's probably possible to create a library without officially announcing it, but to be usefull it has to be in regression tests, and then users will start thinking such library is officially available and asking where are the docs, and so on.
As I've posted elsewhere, that's what the detail namespace/directory is for.
There's no "libs/detail" directory yet. Do you want me to create one, move utf code there, and possibly add "libs/detail/test" directory which will show up in regression tests? - Volodya

Vladimir Prus <ghost@cs.msu.su> writes:
David Abrahams wrote:
Vladimir Prus <ghost@cs.msu.su> writes:
Matthias Troyer wrote:
is there a reason why both program_options and serialization contain very similar files utf8_codecvt_facet.cpp?
I think it's because neither Robert nor I feel ourself at liberty to create io/utf8 library without review.
I think it's probably possible to create a library without officially announcing it, but to be usefull it has to be in regression tests, and then users will start thinking such library is officially available and asking where are the docs, and so on.
As I've posted elsewhere, that's what the detail namespace/directory is for.
There's no "libs/detail" directory yet. Do you want me to create one, move utf code there, and possibly add "libs/detail/test" directory which will show up in regression tests?
If it seems neccessary to create a standalone library, then yes. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

David Abrahams wrote:
There's no "libs/detail" directory yet. Do you want me to create one, move utf code there, and possibly add "libs/detail/test" directory which will show up in regression tests?
If it seems neccessary to create a standalone library, then yes.
It seems reasoable, because it's not templated code. Creating a library will lead to a set of other problems. If both serialization and program_options need to use utf library, then, should they link to that new library, or ask that users link to the new library themself? Since it case of static libraries it's not possible to link utf into program_options, seems like the user would have to link utf manually. Hmm... that's not good, but I don't see better solution. - Volodya

Vladimir Prus <ghost@cs.msu.su> writes:
David Abrahams wrote:
There's no "libs/detail" directory yet. Do you want me to create one, move utf code there, and possibly add "libs/detail/test" directory which will show up in regression tests?
If it seems neccessary to create a standalone library, then yes.
It seems reasoable, because it's not templated code. Creating a library will lead to a set of other problems. If both serialization and program_options need to use utf library, then, should they link to that new library, or ask that users link to the new library themself?
Since it case of static libraries it's not possible to link utf into program_options, seems like the user would have to link utf manually. Hmm... that's not good, but I don't see better solution.
For now you could put it in an unnamed namespace and just compile it into both of the libraries via #include. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

David Abrahams wrote:
It seems reasoable, because it's not templated code. Creating a library will lead to a set of other problems. If both serialization and program_options need to use utf library, then, should they link to that new library, or ask that users link to the new library themself?
Since it case of static libraries it's not possible to link utf into program_options, seems like the user would have to link utf manually. Hmm... that's not good, but I don't see better solution.
For now you could put it in an unnamed namespace and just compile it into both of the libraries via #include.
Yeah, I think that's possible. So I'm going to: 1. put new header to boost/detail 2. put new source to libs/detail/utf 3. #include new source in program_options. Objections? As I understood, Robert does not mind to have this issue handed to me. Robert, would you want me to change serialization to include the file, too? - Volodya

Hello world, I'm jumping in, because I am interested in Unicode conversion facets...
is there a reason why both program_options and serialization contain very similar files utf8_codecvt_facet.cpp?
I had a look at the serialization library's converter in utf8_codecvt_facet.cpp and noticed that utf8_codecvt_facet_wchar_t::do_in() doesn't check for non-shortest UTF8-sequences. There might also be some issues on platforms with 16-bit wchar_t (possible overflow). I suggest using (parts of) the UTF library in the Boost files area to solve those problems. This could also be another step towards an officially supported Unicode library... ;-) http://groups.yahoo.com/group/boost/files/utf/ Best regards from Aachen, Tilman

Hi Tilman,
I'm jumping in, because I am interested in Unicode conversion facets...
is there a reason why both program_options and serialization contain very similar files utf8_codecvt_facet.cpp?
I had a look at the serialization library's converter in utf8_codecvt_facet.cpp and noticed that utf8_codecvt_facet_wchar_t::do_in() doesn't check for non-shortest UTF8-sequences.
Hmmm... I think it's just an omission, and it would be easy to add.
There might also be some issues on platforms with 16-bit wchar_t (possible overflow).
I suggest using (parts of) the UTF library in the Boost files area to solve those problems. This could also be another step towards an officially supported Unicode library... ;-)
While I think that library is OK, and last time the author, Alberto Barbati, posted on this, he knew about Unicode much more than I, I don't think it's good to take that library and add it now to details. Simply put, it will take another week until regression tests turn green again. I also don't think there's particular difference between different utf8 implementations.... - Volodya
participants (4)
-
David Abrahams
-
Matthias Troyer
-
Tilman Kuepper
-
Vladimir Prus