
Hello, boost This is an idea of project for Google SoC2006 that I want to participate. The library is called 'string_cvt' – or “string conversions”, it solves the problem of converting type to string and string to type with minimal runtime and syntactical overhead. It is a simple "call for interest" mail. Idea for this lib was inspired by recent discussion on boost developers mailing list. The question under discussion was: Is lexical_cast<> tool good enough for TR2 or not? A proponents of lexical_cast<> have a point that the main advantage of lexical_cast<> component is its usage simplicity and symmetry (angle braces are used in both cases): int i = lexical_cast<int>("1"); string s = lexical_cast<string>(1); Additionally, it looks like built-in casts, and it is considered as a very cool thing. On the other side, opponents of lexical_cast<> wants more functionality that doesn't fit into simple cast-like usage like: The requirements table. 1) controlling conversions via facets (locales) 2) full power of iostreams in simple interface. All functionality accessible with iostreams (through manipulators) should be accessible. 3) functor adapters to use with std algorithms 4) error handling and reporting. (what kind of error occurred?) * optionally report failing without exceptions raising 5) best performance, especially for built-in types and for use in loops The "Lexical Conversion Library Proposal for TR2" by Kevlin Henney and Beman Dawes states, that: "The lexical_cast function template offers a convenient and consistent form for supporting common conversions to and from arbitrary types when they are represented as text. The simplification it offers is in expression-level convenience for such conversions. For more involved conversions, such as where precision or formatting need tighter control than is offered by the default behavior of lexical_cast, the conventional stringstream approach is recommended." It is clear that lexical_cast is not intended to address (1-4) points in the list above, and even (5). For optimizing conversions in loops you'll need to resort to stringstreams again. I believe, that stringstreams are not the right tool for daily string conversions job. We need a special and fully featured solution, which addresses all issues in the Requirements table above. My dream is that one has no need to fallback to C-style solutions or to stringstreams anymore, just one consistent interface for all string conversion needs. This proposal for Google SoC project is an attempt to develop such a solution. The final ambitious goal of this project is to make boost::lexical_cast<> obsolete and replace it in TR2 with a new proposal. Regardless of SoC, I’m going to develop such a library for boost, but the participation in the Google SoC is important because otherwise it would be hard to manage enough time to finish this library before the deadline for TR2 in October. As a result of this project we would have not only fully documented and tested library for string conversions, but full comparative performance analysis would be made to ensure that there is no more any need to fallback to some other solution. There are short examples of intended usage of this library (for those who are too busy to read the full proposal’s text) // simple initialization usage: string s = string_from(1); int i = from_string(“1”); // embedded in expression usage: double d = 2 + (double)from_string(“1”); // usage with special locale: string s = string_from(1, std::locale(“loc_name”)); // usage with special format: string s = string_from(1, std::ios::hex); // usage with special format and locale: string s = string_from(1, std::ios::hex, std::locale(“loc_name”)); // usage with default value provided (exceptions are not thrown): int i = from_string(“1”, 1); // usage with cvtstate& argument (exceptions are not thrown. if conversion fails, reason is written in the cvtstate parameter supplied): cvtstate state; int i = from_string(“1”, state); fmt and locale info can be supplied in from_string function too. To optimize conversions in a loop one can do: string_cvt cvt(std::ios::hex, std::locale(“loc_name”)); string s; for(int i; i < 100; ++i) { string t; cvt(i, t); s += (t + “ “); } To convert one sequence to another one can do: vector<double> vec_doubles(10, 1.2); vector<string> vec_strings; string_ocvt_fun<string> ocvtf(cvt); // cvt is defined in a previous example transform( vec_doubles.begin(), vec_doubles.end(), // from back_inserter(vec_strings), // to ocvtf ); // and in a reverse direction: string_icvt_fun<double> icvtf(scvt); vector<double> vec_doubles1(10); transform( vec_strings.begin(), vec_strings.end(), // from vec_doubles1.begin(), // to icvtf ); Details of this proposal are below: The proposal, part 1. from_string/(w)string_from functions. From syntactical point of view an alternative to lexical_cast<> approach was proposed: to_string/string_to<> pair of functions. The "Lexical Conversion Library Proposal for TR2" has a good argument against it: "... Furthermore, the from/to idea cannot be expressed in a simple and consistent form. The illusion is that they are easier than lexical_cast because of the name. This is theory. The practice is that the two forms, although similarly and symmetrically named, are not at all similar in use: one requires explicit provision of a template parameter and the other not. This is a simple usability pitfall that is guaranteed to catch experienced and inexperienced users alike -- the only difference being that the experienced user will know what to do with the error message." There is one more problem with this approach: to_string() function is coming from other languages like java, were it is a member function of all types, so one can wrote: String s = object.toString(); It can be spelled as: "Get string from object", or "Convert an object to string" Both phrases are straightforward and reflect the way that we think of it: 1) I want a string (String s = ) 2) I have an object (object) 3) I'm performing a conversion of this object to string (.toString()) But in C++ the to_string function would be a free-function, resulting in code like: string s = to_string(1); It can be spelled as: "Get string by converting an object '1' to string" The problem here is that the mental sequence is the same as in the example above, but language constructs doesn't reflect it: 1) I want a string (string s = ) 2) I have an object (1) 3) I'm performing a conversion of this object to string (to_string(1)) Note that (2) and (3) items are intermixed. It means, that programmer need to do some additional mental work to jump from item (1) to item (3) and then back to item (2) again. The final mind's workflow would be as follows: 1) I want a string (string s = ) 2) I have an object (1, but not code it, hold it in memory for a while) 3) I'm performing a conversion of this object to string (to_string) 4) Yes! I can release my memory, and code the object finally. ( (1); ) For such a widely used component as string conversions this additional complexity is inappropriate. Note: exactly the same critique can be addressed to lexical_cast<> too. And it has an additional complexity of explicitly specified template parameter. For string to type conversions all things are worse. in java it would be: try { int i = Integer.parseInt(s); // use i } catch (NumberFormatException) { /* perform some error handling or ignore - the usual practice */ } with lexical_cast<> it would be: int i = lexical_cast<int>(s); // use i // exception handling is usually done on a higher levels with string_to<> it would be: int i = string_to<int>(s); just a name was changed here. The resulting mental sequence for all 3 variants above is far from optimal. for lexical_cast<> it would be as follows: 1) I want an int (int i = ) 2) I have a string (s, but not code it, hold it in memory for a while) 3) I'm performing a conversion of this string to an int ( lexical_cast<int> ) 4) Yes! I can release my memory, and code the string finally. ( (s); ) The same mental complexity here. the shortest mental sequence possible is as follows: 1) I want an int (int i = ) 2) I have a string (s) 3) I'm performing a conversion of this string to an int ( toInt(); ) int i = s.toInt(); this approach scales bad, of cause, but it is optimal in a mental sense. Furthermore, one can mention that the best way would be as follows: int i = s; "Construct an int from a string" - as simple as it could be. Surprisingly, it can be implemented! (in terms of templated type cast operator): class string { template<typename T> operator T(); }; But this solution has major drawbacks: 1) it can not be made symmetrical with type to string conversion 2) it is hard to see such conversions in code 3) it requires changes in the standard strings library all three can be resolved with some free-function adapter like string_to, but with more appropriate naming: int i = from_string(s); its counterpart would become: string s = string_from(1); wstring s = wstring_from(1); Note: 1) usage is symmetrical 2) no explicit template parameters The from_string function has one minor drawback: it can not be used in expressions without explicit casting to the type desired: double d = 2.0 + from_string(s); // doesn't works double d = 2.0 + (double)from_string(s); // does But it can be seen as an advantage, because: 1) intention is clear and enforced by compiler (operator'+' ambiguity, or run-time exception if 2.0 becomes 2 and s looks like “1.1”) 2) mentally, the expression "(double)from_string(s)" is close to optimal, it can be thought of as: "Get double from string" - It is hard to imagine thinking path that is shorter and reflects intentions in a more straightforward way. To conclude: the pair of [w]string_from/from_string functions is proposed to compete lexical_cast<> function template for simple needs of converting some type to string or string to some type. Additionally, these functions are not restricted to pure cast-like syntax, and could accept parameters like locale, std::ios::fmtflags and boost::cvtstate (it is a part of this proposal) to address issues (1), (2), and (4) consequently. (see the Requirements table above) The proposal, part 2. converter objects and functor adapters. This part is intended to address issues (3) and (5). It can be achieved by providing templated "converter objects" along with typedefs for char and wchar_t: basic_string_icvt<char_type, traits_type, allocator_type>: string_icvt, wstring_icvt basic_string_ocvt<char_type, traits_type, allocator_type>: string_ocvt, wstring_ocvt basic_string_cvt<char_type, traits_type, allocator_type>: string_cvt, wstring_cvt usage can be: string_cvt scvt(ios_base::hex, locale("")); string s; scvt(12, s); int i; scvt(s, i); and functor adapters: basic_string_ocvt_fun<TCont> typedef basic_string_ocvt_fun<std::string> string_ocvt_fun; typedef basic_string_ocvt_fun<std::wstring> wstring_ocvt_fun; basic_string_icvt_fun<Target, TChar, Traits, TAlloc>; // template typedef template < typename Target, typename Traits = std::char_traits<char>, typename TAlloc = std::allocator<char>
class string_icvt_fun : public basic_string_icvt_fun<Target, char, Traits, TAlloc> // template typedef template < typename Target, typename Traits = std::char_traits<wchar_t>, typename TAlloc = std::allocator<wchar_t>
class wstring_icvt_fun: public basic_string_icvt_fun<Target, wchar_t, Traits, TAlloc> These classes can be used as follows: vector<double> vec_doubles(10, 1.2); vector<string> vec_strings; string_ocvt_fun<string> ocvtf(scvt); transform( vec_doubles.begin(), vec_doubles.end(), // from back_inserter(vec_strings), // to ocvtf ); string_icvt_fun<double> icvtf(scvt); vector<double> vec_doubles1(10); transform( vec_strings.begin(), vec_strings.end(), // from vec_doubles1.begin(), // to icvtf ); int sz = vec_doubles.size(); for (int i = 0; i < sz; ++i) { assert(vec_doubles[i] == vec_doubles1[i]); } And, finally, all power of iostreams can be achieved with this classes: std::ios_base::fmtflags could be specified as a parameter of all converter classes’ constructors to specify some special formatting. Additionally, all family of fmtflags related functions from std::ios_base and std::basic_ios<> are provided. width() and fill() bounties are also provided. (If I forgot to mention some function - it was not intentionally, all meaningful functions from iostreams base classes would be included) In order to satisfy requirement (1) std::locale object can be specified as a parameter of constructor, or as an argument to imbue() function. getloc() function is provided too. For requirement (4) type cvtstate is provided, that is very close to std::ios_base::iostate type, but cvtstate is not a typedef for int, to allow function overloads on it. ‘cvtstate except’ parameter can be provided to constructors of converter classes to specify cases when exceptions should be thrown. By default no exceptions are thrown. The state of conversion (successful or not) can be viewed with rdstate() function and all good/bad/fail functions. Additionally, exception handling behavior can be queried/changed with exceptions() functions. Again, exactly as in std::basic_ios class. Performance for built-in types (the requirement number 5) would be achieved in specializations of components proposed. These specializations would use the technique, proposed in n1803 document – “Simple Numeric Access”: strtoXXX() C-library functions to convert strings to numbers and sprintf() function to convert from numbers to strings. Support for non-standard strings can be done by specializing cvt_tarits<TCont> for them. Till now I have a minimal working implementation of basic concepts proposed. Possible mentors for this project could be authors of the “Lexical Conversion Library Proposal for TR2” proposal - Kevlin Henney and/or Beman Dawes. Best, PhD student, Oleg Abrosimov.