
6 Jul
2006
6 Jul
'06
4:33 a.m.
> This is very close to what I have in mind. The main difference is that > the functions/algorithms in my mind take ranges instead of iterators. > Thus: > > to_lower(src, dest) > to_upper(src, dest) So long as you don't require ranges (or a pair of iterators makes a valid range and dest can still be an output iterator). That's fine - these should work on char* as well as container types. I don't know what kind of ranges you have for dest which allow dest to change size - seems a bit problematic. I want iterators that can handle the encoding transform. I want to be able to write items like the following: std::string s = get_some_utf_8_xml_data(); // Find the BOM character as a UTF-32 character utf_iterator_t i = std::find(utf_iterator_t(s.begin()), utf_iterator_t (s.end()), UL0x0000FEFF); assert(*i.base() == U0xEF); // base iterator points to start of UTF-8 character Sean > With these, I could make Fusion like wrappers that transform them into > something like: > > some_string s1 = to_lower(src); > some_string s2 = to_upper(src); > > where to_lower and to_upper return cheap views that are in and by > themselves valid strings/ranges. They are cheap because the actual > conversions/transformations are done on demand-- think lazy > evaluation. > So, like those done by expression template techniques, there are > no expensive temporaries when you perform seemingly expensive tasks > like: > > some_string s = f1(f2(f3(f4(src)))); > > And yes, because they are generic, those string algorithms can work > on any string type that satisfy some basic requirements. > > Regards, > > -- > Joel de Guzman > http://www.boost-consulting.com > http://spirit.sf.net

6 Jul
6 Jul
5:05 a.m.
New subject: Comment on string / unicode discussion
Sean Parent wrote: >> This is very close to what I have in mind. The main difference is that >> the functions/algorithms in my mind take ranges instead of iterators. >> Thus: >> >> to_lower(src, dest) >> to_upper(src, dest) > So long as you don't require ranges (or a pair of iterators makes a > valid range and dest can still be an output iterator). That's fine - > these should work on char* as well as container types. I don't know > what kind of ranges you have for dest which allow dest to change size > - seems a bit problematic. Yeah. A bit problematic. This is not a problem with the pure functional approach where you return a lazily evaluated view: to_lower(src) // returns a view > I want iterators that can handle the encoding transform. I want to be > able to write items like the following: > > std::string s = get_some_utf_8_xml_data(); > > // Find the BOM character as a UTF-32 character > > utf_iterator_t i = std::find(utf_iterator_t(s.begin()), utf_iterator_t > (s.end()), UL0x0000FEFF); > > assert(*i.base() == U0xEF); // base iterator points to start of UTF-8 > character Contrast that with: std::string s = get_some_utf_8_xml_data(); utf_range r = boost::find(utf_range(s), UL0x0000FEFF); assert(*r.begin().base() == U0xEF); Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

5:43 a.m.
Sean Parent wrote: >> This is very close to what I have in mind. The main difference is that >> the functions/algorithms in my mind take ranges instead of iterators. >> Thus: >> >> to_lower(src, dest) >> to_upper(src, dest) > So long as you don't require ranges (or a pair of iterators makes a > valid range and dest can still be an output iterator). That's fine - > these should work on char* as well as container types. I don't know > what kind of ranges you have for dest which allow dest to change size > - seems a bit problematic. > > I want iterators that can handle the encoding transform. I want to be > able to write items like the following: > > std::string s = get_some_utf_8_xml_data(); > > // Find the BOM character as a UTF-32 character > > utf_iterator_t i = std::find(utf_iterator_t(s.begin()), utf_iterator_t > (s.end()), UL0x0000FEFF); > > assert(*i.base() == U0xEF); // base iterator points to start of UTF-8 > character Boost has (unofficially?) such iterators. Look into <boost/regex/pending/unicode_iterator.hpp> -- Shunsuke Sogame
6914
Age (days ago)
6914
Last active (days ago)
2 comments
3 participants
participants (3)
-
Joel de Guzman
-
Sean Parent
-
Shunsuke Sogame