Re: [boost] [rfc] I/O Library Design

23 Jun 2007


      Jeremy Maitin-Shepard wrote:
...
Andrey Semashev <andysem@mail.ru> writes:
...
Jeremy Maitin-Shepard wrote:
...
Andrey Semashev <andysem@mail.ru> writes:
...
There may be different parsing techniques, depending on the text format. 
Sometimes only character iteration is sufficient, in case of forward 
sequential parsing. There is no restriction, though, to perform 
non-sequential parsing (in case if there is some table of contents with 
offsets or each field to be parsed is prepended with its length).
Such a format would likely then not really be text, since it would
contain embedded offsets (which might likely not be text).
Why not? See GCC symbols mangling for example.
...
...
If all standard algorithms and classes assume that the text being parsed 
is in Unicode, it cannot perform optimizations in a more efficient 
manner. The std::string or regex or stream classes will always have to 
treat the text as Unicode.
Well, since std::string and boost::regex already exist and do not assume
Unicode (or even necessarily support it very well; I've seen some
references to boost::regex providing Unicode support, but I haven't
looked into it), that is not likely to occur.
Actually, std::string (or basic_string) does not support Unicode since 
it operates on per-value_type basis. IOW, it won't recognize code 
sequences. Same thing with streams. As for Boost.Regex, it has such 
support, but it is optional (i.e. it allows 1-octet fixed width strings 
for processing). And I believe, it is the way to do in other components 
we're discussing.