
On 23 June 2010 11:51, Stewart, Robert <Robert.Stewart@sig.com> wrote:
To be honest, I don't see the value of this. As this is the kind of thing which is handled well in other ways (e.g. using a parser or lexer generator, or a standard data format such as XML, JSON etc.). There tends to be odd differences in quoting, encoding and escaping styles making a generic function awkward. It's not as specific as a filename extractor and not as generic as a parser and it's not clear why there's a need for something in between.
Those other approaches are heavier than these algorithms
You often need to use some kind of parser just to get the quoted string in the first place.
which can serve simple cases quite well.
What are these simple cases? I could see the use for something which reads and decodes a 'token' following something like the shell grammar and sets the iterator to the end of the token. But that's quite a specific and more complicated grammar, rather than an attempt at a simple general one.
If you'd care to enumerate the special cases to which you allude, we can consider how best to address them, if support is warranted.
Some examples are: supporting multiple delimiter characters (e.g. supporting both 'x' and "x"), delimiters made up of multiple characters (e.g., """x"""), delimiter pairs (e.g. {x}), meaningful escapes (e.g. '\n' meaning newline), whether newlines are allowed between quotes or if they should end the quoted string, how multiple quoted strings are treated (e.g. in C whitespace separated quoted strings are concatenated, in your algorithm the space between them is included), whether the parsing should be strict or loose and if it is loose, how should it recover from errors. Daniel