
On 6/23/2010 2:51 PM, Daniel James wrote:
On 23 June 2010 11:51, Stewart, Robert<Robert.Stewart@sig.com> wrote:
Those other approaches are heavier than these algorithms
You often need to use some kind of parser just to get the quoted string in the first place.
which can serve simple cases quite well.
What are these simple cases?
CSV fields, pathnames, log messages.
If you'd care to enumerate the special cases to which you allude, we can consider how best to address them, if support is warranted.
Some examples are: supporting multiple delimiter characters (e.g. supporting both 'x' and "x"), delimiters made up of multiple characters (e.g., """x"""), delimiter pairs (e.g. {x}), meaningful escapes (e.g. '\n' meaning newline), whether newlines are allowed between quotes or if they should end the quoted string, how multiple quoted strings are treated (e.g. in C whitespace separated quoted strings are concatenated, in your algorithm the space between them is included), whether the parsing should be strict or loose and if it is loose, how should it recover from errors.
Those are definitely cases that I didn't intend this algorithm to cover except, perhaps, multiple delimiter characters and paired delimiters, which I hadn't considered. Semantic meaning is definitely domain specific as is the treatment of multiple delimited substrings. In the latter case, while simply removing the internal delimiters is legitimate, so is just handling first and last characters and ignoring delimiters in the rest. When considered as the inverse of quote(), unquote() should simply strip leading and trailing delimiters and look for escaped delimiters and escaped escape characters within. To supply the extra semantics you've suggested, quote() must also be enhanced significantly. ___ Rob