[filesystem] basic_path::operator>> issue for paths with spaces

When a path that contains spaces is extracted from a text file using basic_path::operator>>, then the path is truncated at the first space. Maybe basic_path::operator<< should put quotes around the path and basic_path::operator>> strip the quotes (if any). --Johan Råde

On Mon, Sep 22, 2008 at 14:14, Johan Råde <rade@maths.lth.se> wrote:
When a path that contains spaces is extracted from a text file using basic_path::operator>>, then the path is truncated at the first space.
Maybe basic_path::operator<< should put quotes around the path and basic_path::operator>> strip the quotes (if any).
That sounds good, but how would it handle paths with "s in them? $ touch '"' $ rm '"' rm: remove regular empty file `"'? y $ touch '\' $ rm '\' rm: remove regular empty file `\\'? y ~ Scott

Scott McMurray wrote:
On Mon, Sep 22, 2008 at 14:14, Johan Råde <rade@maths.lth.se> wrote:
That sounds good, but how would it handle paths with "s in them?
$ touch '"' $ rm '"' rm: remove regular empty file `"'? y $ touch '\' $ rm '\' rm: remove regular empty file `\\'? y
~ Scott _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
I don't know. I only have experience with Windows, where spaces are common and "s are not allowed in path names. It is a common convention on Windows to surround paths, that contain spaces, by quotes. How is this problem handled on Unix? --Johan Råde

In the past, due to the standard's approach to treating all string extraction as ending upon encountering a space, I have tended to insert the length of the string a space and then characters. "My Name" serialized becomes "7 My Name" in >> w; // read 7 in.ignore(); // ignore space in.width( w ) // minimum extraction for string in >> s; // extract string I like this approach better than using some "sentinel" character as there's no need to perform character by character extraction one the length is known. Not to mention a path can now have a ' ', a '"', etc.... Any way you slice it, there's a problem as the serialization protocol may need to change. How will previous serialized paths be extracted? --aj On 9/22/08 11:48 AM, "Johan Råde" <rade@maths.lth.se> wrote: Scott McMurray wrote:
On Mon, Sep 22, 2008 at 14:14, Johan Råde <rade@maths.lth.se> wrote:
That sounds good, but how would it handle paths with "s in them?
$ touch '"' $ rm '"' rm: remove regular empty file `"'? y $ touch '\' $ rm '\' rm: remove regular empty file `\\'? y
~ Scott _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
I don't know. I only have experience with Windows, where spaces are common and "s are not allowed in path names. It is a common convention on Windows to surround paths, that contain spaces, by quotes. How is this problem handled on Unix? --Johan Råde _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Andrew James wrote:
In the past, due to the standard's approach to treating all string extraction as ending upon encountering a space, I have tended to insert the length of the string a space and then characters.
"My Name" serialized becomes "7 My Name"
in >> w; // read 7 in.ignore(); // ignore space in.width( w ) // minimum extraction for string in >> s; // extract string
I like this approach better than using some "sentinel" character as there's no need to perform character by character extraction one the length is known. Not to mention a path can now have a ' ', a '"', etc....
Any way you slice it, there's a problem as the serialization protocol may need to change. How will previous serialized paths be extracted?
--aj
There is no problem as far as serialization is concerned. Serialization archives have no problem with strings with spaces. --Johan Råde

It seems that operators << and >> can not be defined in a useful way for filesystem::basic_path. If that is the case, maybe they should not be defined at all. --Johan Råde

On Mon, Sep 22, 2008 at 14:48, Johan Råde <rade@maths.lth.se> wrote:
I don't know. I only have experience with Windows, where spaces are common and "s are not allowed in path names. It is a common convention on Windows to surround paths, that contain spaces, by quotes. How is this problem handled on Unix?
Quotes are usually used, which are parsed out by the shell. For filenames with quotation marks, then escaping is needed, or a different method, such as apostrophes. In all cases, though, the shell has stripped the excess before passing the appropriate filename to the program. (The shell is also responsible for expanding wildcards.) Is there a really good solution? I don't know. It seems like I can even have newlines in filenames: $ touch '1 2' $ ls 1?2 $ rm 1?2 rm: remove regular empty file `1\n2'? y It seems operator>> would need the full parsing capabilities of a shell, and operator<< would perform escaping, not unlike what rm seems to do (just with quotation marks, for round-tripping). ~ Scott

On Mon, Sep 22, 2008 at 9:24 PM, Scott McMurray <me22.ca+boost@gmail.com> wrote:
On Mon, Sep 22, 2008 at 14:48, Johan Råde <rade@maths.lth.se> wrote:
I don't know. I only have experience with Windows, where spaces are common and "s are not allowed in path names. It is a common convention on Windows to surround paths, that contain spaces, by quotes. How is this problem handled on Unix?
Quotes are usually used, which are parsed out by the shell. For filenames with quotation marks, then escaping is needed, or a different method, such as apostrophes. In all cases, though, the shell has stripped the excess before passing the appropriate filename to the program. (The shell is also responsible for expanding wildcards.)
Is there a really good solution? I don't know. It seems like I can even have newlines in filenames:
I think that, according to posix (or at least traditional unix), the only character that is not allowed in a filename is '/'. -- gpd

That can be generalized to the only character not allowed is the path separator. But that separator itself doesn't really need to be stored. How bout a serialization like /My/Path/With Spaces/ 3 2 My4 Path11With Spaces This encodes, 3 components then the components follow. --aj On 9/22/08 1:01 PM, "Giovanni Piero Deretta" <gpderetta@gmail.com> wrote: On Mon, Sep 22, 2008 at 9:24 PM, Scott McMurray <me22.ca+boost@gmail.com> wrote:
On Mon, Sep 22, 2008 at 14:48, Johan Råde <rade@maths.lth.se> wrote:
I don't know. I only have experience with Windows, where spaces are common and "s are not allowed in path names. It is a common convention on Windows to surround paths, that contain spaces, by quotes. How is this problem handled on Unix?
Quotes are usually used, which are parsed out by the shell. For filenames with quotation marks, then escaping is needed, or a different method, such as apostrophes. In all cases, though, the shell has stripped the excess before passing the appropriate filename to the program. (The shell is also responsible for expanding wildcards.)
Is there a really good solution? I don't know. It seems like I can even have newlines in filenames:
I think that, according to posix (or at least traditional unix), the only character that is not allowed in a filename is '/'. -- gpd _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Mon, Sep 22, 2008 at 16:17, Andrew James <aj@powerset.com> wrote:
That can be generalized to the only character not allowed is the path separator. But that separator itself doesn't really need to be stored. How bout a serialization like
/My/Path/With Spaces/
3 2 My4 Path11With Spaces
This encodes, 3 components then the components follow.
This works, but I don't think that serialization is the problem. I see the <</>> operators as for user-facing I/O, and such a format is unacceptable for that usage. The string literal-style escaping seems like the only reasonable format to me. Especially if it's only output when there are spaces in the path (or the first character of the path is a quotation mark), and then operator>> reads like it does now if there's no quotation mark as the first character.
participants (4)
-
Andrew James
-
Giovanni Piero Deretta
-
Johan Råde
-
Scott McMurray