[filesystem] Function name changes

Beman Dawes

3 Jul 2008 3 Jul '08

4:17 p.m.

All the changes mentioned below leave the current names in place as synonyms, so will break no existing code. * Change is_regular() to is_regular_file(). Dave Abrahams and several other people, including me, have found that the name of the is_regular() function is too obscure. It was chosen over is_regular_file() for brevity, but isn't holding up well with experience. The status name is already "regular_file" and the term "regular file" is used by the POSIX standard. * Change leaf() to base_name(). * Change branch() to base_path(). In the long thread '[filesystem] "leaf"', several people found the current names obscure or invoking the wrong metaphor. Many alternative names were discussed, with an eye to making the names more obvious, conforming to names such as "basename" used in other programming languages, and/or using names already in use by the STL. Not breaking existing code was also a major concern, given that "basename" is already used in Boost.Filesystem. The mental model for the new names is that for a path, p: p == p.base_path() / p.base_name() Thus if p == "foo/bar/boo.txt", p.base_path() == "foo/bar", and p.base_name() == "boo.txt". * Change basename() to base_name_prefix(). * Change to extension() to base_name_extension(). This change is to avoid confusion with base_name(). The mental model for the new name is that for a path, p: p.base_name() == base_name_prefix(p) + base_name_extension(p) Thus if p == "foo/bar/boo.txt", base_name_prefix(p) == "boo", and base_name_extension(p) == ".txt". Comments? --Beman

Show replies by date

David Abrahams

3 Jul 3 Jul

5:36 p.m.

Beman Dawes wrote:

...

All the changes mentioned below leave the current names in place as synonyms, so will break no existing code.

* Change is_regular() to is_regular_file().

Dave Abrahams and several other people, including me, have found that the name of the is_regular() function is too obscure. It was chosen over is_regular_file() for brevity, but isn't holding up well with experience. The status name is already "regular_file" and the term "regular file" is used by the POSIX standard.

* Change leaf() to base_name(). * Change branch() to base_path().

What does the latter do, again? Oh, yeah, it returns the path to the directory containing the file indicated by the path object. I'm sorry, but I don't like base_path() much. We know it is going to return a path, so the "path" part of the name doesn't tell me much about what this function does. The only thing left to go on is the meaning of "base." Well I'm afraid we don't know what that is. The (only) good reason to use base_name() is that there is precedent; it's not because it makes particularly "good sense." You yourself find it counterintuitive, IIUC. If "base" _did_ make any sense, I would guess that base_path() was the identity function. How about parent_directory(), containing_directory(), container_path(), container(), or parent()?

...

In the long thread '[filesystem] "leaf"', several people found the current names obscure or invoking the wrong metaphor. Many alternative names were discussed, with an eye to making the names more obvious, conforming to names such as "basename" used in other programming languages, and/or using names already in use by the STL. Not breaking existing code was also a major concern, given that "basename" is already used in Boost.Filesystem.

The mental model for the new names is that for a path, p:

p == p.base_path() / p.base_name()

Thus if p == "foo/bar/boo.txt", p.base_path() == "foo/bar", and p.base_name() == "boo.txt".

* Change basename() to base_name_prefix(). * Change to extension() to base_name_extension().

This change is to avoid confusion with base_name().

Works for me.

...

The mental model for the new name is that for a path, p:

p.base_name() == base_name_prefix(p) + base_name_extension(p)

Thus if p == "foo/bar/boo.txt", base_name_prefix(p) == "boo", and base_name_extension(p) == ".txt".

Comments?

I worry a bit about the chance of silent misbehavior due to basename/base_name spelling errors, and about what happens when both names make it into the same codebase. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Beman Dawes

6:29 p.m.

David Abrahams wrote:

...

Beman Dawes wrote:

...
All the changes mentioned below leave the current names in place as synonyms, so will break no existing code.

* Change is_regular() to is_regular_file().

Dave Abrahams and several other people, including me, have found that the name of the is_regular() function is too obscure. It was chosen over is_regular_file() for brevity, but isn't holding up well with experience. The status name is already "regular_file" and the term "regular file" is used by the POSIX standard.

* Change leaf() to base_name(). * Change branch() to base_path().

What does the latter do, again? Oh, yeah, it returns the path to the directory containing the file indicated by the path object.

I'm sorry, but I don't like base_path() much. We know it is going to return a path, so the "path" part of the name doesn't tell me much about what this function does. The only thing left to go on is the meaning of "base."

Well I'm afraid we don't know what that is. The (only) good reason to use base_name() is that there is precedent; it's not because it makes particularly "good sense." You yourself find it counterintuitive, IIUC.

That's correct. I'm only suggesting it because of the precedent.

...

If "base" _did_ make any sense, I would guess that base_path() was the identity function.

How about parent_directory(), containing_directory(), container_path(), container(), or parent()?

The objection raised in the discussion to the names with "parent" is that the resulting path is only the parent path in the absence of symbolic links. Same objection would apply to your other suggestions. Maybe that isn't all that strong an argument; parent_path() is quite clear to me and fits into the overall naming scheme. The impact of symlinks may just be a quirk we can live with. parent_directory() or parent() always look to me like they refer to one level up only.

...

...
In the long thread '[filesystem] "leaf"', several people found the current names obscure or invoking the wrong metaphor. Many alternative names were discussed, with an eye to making the names more obvious, conforming to names such as "basename" used in other programming languages, and/or using names already in use by the STL. Not breaking existing code was also a major concern, given that "basename" is already used in Boost.Filesystem.

The mental model for the new names is that for a path, p:

p == p.base_path() / p.base_name()

Thus if p == "foo/bar/boo.txt", p.base_path() == "foo/bar", and p.base_name() == "boo.txt".

* Change basename() to base_name_prefix(). * Change to extension() to base_name_extension().

This change is to avoid confusion with base_name().

Works for me.

...
The mental model for the new name is that for a path, p:

p.base_name() == base_name_prefix(p) + base_name_extension(p)

Thus if p == "foo/bar/boo.txt", base_name_prefix(p) == "boo", and base_name_extension(p) == ".txt".

Comments?

I worry a bit about the chance of silent misbehavior due to basename/base_name spelling errors, and about what happens when both names make it into the same codebase.

I don't like that either, but (1) there doesn't seem much choice if "base*" is to be the new name, and (2) the problem doesn't last indefinitely since the deprecated form of the names would go away in say one year. --Beman

John Femiani

6:49 p.m.

Beman wrote:

...

...
Well I'm afraid we don't know what that is. The (only)

good reason to

...
use base_name() is that there is precedent; it's not because it makes particularly "good sense." You yourself find it counterintuitive, IIUC.

That's correct. I'm only suggesting it because of the precedent.

...
If "base" _did_ make any sense, I would guess that base_path() was the identity function.

How about parent_directory(), containing_directory(), container_path(), container(), or parent()?

There is also a precedent for dirname. I think basename/dirname are both posix functions (but I'm stuck in windows so don't take my word). You could have : dir_name = branch_path base_name = leaf extension = extension Then base_name(f, extension(f)) = basename(f) <snip>

...

...
...
Comments?

I worry a bit about the chance of silent misbehavior due to basename/base_name spelling errors, and about what happens when both names make it into the same codebase.

I don't like that either, but (1) there doesn't seem much choice if "base*" is to be the new name, and (2) the problem doesn't last indefinitely since the deprecated form of the names would go away in say one year.

This can be mitigated by providing a macro to enable/disable deprecation warnings. People who don't have time to fix names can use something like BOOST_FS_NO_DEPRECATE (or BOOST_NO_DEPRECATE?) until they get a chance to fix their code. Then you can deprecate basename so accidental misspelling of base_name as basename will result in a warning. -- John

David Abrahams

8:16 p.m.

Beman Dawes wrote:

...

David Abrahams wrote:

...
Beman Dawes wrote:

...
* Change leaf() to base_name(). * Change branch() to base_path().

What does the latter do, again? Oh, yeah, it returns the path to the directory containing the file indicated by the path object.

I'm sorry, but I don't like base_path() much. We know it is going to return a path, so the "path" part of the name doesn't tell me much about what this function does. The only thing left to go on is the meaning of "base."

Well I'm afraid we don't know what that is. The (only) good reason to use base_name() is that there is precedent; it's not because it makes particularly "good sense." You yourself find it counterintuitive, IIUC.

That's correct. I'm only suggesting it because of the precedent.

...
If "base" _did_ make any sense, I would guess that base_path() was the identity function.

How about parent_directory(), containing_directory(), container_path(), container(), or parent()?

The objection raised in the discussion to the names with "parent" is that the resulting path is only the parent path in the absence of symbolic links. Same objection would apply to your other suggestions.]

I find that objection specious.

...

Maybe that isn't all that strong an argument; parent_path() is quite clear to me and fits into the overall naming scheme. The impact of symlinks may just be a quirk we can live with.

parent_directory() or parent() always look to me like they refer to one level up only.

I think parent_path or parent can actually be viewed as completely being correct in the presence of symlinks. A path is a path, not a file or directory (or symlink). There may or may not be a file or directory there. So the parent of a path is also a path, regardless of the underlying filesystem structure.

...

...
...
Comments?

I worry a bit about the chance of silent misbehavior due to basename/base_name spelling errors, and about what happens when both names make it into the same codebase.

I don't like that either, but (1) there doesn't seem much choice if "base*" is to be the new name, and (2) the problem doesn't last indefinitely since the deprecated form of the names would go away in say one year.

FWIW, though I think it's probably a good idea to use base_name as you are suggesting, I was much less attached to the idea of using it to mean what is currently called leaf() than I was opposed to the idea of using it to mean something else, if you catch my drift :-) So one other option that avoids the above issues (not that I'm pushing this route) is to pick another name for what you currently call leaf(). -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Beman Dawes

9:58 p.m.

David Abrahams wrote:

...

FWIW, though I think it's probably a good idea to use base_name as you are suggesting, I was much less attached to the idea of using it to mean what is currently called leaf() than I was opposed to the idea of using it to mean something else, if you catch my drift :-)

Ah! Understood.

...

So one other option that avoids the above issues (not that I'm pushing this route) is to pick another name for what you currently call leaf().

Let's say branch_path() is changed to parent_path(). That suggests a full set of names based on the parent/child decomposition of a path: * Change branch() to parent_path() * Change leaf() to child() * Change basename() to child_prefix() * Change extension() to child_extension() At first glance, those names seem reasonable clear and self-consistent. What's your take on that set? Although historically basename() and extension() were free functions, my sense is the replacements should be basic_path member functions as they are closely related to the basic_path decomposition functions. Do you have an opinion on that? --Beman

Bjørn Roald

10:36 p.m.

Beman Dawes wrote:

...

David Abrahams wrote:

...
So one other option that avoids the above issues (not that I'm pushing this route) is to pick another name for what you currently call leaf().

Let's say branch_path() is changed to parent_path(). That suggests a full set of names based on the parent/child decomposition of a path:

* Change branch() to parent_path() * Change leaf() to child() * Change basename() to child_prefix() * Change extension() to child_extension()

At first glance, those names seem reasonable clear and self-consistent.

What's your take on that set?

I am not sure this is any good. Considder the path "../../a/b" and the meaning of parent and child. The only sensible parent is in the middle and we don't even know it's name. Children are at both ends the implicit "." or "b". -- Bjørn

David Abrahams

11:52 p.m.

Bjørn Roald wrote:

...

Beman Dawes wrote:

...
David Abrahams wrote:

...
So one other option that avoids the above issues (not that I'm pushing this route) is to pick another name for what you currently call leaf().

Let's say branch_path() is changed to parent_path(). That suggests a full set of names based on the parent/child decomposition of a path:

* Change branch() to parent_path() * Change leaf() to child() * Change basename() to child_prefix() * Change extension() to child_extension()

At first glance, those names seem reasonable clear and self-consistent.

What's your take on that set?

I am not sure this is any good. Considder the path "../../a/b" and the meaning of parent and child.

The only sensible parent is in the middle and we don't even know it's name. Children are at both ends the implicit "." or "b".

The parent of '../../a/b' is '../../a' The parent of '../../a' is '../..' The parent of '../..' is probably '../../..' If Beman intended '..' to be the result of the final transformation above, then we should be using something like pop() to describe it. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Michael Marcin

4 Jul 4 Jul

12:52 a.m.

David Abrahams wrote:

...

Bjørn Roald wrote:

...
Beman Dawes wrote:

...
* Change branch() to parent_path() * Change leaf() to child() * Change basename() to child_prefix() * Change extension() to child_extension()

At first glance, those names seem reasonable clear and self-consistent.

What's your take on that set?

I am not sure this is any good. Considder the path "../../a/b" and the meaning of parent and child.

The only sensible parent is in the middle and we don't even know it's name. Children are at both ends the implicit "." or "b".

The parent of '../../a/b' is '../../a' The parent of '../../a' is '../..' The parent of '../..' is probably '../../..'

If Beman intended '..' to be the result of the final transformation above, then we should be using something like pop() to describe it.

Doesn't leaf serve 2 functions? If the path points to a directory it returns most derived directory (to make a bad analogy to class hierarchy). If the path points to a file it returns the filename. It seems to me that this duality makes it difficult to choose a meaningful name in the domain of file systems. Personally I think I like parent() - if this is a path to a subdirectory or a file returns a path one level above directory() otherwise returns *this directory() - if this is a path to a file returns a path to the directory that contains the file otherwise returns *this filename() - if this is a path to a file returns a string containing the filename otherwise returns an empty string basename() - if this is a path to a file returns a string containing the filename without its extension otherwise returns an empty string extension() - if this is a path to a file returns a string contains the filename without its basename otherwise returns an empty string Thanks, Michael Marcin

David Abrahams

1:13 a.m.

Michael Marcin wrote:

...

David Abrahams wrote:

...
Bjørn Roald wrote:

...
Beman Dawes wrote:

...
* Change branch() to parent_path() * Change leaf() to child() * Change basename() to child_prefix() * Change extension() to child_extension()

At first glance, those names seem reasonable clear and self-consistent.

What's your take on that set?

I am not sure this is any good. Considder the path "../../a/b" and the meaning of parent and child.

The only sensible parent is in the middle and we don't even know it's name. Children are at both ends the implicit "." or "b".

The parent of '../../a/b' is '../../a' The parent of '../../a' is '../..' The parent of '../..' is probably '../../..'

If Beman intended '..' to be the result of the final transformation above, then we should be using something like pop() to describe it.

Doesn't leaf serve 2 functions? If the path points to a directory it returns most derived directory

It does not return a directory. We don't have a type that can represent a directory. We only have paths and strings.

...

(to make a bad analogy to class hierarchy). If the path points to a file it returns the filename.

No. In either case it returns the name of the thing that the path points to.

...

It seems to me that this duality makes it difficult to choose a meaningful name in the domain of file systems.

Personally I think I like

parent() - if this is a path to a subdirectory or a file returns a path one level above directory() otherwise returns *this

IIUC, this function is not supposed to touch the filesystem. It's supposed to be a pure path manipulation.

...

directory() - if this is a path to a file returns a path to the directory that contains the file otherwise returns *this

ditto

...

filename() - if this is a path to a file returns a string containing the filename otherwise returns an empty string

ditto

...

basename() - if this is a path to a file returns a string containing the filename without its extension otherwise returns an empty string

ditto

...

extension() - if this is a path to a file returns a string contains the filename without its basename otherwise returns an empty string

:) you get the idea. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Michael Marcin

1:31 a.m.

David Abrahams wrote:

...

Michael Marcin wrote:

...
Doesn't leaf serve 2 functions? If the path points to a directory it returns most derived directory

It does not return a directory. We don't have a type that can represent a directory. We only have paths and strings.

...
(to make a bad analogy to class hierarchy). If the path points to a file it returns the filename.

No. In either case it returns the name of the thing that the path points to.

Sorry I meant name of a directory or name of a file. From the filesystem doc's index page: "leaf() returns a string which is a copy of the last (closest to the leaf, farthest from the root) file or directory name in the path object." I don't believe any of the functions need to touch the filesystem with the possible exception of parent which could be relaxed to just return path + "../" since at least on windows if you ../ yourself above the root of your current drive you just end up at the root of the current drive. Or do you mean that you can't determine if it is a path to a file because files can look just like directories if they have no extensions and directories can contain periods? That is annoying but it seems like a trailing slash at the end of a path would be enough to differentiate paths to files from paths to directories. Thanks, Micahel Marcin

Steven Watanabe

1:47 a.m.

AMDG Michael Marcin wrote:

...

That is annoying but it seems like a trailing slash at the end of a path would be enough to differentiate paths to files from paths to directories.

Doesn't that force the path constructor to access the file system? In Christ, Steven Watanabe

Michael Marcin

2:14 a.m.

Steven Watanabe wrote:

...

AMDG

Michael Marcin wrote:

...
That is annoying but it seems like a trailing slash at the end of a path would be enough to differentiate paths to files from paths to directories.

Doesn't that force the path constructor to access the file system?

I think it just forces the user to supply a path that meets this criteria.

Steven Watanabe

2:44 a.m.

AMDG Michael Marcin wrote:

...

Steven Watanabe wrote:

...
AMDG

Michael Marcin wrote:

...
That is annoying but it seems like a trailing slash at the end of a path would be enough to differentiate paths to files from paths to directories.

Doesn't that force the path constructor to access the file system?

I think it just forces the user to supply a path that meets this criteria.

1) I for one would be very surprised by a library that requires this. No tool that I can think of works this way. 2) How do you deal with an input parameter that can be either a file or a directory? Do you have to determine which it is before creating a path? The boost::filesystem functions for determining this take paths... In Christ, Steven Watanabe

David Abrahams

2:27 a.m.

Michael Marcin wrote:

...

That is annoying but it seems like a trailing slash at the end of a path would be enough to differentiate paths to files from paths to directories.

I don't like the idea of adding "file or directory" semantics to paths. Paths are just paths. What about symlinks? What about devices? Should we find a way to represent that information in a path too? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Scott McMurray

2:49 a.m.

On Thu, Jul 3, 2008 at 22:27, David Abrahams <dave@boostpro.com> wrote:

...

Michael Marcin wrote:

...
That is annoying but it seems like a trailing slash at the end of a path would be enough to differentiate paths to files from paths to directories.

I don't like the idea of adding "file or directory" semantics to paths. Paths are just paths. What about symlinks? What about devices? Should we find a way to represent that information in a path too?

Well, maybe there are some path-finding-domain terms that work? The only thing I can come up with is visited_path, though, and the connotations don't really fit. I never recall hearing a term for the node on the closed list from which a node in the open list was expended...

Scott Woods

8:44 a.m.

----- Original Message ----- From: "Scott McMurray" <me22.ca+boost@gmail.com> To: <boost@lists.boost.org> Sent: Friday, July 04, 2008 2:49 PM Subject: Re: [boost] [filesystem] Function name changes

...

On Thu, Jul 3, 2008 at 22:27, David Abrahams <dave@boostpro.com> wrote:

...
Michael Marcin wrote:

...
That is annoying but it seems like a trailing slash at the end of a path would be enough to differentiate paths to files from paths to directories.

I don't like the idea of adding "file or directory" semantics to paths. Paths are just paths. What about symlinks? What about devices? Should we find a way to represent that information in a path too?

Well, maybe there are some path-finding-domain terms that work? The only thing I can come up with is visited_path, though, and the connotations don't really fit. I never recall hearing a term for the node on the closed list from which a node in the open list was expended...

A strange idea that I had some time ago; what about some kind of compile-time attribute that could be used to distinguish a "device path" from a "directory path"? This strange idea was a merging of previous path+file and units discussions. Is there merit in having the compiler complain when a device path is presented to a method expecting a directory path? Cheers.

Beman Dawes

12:34 p.m.

David Abrahams wrote:

...

Michael Marcin wrote:

...
That is annoying but it seems like a trailing slash at the end of a path would be enough to differentiate paths to files from paths to directories.

I don't like the idea of adding "file or directory" semantics to paths. Paths are just paths. What about symlinks? What about devices? Should we find a way to represent that information in a path too?

I agree. It is a firm principal of the design that "paths are just paths". None of the path member functions go to the file system. There might not even be a file system; paths are sometimes manipulated by programs for purposes other than immediate use. A trailing slash at the end of a path is preserved in case a particular application distinguishes that as a special case. But no Boost.Filesystem functions do so. --Beman

David Abrahams

3 Jul 3 Jul

11:47 p.m.

Beman Dawes wrote:

...

David Abrahams wrote:

...
FWIW, though I think it's probably a good idea to use base_name as you are suggesting, I was much less attached to the idea of using it to mean what is currently called leaf() than I was opposed to the idea of using it to mean something else, if you catch my drift :-)

Ah! Understood.

...
So one other option that avoids the above issues (not that I'm pushing this route) is to pick another name for what you currently call leaf().

Let's say branch_path() is changed to parent_path(). That suggests a full set of names based on the parent/child decomposition of a path:

* Change branch() to parent_path() * Change leaf() to child() * Change basename() to child_prefix() * Change extension() to child_extension()

At first glance, those names seem reasonable clear and self-consistent.

What's your take on that set?

Well, "parent" describes a relationship between the path being operated on and the result. "Child," on the other hand, does not. So that doesn't work for me. I would prefer "parent" and "filename." I would prefer "drop_extension" and "extension," although I rather liked Volodya's "stem" suggestion.

...

Although historically basename() and extension() were free functions, my sense is the replacements should be basic_path member functions as they are closely related to the basic_path decomposition functions. Do you have an opinion on that?

I dunno. The extension splitting things feel like they're string operations more than path operations. It's not clear what should happen when you ask for the child_prefix of a path with structure, although drop_extension makes that a little clearer. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Beman Dawes

4 Jul 4 Jul

1:25 p.m.

David Abrahams wrote:

...

Beman Dawes wrote:

...
David Abrahams wrote:

...
FWIW, though I think it's probably a good idea to use base_name as you are suggesting, I was much less attached to the idea of using it to mean what is currently called leaf() than I was opposed to the idea of using it to mean something else, if you catch my drift :-) Ah! Understood.

...
So one other option that avoids the above issues (not that I'm pushing this route) is to pick another name for what you currently call leaf(). Let's say branch_path() is changed to parent_path(). That suggests a full set of names based on the parent/child decomposition of a path:

* Change branch() to parent_path() * Change leaf() to child() * Change basename() to child_prefix() * Change extension() to child_extension()

At first glance, those names seem reasonable clear and self-consistent.

What's your take on that set?

Well, "parent" describes a relationship between the path being operated on and the result. "Child," on the other hand, does not. So that doesn't work for me. I would prefer "parent" and "filename." I would prefer "drop_extension" and "extension," although I rather liked Volodya's "stem" suggestion.

I like "stem" too. Trying to put that all together: * Change branch() to parent_path() * Change leaf() to file_name() * Change basename() to stem() * extension() remains extension() The models are: * A path is composed of a parent path and file name. * A file name is composed of a stem and an extension. I do intend to change make stem() and extension() member functions; I want to preserve the design convention that lexical operations on paths are supplied as member functions while the filesystem operations on paths are free functions. --Beman

David Abrahams

2:16 p.m.

on Fri Jul 04 2008, Beman Dawes <bdawes-AT-acm.org> wrote:

...

David Abrahams wrote:

...
Beman Dawes wrote:

...
David Abrahams wrote:

...
FWIW, though I think it's probably a good idea to use base_name as you are suggesting, I was much less attached to the idea of using it to mean what is currently called leaf() than I was opposed to the idea of using it to mean something else, if you catch my drift :-) Ah! Understood.

...
So one other option that avoids the above issues (not that I'm pushing this route) is to pick another name for what you currently call leaf(). Let's say branch_path() is changed to parent_path(). That suggests a full set of names based on the parent/child decomposition of a path:

* Change branch() to parent_path() * Change leaf() to child() * Change basename() to child_prefix() * Change extension() to child_extension()

At first glance, those names seem reasonable clear and self-consistent.

What's your take on that set?

Well, "parent" describes a relationship between the path being operated on and the result. "Child," on the other hand, does not. So that doesn't work for me. I would prefer "parent" and "filename." I would prefer "drop_extension" and "extension," although I rather liked Volodya's "stem" suggestion.

I like "stem" too.

Trying to put that all together:

* Change branch() to parent_path()

Nit: "parent" is better. Sticking "path" in the name of a transformation that returns a path smacks of hungarian notation.

...

* Change leaf() to file_name() * Change basename() to stem() * extension() remains extension()

The models are:

* A path is composed of a parent path and file name. * A file name is composed of a stem and an extension.

I do intend to change make stem() and extension() member functions; I want to preserve the design convention that lexical operations on paths are supplied as member functions while the filesystem operations on paths are free functions.

I can't envision any actual issues with it right now, but when you say "member function" I always wonder if it will limit genericity in some important way. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Beman Dawes

8:55 p.m.

David Abrahams wrote:

...

on Fri Jul 04 2008, Beman Dawes <bdawes-AT-acm.org> wrote:

...
David Abrahams wrote:

...
...
David Abrahams wrote:

...
FWIW, though I think it's probably a good idea to use base_name as you are suggesting, I was much less attached to the idea of using it to mean what is currently called leaf() than I was opposed to the idea of using it to mean something else, if you catch my drift :-) Ah! Understood.

...
So one other option that avoids the above issues (not that I'm pushing this route) is to pick another name for what you currently call leaf(). Let's say branch_path() is changed to parent_path(). That suggests a full set of names based on the parent/child decomposition of a path:

* Change branch() to parent_path() * Change leaf() to child() * Change basename() to child_prefix() * Change extension() to child_extension()

At first glance, those names seem reasonable clear and self-consistent.

What's your take on that set? Well, "parent" describes a relationship between the path being operated on and the result. "Child," on the other hand, does not. So that doesn't work for me. I would prefer "parent" and "filename." I would

Beman Dawes wrote: prefer "drop_extension" and "extension," although I rather liked Volodya's "stem" suggestion. I like "stem" too.

Trying to put that all together:

* Change branch() to parent_path()

Nit: "parent" is better. Sticking "path" in the name of a transformation that returns a path smacks of hungarian notation.

The point (besides consistency with some other decomposition function names) of adding _path is to emphasize that what parent_path("a/b/c") returns is "a/b" rather "a", "..", or going to the file system and finding out the parent path of "a". I'm no fan of hungarian notation!

...

...
* Change leaf() to file_name()

Are there any arguments other than personal preference for filename() vs file_name()? I was trying some code and kept writing filename(). Google code search turns up 5,980,00 hits for "filename" versus 550,000 for "file_name". So I think "filename" would win unless someone comes up with an argument otherwise. That should make you (Dave) happy as you've always spelled it "filename" in these discussions. I guess one argument is that "root_name" then looks inconsistent. I'm not worried about that enough to do anything about it --Beman

David Abrahams

9:32 p.m.

on Fri Jul 04 2008, Beman Dawes <bdawes-AT-acm.org> wrote:

...

Are there any arguments other than personal preference for filename() vs file_name()? I was trying some code and kept writing filename(). Google code search turns up 5,980,00 hits for "filename" versus 550,000 for "file_name". So I think "filename" would win unless someone comes up with an argument otherwise. That should make you (Dave) happy as you've always spelled it "filename" in these discussions.

I guess one argument is that "root_name" then looks inconsistent. I'm not worried about that enough to do anything about it

Yeah, but filename is a word, whereas rootname is not :-) http://www.google.com/search?q=define%3A+filename http://www.google.com/search?q=define%3A+rootname Cheers, -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Vladimir Prus

2:38 p.m.

Beman Dawes wrote:

...

David Abrahams wrote:

...
Beman Dawes wrote:

...
David Abrahams wrote:

...
FWIW, though I think it's probably a good idea to use base_name as you are suggesting, I was much less attached to the idea of using it to mean what is currently called leaf() than I was opposed to the idea of using it to mean something else, if you catch my drift :-) Ah! Understood.

...
So one other option that avoids the above issues (not that I'm pushing this route) is to pick another name for what you currently call leaf(). Let's say branch_path() is changed to parent_path(). That suggests a full set of names based on the parent/child decomposition of a path:

* Change branch() to parent_path() * Change leaf() to child() * Change basename() to child_prefix() * Change extension() to child_extension()

At first glance, those names seem reasonable clear and self-consistent.

What's your take on that set?

Well, "parent" describes a relationship between the path being operated on and the result. "Child," on the other hand, does not. So that doesn't work for me. I would prefer "parent" and "filename." I would prefer "drop_extension" and "extension," although I rather liked Volodya's "stem" suggestion.

I like "stem" too.

Trying to put that all together:

* Change branch() to parent_path() * Change leaf() to file_name()

Why is "file" there? Path of "a/b/c" can refer to either file, or directory. Does "file" bring undesired connotation that the path refers to file?

...

* Change basename() to stem() * extension() remains extension()

'stem' is linguistic term, whereas 'extension' is not. To be consistent, it's better to use 'suffix', not 'extension'. Furthermore, I do think we need to pay attention to Qt's suffix vs. completeSuffix distinction -- it seems useful one. - Volodya

David Abrahams

4:21 p.m.

on Fri Jul 04 2008, Vladimir Prus <vladimir-AT-codesourcery.com> wrote:

...

Beman Dawes wrote:

...
David Abrahams wrote:

...
Beman Dawes wrote:

...
David Abrahams wrote:

...
FWIW, though I think it's probably a good idea to use base_name as you are suggesting, I was much less attached to the idea of using it to mean what is currently called leaf() than I was opposed to the idea of using it to mean something else, if you catch my drift :-) Ah! Understood.

...
So one other option that avoids the above issues (not that I'm pushing this route) is to pick another name for what you currently call leaf(). Let's say branch_path() is changed to parent_path(). That suggests a full set of names based on the parent/child decomposition of a path:

* Change branch() to parent_path() * Change leaf() to child() * Change basename() to child_prefix() * Change extension() to child_extension()

At first glance, those names seem reasonable clear and self-consistent.

What's your take on that set?

Well, "parent" describes a relationship between the path being operated on and the result. "Child," on the other hand, does not. So that doesn't work for me. I would prefer "parent" and "filename." I would prefer "drop_extension" and "extension," although I rather liked Volodya's "stem" suggestion.

I like "stem" too.

Trying to put that all together:

* Change branch() to parent_path() * Change leaf() to file_name()

Why is "file" there? Path of "a/b/c" can refer to either file, or directory. Does "file" bring undesired connotation that the path refers to file?

Agreed.

...

...
* Change basename() to stem() * extension() remains extension()

'stem' is linguistic term, whereas 'extension' is not. To be consistent, it's better to use 'suffix', not 'extension'.

Yeah, but 'extension' is a file naming term. Where we can, we should use accepted terminology from the domain.

...

Furthermore, I do think we need to pay attention to Qt's suffix vs. completeSuffix distinction -- it seems useful one.

What is that? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Vladimir Prus

4:35 p.m.

David Abrahams wrote:

...

on Fri Jul 04 2008, Vladimir Prus <vladimir-AT-codesourcery.com> wrote:

...
Beman Dawes wrote:

...
David Abrahams wrote:

...
Beman Dawes wrote:

...
David Abrahams wrote:

...
FWIW, though I think it's probably a good idea to use base_name as you are suggesting, I was much less attached to the idea of using it to mean what is currently called leaf() than I was opposed to the idea of using it to mean something else, if you catch my drift :-) Ah! Understood.

...
So one other option that avoids the above issues (not that I'm pushing this route) is to pick another name for what you currently call leaf(). Let's say branch_path() is changed to parent_path(). That suggests a full set of names based on the parent/child decomposition of a path:

* Change branch() to parent_path() * Change leaf() to child() * Change basename() to child_prefix() * Change extension() to child_extension()

At first glance, those names seem reasonable clear and self-consistent.

What's your take on that set?

Well, "parent" describes a relationship between the path being operated on and the result. "Child," on the other hand, does not. So that doesn't work for me. I would prefer "parent" and "filename." I would prefer "drop_extension" and "extension," although I rather liked Volodya's "stem" suggestion.

I like "stem" too.

Trying to put that all together:

* Change branch() to parent_path() * Change leaf() to file_name()

Why is "file" there? Path of "a/b/c" can refer to either file, or directory. Does "file" bring undesired connotation that the path refers to file?

Agreed.

...
...
* Change basename() to stem() * extension() remains extension()

'stem' is linguistic term, whereas 'extension' is not. To be consistent, it's better to use 'suffix', not 'extension'.

Yeah, but 'extension' is a file naming term. Where we can, we should use accepted terminology from the domain.

As I've mentioned, "suffix" is also used in existing libraries.

...

...
Furthermore, I do think we need to pay attention to Qt's suffix vs. completeSuffix distinction -- it seems useful one.

What is that?

backup.tar.bz2 Here, you might want to look at either "top-level" file type -- ".bz2" here, or at everything after first dot -- ".tar.bz2", depending on what you want to do. - Volodya

Stefan Seefeld

4:41 p.m.

Vladimir Prus wrote:

...

backup.tar.bz2

Here, you might want to look at either "top-level" file type -- ".bz2" here, or at everything after first dot -- ".tar.bz2", depending on what you want to do.

I think it's bad enough that we have inherited this bogus semantics of a file 'extension', so let's not add to that. Users can always parse filenames with their own application-specific methods, if to them a name carries a particular meaning. Thanks, Stefan -- ...ich hab' noch einen Koffer in Berlin...

Vladimir Prus

4:47 p.m.

Stefan Seefeld wrote:

...

Vladimir Prus wrote:

...
backup.tar.bz2

Here, you might want to look at either "top-level" file type -- ".bz2" here, or at everything after first dot -- ".tar.bz2", depending on what you want to do.

I think it's bad enough that we have inherited this bogus semantics of a file 'extension', so let's not add to that.

You suggest that each and every filesystem be modified to carry mime type with each file? That would be good, but unless that happens, lots of application will have to look at extension.

...

Users can always parse filenames with their own application-specific methods, if to them a name carries a particular meaning.

As it happens, application-specific methods is very often just looking at file extension. - Volodya

Stefan Seefeld

4:58 p.m.

Vladimir Prus wrote:

...

Stefan Seefeld wrote:

...
I think it's bad enough that we have inherited this bogus semantics of a file 'extension', so let's not add to that.

You suggest that each and every filesystem be modified to carry mime type with each file? That would be good, but unless that happens, lots of application will have to look at extension.

I'm not suggesting any such modification. (Besides, this is hardly something for the filesystem to care about, at least in my understanding of the term.) See for example the 'file' manpage on modern UNIX systems for how such file-related metadata can be represented non-intrusively.

...

...
Users can always parse filenames with their own application-specific methods, if to them a name carries a particular meaning.

As it happens, application-specific methods is very often just looking at file extension.

Then let them do it, but without perpetuating such semantics in generic libraries. Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...

Vladimir Prus

5:09 p.m.

Stefan Seefeld wrote:

...

...
As it happens, application-specific methods is very often just looking at file extension.

Then let them do it, but without perpetuating such semantics in generic libraries.

I don't really think that Boost should be too clever in such an established field. Many existing libraries do have concept of extension, and rejecting to add such concept based on purity grounds will only make Boost be incompatible with people's expectations. - Volodya

David Abrahams

7:08 p.m.

on Fri Jul 04 2008, Vladimir Prus <vladimir-AT-codesourcery.com> wrote:

...

Stefan Seefeld wrote:

...
...
As it happens, application-specific methods is very often just looking at file extension.

Then let them do it, but without perpetuating such semantics in generic libraries.

I don't really think that Boost should be too clever in such an established field. Many existing libraries do have concept of extension, and rejecting to add such concept based on purity grounds will only make Boost be incompatible with people's expectations.

I agree. Strongly. With both of you :-( -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Scott McMurray

6:10 p.m.

On Fri, Jul 4, 2008 at 12:35, Vladimir Prus <vladimir@codesourcery.com> wrote:

...

...
...
Furthermore, I do think we need to pay attention to Qt's suffix vs. completeSuffix distinction -- it seems useful one.

What is that?

backup.tar.bz2

Here, you might want to look at either "top-level" file type -- ".bz2" here, or at everything after first dot -- ".tar.bz2", depending on what you want to do.

I very strongly dislike that concept. It can be done much better treating extension() and stem() like head() and tail(), and iterating. And a quick peek at my filesystem finds such things as these: /usr/portage/distfiles/boost-jam-3.1.4.tgz /usr/portage/distfiles/automake-1.8.5.tar.bz2 /usr/portage/distfiles/binutils-2.16.1-patches-1.9.tar.bz2 /usr/portage/distfiles/gcc-3.4.6-piepatches-v8.7.10.tar.bz2 /usr/portage/distfiles/linux-2.6.17-m68k-headers.patch.bz2 /usr/portage/distfiles/mozilla-firefox-2.0.0.6-fr.xpi /usr/portage/distfiles/nerolinux-3.0.1.3-x86_64.rpm /usr/portage/distfiles/NVIDIA-Linux-x86_64-100.14.09-pkg2.run Where "everything after the first dot" is *never* of use to anyone.

Beman Dawes

8:34 p.m.

Vladimir Prus wrote:

...

Beman Dawes wrote:

...

...
Trying to put that all together:

* Change branch() to parent_path() * Change leaf() to file_name()

Why is "file" there? Path of "a/b/c" can refer to either file, or directory.

A directory is just one particular type of file. Boost.Filesystem has always followed the POSIX definition of "File": "An object that can be written to, or read from, or both. A file has certain attributes, including access permissions and type. File types include regular file, character special file, block special file, FIFO special file, symbolic link, socket, and directory. Other types of files may be supported by the implementation."

...

Does "file" bring undesired connotation that the path refers to file?

That's what paths do. Refer to files.

...

...
* Change basename() to stem() * extension() remains extension()

'stem' is linguistic term, whereas 'extension' is not. To be consistent, it's better to use 'suffix', not 'extension'. Furthermore, I do think we need to pay attention to Qt's suffix vs. completeSuffix distinction -- it seems useful one.

The term "filename extension" is well established. See http://en.wikipedia.org/wiki/Filename_extension or google for "file extension". --Beman

Vladimir Prus

5 Jul 5 Jul

6:38 a.m.

Beman Dawes wrote:

...

Vladimir Prus wrote:

...
Beman Dawes wrote:

...
...
Trying to put that all together:

* Change branch() to parent_path() * Change leaf() to file_name()

Why is "file" there? Path of "a/b/c" can refer to either file, or directory.

A directory is just one particular type of file.

Boost.Filesystem has always followed the POSIX definition of "File":

"An object that can be written to, or read from, or both. A file has certain attributes, including access permissions and type. File types include regular file, character special file, block special file, FIFO special file, symbolic link, socket, and directory. Other types of files may be supported by the implementation."

Why do we follow specific definition of file in a portable library? Can we start with the most simple names -- 'parent' and 'name'? What's wrong with them, and what kind of confusion they will cause?

...

...
Does "file" bring undesired connotation that the path refers to file?

That's what paths do. Refer to files.

...
...
* Change basename() to stem() * extension() remains extension()

'stem' is linguistic term, whereas 'extension' is not. To be consistent, it's better to use 'suffix', not 'extension'. Furthermore, I do think we need to pay attention to Qt's suffix vs. completeSuffix distinction -- it seems useful one.

The term "filename extension" is well established. See http://en.wikipedia.org/wiki/Filename_extension or google for "file extension".

Oh, using wikipedia and number of google hits to establish naming sounds interesting -- you can some up with random API. Note that "file extension" has 12'000'000 hits, whereas "file suffix" has 2'000'000 hits, which does not seem to be on overwhelming difference. Also, I don't know how to get google to produce hits only relevant to names of API function in some libraries. - Volodya

Vladimir Prus

7:04 a.m.

Vladimir Prus wrote:

...

...
The term "filename extension" is well established. See http://en.wikipedia.org/wiki/Filename_extension or google for "file extension".

Oh, using wikipedia and number of google hits to establish naming sounds interesting -- you can some up with random API. Note that "file extension" has 12'000'000 hits, whereas "file suffix" has 2'000'000 hits, which does not seem to be on overwhelming difference. Also, I don't know how to get google to produce hits only relevant to names of API function in some libraries.

In fact, here's what http://en.wikipedia.org/wiki/Filename_extension has to say: A filename extension is a *suffix* to the name of a computer file - Volodya

Beman Dawes

11:18 a.m.

Vladimir Prus wrote:

...

Beman Dawes wrote:

...
Vladimir Prus wrote:

...
Beman Dawes wrote:

...
Trying to put that all together:

* Change branch() to parent_path() * Change leaf() to file_name() Why is "file" there? Path of "a/b/c" can refer to either file, or directory. A directory is just one particular type of file.

Boost.Filesystem has always followed the POSIX definition of "File":

"An object that can be written to, or read from, or both. A file has certain attributes, including access permissions and type. File types include regular file, character special file, block special file, FIFO special file, symbolic link, socket, and directory. Other types of files may be supported by the implementation."

Why do we follow specific definition of file in a portable library? Can we start with the most simple names -- 'parent' and 'name'? What's wrong with them, and what kind of confusion they will cause?

It is a question of more explicit versus less explicit.

...

...
The term "filename extension" is well established. See http://en.wikipedia.org/wiki/Filename_extension or google for "file extension".

Oh, using wikipedia and number of google hits to establish naming sounds interesting -- you can some up with random API. Note that "file extension" has 12'000'000 hits, whereas "file suffix" has 2'000'000 hits, which does not seem to be on overwhelming difference. Also, I don't know how to get google to produce hits only relevant to names of API function in some libraries.

www.google.com/codesearch and the other code search sites are useful in determining the relative popularity of names, but that's only one aspect of choosing good names, much less designing an API. --Beman

Vladimir Prus

11:54 a.m.

Beman Dawes wrote:

...

...
...
"An object that can be written to, or read from, or both. A file has certain attributes, including access permissions and type. File types include regular file, character special file, block special file, FIFO special file, symbolic link, socket, and directory. Other types of files may be supported by the implementation."

Why do we follow specific definition of file in a portable library? Can we start with the most simple names -- 'parent' and 'name'? What's wrong with them, and what kind of confusion they will cause?

It is a question of more explicit versus less explicit.

Ok, I'll vote for parent + name, then.

...

...
...
The term "filename extension" is well established. See http://en.wikipedia.org/wiki/Filename_extension or google for "file extension".

Oh, using wikipedia and number of google hits to establish naming sounds interesting -- you can some up with random API. Note that "file extension" has 12'000'000 hits, whereas "file suffix" has 2'000'000 hits, which does not seem to be on overwhelming difference. Also, I don't know how to get google to produce hits only relevant to names of API function in some libraries.

www.google.com/codesearch and the other code search sites are useful in determining the relative popularity of names, but that's only one aspect of choosing good names, much less designing an API.

Heh -- it was *you* who tried to use Google/Wikipedia as an argument in naming discussion. - Volodya

Scott McMurray

3 Jul 3 Jul

7:33 p.m.

On Thu, Jul 3, 2008 at 13:36, David Abrahams <dave@boostpro.com> wrote:

...

I'm sorry, but I don't like base_path() much. We know it is going to return a path, so the "path" part of the name doesn't tell me much about what this function does. The only thing left to go on is the meaning of "base."

Well I'm afraid we don't know what that is. The (only) good reason to use base_name() is that there is precedent; it's not because it makes particularly "good sense." You yourself find it counterintuitive, IIUC. If "base" _did_ make any sense, I would guess that base_path() was the identity function.

How about parent_directory(), containing_directory(), container_path(), container(), or parent()?

As yet another option, how about super_path? Similar to parent_path, but without the ./.. connotation that was the complaint. I have to say I still don't find base_name intuitive, precedent or not. Maybe this_name, thinking of super/this/sub?

...

...
* Change basename() to base_name_prefix(). * Change to extension() to base_name_extension().

This change is to avoid confusion with base_name().

extension was nice and clear, so p.this_name() == extensionless_name(p) + extension(p) could be a different way of approaching this. But neither the original or the proposed new names for these bother me.

...

I worry a bit about the chance of silent misbehavior due to basename/base_name spelling errors, and about what happens when both names make it into the same codebase.

Hopefully the fact that basename was namespace-scope while the proposed base_name is a member will help prevent that. Thanks for continuing this, ~ Scott

Vladimir Prus

5:37 p.m.

Beman Dawes wrote:

...

* Change leaf() to base_name(). * Change branch() to base_path().

I don't think that's very clear. In POSIX, you have dirname + basename, so the model is that file path is "directory + base". What is the meaning of "base" in "base_path"? If you have something named 'base', then you must have something that can be added to base to produce something bigger. But it appears that everything is now called 'base_xxx'. Looking at Java, the File class has the getParentFile() method, which seems fairly obvious. It also has 'getName()' method that returns last component of the path. Looking at Qt, the QFileInfo class provides path() and fileName() methods that add up to complete path. The baseName() method returns part of fileName until the first dot -- and there are functions to get suffix -- suffix() and completeSuffix(). The path() method returns QString, and there's dir() method that returns QDir. Why don't we borrow those examples and use 'directory' + 'name'?

...

The mental model for the new name is that for a path, p:

p.base_name() == base_name_prefix(p) + base_name_extension(p)

Does this actually add up? My mental model of "prefix" is that you have prefix, followed by stem, followed by suffix. You have no stem, and have extension instead of suffix. It probably should be "stem" + "suffix", or "core" + "extension", or, well "basename" + "extension". - Volodya

Beman Dawes

4 Jul 4 Jul

1:38 p.m.

Vladimir Prus wrote:

...

Beman Dawes wrote:

...
* Change leaf() to base_name(). * Change branch() to base_path().

I don't think that's very clear. In POSIX, you have dirname + basename, so the model is that file path is "directory + base". What is the meaning of "base" in "base_path"? If you have something named 'base', then you must have something that can be added to base to produce something bigger. But it appears that everything is now called 'base_xxx'.

Looking at Java, the File class has the getParentFile() method, which seems fairly obvious. It also has 'getName()' method that returns last component of the path.

Looking at Qt, the QFileInfo class provides path() and fileName() methods that add up to complete path. The baseName() method returns part of fileName until the first dot -- and there are functions to get suffix -- suffix() and completeSuffix(). The path() method returns QString, and there's dir() method that returns QDir.

Why don't we borrow those examples and use 'directory' + 'name'?

To me, "directory" implies a single directory name rather than a path. That's always been a gripe of mine when looking at other systems. But I do think the very similar 'path' + 'name' approach is what we are aiming for.

...

...
The mental model for the new name is that for a path, p:

p.base_name() == base_name_prefix(p) + base_name_extension(p)

Does this actually add up? My mental model of "prefix" is that you have prefix, followed by stem, followed by suffix. You have no stem, and have extension instead of suffix. It probably should be "stem" + "suffix", or "core" + "extension", or, well "basename" + "extension".

"stem" is an interesting suggestion. For "word stem", Wikipedia says "In linguistics, a stem (sometimes also theme) is the part of a word that is common to all its inflected variants." That pretty much describes the basename concept. --Beman

John Femiani

11:16 p.m.

Beman wrote: <Snip>

...

"stem" is an interesting suggestion. For "word stem", Wikipedia says "In linguistics, a stem (sometimes also theme) is the part of a word that is common to all its inflected variants." That pretty much describes the basename concept.

--Beman

What if the 'replace_extension' (aka 'change_extension') function just took an empy string as a default second argument? Isn't that the same as basename? Then you don't need 'base_name' or 'stem' at all. Also extension/change_extension should not be a member function since they should be defined in a separate header so that people who don't believe in extensions don't have to include them. -- John Femiani

Beman Dawes

5 Jul 5 Jul

11:05 a.m.

John Femiani wrote:

...

Beman wrote: <Snip>

...
"stem" is an interesting suggestion. For "word stem", Wikipedia says "In linguistics, a stem (sometimes also theme) is the part of a word that is common to all its inflected variants." That pretty much describes the basename concept.

--Beman

What if the 'replace_extension' (aka 'change_extension') function just took an empy string as a default second argument? Isn't that the same as basename? Then you don't need 'base_name' or 'stem' at all. Also extension/change_extension should not be a member function since they should be defined in a separate header so that people who don't believe in extensions don't have to include them.

While that may be correct in a technical sense, it is too clever by half. The point of renaming functions is to make the interface more obvious and intuitive, and using replace_extension to get the stem seems to me to go in the opposite direction. --Beman

6227

Age (days ago)

6229

Last active (days ago)

List overview

Download

41 comments

10 participants

participants (10)

Beman Dawes
Bjørn Roald
David Abrahams
John Femiani
Michael Marcin
Scott McMurray
Scott Woods
Stefan Seefeld
Steven Watanabe
Vladimir Prus