[filesystem] "leaf"

David Abrahams

19 May 2008 19 May '08

8:37 p.m.

I was just reviewing the filesystem docs and came across "leaf()". I'm sure this isn't the first time I've seen it, but this time I picked up a little semantic dissonance. Normally we think of "leaf" in the context of a tree as being a thing with no children. An interior node like a directory that has files or other directories in it is usually not called a "leaf." I wonder if this is the best possible name? Is there a precedent we can draw on in some other language/library? In python, it's os.path.basename(p). Perl, php, and the posix basename command seem to do something similar. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Show replies by date

dizzy

20 May 20 May

8:52 a.m.

On Monday 19 May 2008 23:37:25 David Abrahams wrote:

...

I was just reviewing the filesystem docs and came across "leaf()". I'm sure this isn't the first time I've seen it, but this time I picked up a little semantic dissonance. Normally we think of "leaf" in the context of a tree as being a thing with no children.

Correct.

...

An interior node like a directory that has files or other directories in it is usually not called a "leaf." I wonder if this is the best possible name?

But it's not the directory itself (identified by some path object) that is a leaf, it is the path component returned by leaf() that is the leaf of a path. Because I assume you talk about basic_path::leaf(). For any given path, the path components form the path tree and the last component is the leaf of that tree respecting the semantics you described. It's not a leaf node in another context, but there is no such context implied since we are talking about basic_path::leaf(), ie a path leaf not something else's leaf.

...

Is there a precedent we can draw on in some other language/library? In python, it's os.path.basename(p). Perl, php, and the posix basename command seem to do something similar.

"basename" could work too but I never had any confusion in my head of what leaf() means for a given path object. -- Mihai RUSU Email: dizzy@roedu.net "Linux is obsolete" -- AST

David Abrahams

21 May 21 May

6:19 a.m.

on Tue May 20 2008, dizzy <dizzy-AT-roedu.net> wrote:

...

On Monday 19 May 2008 23:37:25 David Abrahams wrote:

...
I was just reviewing the filesystem docs and came across "leaf()". I'm sure this isn't the first time I've seen it, but this time I picked up a little semantic dissonance. Normally we think of "leaf" in the context of a tree as being a thing with no children.

Correct.

...
An interior node like a directory that has files or other directories in it is usually not called a "leaf." I wonder if this is the best possible name?

But it's not the directory itself (identified by some path object) that is a leaf, it is the path component returned by leaf() that is the leaf of a path.

Paths don't have leaves. They have beginnings, endings, and middles. They are linear.

...

Because I assume you talk about basic_path::leaf(). For any given path, the path components form the path tree

Ok, technically a -> b -> c -> d is a tree, but it's a degenerate tree. That's not a very useful view, and just complicates everything conceptually.

...

and the last component is the leaf of that tree respecting the semantics you described. It's not a leaf node in another context,

It's not a leaf node in any context other than in the subtree of the actual directory tree that only consists of the listed path components.

...

but there is no such context implied since we are talking about basic_path::leaf(), ie a path leaf not something else's leaf.

That logic sounds a bit circular to me. Maybe I'm missing something. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Zach Laine

1:34 p.m.

On Wed, May 21, 2008 at 1:19 AM, David Abrahams <dave@boostpro.com> wrote:

...

[snip]

...

Paths don't have leaves. They have beginnings, endings, and middles. They are linear.

This sheds their context. The linear sequence does not exist in a vacuum. It is a sequence of nodes that defines a path through a filesystem tree. When I think of a filesystem, I think of it as (a) root node(s), interior nodes, and leaf nodes. The fact that I'm only looking at a subset of them when dealing with a given path does not change what kind of node each is conceptually. In short, I like "leaf()". Zach

Frank Mori Hess

1:44 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday 21 May 2008 09:34 am, Zach Laine wrote:

...

This sheds their context. The linear sequence does not exist in a vacuum. It is a sequence of nodes that defines a path through a filesystem tree. When I think of a filesystem, I think of it as (a) root node(s), interior nodes, and leaf nodes. The fact that I'm only looking at a subset of them when dealing with a given path does not change what kind of node each is conceptually. In short, I like "leaf()".

My impression from earlier posts, and the path decomposition table in the docs, is that leaf is a bad name because it can return an interior node in the filesystem. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFINCdN5vihyNWuA4URAhlxAKDdBTyqqmqc5kHS/A7X9R7lM1ZMPwCg4axD +/X+25d3l1Wd4c51FSRwpes= =2oXy -----END PGP SIGNATURE-----

Zach Laine

1:48 p.m.

On Wed, May 21, 2008 at 8:44 AM, Frank Mori Hess <frank.hess@nist.gov> wrote:

...

On Wednesday 21 May 2008 09:34 am, Zach Laine wrote:

...
This sheds their context. The linear sequence does not exist in a vacuum. It is a sequence of nodes that defines a path through a filesystem tree. When I think of a filesystem, I think of it as (a) root node(s), interior nodes, and leaf nodes. The fact that I'm only looking at a subset of them when dealing with a given path does not change what kind of node each is conceptually. In short, I like "leaf()".

My impression from earlier posts, and the path decomposition table in the docs, is that leaf is a bad name because it can return an interior node in the filesystem.

Then the leaf referred to is still the leaf of a subtree of the filesystem. I'm as into proper naming as the next guy. The point I'm trying to get across is that the name "leaf()" seems very natural to me. In fact, it has never given me a moment's pause. Zach

Larry Evans

4:54 p.m.

On 05/21/08 08:48, Zach Laine wrote:

...

On Wed, May 21, 2008 at 8:44 AM, Frank Mori Hess wrote:

...

...
On Wednesday 21 May 2008 09:34 am, Zach Laine wrote:

...

...
...
This sheds their context. The linear sequence does not exist in a vacuum. It is a sequence of nodes that defines a path through a filesystem tree. When I think of a filesystem, I think of it as (a) root node(s), interior nodes, and leaf nodes. The fact that I'm only looking at a subset of them when dealing with a given path does not change what kind of node each is conceptually. In short, I like "leaf()".

...

...
My impression from earlier posts, and the path decomposition table in the docs, is that leaf is a bad name because it can return an interior node in the filesystem.

...

Then the leaf referred to is still the leaf of a subtree of the filesystem. I'm as into proper naming as the next guy. The point I'm trying to get across is that the name "leaf()" seems very natural to me. In fact, it has never given me a moment's pause.

Would the term "pruned_leaf" be an acceptable compromise? The "pruned_" prefix would imply the "context" you mentioned above as well as "subset of them" [where them is nodes] mengioned above. In the case that the node was an actual leaf (e.g. an actual file) then the "size of the pruning" would be just 0, where "size of pruning" is the distance to an actual leaf (where actual leaf means the same as leaf node in this quote:

...

...
...
When I think of a filesystem, I think of it as (a) root node(s), interior nodes, and leaf nodes.

John Femiani

5:54 p.m.

...

Would the term "pruned_leaf" be an acceptable compromise? The "pruned_" prefix would imply the "context" you mentioned above as well as "subset of them" [where them is nodes] mengioned above. In the case that the node was an actual leaf (e.g. an actual file) then the "size of the pruning" would be just 0, where "size of pruning" is the distance to an actual leaf (where actual leaf means the same as leaf node in this quote:

How about deprecating branch_path and using 'head', and deprecating 'leaf' in favor of 'tail'? This replaces tree terms with list terms, since a path is a list. --John

Felipe Magno de Almeida

11:09 p.m.

On Wed, May 21, 2008 at 2:54 PM, John Femiani <JOHN.FEMIANI@asu.edu> wrote:

...

[snip]

...

How about deprecating branch_path and using 'head', and deprecating 'leaf' in favor of 'tail'? This replaces tree terms with list terms, since a path is a list.

I liked this better. I believe string_algo has a similar problem, using left and right that should be replaced for head and tail too.

...

--John

Regards, -- Felipe Magno de Almeida

Johan Råde

22 May 22 May

4:22 a.m.

Felipe Magno de Almeida wrote:

...

On Wed, May 21, 2008 at 2:54 PM, John Femiani <JOHN.FEMIANI@asu.edu> wrote:

[snip]

...
How about deprecating branch_path and using 'head', and deprecating 'leaf' in favor of 'tail'? This replaces tree terms with list terms, since a path is a list.

I liked this better. I believe string_algo has a similar problem, using left and right that should be replaced for head and tail too.

I also like head and tail. --Johan Råde

Marcus Lindblom

8:21 a.m.

John Femiani wrote:

...

How about deprecating branch_path and using 'head', and deprecating 'leaf' in favor of 'tail'? This replaces tree terms with list terms, since a path is a list.

The way I'm used to name lists (from Haskell and others), is that a list of x has a head of type x and a tail of type 'list of x'. Your version would turn that around, which would confuse me. I have no problem in seeing leaf() as the last element in the path (which is a list of names) and it's short and sweet. If one can accept that the path is a different concept from the file entry, this works very well. I think there are valid arguments for and against, and I'd hate to have this switched at this moment, when it's been used for quite a while by many apps. There are too many things changing from one boost release to another already, so you'd better make a good argument and have a clear concensus. (We're getting dangerously close to the bike-shed... :) My 0.02€. Cheers, /Marcus

John Femiani

8:54 a.m.

Marcus wrote:

...

John Femiani wrote:

...
How about deprecating branch_path and using 'head', and deprecating 'leaf' in favor of 'tail'? This replaces tree terms with list terms, since a path is a list.

The way I'm used to name lists (from Haskell and others), is that a list of x has a head of type x and a tail of type 'list of x'.

Your version would turn that around, which would confuse me.

Oops, your right. 'back' would be a lot better, given the c++ precedent (std::list).

...

(We're getting dangerously close to the bike-shed... :)

Sorry, I just wanted to try and step up to Beman's challenge:

...

If you have a better set of names, why don't you suggest them?

--Beman

-- John

Marcus Lindblom

1:03 p.m.

John Femiani wrote:

...

Marcus wrote:

...
John Femiani wrote:

...
How about deprecating branch_path and using 'head', and deprecating 'leaf' in favor of 'tail'? This replaces tree terms with list terms, since a path is a list. The way I'm used to name lists (from Haskell and others), is that a list of x has a head of type x and a tail of type 'list of x'.

Your version would turn that around, which would confuse me.

Oops, your right. 'back' would be a lot better, given the c++ precedent (std::list).

Indeed. back() is excellent. Still, does it justify breaking backwards compatibility?

...

...
(We're getting dangerously close to the bike-shed... :)

Sorry, I just wanted to try and step up to Beman's challenge:

...
If you have a better set of names, why don't you suggest them?

Ah. Well, he should've knwon better. ;-P mvh /Marcus

John Femiani

8:42 p.m.

...

...
Oops, your right. 'back' would be a lot better, given the c++ precedent (std::list).

Indeed. back() is excellent.

Still, does it justify breaking backwards compatibility?

I suggested to deprecate, rather than removing anything, in order to avoid breaking backwards compatibility. I just meant to suggest documenting it as deprecated. -- John

Johan Råde

8:58 a.m.

Marcus Lindblom wrote:

...

John Femiani wrote:

...
How about deprecating branch_path and using 'head', and deprecating 'leaf' in favor of 'tail'? This replaces tree terms with list terms, since a path is a list.

The way I'm used to name lists (from Haskell and others), is that a list of x has a head of type x and a tail of type 'list of x'.

Your version would turn that around, which would confuse me.

I have no problem in seeing leaf() as the last element in the path (which is a list of names) and it's short and sweet. If one can accept that the path is a different concept from the file entry, this works very well.

I think there are valid arguments for and against, and I'd hate to have this switched at this moment, when it's been used for quite a while by many apps. There are too many things changing from one boost release to another already, so you'd better make a good argument and have a clear concensus. (We're getting dangerously close to the bike-shed... :)

My 0.02€.

Cheers, /Marcus _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

I also assumed leaf -> head and branch_path -> tail :-) Maybe we should just drop the matter and live with branch_path and leaf. (Even if I really would prefer branch_path -> parent_directory_path) --Johan

David Abrahams

11:58 p.m.

on Wed May 21 2008, Larry Evans <cppljevans-AT-suddenlink.net> wrote:

...

On 05/21/08 08:48, Zach Laine wrote:

...
On Wed, May 21, 2008 at 8:44 AM, Frank Mori Hess wrote:

...
...
On Wednesday 21 May 2008 09:34 am, Zach Laine wrote:

...
...
...
This sheds their context. The linear sequence does not exist in a vacuum. It is a sequence of nodes that defines a path through a filesystem tree. When I think of a filesystem, I think of it as (a) root node(s), interior nodes, and leaf nodes. The fact that I'm only looking at a subset of them when dealing with a given path does not change what kind of node each is conceptually. In short, I like "leaf()".

...
...
My impression from earlier posts, and the path decomposition table in the docs, is that leaf is a bad name because it can return an interior node in the filesystem.

...
Then the leaf referred to is still the leaf of a subtree of the filesystem. I'm as into proper naming as the next guy. The point I'm trying to get across is that the name "leaf()" seems very natural to me. In fact, it has never given me a moment's pause.

Would the term "pruned_leaf" be an acceptable compromise?

Bleah; that's just more complicated and less evocative of what's actually going on. Just use the darned name that everyone else uses! Or, if you *really* can't stand "basename()", at least use a word that's conceptually appropriate like "tail()" or, heck, "back()" -- Dave Abrahams BoostPro Computing http://www.boostpro.com

David Abrahams

11:56 p.m.

on Wed May 21 2008, "Zach Laine" <whatwasthataddress-AT-gmail.com> wrote:

...

On Wed, May 21, 2008 at 1:19 AM, David Abrahams <dave@boostpro.com> wrote:

...
[snip]

...
Paths don't have leaves. They have beginnings, endings, and middles. They are linear.

This sheds their context. The linear sequence does not exist in a vacuum. It is a sequence of nodes that defines a path through a filesystem tree. When I think of a filesystem, I think of it as (a) root node(s), interior nodes, and leaf nodes. The fact that I'm only looking at a subset of them when dealing with a given path does not change what kind of node each is conceptually.

I think that proves my point. On a Unix system "/usr" is never a leaf in the filesystem. Boost.Filesystem can call it a leaf, though.

...

In short, I like "leaf()".

I don't understand why. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Beman Dawes

20 May 20 May

1:33 p.m.

David Abrahams wrote:

...

I was just reviewing the filesystem docs and came across "leaf()". I'm sure this isn't the first time I've seen it, but this time I picked up a little semantic dissonance. Normally we think of "leaf" in the context of a tree as being a thing with no children. An interior node like a directory that has files or other directories in it is usually not called a "leaf."

Right. And "leaf" never returns an interior node of a path.

...

I wonder if this is the best possible name?

The names used by the filesystem library were carefully chosen as a matched set. So you can't change a single name without making a corresponding change to the other names (like "branch") it is related to.

...

Is there a precedent we can draw on in some other language/library? In python, it's os.path.basename(p). Perl, php, and the posix basename command seem to do something similar.

The filesytem names were chosen to be an improvement over the naming used by other libraries and/or languages, which always seemed to me to be misleading. For example, my intuition is that basename("foo.bar") should yield "foo", not "foo.bar". --Beman

John Femiani

5:04 p.m.

Beman wrote:

...

...
in the context of a tree as being a thing with no children. An interior node like a directory that has files or other directories in it is usually not called a "leaf."

Right. And "leaf" never returns an interior node of a path.

...
I wonder if this is the best possible name?

The names used by the filesystem library were carefully chosen as a matched set. So you can't change a single name without making a corresponding change to the other names (like "branch") it is related to.

Is a tree a good way to describe a path? I mean, the path itself is a basically a list of names right? Many of us think of the filesystem itself as a tree (whether it is or not, at least it is a digraph). To me a path is a sequence (a list) of connected nodes in a graph (or tree), and this analogy shold hold with filesystem. A leaf is a node in a tree with no out-edges, and the last node in a path may or may not be a leaf. I think that python (since somebody mentioned that already) uses a 'head' and a 'tail' to describe what filesystem currently calls the 'branch_path' and 'leaf' of a path. That makes sense since to me since it captures the notion that a path is a sequence of names. -- John Femiani

Larry Evans

23 May 23 May

2:06 p.m.

On 05/20/08 12:04, John Femiani wrote: [snip]

...

Is a tree a good way to describe a path? I mean, the path itself is a basically a list of names right? Many of us think of the filesystem itself as a tree (whether it is or not, at least it is a digraph).

To me a path is a sequence (a list) of connected nodes in a graph (or tree), and this analogy shold hold with filesystem. A leaf is a node in a tree with no out-edges, and the last node in a path may or may not be a leaf.

This makes the most sense to me. A path, to paraphrase John, is nothing more than directions to a location in a tree. Whether the directions lead to a leaf or not depends on the "context" i.e. on the tree that's being traversed. So, I think back (or maybe last as *maybe* John was implying above with 'last node in a path') would be the best name for accessing the last component in the path.

David Abrahams

21 May 21 May

6:27 a.m.

on Tue May 20 2008, Beman Dawes <bdawes-AT-acm.org> wrote:

...

David Abrahams wrote:

...
I was just reviewing the filesystem docs and came across "leaf()". I'm sure this isn't the first time I've seen it, but this time I picked up a little semantic dissonance. Normally we think of "leaf" in the context of a tree as being a thing with no children. An interior node like a directory that has files or other directories in it is usually not called a "leaf."

Right. And "leaf" never returns an interior node of a path.

What is an "interior node" of a path? Would you talk about "interior nodes" of a std::vector<string>?

...

...
I wonder if this is the best possible name?

The names used by the filesystem library were carefully chosen as a matched set. So you can't change a single name without making a corresponding change to the other names (like "branch") it is related to.

I understand that a change may upset the apple cart, but the fact that the names are interdependent doesn't mean we shouldn't consider different ones.

...

...
Is there a precedent we can draw on in some other language/library? In python, it's os.path.basename(p). Perl, php, and the posix basename command seem to do something similar.

The filesytem names were chosen to be an improvement over the naming used by other libraries and/or languages, which always seemed to me to be misleading. For example, my intuition is that basename("foo.bar") should yield "foo", not "foo.bar".

I don't know why -- basename seems like one of those names that would be semantically void except for the precedent provided by other languages trying to do the same thing. On the face of it, it doesn't suggest anything about the extension part of the name one way or the other. In any case, the 2-argument version of basename does something like what you want in many of those other languages. I'm certainly open to persuasion, but so far, a pathname doesn't seem to resemble a tree in any conceptually useful way, and there seems to be no compelling advantage to inventing our own terminology here. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Beman Dawes

12:20 p.m.

David Abrahams wrote:

...

on Tue May 20 2008, Beman Dawes <bdawes-AT-acm.org> wrote:

...
David Abrahams wrote:

...
I was just reviewing the filesystem docs and came across "leaf()". I'm sure this isn't the first time I've seen it, but this time I picked up a little semantic dissonance. Normally we think of "leaf" in the context of a tree as being a thing with no children. An interior node like a directory that has files or other directories in it is usually not called a "leaf." Right. And "leaf" never returns an interior node of a path.

What is an "interior node" of a path? Would you talk about "interior nodes" of a std::vector<string>?

...
...
I wonder if this is the best possible name? The names used by the filesystem library were carefully chosen as a matched set. So you can't change a single name without making a corresponding change to the other names (like "branch") it is related to.

I understand that a change may upset the apple cart, but the fact that the names are interdependent doesn't mean we shouldn't consider different ones.

Sure. But given that the current names were widely discussed at the time of adoption, have been in use for quite a few years, and "basename" is already used by the library for a function with different semantics, change would be difficult.

...

...
...
Is there a precedent we can draw on in some other language/library? In python, it's os.path.basename(p). Perl, php, and the posix basename command seem to do something similar. The filesytem names were chosen to be an improvement over the naming used by other libraries and/or languages, which always seemed to me to be misleading. For example, my intuition is that basename("foo.bar") should yield "foo", not "foo.bar".

I don't know why -- basename seems like one of those names that would be semantically void except for the precedent provided by other languages trying to do the same thing. On the face of it, it doesn't suggest anything about the extension part of the name one way or the other. In any case, the 2-argument version of basename does something like what you want in many of those other languages.

I'm certainly open to persuasion, but so far, a pathname doesn't seem to resemble a tree in any conceptually useful way, and there seems to be no compelling advantage to inventing our own terminology here.

If you have a better set of names, why don't you suggest them? --Beman

David Abrahams

25 May 25 May

5:50 a.m.

on Wed May 21 2008, Beman Dawes <bdawes-AT-acm.org> wrote:

...

David Abrahams wrote:

...
on Tue May 20 2008, Beman Dawes <bdawes-AT-acm.org> wrote:

...
David Abrahams wrote:

...
I was just reviewing the filesystem docs and came across "leaf()". I'm sure this isn't the first time I've seen it, but this time I picked up a little semantic dissonance. Normally we think of "leaf" in the context of a tree as being a thing with no children. An interior node like a directory that has files or other directories in it is usually not called a "leaf." Right. And "leaf" never returns an interior node of a path.

What is an "interior node" of a path? Would you talk about "interior nodes" of a std::vector<string>?

...
...
I wonder if this is the best possible name? The names used by the filesystem library were carefully chosen as a matched set. So you can't change a single name without making a corresponding change to the other names (like "branch") it is related to.

I understand that a change may upset the apple cart, but the fact that the names are interdependent doesn't mean we shouldn't consider different ones.

Sure. But given that the current names were widely discussed at the time of adoption, have been in use for quite a few years, and "basename" is already used by the library for a function with different semantics, change would be difficult.

Well, I did bring this up almost five years ago: http://lists.boost.org/Archives/boost/2003/08/50910.php

...

...
...
...
Is there a precedent we can draw on in some other language/library? In python, it's os.path.basename(p). Perl, php, and the posix basename command seem to do something similar.

The filesytem names were chosen to be an improvement over the naming used by other libraries and/or languages, which always seemed to me to be misleading. For example, my intuition is that basename("foo.bar") should yield "foo", not "foo.bar".

I don't know why -- basename seems like one of those names that would be semantically void except for the precedent provided by other languages trying to do the same thing. On the face of it, it doesn't suggest anything about the extension part of the name one way or the other. In any case, the 2-argument version of basename does something like what you want in many of those other languages.

I'm certainly open to persuasion, but so far, a pathname doesn't seem to resemble a tree in any conceptually useful way, and there seems to be no compelling advantage to inventing our own terminology here.

If you have a better set of names, why don't you suggest them?

You asked for it. I'm going in the order given by http://www.ibm.com/developerworks/aix/library/au-boostfs/ because that's reasonably comprehensive and readable even though it looks like it may have some serious errors. Don't have time to pore through the full reference right now. path members: const std::string& string( ): OK std::string root_directory( ): OK, but maybe should return boost::optional<path>. I wonder why we decay to std::string so eagerly. std::string root_name( ): OK, but maybe should be called "root" and return boost::optional<path> std::string leaf( ): should be basename(). back(), tail() and p.split()[1].string() are viable alternatives std::string branch_path( ): should be dirname() bool empty( ): OK, although this seems like a really uninteresting question to ask. iterator: OK operator/: I've liked that one ever since I came up with it ;-) The rest of the names look OK to me except for is_regular, which should be is_file. That name seems overthought and "regular" has all kinds of other connotations. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Beman Dawes

29 May 29 May

3:10 p.m.

David Abrahams wrote:

...

on Wed May 21 2008, Beman Dawes <bdawes-AT-acm.org> wrote:

...
David Abrahams wrote:

...
on Tue May 20 2008, Beman Dawes <bdawes-AT-acm.org> wrote:

...
David Abrahams wrote:

...
I was just reviewing the filesystem docs and came across "leaf()". I'm sure this isn't the first time I've seen it, but this time I picked up a little semantic dissonance. Normally we think of "leaf" in the context of a tree as being a thing with no children. An interior node like a directory that has files or other directories in it is usually not called a "leaf." Right. And "leaf" never returns an interior node of a path. What is an "interior node" of a path? Would you talk about "interior nodes" of a std::vector<string>?

...
...
I wonder if this is the best possible name? The names used by the filesystem library were carefully chosen as a matched set. So you can't change a single name without making a corresponding change to the other names (like "branch") it is related to. I understand that a change may upset the apple cart, but the fact that the names are interdependent doesn't mean we shouldn't consider different ones. Sure. But given that the current names were widely discussed at the time of adoption, have been in use for quite a few years, and "basename" is already used by the library for a function with different semantics, change would be difficult.

Well, I did bring this up almost five years ago: http://lists.boost.org/Archives/boost/2003/08/50910.php

...
...
...
...
Is there a precedent we can draw on in some other language/library? In python, it's os.path.basename(p). Perl, php, and the posix basename command seem to do something similar. The filesytem names were chosen to be an improvement over the naming used by other libraries and/or languages, which always seemed to me to be misleading. For example, my intuition is that basename("foo.bar") should yield "foo", not "foo.bar". I don't know why -- basename seems like one of those names that would be semantically void except for the precedent provided by other languages trying to do the same thing. On the face of it, it doesn't suggest anything about the extension part of the name one way or the other. In any case, the 2-argument version of basename does something like what you want in many of those other languages.

I'm certainly open to persuasion, but so far, a pathname doesn't seem to resemble a tree in any conceptually useful way, and there seems to be no compelling advantage to inventing our own terminology here. If you have a better set of names, why don't you suggest them?

You asked for it. I'm going in the order given by http://www.ibm.com/developerworks/aix/library/au-boostfs/ because that's reasonably comprehensive and readable even though it looks like it may have some serious errors. Don't have time to pore through the full reference right now.

path members:

const std::string& string( ): OK

std::string root_directory( ): OK, but maybe should return boost::optional<path>. I wonder why we decay to std::string so eagerly.

std::string root_name( ): OK, but maybe should be called "root" and return boost::optional<path>

Logically, the root is made up of the root_name() and the root_directory(). If you change the name of root_directory() to root(), what do you call the combination of root_name() and root_directory()? As far as other aspects of the interface, like the return type, I want to revisit the whole design once C++0x stabilizes and there is a compiler available with more C++0x features to experiment with.

...

std::string leaf( ): should be basename(). back(), tail() and p.split()[1].string() are viable alternatives

One problem with basename is that it is already used by one of the convenience functions. Another is that I find it misleading. An alternative set I'd be comfortable with would be tail() for the current leaf() and head_path() for the current branch_path().

...

std::string branch_path( ): should be dirname()

That breaks the naming scheme; _path is uniformly used to signal that that the return potentially contains a path rather than just a single name. It is also misleading in that the return is often a path rather than just a single directory name, and for "c:" on Windows isn't a directory name at all. One of my frustrations with similar libraries has always been their misleading function names. And that's really your point about leaf() and branch_path(); you find them misleading. That's a concern, and why I'm willing to consider renaming them. But not to a set of names that is misleading to a different set of people, me included.

...

bool empty( ): OK, although this seems like a really uninteresting question to ask.

iterator: OK

operator/: I've liked that one ever since I came up with it ;-)

Yeah, I like it too, although some people find it too cute. I had a complaint recently from a new user that the append functionality was not supported; turned out he never read the docs for operator/ because he assumed operator/ couldn't possibly be what he was looking for.

...

The rest of the names look OK to me except for is_regular, which should be is_file. That name seems overthought and "regular" has all kinds of other connotations.

I'm not particularly fond of is_regular either. The problem with is_file is that some people argue it should be true for directories. I could get talked into is_file, if others support that and will help dealing with those who think of directories as files. Or maybe some other name could be found. is_regular_file()? Although longer, that seems clearer. --Beman

Frank Mori Hess

3:28 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday 29 May 2008 11:10 am, Beman Dawes wrote:

...

...
The rest of the names look OK to me except for is_regular, which should be is_file. That name seems overthought and "regular" has all kinds of other connotations.

I'm not particularly fond of is_regular either. The problem with is_file is that some people argue it should be true for directories. I could get talked into is_file, if others support that and will help dealing with those who think of directories as files. Or maybe some other name could be found. is_regular_file()? Although longer, that seems clearer.

I'd go with is_regular_file over is_file. In unix, everything is a file: regular files, directories, fifos, character and block devices, sockets, etc (I assume is_regular returns false for most of those?). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFIPsuH5vihyNWuA4URApwKAJ41hFqt6HiXtCIfObOkqoR/PpVF3wCfalVt /Ujnywts8UIbUpJP+xXaCEo= =7WWN -----END PGP SIGNATURE-----

Beman Dawes

3:33 p.m.

Beman Dawes wrote:

...

...An alternative set I'd be comfortable with would be tail() for the current leaf() and head_path() for the current branch_path().

Hum... After sending that, I read Johan Råde's "parent" suggestion. I really like parent_path() for the current branch_path(). That leaves leaf(). tail() is arguably slightly better, but not by a lot. Other possibilities: leaf() // cryptic tail() // slightly better right() // cryptic rightmost() // better rightmost_name() // very explicit, but longish --Beman PS: In any event, the library will retain the old names as synonyms, so no existing code will break. Semantics remain unchanged.

John Femiani

3:45 p.m.

...

That leaves leaf(). tail() is arguably slightly better, but not by a lot. Other possibilities:

leaf() // cryptic tail() // slightly better right() // cryptic rightmost() // better rightmost_name() // very explicit, but longish

--Beman

Also 'resource' 'resource_name' 'child' 'child_name' 'current_name' 'basename_and_extension' 'back' Right? -- John

Bjørn Roald

9:04 p.m.

John Femiani wrote:

...

Also

'resource' 'resource_name' 'child' 'child_name'

Hm, regarding parent and child suggestions, considering a path may be something like this: ../../a/b what end is child and parent? Rather than just complaining I throw in what I find natural. I have not really considered how this would effect current users or other useful conventions, so please ignore all that is useless. Otherwise I am glad if it helps. For me the path is something that describe a path *from* somewhere in the graph/tree *to* somewhere which most often is somewhere else in the same graph/tree. It describe a directional traversal between connected nodes in the graph. If the *from* node are special well known places in the graph we have special cases for the interpetation of the file traversal, such as a for the file system root, web server document root and so forth. STL begin() and end() for iterators and front() and back() for access to the elements at each end of the path come to my mind. However which is front and which is back()? When i backtrack a path I, in my head, move toward the front -- arghhh --- so maybe the simple and plain "from" and "to" are best after all. Let us try: path p1("a/b/c"); path p2("a/b/c/d.tar.gz") p1.from_name() == "a"; // path is from directory called "a" p1.to_name() == "c" // path goes to file or directory called "c" p2.to.name() == "d.tar.gz" p1.name() == "c" // name of last node p1.from() == path("a"); // path to first node in p p1.from().name() == p.from_name() p1.to().name().ext() == "" p1.to_name().ext() == "" p1.name_ext() == "" p2.to().name().ext() == ".tar.gz" p2.to_name().ext() == ".tar.gz" p2.to().name_ext() == ".tar.gz" p2.name_ext() == ".tar.gz" p1.to().name().base() == "c" p2.to().name().base() == "d" p2.name_base() == "d" p1.to() == path("a/b/c") // path to last node in p; p == p.to() ??? p1.to().name() == p.to_name() p1.file() == path("a/b/c") // even though a may be a directory ???? in unix all is files :-) p1.file_name() == "c" // only node in p that *may* be a file, a good (but taken) alternative here is basename() p1.file_name() == p1.file().name() p1.file_path() == "a/b/c" // probably not very useful as p == p.filepath() p1.dir_name() == "b" // first directory backtracking path from p:to() p1.dir() == path("a/b") // shorten path by 1 p1.dir().dir() == path("a") // shorten path by 2 *p1.begin() == p1.from() *--p1.end() == p1.to() *(p1.end()-2) == p1.dir() (p1.end()-2)->name() == "b"; This may only feel right if we think of the result of from() and to() as absolute locations, i.e. something more like absolute paths. And that may make sense here, I am not sure. However, if we think of it in context of the path object it operates on rather than the file system the path object may be associated with, it make sense to me. Also, files may only be referred in the path::to() node, the path::from() node and all intermediate nodes in the path traversal must be directories. Symbolic links are just an edge to, or alias to, a directory if it is internal in a valid path. So path::file() may logically be the same as path::to() even if it is not connected to a filesystem with absolute location of the p.from() node defined. Hence, checking on validity of path in general and if last node is a file is not feasible. Operations on a filesystem using a path object may return an invalid value if there is a directory at path::to() location in the filesystem, but that is after the path is put in specific context. -- Bjørn

Scott McMurray

4:22 p.m.

On Thu, May 29, 2008 at 11:33 AM, Beman Dawes <bdawes@acm.org> wrote:

...

Beman Dawes wrote:

...
...An alternative set I'd be comfortable with would be tail() for the current leaf() and head_path() for the current branch_path().

Hum... After sending that, I read Johan Råde's "parent" suggestion.

I really like parent_path() for the current branch_path().

I do too.

...

That leaves leaf(). tail() is arguably slightly better, but not by a lot. Other possibilities:

I really dislike tail() for this purpose, since as far as I'm concerned, tail is a list term that returns a list (or the empty list), so having a tail function that returns a single element is even less intuitive than leaf. The name I usually see for that is last, which isn't all that good, with init for the parent directory part. Using snoc for operator/ is definitely not a good plan. How about p == p.parent_path() / p.current_name()? And while we're on the subject of decomposition, would it be worthwhile to add decomposition from the front in terms of an uncomplete() function that'd give a relative path from one full path to another? uncomplete(/foo/new, /foo/bar) => ../new I think that complete is the only function without a corresponding inverse. (The name could obviously be improved. incomplete sounds like a property, so I don't like it. it'd be nice to use relative, but that conflicts conceptually with .relative_path(). Perhaps relative_path could use a different name too? I usually thing of a relative path as relative to some folder, not to the root, even on windows. local_path is ok, if not great. p == p.root_path() / p.local_path() is reasonable.)

Beman Dawes

30 May 30 May

12:53 p.m.

Scott McMurray wrote:

...

On Thu, May 29, 2008 at 11:33 AM, Beman Dawes <bdawes@acm.org> wrote:

...
Beman Dawes wrote:

...
...An alternative set I'd be comfortable with would be tail() for the current leaf() and head_path() for the current branch_path(). Hum... After sending that, I read Johan Råde's "parent" suggestion.

I really like parent_path() for the current branch_path().

I do too.

...
That leaves leaf(). tail() is arguably slightly better, but not by a lot. Other possibilities:

I really dislike tail() for this purpose, since as far as I'm concerned, tail is a list term that returns a list (or the empty list), so having a tail function that returns a single element is even less intuitive than leaf.

The name I usually see for that is last, which isn't all that good, with init for the parent directory part. Using snoc for operator/ is definitely not a good plan.

How about p == p.parent_path() / p.current_name()?

That's a nice way to describe the relationships. Thanks! "current_name" doesn't convey the right connotation to me. What about p == p.parent_path() / p.last_name() ? That seems pretty good to me.

...

And while we're on the subject of decomposition, would it be worthwhile to add decomposition from the front in terms of an uncomplete() function that'd give a relative path from one full path to another? uncomplete(/foo/new, /foo/bar) => ../new

I think that complete is the only function without a corresponding inverse.

Do you have a compelling use case beyond complete not having an inverse? Thanks for the suggestions, --Beman

Scott McMurray

1:38 p.m.

On Fri, May 30, 2008 at 8:53 AM, Beman Dawes <bdawes@acm.org> wrote:

...

Scott McMurray wrote:

...
The name I usually see for that is last, which isn't all that good, with init for the parent directory part. Using snoc for operator/ is definitely not a good plan.

How about p == p.parent_path() / p.current_name()?

That's a nice way to describe the relationships. Thanks!

"current_name" doesn't convey the right connotation to me.

What about p == p.parent_path() / p.last_name() ?

That seems pretty good to me.

Current does seem stateful or time-dependant, so I agree that it's weak. I'm happy enough with last_name, though it does make me think of first_name and middle_initial ;) Though since containers use back(), not last(), perhaps it should be back_name()?

...

...
And while we're on the subject of decomposition, would it be worthwhile to add decomposition from the front in terms of an uncomplete() function that'd give a relative path from one full path to another? uncomplete(/foo/new, /foo/bar) => ../new

I think that complete is the only function without a corresponding inverse.

Do you have a compelling use case beyond complete not having an inverse?

Any time you get a full path (from, say, an open file dialog) and you'd rather have the relative one to save, so that it won't break when moved around. An IDE, for example, would prefer a path relative to the project file's directory, so that you could check it out of SVN to whatever folder. ~ Scott

Beman Dawes

3:40 p.m.

Scott McMurray wrote:

...

On Fri, May 30, 2008 at 8:53 AM, Beman Dawes <bdawes@acm.org> wrote:

...
Scott McMurray wrote:

...

...
...
And while we're on the subject of decomposition, would it be worthwhile to add decomposition from the front in terms of an uncomplete() function that'd give a relative path from one full path to another? uncomplete(/foo/new, /foo/bar) => ../new

I think that complete is the only function without a corresponding inverse. Do you have a compelling use case beyond complete not having an inverse?

Any time you get a full path (from, say, an open file dialog) and you'd rather have the relative one to save, so that it won't break when moved around. An IDE, for example, would prefer a path relative to the project file's directory, so that you could check it out of SVN to whatever folder.

That seems compelling. Please submit a feature request to trac. Thanks, --Beman

Scott McMurray

6:08 p.m.

On Fri, May 30, 2008 at 11:40 AM, Beman Dawes <bdawes@acm.org> wrote:

...

Scott McMurray wrote:

...
On Fri, May 30, 2008 at 8:53 AM, Beman Dawes <bdawes@acm.org> wrote:

...
Scott McMurray wrote:

...
...
...
And while we're on the subject of decomposition, would it be worthwhile to add decomposition from the front in terms of an uncomplete() function that'd give a relative path from one full path to another? uncomplete(/foo/new, /foo/bar) => ../new

I think that complete is the only function without a corresponding inverse. Do you have a compelling use case beyond complete not having an inverse?

Any time you get a full path (from, say, an open file dialog) and you'd rather have the relative one to save, so that it won't break when moved around. An IDE, for example, would prefer a path relative to the project file's directory, so that you could check it out of SVN to whatever folder.

That seems compelling. Please submit a feature request to trac.

Submitted: http://svn.boost.org/trac/boost/ticket/1976 The obvious implentation complication is that if /foo/bar is a symlink, then trying to put the path back together with complete(../new, /foo/bar) will give /foo/bar/../new, which might not actually be /foo/new.

Martin Wille

29 May 29 May

5:06 p.m.

Beman Dawes wrote:

...

Beman Dawes wrote:

...
...An alternative set I'd be comfortable with would be tail() for the current leaf() and head_path() for the current branch_path().

Hum... After sending that, I read Johan Råde's "parent" suggestion.

I really like parent_path() for the current branch_path().

Sorry for the bikeshedding, but I really don't like parent_path. The problem I see is related to symbolic links. "parent" suggests a parent directory, even though parent_path() might return something that is not the parent directory of a link target. I think the naming should reflect the fact that we're operating on names only, not on an actual filesystem structure. So, IMHO, "basename" is a better choice than anything containing "parent". A sketch might help to understand the situation; consider this structure: /x: drwxr-xr-x 2 root root 72 2008-05-29 18:52 left drwxr-xr-x 3 root root 72 2008-05-29 18:52 right1 /x/left: -rw-r--r-- 1 root root 5 2008-05-29 18:52 foo /x/right1: drwxr-xr-x 2 root root 72 2008-05-29 18:55 right2 /x/right1/right2: lrwxrwxrwx 1 root root 10 2008-05-29 18:55 bar -> ../../left basename(/x/right1/right2/bar) will return /x/right1/right2 open(basename(/x/right1/right2/bar)) will open something different than open((/x/right1/right2/bar/..)), as can be demonstrated by using ls:

...

ls -l /x/right1/right2 lrwxrwxrwx 1 root root 10 2008-05-29 18:55 bar -> ../../left

...

ls -l /x/right1/right2/bar/.. drwxr-xr-x 2 root root 72 2008-05-29 18:52 left drwxr-xr-x 3 root root 72 2008-05-29 18:52 right1

So, "basename()" and "parent_of()" should be considered operations on different domains (path names vs filesystem). Regards, m

Scott McMurray

5:36 p.m.

On Thu, May 29, 2008 at 1:06 PM, Martin Wille <mw8329@yahoo.com.au> wrote:

...

Sorry for the bikeshedding, but I really don't like parent_path. The problem I see is related to symbolic links. "parent" suggests a parent directory, even though parent_path() might return something that is not the parent directory of a link target. I think the naming should reflect the fact that we're operating on names only, not on an actual filesystem structure. So, IMHO, "basename" is a better choice than anything containing "parent".

[...]

So, "basename()" and "parent_of()" should be considered operations on different domains (path names vs filesystem).

I'd say the problem you mention needs only be considered in the filesystem case where you have the interpretation of a path as referring to a symlink. If we're just considering the path, then the parent of the path /x/right1/right2/bar is /x/right1/right2. If you want the filesystem interpretation, then you want path("/x/right1/right2/bar")/"..", which will act equivalently to /x in operation functions. I think, as a decomposition function, the name parent is sufficiently clear.

David Abrahams

30 May 30 May

11:31 p.m.

on Thu May 29 2008, Martin Wille <mw8329-AT-yahoo.com.au> wrote:

...

Beman Dawes wrote:

...
Beman Dawes wrote:

...
...An alternative set I'd be comfortable with would be tail() for the current leaf() and head_path() for the current branch_path().

Hum... After sending that, I read Johan Råde's "parent" suggestion.

I really like parent_path() for the current branch_path().

Sorry for the bikeshedding, but I really don't like parent_path. The problem I see is related to symbolic links. "parent" suggests a parent directory, even though parent_path() might return something that is not the parent directory of a link target. I think the naming should reflect the fact that we're operating on names only, not on an actual filesystem structure. So, IMHO, "basename" is a better choice than anything containing "parent".

Is this yet another alternate meaning for "basename?" Nobody has suggested using it to drop the last path element before, AFAIK. I would make the same argument with you as I have with Beman: "basename" has an accepted meaning in the domain of filesystem APIs. We can't use the same name to mean something different. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Johan Råde

29 May 29 May

7:08 p.m.

Beman Dawes wrote:

...

Beman Dawes wrote:

[snip]

...

That leaves leaf(). tail() is arguably slightly better, but not by a lot. Other possibilities:

leaf() // cryptic tail() // slightly better right() // cryptic rightmost() // better rightmost_name() // very explicit, but longish

In functional programming you often build a list by appending one element at a time. The word head is then used to refer to the last added element, and tail to the rest of the list. This suggests that the leaf is the head, not the tail. Since different people have different intuition about what is the head and what is the tail, I think the terms head and tail should not be used in this context. Here are two other possible alternatives to leaf: name() own_name() --Johan

dherring＠ll.mit.edu

9:14 p.m.

More fodder: (I didn't check that this is 100% compatible with the current API.) this/is/stuff/last.my.name = path this/is/stuff = path.dirname() last.my.name = path.basename() last.my = path.basename(".name") .name = path.suffix() this/is/stuff/ = path this/is = path.dirname() stuff/ = path.basename() / = path.suffix() From Perl's File::Basename, It is guaranteed that # Where $path_separator is / for Unix, \ for Windows, etc... dirname($path) . $path_separator . basename($path); is equivalent to the original path for all systems but VMS. - Daniel On Thu, 29 May 2008, Johan Råde wrote:

...

Beman Dawes wrote:

...
Beman Dawes wrote:

[snip]

...
That leaves leaf(). tail() is arguably slightly better, but not by a lot. Other possibilities:

leaf() // cryptic tail() // slightly better right() // cryptic rightmost() // better rightmost_name() // very explicit, but longish

In functional programming you often build a list by appending one element at a time. The word head is then used to refer to the last added element, and tail to the rest of the list. This suggests that the leaf is the head, not the tail. Since different people have different intuition about what is the head and what is the tail, I think the terms head and tail should not be used in this context.

Here are two other possible alternatives to leaf:

name() own_name()

--Johan

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

David Abrahams

30 May 30 May

10:49 p.m.

on Thu May 29 2008, Beman Dawes <bdawes-AT-acm.org> wrote:

...

Beman Dawes wrote:

...
...An alternative set I'd be comfortable with would be tail() for the current leaf() and head_path() for the current branch_path().

Hum... After sending that, I read Johan Råde's "parent" suggestion.

I really like parent_path() for the current branch_path().

That leaves leaf(). tail() is arguably slightly better, but not by a lot. Other possibilities:

leaf() // cryptic tail() // slightly better right() // cryptic rightmost() // better rightmost_name() // very explicit, but longish

Actually, "name()" would be just fine. Just ask yourself, "what is the name of this file/directory?" without thinking too hard about it and the answer is always what is currently called "leaf()." I don't believe names need to be chosen such that they shield the user from overthinking things :-) -- Dave Abrahams BoostPro Computing http://www.boostpro.com

David Abrahams

8:51 p.m.

on Thu May 29 2008, Beman Dawes <bdawes-AT-acm.org> wrote:

...

...
std::string leaf( ): should be basename(). back(), tail() and p.split()[1].string() are viable alternatives

One problem with basename is that it is already used by one of the convenience functions.

Yes, but seriously, you can't keep using basename to signify something different about files than it already does in python,perl,ruby,posix sh,and probably a bunch of other things. That's just wrong in principle, no matter how misleading you find the name. It's a term of art, now, like RAII.

...

Another is that I find it misleading.

I would ask why, but I really think that's beside the point. It's like saying you want to rename the hippocampus because it makes you think of universities for large, dangerous, waterborne mammals. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Johan Råde

21 May 21 May

8:30 a.m.

David Abrahams wrote:

...

I was just reviewing the filesystem docs and came across "leaf()". I'm sure this isn't the first time I've seen it, but this time I picked up a little semantic dissonance. Normally we think of "leaf" in the context of a tree as being a thing with no children. An interior node like a directory that has files or other directories in it is usually not called a "leaf." I wonder if this is the best possible name?

Is there a precedent we can draw on in some other language/library? In python, it's os.path.basename(p). Perl, php, and the posix basename command seem to do something similar.

David is absolutely right. The name leaf is the result of conceptual confusion, failure to distinguish between the tree structure of the file system and the linear structure of a path. If possible, at this late stage, the names should be changed. --Johan Råde

Beman Dawes

12:31 p.m.

Johan Råde wrote:

...

David Abrahams wrote:

...
I was just reviewing the filesystem docs and came across "leaf()". I'm sure this isn't the first time I've seen it, but this time I picked up a little semantic dissonance. Normally we think of "leaf" in the context of a tree as being a thing with no children. An interior node like a directory that has files or other directories in it is usually not called a "leaf." I wonder if this is the best possible name?

Is there a precedent we can draw on in some other language/library? In python, it's os.path.basename(p). Perl, php, and the posix basename command seem to do something similar.

David is absolutely right.

The name leaf is the result of conceptual confusion, failure to distinguish between the tree structure of the file system and the linear structure of a path.

If possible, at this late stage, the names should be changed.

Why don't you and Dave come up with a proposed set of names? The current names are: root_path root_name root_directory relative_path leaf branch_path basename extension For examples, see http://www.boost.org/doc/libs/1_35_0/libs/filesystem/doc/reference.html#Path... --Beman

Johan Råde

1:11 p.m.

Beman Dawes wrote:

...

Johan Råde wrote:

...
David Abrahams wrote:

...
I was just reviewing the filesystem docs and came across "leaf()". I'm sure this isn't the first time I've seen it, but this time I picked up a little semantic dissonance. Normally we think of "leaf" in the context of a tree as being a thing with no children. An interior node like a directory that has files or other directories in it is usually not called a "leaf." I wonder if this is the best possible name?

Is there a precedent we can draw on in some other language/library? In python, it's os.path.basename(p). Perl, php, and the posix basename command seem to do something similar.

David is absolutely right.

The name leaf is the result of conceptual confusion, failure to distinguish between the tree structure of the file system and the linear structure of a path.

If possible, at this late stage, the names should be changed.

Why don't you and Dave come up with a proposed set of names? The current names are:

root_path root_name root_directory relative_path leaf branch_path basename extension

I think all these names are fine except "leaf" and "branch_path". If anyone asked me "What does the function branch_path return?", I would answer "The parent directory." So why not call it parent_directory_path? --Johan

Beman Dawes

29 May 29 May

3:13 p.m.

Johan Råde wrote:

...

Beman Dawes wrote:

...
Johan Råde wrote:

...
David Abrahams wrote:

...
I was just reviewing the filesystem docs and came across "leaf()". I'm sure this isn't the first time I've seen it, but this time I picked up a little semantic dissonance. Normally we think of "leaf" in the context of a tree as being a thing with no children. An interior node like a directory that has files or other directories in it is usually not called a "leaf." I wonder if this is the best possible name?

Is there a precedent we can draw on in some other language/library? In python, it's os.path.basename(p). Perl, php, and the posix basename command seem to do something similar.

David is absolutely right.

The name leaf is the result of conceptual confusion, failure to distinguish between the tree structure of the file system and the linear structure of a path.

If possible, at this late stage, the names should be changed. Why don't you and Dave come up with a proposed set of names? The current names are:

root_path root_name root_directory relative_path leaf branch_path basename extension

I think all these names are fine except "leaf" and "branch_path". If anyone asked me "What does the function branch_path return?", I would answer "The parent directory." So why not call it parent_directory_path?

Or just parent_path(). That's an interesting suggestion, thanks! --Beman

John Femiani

3:36 p.m.

...

...
...
...
If possible, at this late stage, the names should be changed. Why don't you and Dave come up with a proposed set of names? The current names are:

root_path root_name root_directory relative_path leaf branch_path basename extension

I think all these names are fine except "leaf" and "branch_path". If anyone asked me "What does the function branch_path return?", I would answer "The parent directory." So why not call it parent_directory_path?

Or just parent_path(). That's an interesting suggestion, thanks!

FYI: http://en.wikipedia.org/wiki/Filename I think it is interesting that the wikipedia community distinguishes windows filenames from unix filenames, so that on unix systems the 'extension' is part of the name but on windows systems it is not. -- John

David Abrahams

30 May 30 May

10:51 p.m.

on Thu May 29 2008, John Femiani <JOHN.FEMIANI-AT-asu.edu> wrote:

...

...
...
...
...
If possible, at this late stage, the names should be changed. Why don't you and Dave come up with a proposed set of names? The current names are:

root_path root_name root_directory relative_path leaf branch_path basename extension

I think all these names are fine except "leaf" and "branch_path". If anyone asked me "What does the function branch_path return?", I would answer "The parent directory." So why not call it parent_directory_path?

Or just parent_path(). That's an interesting suggestion, thanks!

FYI:

http://en.wikipedia.org/wiki/Filename

I think it is interesting that the wikipedia community distinguishes windows filenames from unix filenames, so that on unix systems the 'extension' is part of the name but on windows systems it is not.

The latter is "true" in some sense from the user's perspective, or at least Windows tries to present that illusion. But every programmer quickly learns that the extension *is* really part of the file's name on Windows. We're presenting names for programmers, not GUI users. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

John Femiani

11:43 p.m.

David Abrahams wrote:

...

...
FYI:

http://en.wikipedia.org/wiki/Filename

I think it is interesting that the wikipedia community

distinguishes

...
windows filenames from unix filenames, so that on unix systems the 'extension' is part of the name but on windows systems it is not.

The latter is "true" in some sense from the user's perspective, or at least Windows tries to present that illusion. But every programmer quickly learns that the extension *is* really part of the file's name on Windows. We're presenting names for programmers, not GUI users.

Really? It is definately part of the path on windows but... http://msdn.microsoft.com/en-us/library/e737s6tf(VS.80).aspx Anyhow I liked your suggestion to just use 'name()', I vote for that one. -- John

Bjørn Roald

31 May 31 May

6:05 a.m.

David Abrahams wrote:

...

on Thu May 29 2008, John Femiani <JOHN.FEMIANI-AT-asu.edu> wrote:

...
...
...
...
...
If possible, at this late stage, the names should be changed.

Why don't you and Dave come up with a proposed set of names? The current names are:

root_path root_name root_directory relative_path leaf branch_path basename extension

I think all these names are fine except "leaf" and "branch_path". If anyone asked me "What does the function branch_path return?", I would answer "The parent directory." So why not call it parent_directory_path?

Or just parent_path(). That's an interesting suggestion, thanks!

FYI:

http://en.wikipedia.org/wiki/Filename

I think it is interesting that the wikipedia community distinguishes windows filenames from unix filenames, so that on unix systems the 'extension' is part of the name but on windows systems it is not.

The latter is "true" in some sense from the user's perspective, or at least Windows tries to present that illusion. But every programmer quickly learns that the extension *is* really part of the file's name on Windows. We're presenting names for programmers, not GUI users.

Agree, The Windows system API tell the truth about what a file name is in a file system under Windows, not the GUI. One can also say windows is not very consistent in making this illusion in the GUI. You can set up explorer to show the extension as part of .... the filename. Also what about file open/close dialogs in most applications? You can also look at it this way. If it is an extension, what does it extend? ... the filename, right! So if the filename is extended with an extension, what is the extension part of? ... the filename, right! -- Bjørn

Vladimir Prus

21 May 21 May

1:22 p.m.

Johan Råde wrote:

...

David Abrahams wrote:

...
I was just reviewing the filesystem docs and came across "leaf()". I'm sure this isn't the first time I've seen it, but this time I picked up a little semantic dissonance. Normally we think of "leaf" in the context of a tree as being a thing with no children. An interior node like a directory that has files or other directories in it is usually not called a "leaf." I wonder if this is the best possible name?

Is there a precedent we can draw on in some other language/library? In python, it's os.path.basename(p). Perl, php, and the posix basename command seem to do something similar.

David is absolutely right.

The name leaf is the result of conceptual confusion, failure to distinguish between the tree structure of the file system and the linear structure of a path.

If possible, at this late stage, the names should be changed.

<another universe mode> Of course, since boost.filesystem is used by exactly zero real-world projects right now (because nobody was able to grok the meaning of 'leaf'), it's OK to change the names to more sane ones. </another universe mode> <this universe mode> Given that boost.filesystem appears to be highly popular, and apparently users don't care about conceptual clarify of 'leaf', changing those names will basically cause everybody to change, or conditionally change, their code, without any practical benefit. </this universe mode> - Volodya

Frank Mori Hess

1:35 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wednesday 21 May 2008 09:22 am, Vladimir Prus wrote:

...

<another universe mode> Of course, since boost.filesystem is used by exactly zero real-world projects right now (because nobody was able to grok the meaning of 'leaf'), it's OK to change the names to more sane ones. </another universe mode>

<this universe mode> Given that boost.filesystem appears to be highly popular, and apparently users don't care about conceptual clarify of 'leaf', changing those names will basically cause everybody to change, or conditionally change, their code, without any practical benefit. </this universe mode>

Why couldn't the name leaf just be deprecated but kept in the library for old code to keep working? I'm only casually browsing this thread, but backwards compatibility in this case doesn't seem like that big a problem. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFINCU+5vihyNWuA4URAui4AKDJeDEECwaYikzkLuLF9SQnaZ8YvwCgoMoN PFPBPrmA/RRBhLZWdidItlw= =826F -----END PGP SIGNATURE-----

Vladimir Prus

1:51 p.m.

Frank Mori Hess wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Wednesday 21 May 2008 09:22 am, Vladimir Prus wrote:

...
<another universe mode> Of course, since boost.filesystem is used by exactly zero real-world projects right now (because nobody was able to grok the meaning of 'leaf'), it's OK to change the names to more sane ones. </another universe mode>

<this universe mode> Given that boost.filesystem appears to be highly popular, and apparently users don't care about conceptual clarify of 'leaf', changing those names will basically cause everybody to change, or conditionally change, their code, without any practical benefit. </this universe mode>

Why couldn't the name leaf just be deprecated but kept in the library for old code to keep working? I'm only casually browsing this thread, but backwards compatibility in this case doesn't seem like that big a problem.

Unless you want 'leaf' to be named 'basename'. See, we already have 'basename', which unfortunately does something different. - Volodya

Johan Råde

2:13 p.m.

Vladimir Prus wrote:

...

Unless you want 'leaf' to be named 'basename'. See, we already have 'basename', which unfortunately does something different.

- Volodya

How about leaf -> own_name and branch_path -> parent_directory_path ? --Johan

Johan Nilsson

22 May 22 May

7:01 a.m.

Vladimir Prus wrote:

...

Frank Mori Hess wrote:

...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

[snip]

...

Unless you want 'leaf' to be named 'basename'. See, we already have 'basename', which unfortunately does something different.

[sorry for jumping in] I personally have no problems with leaf, but just for comparison, the following is from Ruby (note that extension can be conditionally removed):

...

ri File.basename --------------------------------------------------------- File::basename File.basename(file_name [, suffix] ) -> base_name

Returns the last component of the filename given in _file_name_, which must be formed using forward slashes (``+/+'') regardless of the separator used on the local file system. If _suffix_ is given and present at the end of _file_name_, it is removed. File.basename("/home/gumby/work/ruby.rb") #=> "ruby.rb" File.basename("/home/gumby/work/ruby.rb", ".rb") #=> "ruby"

...

ri Pathname.basename

------------------------------------------------------ Pathname#basename basename(*args) ------------------------------------------------------------------------ See +File.basename+. Returns the last component of the path. / Johan

David Abrahams

25 May 25 May

5:11 a.m.

on Thu May 22 2008, "Johan Nilsson" <r.johan.nilsson-AT-gmail.com> wrote:

...

Vladimir Prus wrote:

...
Frank Mori Hess wrote:

...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

[snip]

...
Unless you want 'leaf' to be named 'basename'. See, we already have 'basename', which unfortunately does something different.

[sorry for jumping in]

I personally have no problems with leaf, but just for comparison, the following is from Ruby (note that extension can be conditionally removed):

...
ri File.basename --------------------------------------------------------- File::basename File.basename(file_name [, suffix] ) -> base_name

Returns the last component of the filename given in _file_name_, which must be formed using forward slashes (``+/+'') regardless of the separator used on the local file system. If _suffix_ is given and present at the end of _file_name_, it is removed.

File.basename("/home/gumby/work/ruby.rb") #=> "ruby.rb" File.basename("/home/gumby/work/ruby.rb", ".rb") #=> "ruby"

Which exactly mirrors the posix command-line function. On every POSIX system $ basename foo/bar/baz.x baz.x $ basename foo/bar/baz.x .cpp baz.x $ basename foo/bar/baz.x .x baz -- Dave Abrahams BoostPro Computing http://www.boostpro.com

David Abrahams

23 May 23 May

12:01 a.m.

on Wed May 21 2008, Vladimir Prus <vladimir-AT-codesourcery.com> wrote:

...

Johan Råde wrote:

...
David Abrahams wrote:

...
I was just reviewing the filesystem docs and came across "leaf()". I'm sure this isn't the first time I've seen it, but this time I picked up a little semantic dissonance. Normally we think of "leaf" in the context of a tree as being a thing with no children. An interior node like a directory that has files or other directories in it is usually not called a "leaf." I wonder if this is the best possible name?

Is there a precedent we can draw on in some other language/library? In python, it's os.path.basename(p). Perl, php, and the posix basename command seem to do something similar.

David is absolutely right.

The name leaf is the result of conceptual confusion, failure to distinguish between the tree structure of the file system and the linear structure of a path.

If possible, at this late stage, the names should be changed.

<another universe mode> Of course, since boost.filesystem is used by exactly zero real-world projects right now (because nobody was able to grok the meaning of 'leaf'), it's OK to change the names to more sane ones. </another universe mode>

<this universe mode> Given that boost.filesystem appears to be highly popular, and apparently users don't care about conceptual clarify of 'leaf', changing those names will basically cause everybody to change, or conditionally change, their code, without any practical benefit. </this universe mode>

It's easy enough to leave them deprecated, or even officially removed, but available for backward-compatibility. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Vladimir Prus

4:43 a.m.

David Abrahams wrote:

...

...
...
If possible, at this late stage, the names should be changed.

<another universe mode> Of course, since boost.filesystem is used by exactly zero real-world projects right now (because nobody was able to grok the meaning of 'leaf'), it's OK to change the names to more sane ones. </another universe mode>

<this universe mode> Given that boost.filesystem appears to be highly popular, and apparently users don't care about conceptual clarify of 'leaf', changing those names will basically cause everybody to change, or conditionally change, their code, without any practical benefit. </this universe mode>

It's easy enough to leave them deprecated, or even officially removed, but available for backward-compatibility.

Did you read all the messages in this thread? If you want to rename 'leaf' to 'basename', you cannot do that without breaking backward compatibility (no matter if you leave 'leaf' around as deprecated). - Volodya

John Femiani

7:52 a.m.

Volodya wrote:

...

David Abrahams wrote: [snip]

...
It's easy enough to leave them deprecated, or even officially removed, but available for backward-compatibility.

Did you read all the messages in this thread? If you want to rename 'leaf' to 'basename', you cannot do that without breaking backward compatibility (no matter if you leave 'leaf' around as deprecated).

Is there anything wrong with back()? Can we add back() and deprecate (not remove, just deprecate) leaf()? -- John

David Abrahams

25 May 25 May

5:55 a.m.

on Fri May 23 2008, Vladimir Prus <vladimir-AT-codesourcery.com> wrote:

...

David Abrahams wrote:

...
...
...
If possible, at this late stage, the names should be changed.

<another universe mode> Of course, since boost.filesystem is used by exactly zero real-world projects right now (because nobody was able to grok the meaning of 'leaf'), it's OK to change the names to more sane ones. </another universe mode>

<this universe mode> Given that boost.filesystem appears to be highly popular, and apparently users don't care about conceptual clarify of 'leaf', changing those names will basically cause everybody to change, or conditionally change, their code, without any practical benefit. </this universe mode>

It's easy enough to leave them deprecated, or even officially removed, but available for backward-compatibility.

Did you read all the messages in this thread? If you want to rename 'leaf' to 'basename', you cannot do that without breaking backward compatibility (no matter if you leave 'leaf' around as deprecated).

I guess I would have screamed a lot louder about "leaf" in 2003 if I had realized that the library was also inventing a new meaning for "basename" in a domain where it is already heavily used to mean something else. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

David Abrahams

5:53 a.m.

on Wed May 21 2008, Vladimir Prus <vladimir-AT-codesourcery.com> wrote:

...

Johan Råde wrote:

...
David Abrahams wrote:

...
I was just reviewing the filesystem docs and came across "leaf()". I'm sure this isn't the first time I've seen it, but this time I picked up a little semantic dissonance. Normally we think of "leaf" in the context of a tree as being a thing with no children. An interior node like a directory that has files or other directories in it is usually not called a "leaf." I wonder if this is the best possible name?

Is there a precedent we can draw on in some other language/library? In python, it's os.path.basename(p). Perl, php, and the posix basename command seem to do something similar.

David is absolutely right.

The name leaf is the result of conceptual confusion, failure to distinguish between the tree structure of the file system and the linear structure of a path.

If possible, at this late stage, the names should be changed.

<another universe mode> Of course, since boost.filesystem is used by exactly zero real-world projects right now (because nobody was able to grok the meaning of 'leaf'), it's OK to change the names to more sane ones. </another universe mode>

<this universe mode> Given that boost.filesystem appears to be highly popular, and apparently users don't care about conceptual clarify of 'leaf', changing those names will basically cause everybody to change, or conditionally change, their code, without any practical benefit. </this universe mode>

I think conceptual clarity and the use of accepted terminology will be a big deal in the future for those people coming from other languages and environments, and if this library is successful that group will be way bigger than the library's current userbase is. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

6250

Age (days ago)

6262

Last active (days ago)

List overview

Download

58 comments

16 participants

participants (16)

Beman Dawes
Bjørn Roald
David Abrahams
dherring＠ll.mit.edu
dizzy
Felipe Magno de Almeida
Frank Mori Hess
Johan Nilsson
Johan Råde
John Femiani
Larry Evans
Marcus Lindblom
Martin Wille
Scott McMurray
Vladimir Prus
Zach Laine

[filesystem] "leaf"

David Abrahams

dizzy

David Abrahams

Zach Laine

Frank Mori Hess

Zach Laine

Larry Evans

John Femiani

Felipe Magno de Almeida

Johan Råde

Marcus Lindblom

John Femiani

Marcus Lindblom

John Femiani

Johan Råde

David Abrahams

David Abrahams

Beman Dawes

John Femiani

Larry Evans

David Abrahams

Beman Dawes

David Abrahams

Beman Dawes

Frank Mori Hess

Beman Dawes

John Femiani

Bjørn Roald

Scott McMurray

Beman Dawes

Scott McMurray

Beman Dawes

Scott McMurray

Martin Wille

Scott McMurray

David Abrahams

Johan Råde

dherring＠ll.mit.edu

David Abrahams

David Abrahams

Johan Råde

Beman Dawes

Johan Råde

Beman Dawes

John Femiani

David Abrahams

John Femiani

Bjørn Roald

Vladimir Prus

Frank Mori Hess

Vladimir Prus

Johan Råde

Johan Nilsson

David Abrahams

David Abrahams

Vladimir Prus

John Femiani

David Abrahams

David Abrahams

tags

participants (16)