[filesystem] size() on directory?

How should size( "foo" ) behave if "foo" is a directory? 1) Throw. The function should be renamed file_size(). 2) Return 0. Directories don't strictly speaking have a size. 3) Return the number of entries in the directory. Directories are really just containers; so of course they have a size. (1) and (3) seem most attractive to me; (2) seems useless for directories, and probably misleading. The problem with (3) is that for many (most?) operating systems the implementation would have to iterate over the directory to get a count, and this difference in complexity compared to getting the size of a file (which is usually constant complexity) could be pretty surprising. I'm inclined to go with (1), naming the function "file_size", and then add a function later to get the number of entries in a directory. But only if there is enough demand. I don't want to clutter up the library with a lot of seldom used functions that can be trivially written as user code. Thoughts? --Beman

Beman Dawes wrote:
How should size( "foo" ) behave if "foo" is a directory?
1) Throw. The function should be renamed file_size().
2) Return 0. Directories don't strictly speaking have a size.
3) Return the number of entries in the directory. Directories are really just containers; so of course they have a size.
(1) and (3) seem most attractive to me; (2) seems useless for directories, and probably misleading.
The problem with (3) is that for many (most?) operating systems the implementation would have to iterate over the directory to get a count, and this difference in complexity compared to getting the size of a file (which is usually constant complexity) could be pretty surprising.
I'm inclined to go with (1), naming the function "file_size", and then add a function later to get the number of entries in a directory. But only if there is enough demand. I don't want to clutter up the library with a lot of seldom used functions that can be trivially written as user code.
Thoughts?
I agree with your conclusion. Asking for the file_size of a directory is just wrong. One should throw exceptions when a member function is used on an object which is not applicable to the actual type or state of the object. That way it is clearest that the call is incorrect. Anything else masks the incorrectness of the call and one does not want to do that.

On Mon, Feb 02, 2004 at 12:14:47PM -0500, Edward Diener wrote:
Beman Dawes wrote:
How should size( "foo" ) behave if "foo" is a directory?
1) Throw. The function should be renamed file_size().
[...]
I agree with your conclusion. Asking for the file_size of a directory is just wrong.
What's wrong about it? Example: If I do a "ls -lh" in my boost directory (Linux, Reiser FS), the first two lines I get are: drwxr-xr-x 31 cludwig users 3544 2003-11-27 11:06 boost -r-xr-xr-x 1 cludwig users 298 2003-11-27 11:06 boost-build.jam The system tells me that the special file containing the directory entries for the subdir "boost" occupies 3,5 KByte. I also checked the output of ls on a Solaris machine (with an ufs filesystem, I assume) and on a FAT32 filesystem. There the situation is similar; the only difference is that the reported size is always a multiple of the file system's block size. I don't know about other filesystems, but in the examples above it makes perfectly sense to ask for the size of a directory (rather than the sum of the size of its content). Beman's list of options has therefore to be appended: 4) Return the size of the special file containing the directory information. Provided it can be implemented on other systems, too, this would be the most natural behaviour in my opinion. Regards Christoph -- http://www.informatik.tu-darmstadt.de/TI/Mitarbeiter/cludwig.html LiDIA: http://www.informatik.tu-darmstadt.de/TI/LiDIA/Welcome.html

Beman Dawes <bdawes@acm.org> writes:
How should size( "foo" ) behave if "foo" is a directory? 1) Throw. The function should be renamed file_size().
[snip]
I'm inclined to go with (1), naming the function "file_size", and then add a function later to get the number of entries in a directory. But only if there is enough demand. I don't want to clutter up the library with a lot of seldom used functions that can be trivially written as user code.
I agree that (1) is the best option. Wanting to know the number of entries in a directory is less common, and can be written as: distance(directory_iterator(ph), directory_iterator()) The user may also wish to know the recursive size of all entries in the directory, but that is probably better handled by a more generic path visitor system. -- Jeremy Maitin-Shepard

Walter Landry <wlandry@ucsd.edu> writes:
Beman Dawes <bdawes@acm.org> wrote:
How should size( "foo" ) behave if "foo" is a directory?
What about symlinks?
I think those should be resolved by default, but there should be some syntax to avoid them resolving (not particularly useful with this function though). -- Jeremy Maitin-Shepard

At 05:20 PM 2/2/2004, Walter Landry wrote:
Beman Dawes <bdawes@acm.org> wrote:
How should size( "foo" ) behave if "foo" is a directory?
What about symlinks?
Good question. The POSIX implementation just committed uses stat(). My inclination is that this is desirable, but I'm not a POSIX user so I could easily be wrong. Opinions? --Beman

Beman Dawes wrote:
At 05:20 PM 2/2/2004, Walter Landry wrote:
Beman Dawes <bdawes@acm.org> wrote:
How should size( "foo" ) behave if "foo" is a directory?
What about symlinks?
Good question. The POSIX implementation just committed uses stat(). My inclination is that this is desirable, but I'm not a POSIX user so I could easily be wrong.
Opinions?
The implementation should return the size of the file that would be opened if the pathname of the symlink is used for opening, ie. it should return the size of the file being referred to by the symlink. So, IMHO, the implementation is correct and using lstat would be wrong. Regards, m

On 2/2/04 11:32 AM, "Beman Dawes" <bdawes@acm.org> wrote:
How should size( "foo" ) behave if "foo" is a directory?
1) Throw. The function should be renamed file_size().
2) Return 0. Directories don't strictly speaking have a size.
3) Return the number of entries in the directory. Directories are really just containers; so of course they have a size.
(1) and (3) seem most attractive to me; (2) seems useless for directories, and probably misleading.
The problem with (3) is that for many (most?) operating systems the implementation would have to iterate over the directory to get a count, and this difference in complexity compared to getting the size of a file (which is usually constant complexity) could be pretty surprising.
I'm inclined to go with (1), naming the function "file_size", and then add a function later to get the number of entries in a directory. But only if there is enough demand. I don't want to clutter up the library with a lot of seldom used functions that can be trivially written as user code.
I guess that the file-system library is based around the Unix/POSIX model of single-part files, purely-container directories, and a minimum of attributes (creation/modification dates, permissions, etc.). But there are file systems beyond that. Some of them support arbitrary attributes. Some of them support files with multiple parts, an extension of the "resource fork" idea from pre-X Macs. Most importantly, these features are also supported for directories. In this case, files and directories are identical, except that files do _not_ have the ability to "contain" other files. With this sort of file system, a "size" function for directories does make sense. Such a function would return the sum of the sizes of the file object's forks. If you go along with this, we should add APIs to access properties per individual fork, and add a new corresponding I/O-stream. IIUC, examples of these advanced file systems would be Apple's and Microsoft's latest offerings. I think Apple started it first, but has totally disabled it except what's needed to support pre-X files. Microsoft doesn't need it at all, but has it fully enabled! (I've read that this decision has resulted in some security bloopers. For example, someone could look at an empty file in his/her text editor and wonder why it takes up so much disk space, not knowing that the data was placed in a custom fork and an old text editor can't recognize that.) Does any Unix(-like) system support this idea, besides Mac OS X? I've heard that Linux was experimenting with this. Obviously, any non-advanced file system could be supported with any new APIs we make; just assume that the file system supports only one fork (with an empty name?). -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com

IIUC, examples of these advanced file systems would be Apple's and Microsoft's latest offerings. I think Apple started it first, but has totally disabled it except what's needed to support pre-X files. Microsoft doesn't need it at all, but has it fully enabled! (I've read that this decision has resulted in some security bloopers. For example, someone could look at an empty file in his/her text editor and wonder why it takes up so much disk space, not knowing that the data was placed in a custom fork and an old text editor can't recognize that.)
Does any Unix(-like) system support this idea, besides Mac OS X? I've heard that Linux was experimenting with this. Obviously, any non-advanced file system could be supported with any new APIs we make; just assume that the file system supports only one fork (with an empty name?).
Hi, Apple and MS call this feature, multiple streams file, so could use that name too :-) Apple did it first, and MS copy it (actually, improved it a lot.) and MS does use it, never wonder where the author field from each file came from? Anyway, you can generate and open each stream as a separated file, if you want to read foo, stream bar, just open foo::bar. My 0.02 pesos Lucas/
participants (8)
-
Beman Dawes
-
Christoph Ludwig
-
Daryle Walker
-
Edward Diener
-
Jeremy Maitin-Shepard
-
Lucas Galfaso
-
Martin Wille
-
Walter Landry