
Beman Dawes <bdawes@acm.org> writes:
[snip: path ordering]
The primary use case I know of for operator<() is default ordering for paths used as keys in associative containers. I can't see that either approach is superior for this use, so unless someone comes up with a compelling argument, (1) will be used.
I would suggest lexicographical ordering of the components, i.e. option (2). Ordering based on the ``portable'' path representation would, I think, be confusing on platforms which do not have a native path format which is identical or very similar to the portable format. Furthermore, the primary if not exclusive purpose of the portable path format is to allow storing (relative) paths as string constants, which is functionality that many users may not need, and thus will not be using the portable path representation.
Equivalence -----------
Two paths will be considered equivalent if they resolve to the same physical directory or file.
Question 1: What is a use case that requires this function? Verifying that source and target files are different before some modifying operation is the only one I've come up with. I guess that is sufficient to justify adding the function.
Following directory trees is the common use case. Of course, without a reliable file identifier number, actually using this function would be highly inefficient. A link_count function would also be useful for supporting certain logic for dealing with making backup files, such as move if the file is linked only once, otherwise copy. In addition, useful functionality that could be implemented at a later point would be a unique file identifier object, which keeps an open file handle/descriptor, to ensure that the identifier remains valid. Then the object could be used as a key in associative containers, and allow for efficient implementation of directory recursion.
Question 2: What if neither exist? Only one exists? My initial thought is that these are likely to be errors, so treat them as such. It could be argued that if either or both don't exist, they can't be equivalent, so return false.
I would suggest that the function throw an exception if either file does not exist. The exception would allow the user to determine exactly which paths exist or do not exist. Any other behavior, given that the function can return only true or false, would in some circumstances give the user less information than desired.
Question 3: The implementation on Windows (see below) leaves a small hole in that duplicated media (such as two CD's) mounted on devices with the same device id on two different networked machines would be reported as equivalent.
Does Windows actually assign networked devices device ids which are also used for local devices? If it does, then disregard comments below about use of device id exclusively.
POSIX requires that such networked devices have different device id's, avoiding the problem. Is the fact that Windows and POSIX implementations would perform slightly differently on this corner case a showstopper? I think not.
Windows logic for path equivalent: same device id AND same media volume serial number AND same physical location on disk AND same creation time. This works even in degenerate cases like camera formatted FAT flash memory cards or floppy disks with volume serial numbers incorrectly initialized to 0.
Why not use exclusively the device id and ignore the media volume serial number? Shouldn't that solve the problems? I wouldn't be too worried about broken device ids, and I don't like the idea of using hacks like modification time. Before using modification time, it would be useful to determine if there are versions of Windows that sometimes give two devices the same device id (this really does seem highly unlikely).
POSIX logic: same device id AND same physical location on disk AND same modification time. The modification time is in theory redundant, but is an added protection in case the device id on networked devices failed to meet the POSIX specs.
As with Windows, do you know of any POSIX platforms that sometimes give two devices the same device id? Note: the sample code I posted incorrectly used stat(2) instead of fstat(2) -- fstat should be used to ensure that the file identifier remains valid, and that the file is not removed, changed, etc. -- Jeremy Maitin-Shepard