Name checking in filesystem

Greetings, I have come across an annoying problem with the filesystem library. I have wrapped filesystem with Swig so that I can call it from python. I had to rewrite the filesystem::path implementation a little because Swig can't handle nested classes. In any case, once I got that sorted out, I wanted to do something like p=path(".bar") which doesn't work because filenames starting with a period are not allowed by the default name checking routine. Argh. Unfortunately, I can't figure out how to change the default name checker from python. If I worked at it long enough, I could probably work it out. But I am sure that I don't want to have to run something like path.default_name_check(no_check) at the beginning of every 10 line script. It is a particular piece of functionality (name checking) that just gets in the way. I would wager that most people using boost::filesystem from scripts would not want to have to deal with it, since it is completely different from every other portable, cross platform filesystem API. So to make a long story short, I think the default for boost::filesystem should be not to check for "portable" paths. This is different from a position I took earlier, where I thought that it is ok if I only have to set it once at the beginning. That is acceptable for large programs, but not for small ones. Regards, Walter Landry wlandry@ucsd.edu

At 11:25 PM 9/26/2004, Walter Landry wrote:
Greetings,
I have come across an annoying problem with the filesystem library. I have wrapped filesystem with Swig so that I can call it from python. I had to rewrite the filesystem::path implementation a little because Swig can't handle nested classes. In any case, once I got that sorted out, I wanted to do something like
p=path(".bar")
which doesn't work because filenames starting with a period are not allowed by the default name checking routine. Argh.
Unfortunately, I can't figure out how to change the default name checker from python. If I worked at it long enough, I could probably work it out. But I am sure that I don't want to have to run something like path.default_name_check(no_check) at the beginning of every 10 line script. It is a particular piece of functionality (name checking) that just gets in the way. I would wager that most people using boost::filesystem from scripts would not want to have to deal with it, since it is completely different from every other portable, cross platform filesystem API.
So to make a long story short, I think the default for boost::filesystem should be not to check for "portable" paths. This is different from a position I took earlier, where I thought that it is ok if I only have to set it once at the beginning. That is acceptable for large programs, but not for small ones.
Peter Dimov and others have also argued that the default is wrong, and I'm sympathetic to their arguments. Some people may believe the default should be "native" rather "no_check". Either of those would be breaking changes for some current programs which use the library, and we would have to figure a way to deal with that. --Beman

Beman Dawes <bdawes@acm.org> wrote:
At 11:25 PM 9/26/2004, Walter Landry wrote:
Greetings,
I have come across an annoying problem with the filesystem library. I have wrapped filesystem with Swig so that I can call it from python. I had to rewrite the filesystem::path implementation a little because Swig can't handle nested classes. In any case, once I got that sorted out, I wanted to do something like
p=path(".bar")
which doesn't work because filenames starting with a period are not allowed by the default name checking routine. Argh.
Unfortunately, I can't figure out how to change the default name checker from python. If I worked at it long enough, I could probably work it out. But I am sure that I don't want to have to run something like path.default_name_check(no_check) at the beginning of every 10 line script. It is a particular piece of functionality (name checking) that just gets in the way. I would wager that most people using boost::filesystem from scripts would not want to have to deal with it, since it is completely different from every other portable, cross platform filesystem API.
So to make a long story short, I think the default for boost::filesystem should be not to check for "portable" paths. This is different from a position I took earlier, where I thought that it is ok if I only have to set it once at the beginning. That is acceptable for large programs, but not for small ones.
Peter Dimov and others have also argued that the default is wrong, and I'm sympathetic to their arguments. Some people may believe the default should be "native" rather "no_check".
If I so desired, I could mount HFS+, BeFS, JFS, FFS, BFS, ADFS, FAT, VFAT, NTFS, ext2/3, XFS, UMSDOS, Reiserfs, ISO 9660, and UDF on my machine. Which one is "native"? If the intent is make sure that all paths can actually be accessed on the machine, then you don't need to do any checks. The operating system does that for you. If you are not actually opening files, then perhaps you don't need this check anyway? Besides, doing any checking implies a (perhaps mild) performance hit, and I don't want to have to jump through hoops to get rid of something I don't need.
Either of those would be breaking changes for some current programs which use the library, and we would have to figure a way to deal with that.
A compile-time option? Users who want the old behavior can compile with BOOST_FILESYSTEM_PORTABLE_DEFAULT defined. Regards, Walter Landry wlandry@ucsd.edu

Walter Landry wrote:
If I so desired, I could mount HFS+, BeFS, JFS, FFS, BFS, ADFS, FAT, VFAT, NTFS, ext2/3, XFS, UMSDOS, Reiserfs, ISO 9660, and UDF on my machine. Which one is "native"?
Doesn't the OS your running deside what is allowed for it, therefore the OS name check is 'native'?

At 08:31 AM 10/3/2004, Russell Hind wrote:
Walter Landry wrote:
If I so desired, I could mount HFS+, BeFS, JFS, FFS, BFS, ADFS, FAT, VFAT, NTFS, ext2/3, XFS, UMSDOS, Reiserfs, ISO 9660, and UDF on my machine. Which one is "native"?
Doesn't the OS your running deside what is allowed for it, therefore the OS name check is 'native'?
That's correct. So the fact that a particular OS can mount many different file systems is not a problem as far as "native" is concerned. In trying to redesign Boost.Filesystem to handle wide-character external file names, mounts of different file systems are a serious problem if some support wide-character names (NTFS, for example) and some don't (FAT). AFAICS, the implementation of a wide-character name request will have to determine that at runtime if wide-characters are supported by the actual file system, and act accordingly. --Beman

Walter Landry <wlandry@ucsd.edu> writes:
Peter Dimov and others have also argued that the default is wrong, and I'm sympathetic to their arguments. Some people may believe the default should be "native" rather "no_check".
If I so desired, I could mount HFS+, BeFS, JFS, FFS, BFS, ADFS, FAT, VFAT, NTFS, ext2/3, XFS, UMSDOS, Reiserfs, ISO 9660, and UDF on my machine. Which one is "native"?
If the intent is make sure that all paths can actually be accessed on the machine, then you don't need to do any checks. The operating system does that for you. If you are not actually opening files, then perhaps you don't need this check anyway?
Besides, doing any checking implies a (perhaps mild) performance hit, and I don't want to have to jump through hoops to get rid of something I don't need.
I've said it before, but I always found the checking to be much more of a hindrance than a help. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

On Sun, 03 Oct 2004 18:31:42 -0400, Beman Dawes wrote
At 10:20 AM 10/3/2004, David Abrahams wrote:
I've said it before, but I always found the checking to be much more >of a hindrance than a help.
So presumably you would be in favor of changing the default to "no_check"?
I'll just chime in since we're voting ;-) As an early proponent, and still a user of the portable option, I think either native or no-check is fine as the default. I don't have enough code to really even care about the backward compatibility macro -- it's trivial enough to switch over... Jeff

Beman Dawes <bdawes@acm.org> writes:
At 10:20 AM 10/3/2004, David Abrahams wrote:
I've said it before, but I always found the checking to be much more of a hindrance than a help.
So presumably you would be in favor of changing the default to "no_check"?
I think so. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

At 07:40 PM 10/3/2004, David Abrahams wrote:
Beman Dawes <bdawes@acm.org> writes:
At 10:20 AM 10/3/2004, David Abrahams wrote:
I've said it before, but I always found the checking to be much more of a hindrance than a help.
So presumably you would be in favor of changing the default to "no_check"?
I think so.
Unless strong objections arise, I'll make the change to the main trunk after the 1.32 branch for release. That will give us plenty of time to work out any kinks before release to the general public via 1.33. --Beman

Beman Dawes wrote:
At 07:40 PM 10/3/2004, David Abrahams wrote:
Beman Dawes <bdawes@acm.org> writes:
At 10:20 AM 10/3/2004, David Abrahams wrote:
I've said it before, but I always found the checking to be much more of a hindrance than a help.
So presumably you would be in favor of changing the default to "no_check"?
I think so.
Unless strong objections arise, I'll make the change to the main trunk after the 1.32 branch for release. That will give us plenty of time to work out any kinks before release to the general public via 1.33.
--Beman
Greetings, I am new in Boost Filesytem and my involvement is related to one of my courses. As a project, we have to a request for change and we can then submit our "contribution" by posting it on the list. The first part of my work was to analyse a request. The second part is this post. The analysis can be found on the following link (http://perso.efrei.fr/~galmes/boost/name_check.html). It explains which is the problem in detail using examples. The request is the following : The automatic name checking functionality, in the Boost library is turned on by defaults. That means the Boost::filesystem will check for "portable" paths. The question is then, should the "name check" functionality be turned on by default ? Which between portable_name, native and no_check would be the best choice ? This mail is divided in three parts. The first one expose the conclusions drawn from the analysis. The second part is the argumentation of the results. The last part is composed of some suggestions after having spend some hours going through the library. I apologize for the length of this post and hope that it will help to improve the boost::filesystem library. I - The results Here is what I found out from that analysis (which should not be big news :-): - by default the "no_check" option seems to be the best. - "native" would be the best once it supports multiples filesystem checks. It may then be a good idea if this kind of check is going to work in Boost 1.33 ! Suggestions to improve the "usability" of the library : - the name "native" is not so explicit and tends to confuse users. - How the "native" option works could by explained in an "easy-to-understand" way in the documentation. I - Argumentation The automatic name checking functionality can behave in many different ways, and from those, only three could be used as default values : * portable_name : check if the path is "portable" (default on boost-1.3.2). * native : check if the name is valid for the OS being used to execute the program. * no_check : does not perform any check. We will then try to show which of those is the best. For this, I try to make an "objective" analysis trying to quantify the choices. This is a way to try to solve the problem but might not be the best one : it is a lot of explanations and may just get rid of most of the readers ;). Here is a summary of the pros and cons for each choice from the previous posts. a - portable_name (default on boost 1.32) * Pro 1 : Is the check that require the strictest names to ensure portability. Programs using this should not have any path portability problems for most common operating systems. * Cons 1 : enabling name checking by default prevents users doing programs with no name constraints. Those non-portable program might constitute the majority of programs written using Boost::filesystem. (http://tinyurl.com/5mb5r) * Cons 2 : checking implies a performance hit (see http://tinyurl.com/6esmw) a - no_check * Pro 1 : Does not put any constraints on the users with no need of portability which should represent the majority of users. (See http://tinyurl.com/5mb5r) * Pro 2 : The use of the option "no_check" is explicit (compared to "native" or "portable_name"). b - native : * Pro 1 : Less restrictive that "portable_name" but still realise some checks for the native operating system. * Cons 1 : the name "native" is confusing. Not explicit enough that native check it will check on the operating system, not on the file system (http://tinyurl.com/6esmw). * Cons 2 : Give an illusion of "security"/ "portability" on the same operating system. This is false, as the check is done on the operating system, not on the file system. Thus, on the same operating system the portability depends on the file system used. An example of this problem was given in a post by Beman (See http://tinyurl.com/635yn). d - Different matters and their importance Now that, here is a table summarizing which of those arguments are important for the users wanting to use the library for doing portable programs and those important for the "common users" with no need for portability. This is my point of view, and I would be interested in knowing which is yours. The explanation about the different terms used are given below. ! ! portable users ! common users ! !---------------------------------------------------------------! ! explicit ! + ! ++ ! ! portability check ! +++ ! - ! ! OS check ! - ! + ! ! fs check ! - ! ++ ! ! performance hit ! ++ ! ++ ! ! security illusion ! ++ ! + ! ! ease of use ! +++ ! +++ ! - explicit name : The name of the option (native, no_check...) is self explanatory about the way the check is done. common users (++) : this is important so that they are able to use the library without having to read in detail the documentation. portable users (+) : They will have to read the documentation more in detail in order to know how to produce portable path. Hence, that the name is explicit is a less important criteria for choosing a default value. - portability check : Checks that path will be valid over the most popular platforms (POSIX, Windows). This is the check done by "portable_name". common users (-) : for common users that any check related to portability is of no importance as they won't port their programs. portable users (+++) : this is one of the most important criteria. - OS check : Checks that for the current OS, paths are valid. common users (+) : if paths are not accepted, the program will not work. I only put (+) because often users know which characters are valid for their OS. portable users (-) : When writing portable programs, you do not really care that it is valid for your OS : it should be valid for all OS. This is covered by the portability check. - fs check : Checks the validity of the path for the file system the path tries to access. I suppose that this check should be done at runtime when trying to work with a particular file or directory. common users (++) : if paths are not accepted, the program will not work. I put (++) because less users know which characters are valid for the different file systems they manipulate. portable users (-) : When writing portable programs, you do not really care that it is valid for your fs : it should be valid for all fs. This is covered by the portability check. - ease of use : Is the library easy to use according to the users aims ? Does the user has to write many lines of code in order to achieve what he wants ? all users (+++) : This is the most important feature. If a library does not provide nice interfaces for the users, that he has to reconfigure it all the time so that the checks succeed - performance hit : Does using a check implies a performance hit ? all users (++) : This is also an important feature. When coding a program, users always want it to run fast. This is an important criteria but less important that the ease of use, especially if the performance hit is mild. - security illusion : Does the program gives the illusion that it will work correctly on different platforms ? common user (+) : This is not a big deal, as portability is not the first problem when writing a program. portable user (++) : If the program written in order to be portable just give an illusion of such a behavior, this would be a real problem. e - Different criteria and their availability Here is a table representing for each of the three options there behavior for the different criteria listed above. The characters '+', '=' and '-' represent the "points" given to the criteria if it is available for an option. The criteria ease-of use is separated for the two kind of users. table of "appearance" : ! ! portable_name ! native ! no_check !-------------------------------------------------------------------! ! explicit name ! + ! - ! ++ ! ! portability check ! ++ ! + ! - ! ! OS check ! ++ ! ++ ! - ! ! fs check ! - ! - ! - ! ! no performance hit ! - ! - ! ++ ! ! no security ill. ! - ! - ! ++ ! common users : ! ease of use ! - ! ++ ! ++ ! Portable users : ! ease of use ! ++ ! - ! - ! f - The best option We can now try to satisfy the most users by comparing the criteria in a "mathematical" way. As a first though, we could suppose that the majority of the users would use boost::filesystem for non-portable programs. Let say 80% will write programs without any need for portability. From this we can then deduce which criteria will fit the best. For each option, and for each kind of users, we calculate it in the following way : option = explicit importance * appearance + portability_check importance * appearance + OS check importance * appearance + fs check importance * appearance + ease of use importance * appearance + no performance hit importance * appearance + no security ill. importance * appearance We use the table and choose the following : + = 1 point - = 0 points Then, We find : common users : ! ! portable_name ! native ! no_check ! !------------------------------------------------------------------! ! explicit ! 2 * 1 ! 2 * 0 ! 2 * 2 ! ! portability check ! + 0 * 2 ! + 0 * 1 ! + 0 * 0 ! ! OS check ! + 1 * 2 ! + 1 * 2 ! + 1 * 0 ! ! fs check ! + 2 * 0 ! + 2 * 0 ! + 2 * 0 ! ! performance hit ! + 2 * 0 ! + 2 * 0 ! + 2 * 2 ! ! security illusion ! + 1 * 0 ! + 1 * 0 ! + 1 * 2 ! ! ease of use ! + 3 * 0 ! + 3 * 2 ! + 3 * 2 ! !------------------------------------------------------------------! ! sum ! 4 ! 8 ! 16 portable users : ! ! portable_name ! native ! no_check ! !------------------------------------------------------------------! ! explicit ! 1 * 1 ! 1 * 0 ! 1 * 2 ! ! portability check ! + 3 * 2 ! + 3 * 1 ! + 3 * 0 ! ! OS check ! + 0 * 2 ! + 0 * 2 ! + 0 * 0 ! ! fs check ! + 0 * 0 ! + 0 * 0 ! + 0 * 0 ! ! performance hit ! + 2 * 0 ! + 2 * 0 ! + 2 * 2 ! ! security illusion ! + 1 * 0 ! + 1 * 0 ! + 1 * 2 ! ! ease of use ! + 3 * 2 ! + 3 * 0 ! + 3 * 0 ! !------------------------------------------------------------------! ! sum ! 13 ! 3 ! 8 We can now use the fact that 80% of the users should be common users. We can then try to calculate which option is to be used most widely : portable_name = 0.8 * 4 + 0.2 * 13 = = 3.2 + 2.6 = 5.8 native = 0.8 * 8 + 0.2 * 3 = 6.4 + 0.6 = 7 no_check = 0.8 * 16 + 0.2 * 8 = 12.8 + 1.6 = 14.4 We can then deduce that the no_check option suits the best to most of the common needs ! g - The native problem In the 1.32 version of boost, the option native has the problem : the checks are done so that the path used will be accepted by the operating system used to compile not by the file system the application will access. This is confusing, as by native, I would expect checks to work on the platform for which you compiled the program. I personally do not think about file systems. In my opinion, this is what confuses the user and gives a security illusion. I read that Beman was working to change the behavior of the native option, so that it would also check on the file system. If we check what are the results doing so, we found that then the native option would fit the best. ! ! portable_name ! native ! no_check ! !---------------------------------------------------------------------! ! explicit name ! + ! ++ ! ++ ! ! portability check ! ++ ! + ! - ! ! OS check ! ++ ! ++ ! - ! ! fs check ! - ! ++ ! - ! ! no performance hit ! - ! - ! ++ ! ! no security ill. ! - ! + ! ++ ! common users : ! ease of use ! - ! ++ ! ++ ! Portable users : ! ease of use ! ++ ! - ! - ! common users : ! ! native ! !---------------------------------------! ! explicit ! 2 * 2 ! ! portability check ! + 0 * 1 ! ! OS check ! + 1 * 2 ! ! fs check ! + 2 * 2 ! ! performance hit ! + 2 * 0 ! ! security illusion ! + 1 * 1 ! ! ease of use ! + 3 * 2 ! !---------------------------------------! ! sum ! 17 ! portable users : ! ! native ! !--------------------------------------! ! explicit ! 1 * 2 ! ! portability check ! + 3 * 1 ! ! OS check ! + 0 * 2 ! ! fs check ! + 0 * 2 ! ! performance hit ! + 2 * 0 ! ! security illusion ! + 1 * 1 ! ! ease of use ! + 3 * 2 ! !--------------------------------------! ! sum ! 15 native = 0.8 * 17 + 0.2 * 15 = 13.6 + 3 = 16.6 If the problem described, with multiple file systems mounted is solved, that would be the best choice, but for now, it would bring many mistakes due to misunderstanding how native works in my opinion. g - limitations This approach is not really objective, as I was the one choosing the weights of the different criteria, and those are just arbitrary choices. This is especially true for the explicit criteria or security illusion. On the other side, it is an approach that has the advantage of "measuring" and giving a solution. II - Some suggestions about native The native option is quite confusing. I had to read through the mailing list before being able to understand how it works. The documentation does not explains in a really explicit way how it works. And I don't thing I am the only one, as Walter was also (See http://tinyurl.com/6esmw). Ideas to solve that : 1 -> Give a more explicit name (OS_native ?). 2 -> Change the documentation so that this point is explained a bit more ? Why not add an example ? The example of Beman Dawes what quite explicit about how that works ! I hope I didn't bored too many of you ! If you read this sentence, that is near to be a miracle ! :-) Thank-you for having taken the time to read through this post ! Cheers, Pierre-Andre Galmes

Pierre-Andre Galmes wrote: [...] As the indentation of the tables was lost during the send, you can find the previous post on the following link : http://perso.efrei.fr/~galmes/boost/final_post Cheers, Pierre-Andre

At 12:14 PM 12/4/2004, Pierre-Andre Galmes wrote:
I am new in Boost Filesytem and my involvement is related to one of my courses. As a project, we have to a request for change and we can then submit our "contribution" by posting it on the list. The first part of my work was to analyse a request. The second part is this post. The analysis can be found on the following link (http://perso.efrei.fr/~galmes/boost/name_check.html). It explains which is the problem in detail using examples.
The request is the following : The automatic name checking functionality, in the Boost library is turned on by defaults. That means the Boost::filesystem will check for "portable" paths. The question is then, should the "name check" functionality be turned on by default ? Which between portable_name, native and no_check would be the best choice ? This mail is divided in three parts. The first one expose the conclusions drawn from the analysis. The second part is the argumentation of the results.
The last part is composed of some suggestions after having spend some hours going through the library. I apologize for the length of this post and hope that it will help to improve the boost::filesystem library.
Because of the length of the post, I'll probably spread the reply over several messages.
I - The results
Here is what I found out from that analysis (which should not be big news :-):
- by default the "no_check" option seems to be the best.
I'll comment on this later, although I agree with you that no_check is best.
- "native" would be the best once it supports multiples filesystem checks. It may then be a good idea if this kind of check is going to work in Boost 1.33 !
Suggestions to improve the "usability" of the library :
- the name "native" is not so explicit and tends to confuse users. - How the "native" option works could by explained in an "easy-to-understand" way in the documentation.
The second path constructor argument "name_check checker" actually passes two pieces of information - the checker to use and the format to be allowed. By combining these into a single argument, the interface is hopefully a bit easier to use, although at the cost of being a bit harder to understand. So when "native" is specified, it is really saying (1) allow the native (rather than portable generic) path format, and (2) possibly saying that some general name checking that would apply to any native formatted path be performed. Note that the path being represented does not have to actually exist, and even if it does exist that could just be happenstance. So what kind of file system (FAT, ISO 9660, etc.) is involved isn't something that a path object knows about. (The operational functions are where we might add a function that queries the actual type of the file system.)
I - Argumentation
The automatic name checking functionality can behave in many different ways, and from those, only three could be used as default values :
* portable_name : check if the path is "portable" (default on boost-1.3.2).
* native : check if the name is valid for the OS being used to execute the program.
* no_check : does not perform any check.
We will then try to show which of those is the best. For this, I try to make an "objective" analysis trying to quantify the choices. This is a way to try to solve the problem but might not be the best one : it is a lot of explanations and may just get rid of most of the readers ;).
My usual approach is to first look for "killer arguments". These are arguments which are so strong that they overwhelm other analysis. Peter Dimov has been helping a lot in the "killer argument" department as regards error checking. But if no killer argument is found, then how should a decision be reached? Your approach of assigning weighted values to pros and cons is something that I used to try a lot, but have given up on in recent years. I think that weighting schemes just end up justifying what I thought anyhow, so just are rationalizations for what I was going to do anyhow. So what I do now is to let the choices bubble around in my mind (often overnight), and then pick the one that seems strongest. I also give weight to the views of experienced designers; thus Peter's views carried a lot of weight even though I disagree with some of his detail reasoning. I've run out of time tonight; I'll try to comment further in the next day or two. Good luck with your course! --Beman

At 10:20 AM 10/2/2004, Walter Landry wrote:
If I so desired, I could mount HFS+, BeFS, JFS, FFS, BFS, ADFS, FAT, VFAT, NTFS, ext2/3, XFS, UMSDOS, Reiserfs, ISO 9660, and UDF on my machine. Which one is "native"?
Any of those can be "native", since "native" refers to the grammar used by paths rather than which type of file system is mounted.
If the intent is make sure that all paths can actually be accessed on the machine, then you don't need to do any checks. The operating system does that for you. If you are not actually opening files, then perhaps you don't need this check anyway?
That's correct, however, some applications wish to check cases where files are not being opened, or are only opened on certain conditions.
Besides, doing any checking implies a (perhaps mild) performance hit,
That's true, although the hit is so minor it is not nearly as important a consideration as the other points you bring up.
and I don't want to have to jump through hoops to get rid of something I don't need.
Either of those would be breaking changes for some current programs which
use the library, and we would have to figure a way to deal with that.
A compile-time option? Users who want the old behavior can compile with BOOST_FILESYSTEM_PORTABLE_DEFAULT defined.
That would help the transition. Also, programs which currently change the default to "no_check" would not absolutely have to be changed; the line of code changing the default becomes redundant but is functionally harmless. Thanks, --Beman
participants (6)
-
Beman Dawes
-
David Abrahams
-
Jeff Garland
-
Pierre-Andre Galmes
-
Russell Hind
-
Walter Landry