
Beman Dawes wrote:
At 07:40 PM 10/3/2004, David Abrahams wrote:
Beman Dawes <bdawes@acm.org> writes:
At 10:20 AM 10/3/2004, David Abrahams wrote:
I've said it before, but I always found the checking to be much more of a hindrance than a help.
So presumably you would be in favor of changing the default to "no_check"?
I think so.
Unless strong objections arise, I'll make the change to the main trunk after the 1.32 branch for release. That will give us plenty of time to work out any kinks before release to the general public via 1.33.
--Beman
Greetings, I am new in Boost Filesytem and my involvement is related to one of my courses. As a project, we have to a request for change and we can then submit our "contribution" by posting it on the list. The first part of my work was to analyse a request. The second part is this post. The analysis can be found on the following link (http://perso.efrei.fr/~galmes/boost/name_check.html). It explains which is the problem in detail using examples. The request is the following : The automatic name checking functionality, in the Boost library is turned on by defaults. That means the Boost::filesystem will check for "portable" paths. The question is then, should the "name check" functionality be turned on by default ? Which between portable_name, native and no_check would be the best choice ? This mail is divided in three parts. The first one expose the conclusions drawn from the analysis. The second part is the argumentation of the results. The last part is composed of some suggestions after having spend some hours going through the library. I apologize for the length of this post and hope that it will help to improve the boost::filesystem library. I - The results Here is what I found out from that analysis (which should not be big news :-): - by default the "no_check" option seems to be the best. - "native" would be the best once it supports multiples filesystem checks. It may then be a good idea if this kind of check is going to work in Boost 1.33 ! Suggestions to improve the "usability" of the library : - the name "native" is not so explicit and tends to confuse users. - How the "native" option works could by explained in an "easy-to-understand" way in the documentation. I - Argumentation The automatic name checking functionality can behave in many different ways, and from those, only three could be used as default values : * portable_name : check if the path is "portable" (default on boost-1.3.2). * native : check if the name is valid for the OS being used to execute the program. * no_check : does not perform any check. We will then try to show which of those is the best. For this, I try to make an "objective" analysis trying to quantify the choices. This is a way to try to solve the problem but might not be the best one : it is a lot of explanations and may just get rid of most of the readers ;). Here is a summary of the pros and cons for each choice from the previous posts. a - portable_name (default on boost 1.32) * Pro 1 : Is the check that require the strictest names to ensure portability. Programs using this should not have any path portability problems for most common operating systems. * Cons 1 : enabling name checking by default prevents users doing programs with no name constraints. Those non-portable program might constitute the majority of programs written using Boost::filesystem. (http://tinyurl.com/5mb5r) * Cons 2 : checking implies a performance hit (see http://tinyurl.com/6esmw) a - no_check * Pro 1 : Does not put any constraints on the users with no need of portability which should represent the majority of users. (See http://tinyurl.com/5mb5r) * Pro 2 : The use of the option "no_check" is explicit (compared to "native" or "portable_name"). b - native : * Pro 1 : Less restrictive that "portable_name" but still realise some checks for the native operating system. * Cons 1 : the name "native" is confusing. Not explicit enough that native check it will check on the operating system, not on the file system (http://tinyurl.com/6esmw). * Cons 2 : Give an illusion of "security"/ "portability" on the same operating system. This is false, as the check is done on the operating system, not on the file system. Thus, on the same operating system the portability depends on the file system used. An example of this problem was given in a post by Beman (See http://tinyurl.com/635yn). d - Different matters and their importance Now that, here is a table summarizing which of those arguments are important for the users wanting to use the library for doing portable programs and those important for the "common users" with no need for portability. This is my point of view, and I would be interested in knowing which is yours. The explanation about the different terms used are given below. ! ! portable users ! common users ! !---------------------------------------------------------------! ! explicit ! + ! ++ ! ! portability check ! +++ ! - ! ! OS check ! - ! + ! ! fs check ! - ! ++ ! ! performance hit ! ++ ! ++ ! ! security illusion ! ++ ! + ! ! ease of use ! +++ ! +++ ! - explicit name : The name of the option (native, no_check...) is self explanatory about the way the check is done. common users (++) : this is important so that they are able to use the library without having to read in detail the documentation. portable users (+) : They will have to read the documentation more in detail in order to know how to produce portable path. Hence, that the name is explicit is a less important criteria for choosing a default value. - portability check : Checks that path will be valid over the most popular platforms (POSIX, Windows). This is the check done by "portable_name". common users (-) : for common users that any check related to portability is of no importance as they won't port their programs. portable users (+++) : this is one of the most important criteria. - OS check : Checks that for the current OS, paths are valid. common users (+) : if paths are not accepted, the program will not work. I only put (+) because often users know which characters are valid for their OS. portable users (-) : When writing portable programs, you do not really care that it is valid for your OS : it should be valid for all OS. This is covered by the portability check. - fs check : Checks the validity of the path for the file system the path tries to access. I suppose that this check should be done at runtime when trying to work with a particular file or directory. common users (++) : if paths are not accepted, the program will not work. I put (++) because less users know which characters are valid for the different file systems they manipulate. portable users (-) : When writing portable programs, you do not really care that it is valid for your fs : it should be valid for all fs. This is covered by the portability check. - ease of use : Is the library easy to use according to the users aims ? Does the user has to write many lines of code in order to achieve what he wants ? all users (+++) : This is the most important feature. If a library does not provide nice interfaces for the users, that he has to reconfigure it all the time so that the checks succeed - performance hit : Does using a check implies a performance hit ? all users (++) : This is also an important feature. When coding a program, users always want it to run fast. This is an important criteria but less important that the ease of use, especially if the performance hit is mild. - security illusion : Does the program gives the illusion that it will work correctly on different platforms ? common user (+) : This is not a big deal, as portability is not the first problem when writing a program. portable user (++) : If the program written in order to be portable just give an illusion of such a behavior, this would be a real problem. e - Different criteria and their availability Here is a table representing for each of the three options there behavior for the different criteria listed above. The characters '+', '=' and '-' represent the "points" given to the criteria if it is available for an option. The criteria ease-of use is separated for the two kind of users. table of "appearance" : ! ! portable_name ! native ! no_check !-------------------------------------------------------------------! ! explicit name ! + ! - ! ++ ! ! portability check ! ++ ! + ! - ! ! OS check ! ++ ! ++ ! - ! ! fs check ! - ! - ! - ! ! no performance hit ! - ! - ! ++ ! ! no security ill. ! - ! - ! ++ ! common users : ! ease of use ! - ! ++ ! ++ ! Portable users : ! ease of use ! ++ ! - ! - ! f - The best option We can now try to satisfy the most users by comparing the criteria in a "mathematical" way. As a first though, we could suppose that the majority of the users would use boost::filesystem for non-portable programs. Let say 80% will write programs without any need for portability. From this we can then deduce which criteria will fit the best. For each option, and for each kind of users, we calculate it in the following way : option = explicit importance * appearance + portability_check importance * appearance + OS check importance * appearance + fs check importance * appearance + ease of use importance * appearance + no performance hit importance * appearance + no security ill. importance * appearance We use the table and choose the following : + = 1 point - = 0 points Then, We find : common users : ! ! portable_name ! native ! no_check ! !------------------------------------------------------------------! ! explicit ! 2 * 1 ! 2 * 0 ! 2 * 2 ! ! portability check ! + 0 * 2 ! + 0 * 1 ! + 0 * 0 ! ! OS check ! + 1 * 2 ! + 1 * 2 ! + 1 * 0 ! ! fs check ! + 2 * 0 ! + 2 * 0 ! + 2 * 0 ! ! performance hit ! + 2 * 0 ! + 2 * 0 ! + 2 * 2 ! ! security illusion ! + 1 * 0 ! + 1 * 0 ! + 1 * 2 ! ! ease of use ! + 3 * 0 ! + 3 * 2 ! + 3 * 2 ! !------------------------------------------------------------------! ! sum ! 4 ! 8 ! 16 portable users : ! ! portable_name ! native ! no_check ! !------------------------------------------------------------------! ! explicit ! 1 * 1 ! 1 * 0 ! 1 * 2 ! ! portability check ! + 3 * 2 ! + 3 * 1 ! + 3 * 0 ! ! OS check ! + 0 * 2 ! + 0 * 2 ! + 0 * 0 ! ! fs check ! + 0 * 0 ! + 0 * 0 ! + 0 * 0 ! ! performance hit ! + 2 * 0 ! + 2 * 0 ! + 2 * 2 ! ! security illusion ! + 1 * 0 ! + 1 * 0 ! + 1 * 2 ! ! ease of use ! + 3 * 2 ! + 3 * 0 ! + 3 * 0 ! !------------------------------------------------------------------! ! sum ! 13 ! 3 ! 8 We can now use the fact that 80% of the users should be common users. We can then try to calculate which option is to be used most widely : portable_name = 0.8 * 4 + 0.2 * 13 = = 3.2 + 2.6 = 5.8 native = 0.8 * 8 + 0.2 * 3 = 6.4 + 0.6 = 7 no_check = 0.8 * 16 + 0.2 * 8 = 12.8 + 1.6 = 14.4 We can then deduce that the no_check option suits the best to most of the common needs ! g - The native problem In the 1.32 version of boost, the option native has the problem : the checks are done so that the path used will be accepted by the operating system used to compile not by the file system the application will access. This is confusing, as by native, I would expect checks to work on the platform for which you compiled the program. I personally do not think about file systems. In my opinion, this is what confuses the user and gives a security illusion. I read that Beman was working to change the behavior of the native option, so that it would also check on the file system. If we check what are the results doing so, we found that then the native option would fit the best. ! ! portable_name ! native ! no_check ! !---------------------------------------------------------------------! ! explicit name ! + ! ++ ! ++ ! ! portability check ! ++ ! + ! - ! ! OS check ! ++ ! ++ ! - ! ! fs check ! - ! ++ ! - ! ! no performance hit ! - ! - ! ++ ! ! no security ill. ! - ! + ! ++ ! common users : ! ease of use ! - ! ++ ! ++ ! Portable users : ! ease of use ! ++ ! - ! - ! common users : ! ! native ! !---------------------------------------! ! explicit ! 2 * 2 ! ! portability check ! + 0 * 1 ! ! OS check ! + 1 * 2 ! ! fs check ! + 2 * 2 ! ! performance hit ! + 2 * 0 ! ! security illusion ! + 1 * 1 ! ! ease of use ! + 3 * 2 ! !---------------------------------------! ! sum ! 17 ! portable users : ! ! native ! !--------------------------------------! ! explicit ! 1 * 2 ! ! portability check ! + 3 * 1 ! ! OS check ! + 0 * 2 ! ! fs check ! + 0 * 2 ! ! performance hit ! + 2 * 0 ! ! security illusion ! + 1 * 1 ! ! ease of use ! + 3 * 2 ! !--------------------------------------! ! sum ! 15 native = 0.8 * 17 + 0.2 * 15 = 13.6 + 3 = 16.6 If the problem described, with multiple file systems mounted is solved, that would be the best choice, but for now, it would bring many mistakes due to misunderstanding how native works in my opinion. g - limitations This approach is not really objective, as I was the one choosing the weights of the different criteria, and those are just arbitrary choices. This is especially true for the explicit criteria or security illusion. On the other side, it is an approach that has the advantage of "measuring" and giving a solution. II - Some suggestions about native The native option is quite confusing. I had to read through the mailing list before being able to understand how it works. The documentation does not explains in a really explicit way how it works. And I don't thing I am the only one, as Walter was also (See http://tinyurl.com/6esmw). Ideas to solve that : 1 -> Give a more explicit name (OS_native ?). 2 -> Change the documentation so that this point is explained a bit more ? Why not add an example ? The example of Beman Dawes what quite explicit about how that works ! I hope I didn't bored too many of you ! If you read this sentence, that is near to be a miracle ! :-) Thank-you for having taken the time to read through this post ! Cheers, Pierre-Andre Galmes