[filesystem] Decomposition of filenames beginning with period?

Boost.Filesystem class path views filenames as being made up of a stem and an extension. My Windows Users directory includes files ".gitconfig" and ".netrc". The current class path implementation views these as having extensions of ".gitconfig" and ".netrc", respectively, and empty stems. To me, that's a bit counter intuitive. I guess I view filenames as always having stems, but extensions being optional. But I don't view that as a strong enough argument to change anything. Does anyone have a stronger argument than intuition for changing or retaining the current behavior? --Beman

On 04/07/2011 08:14 AM, Beman Dawes wrote:
Not an argument for or against, but what is the stem and extension in 'filename.with.many.of.periods'? I think the rule you apply to this should be consistent with .file

On Thu, Apr 7, 2011 at 08:31, Olaf van der Spek <olafvdspek@gmail.com> wrote:
Are you sure? "if p.filename() contains a dot but does not consist solely of one or to two dots, returns the substring of p.filename() starting at the rightmost dot and ending at the path's end. Otherwise, returns an empty path object." ~ <http://www.boost.org/doc/libs/1_46_1/libs/filesystem/v3/doc/reference.html#path-extension> Sounds like stem is "filename.with.many.of" and extension is ".periods". ~ Scott

Vicente BOTET wrote:
That fits with the behavior, too. First, use gunzip on the .gz file and you get file.tar. Now untar the .tar file. _____ Rob Stewart robert.stewart@sig.com Software Engineer using std::disclaimer; Dev Tools & Components Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

On Thu, Apr 7, 2011 at 10:23, Olaf van der Spek <olafvdspek@gmail.com> wrote:
Hmm, I was thinking about stuff like file.tar.gz
That works for the easy case, but then you get files like this: http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.27.14.tar.bz2 Splitting that as "linux-2" and ".6.27.14.tar.bz2" is really not helpful. Since the only way to handle things like that is a loop, it's more convenient to take extensions off the end, getting ".bz2" then ".tar" by looping the extensions of the stems. Actually, looking at that loop is interesting for this thread, since doing it the other way around would require Beman's proposed change. If you wanted to get "linux-2" then ".6" then ".27" etc by looping the stems of the extensions, then it would have to work by "the extension starts from the first dot that's not the first character in the string"... ~ Scott

On Thu, Apr 7, 2011 at 11:31 AM, Olaf van der Spek <olafvdspek@gmail.com> wrote:
On Thu, Apr 7, 2011 at 5:29 PM, Brian Schrom <brian.schrom@pnl.gov> wrote:
That's not a good choice. Its typical in the Linux world to see files like boost-1.46.tgz where you want to end up with: stem: boost-1.46 extent: tgz -- Stirling Westrup Programmer, Entrepreneur. https://www.linkedin.com/e/fpf/77228 http://www.linkedin.com/in/swestrup http://technaut.livejournal.com http://sourceforge.net/users/stirlingwestrup

Hi, We can define stem and extension in different ways: 1 - stem can be empty, and extension is the part after the last dot. 2 - stem can not be empty, and extension is the part after the last dot after the first character (ensures stem it is not empty) or the part after the first dot after the first character. 3 - stem and extension are not empty and have a sens only when the filename follows the pattern "stem"."extension". For all the other stem and extension is undefined (an exception cn be throw in this cases). Note that there are other files (. and ..) which have not evident stem or extension. Have you considered the 3rd alternative? Best, Vicente

Beman Dawes wrote:
Does anyone have a stronger argument than intuition for changing or retaining the current behavior?
Hi I believe the current behavior is correct. In a scenario where you are looking for files with certain extensions (take .netrc for instance), you want the file .netrc to have netrc extension. In Windows, if there is a dot in the file name, then there is an extension. Another way of seeing the problem: if you are using the extension function, then you expect a result consistent with the majority of OS supporting extensions. Hopefully there are no many different OS interpreting the extension of .netrc differently :-) Note that Windows makes it hard to create files with empty names and an extension, but if you manage to create it then explorer uses the extension. I agree that the behavior is not always intuitive, perhaps both stem and extension function documentation should explicitly mention it. It would be nice if you could add .foo and .foo.bar to the path decomposition table. Thank you for boost::filesystem :-) Best regards Jorge

I agree that the current behavior is consistent with how Windows does things. For example, create a file called ".exe" and look at it with windows explorer. It will be shown as type "Application". Doing a "dir *.exe" from a CMD prompt will include it. It is also what is indicated by the _splitpath function from the MSVC library: Path extracted with _splitpath: Drive: C: Dir: \ Filename: Ext: .exe
Confidentiality Notice: This email, including attachments, may include non-public, proprietary, confidential or legally privileged information. If you are not an intended recipient or an authorized agent of an intended recipient, you are hereby notified that any dissemination, distribution or copying of the information contained in or transmitted with this e-mail is unauthorized and strictly prohibited. If you have received this email in error, please notify the sender by replying to this message and permanently delete this e-mail, its attachments, and any copies of it immediately – you should not retain, copy or use this e-mail or any attachment for any purpose, nor disclose all or any part of the contents to any other person. Thank you.

Beman Dawes wrote:
On *nix OS's, there are no extensions. There are only filenames. Periods are merely human conventions added in various places to separate filename components. Conventionally, filenames beginning with a period are hidden from normal file listings, but periods otherwise have no specific meaning. Thus, referring to stems and extensions is largely meaningless on such OS's. The conventions of OS's that support extensions would seem to win the day. There have been good arguments in favor of the status quo because of Windows. _____ Rob Stewart robert.stewart@sig.com Software Engineer using std::disclaimer; Dev Tools & Components Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Beman Dawes wrote:
Python's os.path.splitext disregards leading periods, so these would have stems of .gitconfig and .netrc and empty extensions. .gitconfig.txt would have an extension of .txt. Windows Explorer does it differently though. Olaf van der Spek wrote:
stem: filename extension: .with.many.of.periods
The typical answer here is stem: filename.with.many.of extension: .periods

On Thu, Apr 7, 2011 at 08:14, Beman Dawes <bdawes@acm.org> wrote:
Does anyone have a stronger argument than intuition for changing or retaining the current behavior?
I like the existing way, actually. It fits with two nice ways of looking at things: - A *nix hidden file is one with no name, after removing all extensions. - A file like .gitignore is a file of type "gitignore" that nobody bothered to name. Conceptually, it would make sense to have multiple *.gitignore files, and the one without a stem is just the default name. (And that would allow putting, say, all the auto-generated ones in one file, and a manually-maintained list in another.) ~ Scott

On Thu, 2011-04-07 at 10:27 -0700, Scott McMurray wrote:
I think this is a misleading view. It would be better to expose an "is_hidden" function on (probably?) file_status that applied platform specific rules: on Unix, does the file begin with ".", on Windows, is the Hidden attribute set.
- A file like .gitignore is a file of type "gitignore" that nobody bothered to name.
Arguable. Equally valid is the view that it is a file called ".gitignore" which has no extension. As has been mentioned elsewhere, Unix really doesn't care about extensions (it uses the much more sensible approach of magic numbers to determine file-type). People already have to code around files not having extensions (if they want vaguely portable code, anyway) - an awful lot of my Unix files don't have extensions. Having done a quick straw-poll of my own dot-files with no extensions, they are either: * bespoke config files (mostly from older applications) * some scripts * text files (histories) * and... *mostly* they are INI-style config files
Except that that isn't the case for pretty much any Unix dot-file that I've ever heard of. Do you have a non-conceptual real-world example? I think the library should code against "normal" coding practise - and for that reason, I'd argue that .gitignore has stem .gitignore, and an empty extension. However, I can happily cope with the status-quo, too. Phil -- Phil Richards, <news@derived-software.ltd.uk>

On Apr 9, 2011, at 12:56 AM, Phil Richards wrote:
I'd argue that .gitignore doesn't have an extension at all, empty or otherwise. Nor does "foo.", unless you can associate the empty extension with an application which handles such files when opened. In other words, I'd say any filename beginning or ending with a dot, or lacking one entirely, doesn't have an extension. Josh

On Mon, 2011-04-18 at 00:05 -0700, Joshua Juran wrote:
I was referring to the usage of "extension" as used in the Boost.Filesystem library, where the extension() call returns the leading ".".
In other words, I'd say any filename beginning or ending with a dot, or lacking one entirely, doesn't have an extension.
IMO, this is wrong - under Unix, anyway. The two files "foo" and "foo." are different files. It would be odd if they both had the same stem *and* the same extension (as well as every other bit of the paths being identical). Phil -- Phil Richards, <news@derived-software.ltd.uk>

Hi, for me neither foo neither foo. has an stem and an extension. We can not have an stem without extension and vice versa. The best solution in these cases is that the stem extension functions throw an exception. Best, Vicente

On Apr 18, 2011, at 1:34 PM, Vicente BOTET wrote:
Your scenario of filenames with matching extensions contradicts my stipulation that each filename doesn't have an extension in the first place. You're responding as if I'd said ".gitignore has an empty extension", rather than ".gitignore doesn't have an extension at all, empty or otherwise."
This position is a refinement of my own. I agree that extension( "foo" ) should throw, but I could see stem( "foo" ) returning "foo". I don't have an opinion on whether it should or not. Josh
participants (13)
-
Beman Dawes
-
Brian Schrom
-
Jorge Lodos Vigil
-
Joshua Juran
-
Kenny Riddile
-
Olaf van der Spek
-
Paul Rose
-
Peter Dimov
-
Phil Richards
-
Scott McMurray
-
Stewart, Robert
-
Stirling Westrup
-
Vicente BOTET