[filesystem] temp_directory_path() behavior on Windows
temp_directory_path() on Windows is currently implemented by calling the Windows API GetTempPath() function. Ticket https://svn.boost.org/trac/boost/ticket/5300 points out that GetTempPath() does not work as expected for environmental variables longer than roughly 130 characters. I've added a test to Boost.Filesystem that verifies that boost::filesystem::temp_directory() is affected. The suggested fix is to use GetEnvironmentVariable to in effect implement GetTempPath() the way we would like it to work. That makes sense to me, so I looked up the specs for GetTempPath(): The GetTempPath function checks for the existence of environment variables in the following order and uses the first path found: 1. The path specified by the TMP environment variable. 2. The path specified by the TEMP environment variable. 3. The path specified by the USERPROFILE environment variable. 4. The Windows directory. (4) Makes me queasy. I suspect it will fail anyhow because of permissions errors, but I would prefer to have temp_directory_path() report an error if 1, 2, and 3 are not present. temp_directory_path() already reports an error if a path is found but it is not a directory. OTOH, excluding the Windows directory is a breaking change for anyone currently depending on that behavior, so I thought it best to ask for comments before charging ahead. Thoughts? --Beman
Beman Dawes wrote:
temp_directory_path() on Windows is currently implemented by calling the Windows API GetTempPath() function.
Ticket https://svn.boost.org/trac/boost/ticket/5300 points out that GetTempPath() does not work as expected for environmental variables longer than roughly 130 characters. I've added a test to Boost.Filesystem that verifies that boost::filesystem::temp_directory() is affected.
The suggested fix is to use GetEnvironmentVariable to in effect implement GetTempPath() the way we would like it to work.
Not that the documented behavior of GetTempPath makes any sense to me - the default temp directory is at %LOCALAPPDATA%\Temp since Win95 or so - but the suggested fix is actually: "A workaround is to first try to use GetEnvironmentVariable on "TMP" and "TEMP", then fall back on GetTempPath." So, if implemented, it makes
OTOH, excluding the Windows directory is a breaking change for anyone currently depending on that behavior, so I thought it best to ask for comments before charging ahead.
a non-issue, unless you want to avoid calling GetTempPath entirely, in which case I would return %LOCALAPPDATA%\Temp at step 3, obtained via SHGetFolderPath.
On January 31, 2015 11:29:38 AM EST, Peter Dimov
temp_directory_path() on Windows is currently implemented by calling
Windows API GetTempPath() function.
Ticket https://svn.boost.org/trac/boost/ticket/5300 points out that GetTempPath() does not work as expected for environmental variables longer than roughly 130 characters. I've added a test to Boost.Filesystem
Beman Dawes wrote: the that
verifies that boost::filesystem::temp_directory() is affected.
The suggested fix is to use GetEnvironmentVariable to in effect implement GetTempPath() the way we would like it to work.
Not that the documented behavior of GetTempPath makes any sense to me - the default temp directory is at %LOCALAPPDATA%\Temp since Win95 or so - but the suggested fix is actually:
"A workaround is to first try to use GetEnvironmentVariable on "TMP" and "TEMP", then fall back on GetTempPath."
You'll have to compare the strings before and after calling GetEnvironmentVariable(). It doesn't indicate whether any substitutions were made and the result could be the same length as the original.
So, if implemented, it makes
OTOH, excluding the Windows directory is a breaking change for anyone currently depending on that behavior, so I thought it best to ask for comments before charging ahead.
a non-issue, unless you want to avoid calling GetTempPath entirely, in which case I would return %LOCALAPPDATA%\Temp at step 3, obtained via SHGetFolderPath.
+1 _______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
___ Rob (Sent from my portable computation engine)
On Sun, Feb 1, 2015 at 7:53 PM, Rob Stewart
On January 31, 2015 11:29:38 AM EST, Peter Dimov
wrote: Beman Dawes wrote:
The suggested fix is to use GetEnvironmentVariable to in effect implement GetTempPath() the way we would like it to work.
Not that the documented behavior of GetTempPath makes any sense to me - the default temp directory is at %LOCALAPPDATA%\Temp since Win95 or so - but the suggested fix is actually:
"A workaround is to first try to use GetEnvironmentVariable on "TMP" and "TEMP", then fall back on GetTempPath."
You'll have to compare the strings before and after calling GetEnvironmentVariable(). It doesn't indicate whether any substitutions were made and the result could be the same length as the original.
I'm not actually going to fall back to GetTempPath. My tentative implementation using only GetEnvironmentVariable() is much cleaner. See below for the proposed list of env variables.
So, if implemented, it makes
OTOH, excluding the Windows directory is a breaking change for anyone currently depending on that behavior, so I thought it best to ask for comments before charging ahead.
a non-issue, unless you want to avoid calling GetTempPath entirely, in which case I would return %LOCALAPPDATA%\Temp at step 3, obtained via SHGetFolderPath.
+1
SHGetFolderPath is deprecated, and in any case it is very simple and clean just to use a list of directories with GetEnvironmentVariableW(). My current implementation uses this list (shown with the defaults for clean Win 7 and Win 10 installs: TMP=C:\Users\Beman\AppData\Local\Temp TEMP=C:\Users\Beman\AppData\Local\Temp LOCALAPPDATA=C:\Users\Beman\AppData\Local USERPROFILE=C:\Users\Beman Per Peter's suggestion, inserting %LOCALAPPDATA%\Temp would make historic sense. I'm mildly in favor of that. There are two other default environmental variables we could use: SystemRoot=C:\Windows windir=C:\Windows and there are also API functions GetWindowsDirectory and GetSystemWindowsDirectory, but their use is "primarily for compatibility with legacy applications". But I'm against using the Windows directory, however obtained. Thanks, --Beman
Beman Dawes wrote:
SHGetFolderPath is deprecated, and in any case it is very simple and clean just to use a list of directories with GetEnvironmentVariableW().
If you don't have TEMP and TMP, it's quite likely that you aren't going to have LOCALAPPDATA in the environment either, which is why I suggested SHGetFolderPath.
On 31.1.2015. 16:07, Beman Dawes wrote:
4. The Windows directory.
(4) Makes me queasy.
Given that there is a C:\Windows\Temp directory (and it gets recreated by Windows if you delete it even if you divert the system TEMP and TMP env. variables to a different directory) the documentation 'probably' meant "The Windows\Temp directory" not the Windows directory itself... -- Domagoj Saric Software Architect www.LittleEndian.com
On Tue, Feb 3, 2015 at 3:51 AM, Domagoj Saric
On 31.1.2015. 16:07, Beman Dawes wrote:
4. The Windows directory.
(4) Makes me queasy.
Given that there is a C:\Windows\Temp directory (and it gets recreated by Windows if you delete it even if you divert the system TEMP and TMP env. variables to a different directory) the documentation 'probably' meant "The Windows\Temp directory" not the Windows directory itself...
Interesting. Makes sense. The Windows directory can be determined by GetSystemWindowsDirectoryW, so that means no dependency at all on the state of environmental variables. Thanks, --Beman
On 3 Feb 2015 at 6:32, Beman Dawes wrote:
Given that there is a C:\Windows\Temp directory (and it gets recreated by Windows if you delete it even if you divert the system TEMP and TMP env. variables to a different directory) the documentation 'probably' meant "The Windows\Temp directory" not the Windows directory itself...
Interesting. Makes sense.
The Windows directory can be determined by GetSystemWindowsDirectoryW, so that means no dependency at all on the state of environmental variables.
Corporate installs will often set USERPROFILE to a LAN samba share and TMP/TEMP to a local ramdisk or directory cleared on logout. C:\Windows will be completely read only. There is also the use case under Terminal Services where there are two Windows directories, GetWindowsDirectory and GetSystemWindowsDirectory. You also can't assume that any environment variables are set. If you're running as a system service you may have a blank environment and no user profile/home directory. Also, any of the Windows shell function SHxxx() will fail in this use case. For reference, when AFIO gains temporary file dispatchers in v1.4, I was going to use the following schema as I cannot use GetTempPath (AFIO exclusively uses extended NT kernel paths and cannot use the 260 length win32 ones): 1. %TMP% 2. %TEMP% 3. %LOCALAPPDATA%\Temp 4. %USERPROFILE%\Temp 5. %HOME%\Temp 6. %ALLUSERSPROFILE%\Temp 7. %SystemRoot%\Temp 8. GetWindowsDirectory()\Temp 9. %SystemDrive%\Users\Public\Temp It would have to be a very borked situation if one of those didn't work. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On February 3, 2015 7:12:59 AM EST, Niall Douglas
On 3 Feb 2015 at 18:54, Rob Stewart wrote:
Given all of the restrictions you've enumerated, it would seem that the right behavior is actually to test the existence of a possible result and the caller's permission to use it before returning.
I think it's faster to iterate all ten in that order actually. The big problem with caching results is that if your temp drive is on a network share, it can come and go over the lifetime of your process. In the end, temp files are slow on Windows, as is opening file handles at all actually. That's because on NT you were never supposed to use temp files when you have a NT kernel namespace to use (i.e. named section objects). Unfortunately, those don't play well without a bit of work with iostreams, fopen et al. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On February 3, 2015 7:32:37 PM EST, Niall Douglas
On 3 Feb 2015 at 18:54, Rob Stewart wrote:
Given all of the restrictions you've enumerated, it would seem that the right behavior is actually to test the existence of a possible result and the caller's permission to use it before returning.
I think it's faster to iterate all ten in that order actually. The big problem with caching results is that if your temp drive is on a network share, it can come and go over the lifetime of your process.
In the end, temp files are slow on Windows, as is opening file handles at all actually. That's because on NT you were never supposed to use temp files when you have a NT kernel namespace to use (i.e. named section objects). Unfortunately, those don't play well without a bit of work with iostreams, fopen et al.
I don't understand your response given what I wrote. I meant that those options would be tried in order to see if they resolve to a valid directory the caller has permissions to use and, if not, try the next. I'll grant that a network resource may be transient, but that can't be helped. ___ Rob (Sent from my portable computation engine)
On 4/02/2015 15:04, Rob Stewart wrote:
On February 3, 2015 7:32:37 PM EST, Niall Douglas
wrote: On 3 Feb 2015 at 18:54, Rob Stewart wrote:
Given all of the restrictions you've enumerated, it would seem that the right behavior is actually to test the existence of a possible result and the caller's permission to use it before returning.
I think it's faster to iterate all ten in that order actually. The big problem with caching results is that if your temp drive is on a network share, it can come and go over the lifetime of your process.
In the end, temp files are slow on Windows, as is opening file handles at all actually. That's because on NT you were never supposed to use temp files when you have a NT kernel namespace to use (i.e. named section objects). Unfortunately, those don't play well without a bit of work with iostreams, fopen et al.
I don't understand your response given what I wrote. I meant that those options would be tried in order to see if they resolve to a valid directory the caller has permissions to use and, if not, try the next.
Don't you have to touch the filesystem to do that? I can imagine many cases where user code might want to obtain the path in contexts where they don't want to hit the filesystem at all (which is why the WinAPI call does not perform that check). The context that wants to obtain the path is not necessarily the one that wants to make use of it.
On 4 Feb 2015 at 16:16, Gavin Lambert wrote:
I think it's faster to iterate all ten in that order actually. The big problem with caching results is that if your temp drive is on a network share, it can come and go over the lifetime of your process.
In the end, temp files are slow on Windows, as is opening file handles at all actually. That's because on NT you were never supposed to use temp files when you have a NT kernel namespace to use (i.e. named section objects). Unfortunately, those don't play well without a bit of work with iostreams, fopen et al.
I don't understand your response given what I wrote. I meant that those options would be tried in order to see if they resolve to a valid directory the caller has permissions to use and, if not, try the next.
Don't you have to touch the filesystem to do that? I can imagine many cases where user code might want to obtain the path in contexts where they don't want to hit the filesystem at all (which is why the WinAPI call does not perform that check).
The context that wants to obtain the path is not necessarily the one that wants to make use of it.
Yep, Gavin has nailed my very poor original explanation (it was like 1am when I wrote it, and I have been burning the candle at both ends for two weeks now as I try to push out a library release which seems to have a never ending tail of small release quality issues). I had thought you were recommending an ACL lookup for path viability. No I meant that temp_directory_path() basically needs to iterate those options I listed. If the given environment variable does not expand to a valid path (note no FS operations needed here), we move to the next option. We don't check for feasibility. For the function which creates temp files for you, it may be useful to know what I'll be doing there. AFIO won't provide any facility for knowing about paths of temp files until after you have created one. Instead you get a dispatcher implementation which creates either anonymous temp files or named temp files at some one of many unknown locations until after file creation (the unknown locations which can be chosen from is user suppliable, so I guess if you supply a single option you get some control), plus you can request "fast moveability" to another given path i.e. that the temp file location be chosen to be on the same volume as the path supplied where possible, thus ensuring that mv works quickly. I just added the support code for that via the new statfs() call in AFIO v1.3 which might ship this week or next week. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On February 4, 2015 6:35:33 AM EST, Niall Douglas
On 4 Feb 2015 at 16:16, Gavin Lambert wrote:
I think it's faster to iterate all ten in that order actually. The big problem with caching results is that if your temp drive is on a network share, it can come and go over the lifetime of your process.
In the end, temp files are slow on Windows, as is opening file handles at all actually. That's because on NT you were never supposed to use temp files when you have a NT kernel namespace to use (i.e. named section objects). Unfortunately, those don't play well without a bit of work with iostreams, fopen et al.
I don't understand your response given what I wrote. I meant that those options would be tried in order to see if they resolve to a valid directory the caller has permissions to use and, if not, try the next.
Don't you have to touch the filesystem to do that? I can imagine many cases where user code might want to obtain the path in contexts where they don't want to hit the filesystem at all (which is why the WinAPI call does not perform that check).
The context that wants to obtain the path is not necessarily the one that wants to make use of it.
Yes, I was suggesting that.
I had thought you were recommending an ACL lookup for path viability.
I was thinking that would be useful while we were at it, but I recognize that's easily overreaching.
No I meant that temp_directory_path() basically needs to iterate those options I listed. If the given environment variable does not expand to a valid path (note no FS operations needed here), we move to the next option. We don't check for feasibility.
How would you judge that the expansion is a valid path? A regex or something else?
For the function which creates temp files for you, it may be useful to know what I'll be doing there. AFIO won't provide any facility for knowing about paths of temp files until after you have created one. Instead you get a dispatcher implementation which creates either anonymous temp files or named temp files at some one of many unknown locations until after file creation (the unknown locations which can be chosen from is user suppliable, so I guess if you supply a single option you get some control), plus you can request "fast moveability" to another given path i.e. that the temp file location be chosen to be on the same volume as the path supplied where possible, thus ensuring that mv works quickly. I just added the support code for that via the new statfs() call in AFIO v1.3 which might ship this week or next week.
Sometimes the user needs full control over the location of temp files so they can be manually deleted when left by applications. For those that are unlinked after opening (there's a flag for that on Windows I don't recall), that's not an issue, of course. However, the code getting the temp path knows which will be done, so must be given control. ___ Rob (Sent from my portable computation engine)
On 4 Feb 2015 at 7:37, Rob Stewart wrote:
I had thought you were recommending an ACL lookup for path viability.
I was thinking that would be useful while we were at it, but I recognize that's easily overreaching.
Also, Windows can only open 30k file handles per second even across eight CPU cores. It's an enormous performance bottleneck. And you can't check ACLs without opening the file.
No I meant that temp_directory_path() basically needs to iterate those options I listed. If the given environment variable does not expand to a valid path (note no FS operations needed here), we move to the next option. We don't check for feasibility.
How would you judge that the expansion is a valid path? A regex or something else?
Perhaps even simpler. Does the environment variable exist, and does it have contents.
For the function which creates temp files for you, it may be useful to know what I'll be doing there. AFIO won't provide any facility for knowing about paths of temp files until after you have created one. Instead you get a dispatcher implementation which creates either anonymous temp files or named temp files at some one of many unknown locations until after file creation (the unknown locations which can be chosen from is user suppliable, so I guess if you supply a single option you get some control), plus you can request "fast moveability" to another given path i.e. that the temp file location be chosen to be on the same volume as the path supplied where possible, thus ensuring that mv works quickly. I just added the support code for that via the new statfs() call in AFIO v1.3 which might ship this week or next week.
Sometimes the user needs full control over the location of temp files so they can be manually deleted when left by applications. For those that are unlinked after opening (there's a flag for that on Windows I don't recall), that's not an issue, of course. However, the code getting the temp path knows which will be done, so must be given control.
Ah, you strike exactly at a major new feature hopefully in AFIO v1.4. Getting temp files to behave on a Samba or NFS share which is being accessed concurrently by Windows and POSIX is a nest of vipers. AFIO will hopefully abstract away all that complexity. You get temp files which "just work" and you can forget about clean up. The same code obviously enough also works on a local drive. And, interestingly, to make all of the above work we need a non-braindead, functioning and portable byte range file advisory locks abstraction layer, so you'll have one of those too (they are used to portably detect when a temp file is no longer being used, and therefore can be cleaned up. The special delete on close flag on Windows you mentioned does not work correctly on a SMB share, including on Microsoft's own implementation, so we can't use that. Windows also does not let you delete files with open handles [1]). [1]: Actually it does if carefully asked to, AFIO uses that feature for its lock file implementation. Unfortunately the name of the deleted file continues to exist though hidden from view and any attempt to create a file of the same name will fail with a very unspecific win32 error code of ACCESS_DENIED. The workaround to match POSIX unlink semantics is to rename the file to a random hex name, _then_ to delete it. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On Wed, Feb 4, 2015 at 7:56 AM, Niall Douglas
On 4 Feb 2015 at 7:37, Rob Stewart wrote:
I had thought you were recommending an ACL lookup for path viability.
I was thinking that would be useful while we were at it, but I recognize that's easily overreaching.
Also, Windows can only open 30k file handles per second even across eight CPU cores. It's an enormous performance bottleneck. And you can't check ACLs without opening the file.
No I meant that temp_directory_path() basically needs to iterate those options I listed. If the given environment variable does not expand to a valid path (note no FS operations needed here), we move to the next option. We don't check for feasibility.
How would you judge that the expansion is a valid path? A regex or something else?
Perhaps even simpler. Does the environment variable exist, and does it have contents.
That's the obvious rule, and what I coded up originally. I may well revert to that. Thanks, --Beman
On 4.2.2015. 1:32, Niall Douglas wrote:
I think it's faster to iterate all ten in that order actually. The big problem with caching results is that if your temp drive is on a network share, it can come and go over the lifetime of your process.
Is there actually 'anyone out there' that does this? Isn't a temporary on a network/removable drive kind of oxymoronic? -- Domagoj Saric Software Architect www.LittleEndian.com
On 4 Feb 2015 at 14:44, Domagoj Saric wrote:
On 4.2.2015. 1:32, Niall Douglas wrote:
I think it's faster to iterate all ten in that order actually. The big problem with caching results is that if your temp drive is on a network share, it can come and go over the lifetime of your process.
Is there actually 'anyone out there' that does this? Isn't a temporary on a network/removable drive kind of oxymoronic?
Not at all. Think about thin clients which have virtually no local storage, and exclusively boot from the network. They're very common in everything from schools to government. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 4.2.2015. 15:00, Niall Douglas wrote:
On 4 Feb 2015 at 14:44, Domagoj Saric wrote:
On 4.2.2015. 1:32, Niall Douglas wrote:
I think it's faster to iterate all ten in that order actually. The big problem with caching results is that if your temp drive is on a network share, it can come and go over the lifetime of your process.
Is there actually 'anyone out there' that does this? Isn't a temporary on a network/removable drive kind of oxymoronic?
Not at all. Think about thin clients which have virtually no local storage, and exclusively boot from the network. They're very common in everything from schools to government.
Doesn't a loss of connection to the network/removable drive necessarily mean that the whole system goes down (so handling temps in that situation is kind of moot)..? -- Domagoj Saric Software Architect www.LittleEndian.com
On 4 Feb 2015 at 15:04, Domagoj Saric wrote:
Is there actually 'anyone out there' that does this? Isn't a temporary on a network/removable drive kind of oxymoronic?
Not at all. Think about thin clients which have virtually no local storage, and exclusively boot from the network. They're very common in everything from schools to government.
Doesn't a loss of connection to the network/removable drive necessarily mean that the whole system goes down (so handling temps in that situation is kind of moot)..?
That's not really a call for library code to make. We simply have to try our best to be resilient to whatever weirdnesses client code puts us in. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 4.2.2015. 16:10, Niall Douglas wrote:
On 4 Feb 2015 at 15:04, Domagoj Saric wrote:
Doesn't a loss of connection to the network/removable drive necessarily mean that the whole system goes down (so handling temps in that situation is kind of moot)..?
That's not really a call for library code to make. We simply have to try our best to be resilient to whatever weirdnesses client code puts us in.
By that rationale our code should be prepared to 'handle' someone trying to hot-swap a stick of RAM regardless of the fact that this will at best freeze the system or at worst fry the motherboard. Of course library developers are called to make judgment calls on what's 'sane' when modeling a given system... In the same vein, TEMP and TMP are system 'prescribed' variables which are supposed to exist. If they don't, the system is broken "and it's not really a call for Boost" to bloat everybodies binaries trying to workaround this hypothetical situation. -- Domagoj Saric Software Architect www.LittleEndian.com
On 4.2.2015. 15:00, Niall Douglas wrote:
On 4 Feb 2015 at 14:44, Domagoj Saric wrote:
On 4.2.2015. 1:32, Niall Douglas wrote:
I think it's faster to iterate all ten in that order actually. The big problem with caching results is that if your temp drive is on a network share, it can come and go over the lifetime of your process.
Is there actually 'anyone out there' that does this? Isn't a temporary on a network/removable drive kind of oxymoronic?
Not at all. Think about thin clients which have virtually no local storage, and exclusively boot from the network. They're very common in everything from schools to government.
They run Windows on these thin clients w/o virtual memory? -- Domagoj Saric Software Architect www.LittleEndian.com
On 16 Feb 2015 at 10:44, Domagoj Saric wrote:
Not at all. Think about thin clients which have virtually no local storage, and exclusively boot from the network. They're very common in everything from schools to government.
They run Windows on these thin clients w/o virtual memory?
Of course library developers are called to make judgment calls on what's 'sane' when modeling a given system...
In the same vein, TEMP and TMP are system 'prescribed' variables which are supposed to exist. If they don't, the system is broken "and it's not really a call for Boost" to bloat everybodies binaries trying to workaround
The NT kernel has been capable of running inside a fixed memory allocation since its inception. On Windows Phone for example, the kernel has exactly 160Mb, no more, no less. "Windows for Warships" from the 1990s had a fixed memory allocation. Cash ATMs running Windows similarly so. And so on. So the answer is yes. Remember the applications available tend to be very limited, basically Office, Internet Explorer and a few proprietary apps. You certainly can't install anything, and you usually can't run more than two apps at a time. this
hypothetical situation.
As I mentioned, Win32 services are often run with no environment variables. And launching a child process on Windows using Boost.Process will more often than not unintentionally give the child an empty environment due to quirks. And what bloat is there here? A few opcodes at most. Come back to when there are megabytes of bloat, then I might care.
2) Boost.FS is already more than bloated
I certainly wouldn't call Filesystem bloated! If anything, it verges on being so lightweight as to not be sufficiently useful for many common tasks
By bloated I didn't refer to its feature set (features and options coded so that they obey the "you don't pay for it if you don't use it" rule are "power" not "bloat" in my book) more about its implementation or "binary bloat" or being the next "iostreams" when it gets into the standard :/ http://lists.boost.org/Archives/boost/2011/08/184823.php Didn't personally look at it for years, but when I did, I found it doing things like constructing std::strings out of C strings just so that it can call find_first_of() on them "for crying out loud"...It's 'careless' use of STL containers and all the inivisible EH baggage this creates, etc...
It seems to me all that goes away if Filesystem becomes header only.
With that approach C++ will never be widely accepted as the 'one true all purpose language' (e.g. would you really use iostreams, filesystem or even std::function in the kernel of 'the next great OS'?)...
There are many reasons why C++ shoots itself in the foot as a systems programming language - particularly in its poor ABI management. I also greatly dislike its haughty attitude to being more friendly to system programming consumers like language runtimes where we as a community often seem to go out of out way to build ourselves C++ only ivory towers where it really wouldn't kill us to think slightly bigger, especially where the consequences on the C++ are negligible. However are any of the items you mentioned are not an issue in a next gen OS platform. My current client happens to be building a next gen OS platform, and they use all three of your items very extensively. Are these uses without problem? No, not at all ... but they are better than the alternatives. As warty as especially iostreams is, it is very well tested, behaves predictably on all platforms, and the average C++ engineer you interview doesn't need to be trained up in its use, which is a rare thing for most C++ technologies.
Note that this is especially problematic on Windows that does not have a standard prebuilt/included C++ library so most people resort to linking statically with the CRT (which with Boost.FS brings in a ton of code)...Kind of 'embarassing' that a hello world Boost.FS console program will weigh in hundreds of kilobytes...
Thankfully VS2015 finally kills off the static MSVCRT. You'll get DLLs and DLLs only from VS2015 onwards, and it's long overdue IMHO. See http://blogs.msdn.com/b/vcblog/archive/2014/06/10/the-great-crt-refact oring.aspx.
...I would 'like' that the default implementation stays "as simple as possible but not simpler" and add a separate function or set of functions for the various special and edge cases. For example you could add a FindFile-like API that would give you a token with which it would allow you to iterate possible temp. paths until a working one is found [if the one returned in the last call isn't useable you'd call something like get_next_temp( current_temp_path_iteration_state );]...and/or you could provide the same with a container/begin-end interface...
Sounds like a recipe for racy temp file code with security holes.
Honestly, how's that?
Any iterator based filesystem parser is racy by definition. Only snapshot based filesystem parsing has some chance of not being racy. My single biggest issue with Filesystem is the lack of hard guarantees for raciness. I think for each and every API guarantees need to be stated just as you would give complexity guarantees and exception guarantees. Even if the guarantee is "this API is completely useless when the filing system is changing". For the record, I'm eating my own breakfast on this, and the v1.3 AFIO release will contain a long document providing race guarantees per supported platform. I'll admit that preparing this document is very tedious. On the other hand, I've discovered a wealth of race bugs in AFIO (and POSIX) as a result, in fact last night I lodged a bug with FreeBSD as they are particularly borked due to having no way of fetching the current path of an open file descriptor (and being the only major OS not to have that). Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
[Niall Douglas]
Thankfully VS2015 finally kills off the static MSVCRT. You'll get DLLs and DLLs only from VS2015 onwards
That's incorrect. /MT and /MTd will be fully supported in 2015, just like previous versions. Both the CRT and STL have static libs. James's VCBlog post, which you linked, says exactly this:
While I've named the release DLLs in the list, there are also equivalent debug DLLs and release and debug static CRT libraries for each of these.
Note that the CRT's DLLs, import libs, and static libs have been reorganized and renamed for 2015 RTM. Stephan T. Lavavej Senior Developer - Visual C++ Libraries
On 16 Feb 2015 at 18:30, Stephan T. Lavavej wrote:
[Niall Douglas]
Thankfully VS2015 finally kills off the static MSVCRT. You'll get DLLs and DLLs only from VS2015 onwards
That's incorrect. /MT and /MTd will be fully supported in 2015, just like previous versions. Both the CRT and STL have static libs.
Thanks for the catch Stephan. I missed the mention of the static libs. Out of curiosity, if the MSVCRT is now ABI stable and there is therefore now much gain from shipping bug fixed DLLs, what's the point of keeping static libs? Even on embedded systems like xbox and Windows Phone I don't see the cost benefit. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 17 Feb 2015 at 0:56, Stephan T. Lavavej wrote:
[Niall Douglas]
what's the point of keeping static libs?
Redist unnecessary, single executable, immunity to DLL replacement mischief (consider antivirus products).
Have the problems with mixing binaries compiled against the DLL with others compiled against the static lib been fixed in the new refactor? My big problem with the static libs was always that some selfish vendor would use the static libs and balls everything up for everyone else in the same process space. This is why I'd really like the static libs to permanently go away. OTOH, if those interop problems have been fixed, I might switch to the static libs myself. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On Tue, Feb 17, 2015 at 4:44 AM, Niall Douglas
On 17 Feb 2015 at 0:56, Stephan T. Lavavej wrote:
[Niall Douglas]
what's the point of keeping static libs?
Redist unnecessary, single executable, immunity to DLL replacement mischief (consider antivirus products).
Have the problems with mixing binaries compiled against the DLL with others compiled against the static lib been fixed in the new refactor?
AFAIK DLLs should always use the DLL CRT. -- Olaf
On 17 Feb 2015 at 9:38, Olaf van der Spek wrote:
Have the problems with mixing binaries compiled against the DLL with others compiled against the static lib been fixed in the new refactor?
AFAIK DLLs should always use the DLL CRT.
Agreed. But it's a tragedy of the commons - it takes just one vendor to not do that in their proprietary shipped DLLs. I can see it from their perspective - with static linkage you can ship a single DLL rather than one per MSVC release. And the breakage is better hidden than shipping a DLL which causes multiple MSVCRT DLLs to enter the process. But the key is the adjective "hidden". I also look forward to when WinClang is good enough that I can ship my MSVC compatible DLL which was linked against libstdc++, you can ship yours linked against libc++, and all that is somehow supposed to work without incident against DLLs linked with Dinkumware. Yay .... Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 18/02/2015 01:56, Niall Douglas wrote:
On 17 Feb 2015 at 9:38, Olaf van der Spek wrote:
Have the problems with mixing binaries compiled against the DLL with others compiled against the static lib been fixed in the new refactor?
AFAIK DLLs should always use the DLL CRT.
Agreed. But it's a tragedy of the commons - it takes just one vendor to not do that in their proprietary shipped DLLs.
I'm not sure why that would be a problem. If they ship a DLL linked statically then by definition they are not allowed to use anything from the CRT in their API. If *you* ship a DLL linked statically then *you're* not allowed to use anything from the CRT in your API. If you do want to use the CRT in your API then you are obligated to do one of the following: 1. ship the source to your library so that the application author can compile it against the CRT and compiler of their choice. 2. compile it in-house against *all* of the CRTs that you are expecting the application authors to be using, and then ship all of those binaries. As an application author, you need to make sure that if you are using compiler X with CRT Y that the component vendor either is going to give you the source or includes that exact combination (including version matches) as one of their build outputs. Otherwise you either persuade them to include it or you go look for another component. This is why most components either only use C API or COM API (because those have stable ABIs), or only support specific versions of Visual Studio (and occasionally a limited set of other popular compilers).
I also look forward to when WinClang is good enough that I can ship my MSVC compatible DLL which was linked against libstdc++, you can ship yours linked against libc++, and all that is somehow supposed to work without incident against DLLs linked with Dinkumware. Yay ....
I'm not convinced there will be a stable C++ ABI within my lifetime. (And yes, I know there are efforts going on to create one, but I'm not convinced that such a thing is even possible, let alone desirable.) C++'s type system is far too low-level for its own good in this regard.
On 3.2.2015. 13:12, Niall Douglas wrote:
Corporate installs will often set USERPROFILE to a LAN samba share and TMP/TEMP to a local ramdisk or directory cleared on logout. C:\Windows will be completely read only.
There is also the use case under Terminal Services where there are two Windows directories, GetWindowsDirectory and GetSystemWindowsDirectory.
You also can't assume that any environment variables are set. If you're running as a system service you may have a blank environment and no user profile/home directory. Also, any of the Windows shell function SHxxx() will fail in this use case.
For reference, when AFIO gains temporary file dispatchers in v1.4, I was going to use the following schema as I cannot use GetTempPath (AFIO exclusively uses extended NT kernel paths and cannot use the 260 length win32 ones):
1. %TMP%
2. %TEMP%
3. %LOCALAPPDATA%\Temp
4. %USERPROFILE%\Temp
5. %HOME%\Temp
6. %ALLUSERSPROFILE%\Temp
7. %SystemRoot%\Temp
8. GetWindowsDirectory()\Temp
9. %SystemDrive%\Users\Public\Temp
It would have to be a very borked situation if one of those didn't work.
Considering that: 1) GetTempPath() is not deprecated (which I would take to mean that the 'manufacturer of the OS' holds this function to be sufficient for 'most users') 2) Boost.FS is already more than bloated ...I would 'like' that the default implementation stays "as simple as possible but not simpler" and add a separate function or set of functions for the various special and edge cases. For example you could add a FindFile-like API that would give you a token with which it would allow you to iterate possible temp. paths until a working one is found [if the one returned in the last call isn't useable you'd call something like get_next_temp( current_temp_path_iteration_state );]...and/or you could provide the same with a container/begin-end interface... -- Domagoj Saric Software Architect www.LittleEndian.com
On 4 Feb 2015 at 14:55, Domagoj Saric wrote:
It would have to be a very borked situation if one of those didn't work.
Considering that:
1) GetTempPath() is not deprecated (which I would take to mean that the 'manufacturer of the OS' holds this function to be sufficient for 'most users')
That's a reasonable point. Apart from its hardcoded 260 char limit.
2) Boost.FS is already more than bloated
I certainly wouldn't call Filesystem bloated! If anything, it verges on being so lightweight as to not be sufficiently useful for many common tasks (e.g. ACL support, xattr support, file locking support). It also currently doesn't make strong enough guarantees about raciness on the filesystem in my opinion. And Windows support is currently poor, permissions support is faked instead of making use of Windows's POSIX perms emulation, and I find the ambivalent position on the use of extended paths unfortunate (they should be permanently switched on by default, though I'll accept an ability to toggle them off on a case by case basis).
...I would 'like' that the default implementation stays "as simple as possible but not simpler" and add a separate function or set of functions for the various special and edge cases. For example you could add a FindFile-like API that would give you a token with which it would allow you to iterate possible temp. paths until a working one is found [if the one returned in the last call isn't useable you'd call something like get_next_temp( current_temp_path_iteration_state );]...and/or you could provide the same with a container/begin-end interface...
Sounds like a recipe for racy temp file code with security holes. No, it's better that temp file creation is written into the library and is done correctly without race nor security bugs. Indeed, recent Linuxes add special anonymous temp file semantics to the kernel, saves having to bother generating std::random_device temp file naming plus guarantees deletion on close. Very handy, and something Windows has had for decades. Now the only outliers are BSD and OS X. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 5/02/2015 04:08, Niall Douglas wrote:
On 4 Feb 2015 at 14:55, Domagoj Saric wrote:
1) GetTempPath() is not deprecated (which I would take to mean that the 'manufacturer of the OS' holds this function to be sufficient for 'most users')
That's a reasonable point. Apart from its hardcoded 260 char limit.
Windows in general does not react well to paths exceeding this limit anywhere. While yes, there are ways to bypass the limit, there are many restrictions on this functionality and most applications don't use those alternatives and so will choke if they happen to come across such a path. It's usually limited to very specialised scenarios (eg. backup software). Perhaps this is a chicken-and-egg problem (if it were easier for apps to deal with longer paths, maybe they'd do it, and then eventually everybody would start doing it -- suggesting that a useful goal of a library would be to make this easy) though.
On Wed, Feb 4, 2015 at 4:57 PM, Gavin Lambert
On 5/02/2015 04:08, Niall Douglas wrote:
On 4 Feb 2015 at 14:55, Domagoj Saric wrote:
1) GetTempPath() is not deprecated (which I would take to mean that the 'manufacturer of the OS' holds this function to be sufficient for 'most users')
That's a reasonable point. Apart from its hardcoded 260 char limit.
Windows in general does not react well to paths exceeding this limit anywhere.
The problem reported by the original ticket was that for GetTempPathW the limit is 130 characters, and that is way too small for some applications. And, yes, I did add test code that verified GetTempPathW fails at roughly 130 characters. --Beman
On 4 Feb 2015 at 17:46, Beman Dawes wrote:
That's a reasonable point. Apart from its hardcoded 260 char limit.
Windows in general does not react well to paths exceeding this limit anywhere.
The problem reported by the original ticket was that for GetTempPathW the limit is 130 characters, and that is way too small for some applications. And, yes, I did add test code that verified GetTempPathW fails at roughly 130 characters.
Wow, so that's a 260 *byte* limit then. That's dismal :( Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 5 Feb 2015 at 10:57, Gavin Lambert wrote:
That's a reasonable point. Apart from its hardcoded 260 char limit.
Windows in general does not react well to paths exceeding this limit anywhere. While yes, there are ways to bypass the limit, there are many restrictions on this functionality and most applications don't use those alternatives and so will choke if they happen to come across such a path. It's usually limited to very specialised scenarios (eg. backup software).
You'd be surprised at how wrong you are for this with most command line apps, all of mingw and in fact the Windows Explorer as of Windows 8.1. I have a Jenkins install which goes _way_ over the 260 char limit on Windows, and do you know what the *only* build tool to always puke on that is? Answer: Boost.Build. Everything else, from git to MSVC to Python and Java, works just fine [1]. I can also confirm that Notepad++ is happy too. There are some interesting exceptions though. The biggest is that rmdir /s fails and also refuses to accept the \?\ prefix, as does attempting to delete the whole tree from Explorer - though deleting individual files deep into the hierarchy are fine. You can workaround the rmdir /s limitation using robocopy oddly enough which is happy with long paths, but it is frustrating that the tooling falls short on rm -rf directory trees as it's common enough in CI usage. Another unfortunate issue is that all .NET code is 260 limited. A real missed opportunity there. [1]: When called with the current working directory set inside the deep path. Python's support is incomplete - it auto-prefixes paths when they are too long, but does not implement cwd or relative path substitution with extended paths. The debate on whether to finish this support or push the problem onto the user continues, but if your python program only works with absolute paths and uses os.path for path manipulations, extended paths work.
Perhaps this is a chicken-and-egg problem (if it were easier for apps to deal with longer paths, maybe they'd do it, and then eventually everybody would start doing it -- suggesting that a useful goal of a library would be to make this easy) though.
There is an interesting story on this :) Some time ago I raised with Microsoft the urgent need to bump MAX_PATH in Microsoft libc to something sane like 1024, or god forbid, the actual 32767 that the NT kernel supports. The answer was a definite no as it would be a nightmare of compliance testing with never ending corner case bugs, and this I could believe when you examine the MSVCRT source code. Perhaps their brand new rewritten MSVCRT in VS2015 has significantly improved here. So yes, it is a chicken and egg problem. If the standard library shipped with C++ always uses extended paths, that is an enormous leap forward creating pressure on Microsoft to fix this problem. A lot of other languages like Python and Java already use extended paths, even with reduced functionality as soon as the path exceeds 259. So should C++. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 5/02/2015 14:36, Niall Douglas wrote:
On 5 Feb 2015 at 10:57, Gavin Lambert wrote:
Windows in general does not react well to paths exceeding this limit anywhere. While yes, there are ways to bypass the limit, there are many restrictions on this functionality and most applications don't use those alternatives and so will choke if they happen to come across such a path. It's usually limited to very specialised scenarios (eg. backup software).
You'd be surprised at how wrong you are for this with most command line apps, all of mingw and in fact the Windows Explorer as of Windows 8.1. I have a Jenkins install which goes _way_ over the 260 char limit on Windows, and do you know what the *only* build tool to always puke on that is?
Anything that uses relative paths is relatively immune (pun somewhat intended), provided they're only accessing children or at least near neighbours of the deep path. This actually allows a lot of software to work by coincidence, where the intent of the author was to access a file in the application directory but due to lazy coding the app used a relative path under the assumption that the working dir was the application dir. (An assumption that *usually* holds for GUI apps, but fails often enough to get you into trouble -- and *rarely* holds for command line apps, unless they're not in the PATH.)
Some time ago I raised with Microsoft the urgent need to bump MAX_PATH in Microsoft libc to something sane like 1024, or god forbid, the actual 32767 that the NT kernel supports. The answer was a definite no as it would be a nightmare of compliance testing with never ending corner case bugs, and this I could believe when you examine the MSVCRT source code. Perhaps their brand new rewritten MSVCRT in VS2015 has significantly improved here.
That is a chicken-and-egg problem, but it's a different one than what I was thinking of. (Note that MAX_PATH is defined in the Win32 API, not the CRT.) The problem there is that there are too many existing APIs that write into a buffer without being given a buffer size explicitly (caller is expected to make it at least MAX_PATH chars long, function guarantees to not write more than MAX_PATH chars). This is obviously a problem if the two sides do not agree on the value of MAX_PATH, which would be the case for old application vs. new Windows version if they changed it. Similarly there is a lot of existing code that assumes that even in the APIs that do accept a buffer size, when using MAX_PATH as the buffer size the API "cannot" fail with a buffer size error. (Whether this is good code or not is out of scope -- there's a lot of it in the wild.) These would also do peculiar things if this assumption were violated. Making the C/C++ runtime support extended paths would definitely be a step forward, but it's still very common to completely sidestep the CRT and go straight to the WinAPI, because it has more features.
On 5 Feb 2015 at 15:04, Gavin Lambert wrote:
Some time ago I raised with Microsoft the urgent need to bump MAX_PATH in Microsoft libc to something sane like 1024, or god forbid, the actual 32767 that the NT kernel supports. The answer was a definite no as it would be a nightmare of compliance testing with never ending corner case bugs, and this I could believe when you examine the MSVCRT source code. Perhaps their brand new rewritten MSVCRT in VS2015 has significantly improved here.
That is a chicken-and-egg problem, but it's a different one than what I was thinking of. (Note that MAX_PATH is defined in the Win32 API, not the CRT.)
The problem there is that there are too many existing APIs that write into a buffer without being given a buffer size explicitly (caller is expected to make it at least MAX_PATH chars long, function guarantees to not write more than MAX_PATH chars). This is obviously a problem if the two sides do not agree on the value of MAX_PATH, which would be the case for old application vs. new Windows version if they changed it.
Similarly there is a lot of existing code that assumes that even in the APIs that do accept a buffer size, when using MAX_PATH as the buffer size the API "cannot" fail with a buffer size error. (Whether this is good code or not is out of scope -- there's a lot of it in the wild.) These would also do peculiar things if this assumption were violated.
Making the C/C++ runtime support extended paths would definitely be a step forward, but it's still very common to completely sidestep the CRT and go straight to the WinAPI, because it has more features.
Sure. I was referring to the MSVCRT only. Basically iostreams and fopen() etc should work with extended paths. I wouldn't try modifying win32. Except actually, I would. I'd add a new type of unicode string for file paths (only allocatable using an AllocateFilePath() function) and a new extension for win32 APIs e.g. CreateFileLW. I'd have all LW ending functions check the string being fed to them, and if not allocated by AllocateFilePath() then abort. That should trap applications being converted over from legacy wchar_t who do C casting to TCHAR. New applications can then use the LW extended APIs, and we get long path support in Windows at long last. Simples! :) Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
On 4.2.2015. 16:08, Niall Douglas wrote:
On 4 Feb 2015 at 14:55, Domagoj Saric wrote:
It would have to be a very borked situation if one of those didn't work.
Considering that:
1) GetTempPath() is not deprecated (which I would take to mean that the 'manufacturer of the OS' holds this function to be sufficient for 'most users')
That's a reasonable point. Apart from its hardcoded 260 char limit.
I can concede that the 260 >byte< limit is problematic (and probably a bug on MS side) and that Boost.FS can offer a workaround (for wchar_t based code) by reimplementing the officially described logic of GetTempPath()...
2) Boost.FS is already more than bloated
I certainly wouldn't call Filesystem bloated! If anything, it verges on being so lightweight as to not be sufficiently useful for many common tasks
By bloated I didn't refer to its feature set (features and options coded so that they obey the "you don't pay for it if you don't use it" rule are "power" not "bloat" in my book) more about its implementation or "binary bloat" or being the next "iostreams" when it gets into the standard :/ http://lists.boost.org/Archives/boost/2011/08/184823.php Didn't personally look at it for years, but when I did, I found it doing things like constructing std::strings out of C strings just so that it can call find_first_of() on them "for crying out loud"...It's 'careless' use of STL containers and all the inivisible EH baggage this creates, etc... With that approach C++ will never be widely accepted as the 'one true all purpose language' (e.g. would you really use iostreams, filesystem or even std::function in the kernel of 'the next great OS'?)... Note that this is especially problematic on Windows that does not have a standard prebuilt/included C++ library so most people resort to linking statically with the CRT (which with Boost.FS brings in a ton of code)...Kind of 'embarassing' that a hello world Boost.FS console program will weigh in hundreds of kilobytes...
...I would 'like' that the default implementation stays "as simple as possible but not simpler" and add a separate function or set of functions for the various special and edge cases. For example you could add a FindFile-like API that would give you a token with which it would allow you to iterate possible temp. paths until a working one is found [if the one returned in the last call isn't useable you'd call something like get_next_temp( current_temp_path_iteration_state );]...and/or you could provide the same with a container/begin-end interface...
Sounds like a recipe for racy temp file code with security holes.
Honestly, how's that? -- Domagoj Saric Software Architect www.LittleEndian.com
participants (8)
-
Beman Dawes
-
Domagoj Saric
-
Gavin Lambert
-
Niall Douglas
-
Olaf van der Spek
-
Peter Dimov
-
Rob Stewart
-
Stephan T. Lavavej