[cmake] Problem with dependencies for configure checks

Hi, I've been having this problem with CMake in Boost.Filesystem, where I need to have the Boost::library targets available before Boost::filesystem target is defined. This is needed in order to perform configuration checks that will affect how Boost::filesystem will be defined. For example, see the check_cxx_source_compiles calls: https://github.com/boostorg/filesystem/blob/af6ac28b5785820b93a2af7dd6e9b801... For the check_cxx_source_compiles calls, I would have liked to add targets like Boost::config and Boost::winapi to CMAKE_REQUIRED_LIBRARIES so that the checks are compiled with include directories of those dependencies added in the command line. This doesn't work because these targets may not be defined at the point when the configuration checks are performed. My current workaround, as you can see, is to add the Boost root to CMAKE_REQUIRED_INCLUDES, where the boost directory with the common headers tree is located. However, this will not work if we're going to make that directory optional or remove it. Also, you can see that I'm reusing checks implemented in Boost.Config, for which I also had to hard code its relative location from Boost.Filesystem. If we're going to allow different libraries located in separate directory trees, this will also break. My question is, can this be fixed and how? I was wondering if it is possible to make the target generation two-stage, where the first stage identifies library locations, including directories with their headers, and the second stage would define build targets. This way all library directories would be available to the second stage. Although I can already see the problem with second order dependencies with this approach...

Andrey Semashev wrote:
Hi,
I've been having this problem with CMake in Boost.Filesystem, where I need to have the Boost::library targets available before Boost::filesystem target is defined. This is needed in order to perform configuration checks that will affect how Boost::filesystem will be defined. For example, see the check_cxx_source_compiles calls:
https://github.com/boostorg/filesystem/blob/af6ac28b5785820b93a2af7dd6 e9b801232fec19/CMakeLists.txt#L23-L50
For the check_cxx_source_compiles calls, I would have liked to add targets like Boost::config and Boost::winapi to CMAKE_REQUIRED_LIBRARIES so that the checks are compiled with include directories of those dependencies added in the command line. This doesn't work because these targets may not be defined at the point when the configuration checks are performed.
My current workaround, as you can see, is to add the Boost root to CMAKE_REQUIRED_INCLUDES, where the boost directory with the common headers tree is located. However, this will not work if we're going to make that directory optional or remove it. Also, you can see that I'm reusing checks implemented in Boost.Config, for which I also had to hard code its relative location from Boost.Filesystem. If we're going to allow different libraries located in separate directory trees, this will also break.
My question is, can this be fixed and how?
I add ../config/include to CMAKE_REQUIRED_INCLUDES. https://github.com/boostorg/stacktrace/blob/79aff77771a8f3a3d62852de6619a955... I don't think we should mandate add_subdirectory calls to come in a specific order. Even if we can hack something in the superproject to do it, users who use add_subdirectory manually will need to get the order right. Incidentally, #include <${CMAKE_CURRENT_SOURCE_DIR}/config/has_stat_st_blksize.cpp> doesn't work on Cygwin with the Windows CMake; this ends up generating #include <C:/boost-dir/libs/filesystem/config/ has_stat_st_blksize.cpp> which the Cygwin compiler doesn't like and the test always fails. So I've had to use https://github.com/boostorg/stacktrace/blob/79aff77771a8f3a3d62852de6619a955... instead.

On 6/3/21 6:43 PM, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
Hi,
I've been having this problem with CMake in Boost.Filesystem, where I need to have the Boost::library targets available before Boost::filesystem target is defined. This is needed in order to perform configuration checks that will affect how Boost::filesystem will be defined. For example, see the check_cxx_source_compiles calls:
https://github.com/boostorg/filesystem/blob/af6ac28b5785820b93a2af7dd6 e9b801232fec19/CMakeLists.txt#L23-L50
For the check_cxx_source_compiles calls, I would have liked to add targets like Boost::config and Boost::winapi to CMAKE_REQUIRED_LIBRARIES so that the checks are compiled with include directories of those dependencies added in the command line. This doesn't work because these targets may not be defined at the point when the configuration checks are performed.
My current workaround, as you can see, is to add the Boost root to CMAKE_REQUIRED_INCLUDES, where the boost directory with the common headers tree is located. However, this will not work if we're going to make that directory optional or remove it. Also, you can see that I'm reusing checks implemented in Boost.Config, for which I also had to hard code its relative location from Boost.Filesystem. If we're going to allow different libraries located in separate directory trees, this will also break.
My question is, can this be fixed and how?
I add ../config/include to CMAKE_REQUIRED_INCLUDES.
https://github.com/boostorg/stacktrace/blob/79aff77771a8f3a3d62852de6619a955...
But that would mean one has to specify second order dependencies as well for configure checks. This might be fine for Boost.Config, but not for other libraries that have dependencies. Boost.WinAPI is an example; I wouldn't want to replicate its dependencies in all downstream libraries that have checks involving it.
I don't think we should mandate add_subdirectory calls to come in a specific order. Even if we can hack something in the superproject to do it, users who use add_subdirectory manually will need to get the order right.
Incidentally,
#include <${CMAKE_CURRENT_SOURCE_DIR}/config/has_stat_st_blksize.cpp>
doesn't work on Cygwin with the Windows CMake; this ends up generating
#include <C:/boost-dir/libs/filesystem/config/ has_stat_st_blksize.cpp>
which the Cygwin compiler doesn't like and the test always fails. So I've had to use
https://github.com/boostorg/stacktrace/blob/79aff77771a8f3a3d62852de6619a955...
instead.
I'm not sure I understand. Is it that Cygwin doesn't accept angle brackets with Windows paths?

Andrey Semashev wrote:
But that would mean one has to specify second order dependencies as well for configure checks.
¯\_(ツ)_/¯
I'm not sure I understand. Is it that Cygwin doesn't accept angle brackets with Windows paths?
The Cygwin compiler just doesn't accept Windows paths, as far as I can see. Performing C++ SOURCE FILE Test BOOST_FILESYSTEM_HAS_STAT_ST_BLKSIZE failed with the following output: Change Dir: C:/boost-git/develop/__build_gcc/CMakeFiles/CMakeTmp Run Build Command(s):C:/cygwin64/bin/make.exe -f Makefile cmTC_8f9df/fast && /usr/bin/make -f CMakeFiles/cmTC_8f9df.dir/build.make CMakeFiles/cmTC_8f9df.dir/build make[1]: Entering directory '/cygdrive/c/boost-git/develop/__build_gcc/CMakeFiles/CMakeTmp' Building CXX object CMakeFiles/cmTC_8f9df.dir/src.cxx.obj C:/cygwin64/bin/c++.exe -DBOOST_FILESYSTEM_HAS_STAT_ST_BLKSIZE @CMakeFiles/cmTC_8f9df.dir/includes_CXX.rsp -o CMakeFiles/cmTC_8f9df.dir/src.cxx.obj -c C:/boost-git/develop/__build_gcc/CMakeFiles/CMakeTmp/src.cxx C:/boost-git/develop/__build_gcc/CMakeFiles/CMakeTmp/src.cxx:1:10: fatal error: C:/boost-git/develop/libs/filesystem/config/has_stat_st_blksize.cpp: No such file or directory 1 | #include <C:/boost-git/develop/libs/filesystem/config/has_stat_st_blksize.cpp> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. make[1]: *** [CMakeFiles/cmTC_8f9df.dir/build.make:79: CMakeFiles/cmTC_8f9df.dir/src.cxx.obj] Error 1 make[1]: Leaving directory '/cygdrive/c/boost-git/develop/__build_gcc/CMakeFiles/CMakeTmp' make: *** [Makefile:127: cmTC_8f9df/fast] Error 2 Source file was: #include <C:/boost-git/develop/libs/filesystem/config/has_stat_st_blksize.cpp>

On 6/3/21 9:30 PM, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
But that would mean one has to specify second order dependencies as well for configure checks.
¯\_(ツ)_/¯
Ok, so it seems I'll have to add a check to enforce that the unified headers directory exists. This means that we are still dependent on `b2 headers`, at least until there is a CMake equivalent.
I'm not sure I understand. Is it that Cygwin doesn't accept angle brackets with Windows paths?
The Cygwin compiler just doesn't accept Windows paths, as far as I can see.
Performing C++ SOURCE FILE Test BOOST_FILESYSTEM_HAS_STAT_ST_BLKSIZE failed with the following output: Change Dir: C:/boost-git/develop/__build_gcc/CMakeFiles/CMakeTmp
Run Build Command(s):C:/cygwin64/bin/make.exe -f Makefile cmTC_8f9df/fast && /usr/bin/make -f CMakeFiles/cmTC_8f9df.dir/build.make CMakeFiles/cmTC_8f9df.dir/build make[1]: Entering directory '/cygdrive/c/boost-git/develop/__build_gcc/CMakeFiles/CMakeTmp' Building CXX object CMakeFiles/cmTC_8f9df.dir/src.cxx.obj
C:/cygwin64/bin/c++.exe -DBOOST_FILESYSTEM_HAS_STAT_ST_BLKSIZE @CMakeFiles/cmTC_8f9df.dir/includes_CXX.rsp -o CMakeFiles/cmTC_8f9df.dir/src.cxx.obj -c C:/boost-git/develop/__build_gcc/CMakeFiles/CMakeTmp/src.cxx C:/boost-git/develop/__build_gcc/CMakeFiles/CMakeTmp/src.cxx:1:10: fatal error: C:/boost-git/develop/libs/filesystem/config/has_stat_st_blksize.cpp: No such file or directory 1 | #include <C:/boost-git/develop/libs/filesystem/config/has_stat_st_blksize.cpp> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. make[1]: *** [CMakeFiles/cmTC_8f9df.dir/build.make:79: CMakeFiles/cmTC_8f9df.dir/src.cxx.obj] Error 1 make[1]: Leaving directory '/cygdrive/c/boost-git/develop/__build_gcc/CMakeFiles/CMakeTmp' make: *** [Makefile:127: cmTC_8f9df/fast] Error 2
Source file was: #include <C:/boost-git/develop/libs/filesystem/config/has_stat_st_blksize.cpp>
Ok, I think I understand. You're using native Windows CMake that generates Windows paths and a Cygwin compiler that doesn't understand them. That looks like an incorrect config to me; you should be using Cygwin CMake, which should generate Cygwin paths both in source files and command lines.

Andrey Semashev wrote:
On 6/3/21 9:30 PM, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
But that would mean one has to specify second order dependencies as well for configure checks.
¯\_(ツ)_/¯
Ok, so it seems I'll have to add a check to enforce that the unified headers directory exists. This means that we are still dependent on `b2 headers`, at least until there is a CMake equivalent.
"We" are not dependent on any such thing, and there will never be such a CMake equivalent. This directory will not be maintained in a CMake build.

On 6/3/21 9:59 PM, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
On 6/3/21 9:30 PM, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
But that would mean one has to specify second order dependencies as well for configure checks.
¯\_(ツ)_/¯
Ok, so it seems I'll have to add a check to enforce that the unified headers directory exists. This means that we are still dependent on `b2 headers`, at least until there is a CMake equivalent.
"We" are not dependent on any such thing, and there will never be such a CMake equivalent. This directory will not be maintained in a CMake build.
Well, the only other alternative I see is globbing through all libraries to collect their include paths. And that would have to be done in every library, unless a common solution exists. I just looked at Boost.Atomic and it also relies on the unified headers dir, and I'm pretty sure I'm going to need it in Boost.Log. So the scanning will happen at least three times per build, which is already not good.

Andrey Semashev wrote:
Ok, so it seems I'll have to add a check to enforce that the unified headers directory exists. This means that we are still dependent on `b2 headers`, at least until there is a CMake equivalent.
"We" are not dependent on any such thing, and there will never be such a CMake equivalent. This directory will not be maintained in a CMake build.
Well, the only other alternative I see is globbing through all libraries to collect their include paths. And that would have to be done in every library, unless a common solution exists.
Why would every library need to do it? Why would it need to glob? I've done most of them by now and not one of them needs to do so.

On 6/4/21 1:05 AM, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
Ok, so it seems I'll have to add a check to enforce that the unified headers directory exists. This means that we are still dependent on `b2 headers`, at least until there is a CMake equivalent.
"We" are not dependent on any such thing, and there will never be such a CMake equivalent. This directory will not be maintained in a CMake build.
Well, the only other alternative I see is globbing through all libraries to collect their include paths. And that would have to be done in every library, unless a common solution exists.
Why would every library need to do it? Why would it need to glob? I've done most of them by now and not one of them needs to do so.
Probably because most of them don't do configure checks or don't use Boost while doing so. In the cases I mentioned, I do use Boost for configure time checks, and globbing is a way to avoid having to hardcode the dependency tree for those checks.

Andrey Semashev wrote:
Why would every library need to do it? Why would it need to glob? I've done most of them by now and not one of them needs to do so.
Probably because most of them don't do configure checks or don't use Boost while doing so. In the cases I mentioned, I do use Boost for configure time checks, and globbing is a way to avoid having to hardcode the dependency tree for those checks.
I don't think globbing would be a particularly good practice in this case, but whatever floats your boat, I suppose. I will continue hardcoding the required include directories. That's not that great a practice either (*), but I don't see anything better. Why is that even necessary? Configure checks should be standalone. There's no problem using <windows.h> there instead of insisting on winapi, is there? The only other such place is Stacktrace, which wants to #error in its check when BOOST_NO_CXX11_THREAD_LOCAL is defined, even though it already uses thread_local itself, which would presumably fail. I understand why it's done - Config presumably encodes knowledge of platforms where thread_local compiles but then doesn't work - but the right thing would still have been to make the check standalone. I was just too lazy to do it. (*) It will fail when dependencies are pulled at configure time by FetchContent or package manager integration. E.g. pacman_add_subdir(boost_this boost/this) pacman_add_subdir(boost_that boost/that) pacman_add_subdir(boost_other boost/other) If boost_this wants boost_other, it won't find it, because it hasn't been downloaded yet. It will find it on the second configure run, but the checks would have been cached then, and will not run.

On 6/4/21 3:29 AM, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
Why would every library need to do it? Why would it need to glob? I've done most of them by now and not one of them needs to do so.
Probably because most of them don't do configure checks or don't use Boost while doing so. In the cases I mentioned, I do use Boost for configure time checks, and globbing is a way to avoid having to hardcode the dependency tree for those checks.
I don't think globbing would be a particularly good practice in this case, but whatever floats your boat, I suppose.
Sure, I don't like it either. That's why I asked if something better could be done in Boost.CMake, or however it is called.
I will continue hardcoding the required include directories. That's not that great a practice either (*), but I don't see anything better.
Why is that even necessary? Configure checks should be standalone. There's no problem using <windows.h> there instead of insisting on winapi, is there?
In the particular case of Boost.WinAPI I want it (Boost.WinAPI) to define all the version macros it defines. This way I ensure that the config check is compiling in the same environment as the library will. I actually include a special header from Boost.Filesystem (platform_config.hpp) to define all platform-specific macros, which indirectly includes Boost.Config and Boost.WinAPI. In Boost.Log I have a config check that uses Boost.Atomic to test if it supports the required lock-free operations on a given target platform.
(*) It will fail when dependencies are pulled at configure time by FetchContent or package manager integration. E.g.
pacman_add_subdir(boost_this boost/this) pacman_add_subdir(boost_that boost/that) pacman_add_subdir(boost_other boost/other)
If boost_this wants boost_other, it won't find it, because it hasn't been downloaded yet. It will find it on the second configure run, but the checks would have been cached then, and will not run.
I'm inclined to simply not support incomplete source trees.

Andrey Semashev wrote:
In the particular case of Boost.WinAPI I want it (Boost.WinAPI) to define all the version macros it defines. This way I ensure that the config check is compiling in the same environment as the library will. I actually include a special header from Boost.Filesystem (platform_config.hpp) to define all platform-specific macros, which indirectly includes Boost.Config and Boost.WinAPI.
While on the subject of Filesystem configure checks, I don't know if it's been reported already, but the statx kernel call is causing problems; Filesystem operations failing with ENOTSUPP. You shouldn't be assuming that the machine where the library is built will be the same as the one the program will run on. (I think this has come up in the context of Docker, where Filesystem is probably built in the host environment but the program that uses it runs in Docker and apparently the supported kernel calls aren't the same. Or perhaps it's built when the Docker image is created, not sure.)

On 6/4/21 4:07 AM, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
In the particular case of Boost.WinAPI I want it (Boost.WinAPI) to define all the version macros it defines. This way I ensure that the config check is compiling in the same environment as the library will. I actually include a special header from Boost.Filesystem (platform_config.hpp) to define all platform-specific macros, which indirectly includes Boost.Config and Boost.WinAPI.
While on the subject of Filesystem configure checks, I don't know if it's been reported already, but the statx kernel call is causing problems; Filesystem operations failing with ENOTSUPP.
ENOSYS, perhaps? Yes, there were reports. https://github.com/boostorg/filesystem/issues/172
You shouldn't be assuming that the machine where the library is built will be the same as the one the program will run on. (I think this has come up in the context of Docker, where Filesystem is probably built in the host environment but the program that uses it runs in Docker and apparently the supported kernel calls aren't the same. Or perhaps it's built when the Docker image is created, not sure.)
My stance on this is that Boost.Filesystem must be built on a system with kernel headers matching or older than the systems that will run the binary. This matches the same requirements for Windows. I don't support configs with headers newer than runtime kernel.

On 04/06/2021 10:12, Andrey Semashev via Boost wrote:
My stance on this is that Boost.Filesystem must be built on a system with kernel headers matching or older than the systems that will run the binary. This matches the same requirements for Windows. I don't support configs with headers newer than runtime kernel.
It's trivially easy to do a runtime check for statx kernel support and fall back on older syscalls if needed. This is LLFIO doing so, and it works on Linux kernel 2.6: https://github.com/ned14/llfio/blob/develop/include/llfio/v2.0/detail/impl/p... Niall

On 6/4/21 2:03 PM, Niall Douglas via Boost wrote:
On 04/06/2021 10:12, Andrey Semashev via Boost wrote:
My stance on this is that Boost.Filesystem must be built on a system with kernel headers matching or older than the systems that will run the binary. This matches the same requirements for Windows. I don't support configs with headers newer than runtime kernel.
It's trivially easy to do a runtime check for statx kernel support and fall back on older syscalls if needed.
This is LLFIO doing so, and it works on Linux kernel 2.6:
https://github.com/ned14/llfio/blob/develop/include/llfio/v2.0/detail/impl/p...
The code you pointed to is not what I would call trivial. There are quite a few places where statx is used in Boost.Filesystem, in some places exclusively (i.e. there is no fallback). I could add a runtime check, but I definitely don't want that check to happen on every call. This means the result must be stored globally in a thread-safe manner. Then there are other syscalls, on Linux and other systems, should I also add runtime checks for those? All that to support what I consider improperly configured systems. It just doesn't look like a worthwhile effort while I have other things to do, including in Boost.Filesystem too.

On 04/06/2021 13:04, Andrey Semashev via Boost wrote:
On 6/4/21 2:03 PM, Niall Douglas via Boost wrote:
On 04/06/2021 10:12, Andrey Semashev via Boost wrote:
My stance on this is that Boost.Filesystem must be built on a system with kernel headers matching or older than the systems that will run the binary. This matches the same requirements for Windows. I don't support configs with headers newer than runtime kernel.
It's trivially easy to do a runtime check for statx kernel support and fall back on older syscalls if needed.
This is LLFIO doing so, and it works on Linux kernel 2.6:
https://github.com/ned14/llfio/blob/develop/include/llfio/v2.0/detail/impl/p...
The code you pointed to is not what I would call trivial. There are quite a few places where statx is used in Boost.Filesystem, in some places exclusively (i.e. there is no fallback).
There are only a **very** few things which statx provides which older syscalls cannot. statx's big gain is it is a snapshot not subject to races, but that hardly affects Filesystem.
I could add a runtime check, but I definitely don't want that check to happen on every call. This means the result must be stored globally in a thread-safe manner.
At work we've got llfio::stat_t::fill() being called in some places millions of times per second. It is implemented by attempting statx first, and if it is not supported, then stat or other syscalls to emulate the same. You cannot cache statx support/non-support because only some filing systems support statx, and others do not. So depending on the path used, it may work, but at other times, not. Right now at work we're on kernels which don't support statx, so we are always getting back ENOSYS and then falling back to emulation. We have not found any performance issues.
Then there are other syscalls, on Linux and other systems, should I also add runtime checks for those? All that to support what I consider improperly configured systems. It just doesn't look like a worthwhile effort while I have other things to do, including in Boost.Filesystem too.
A very great deal of Boost users are either on older Linux kernels, or are running Boost inside Docker containers where various newer syscalls are disabled e.g. where I work. In my opinion, if you like your users, you need to implement syscall fallback for all recent Linux syscalls, same as you'd do on Windows. I agree that you can reasonably cut off all Linux kernels past EOL, so you only need to support syscall fallback for kernels since EOL. Niall

Niall Douglas wrote:
On 04/06/2021 13:04, Andrey Semashev via Boost wrote:
I could add a runtime check, but I definitely don't want that check to happen on every call. This means the result must be stored globally in a thread-safe manner.
At work we've got llfio::stat_t::fill() being called in some places millions of times per second.
It is implemented by attempting statx first, and if it is not supported, then stat or other syscalls to emulate the same.
You cannot cache statx support/non-support because only some filing systems support statx, and others do not. So depending on the path used, it may work, but at other times, not.
Right now at work we're on kernels which don't support statx, so we are always getting back ENOSYS and then falling back to emulation. We have not found any performance issues.
A kernel call is infinitely more expensive than checking a number against ENOSYS. It's absolutely impossible for the test to ever be a performance issue.

On 04/06/2021 15:54, Peter Dimov via Boost wrote:
Niall Douglas wrote:
On 04/06/2021 13:04, Andrey Semashev via Boost wrote:
I could add a runtime check, but I definitely don't want that check to happen on every call. This means the result must be stored globally in a thread-safe manner.
At work we've got llfio::stat_t::fill() being called in some places millions of times per second.
It is implemented by attempting statx first, and if it is not supported, then stat or other syscalls to emulate the same.
You cannot cache statx support/non-support because only some filing systems support statx, and others do not. So depending on the path used, it may work, but at other times, not.
Right now at work we're on kernels which don't support statx, so we are always getting back ENOSYS and then falling back to emulation. We have not found any performance issues.
A kernel call is infinitely more expensive than checking a number against ENOSYS. It's absolutely impossible for the test to ever be a performance issue.
Well, that depends. If you're on a kernel that has no statx syscall, then the userspace syscall dispatcher will spot the unknown syscall number, and you'll get back ENOSYS very quickly, as you point out. But if you're on a kernel with statx syscall, and instead it is the filing system which doesn't support statx, then that's a full fat syscall just to get back ENOSYS. I don't personally think anyone who cares about metadata retrieval performance would ever use Filesystem or std::filesystem because both tend to do dynamic memory allocations, which easily swamp a statx call. So, in that sense, even many syscalls can be done and it'll be unmeasurable compared to say a filesystem path construction. If this were Windows, because Filesystem calls many Win32 functions to build a stat, that is a very different situation. But not on POSIX. Niall

Niall Douglas wrote:
On 04/06/2021 15:54, Peter Dimov via Boost wrote:
A kernel call is infinitely more expensive than checking a number against ENOSYS. It's absolutely impossible for the test to ever be a performance issue.
Well, that depends.
If you're on a kernel that has no statx syscall, then the userspace syscall dispatcher will spot the unknown syscall number, and you'll get back ENOSYS very quickly, as you point out.
But if you're on a kernel with statx syscall, and instead it is the filing system which doesn't support statx, then that's a full fat syscall just to get back ENOSYS.
Sure, but the alternative is to do that same fat syscall just to fail, as Filesystem currently does. The performance concerns in our case are about the scenario where the syscall doesn't fail, because that's the case that works. The performance of something that doesn't work is irrelevant.

On 6/4/21 6:17 PM, Niall Douglas via Boost wrote:
On 04/06/2021 15:54, Peter Dimov via Boost wrote:
Niall Douglas wrote:
On 04/06/2021 13:04, Andrey Semashev via Boost wrote:
I could add a runtime check, but I definitely don't want that check to happen on every call. This means the result must be stored globally in a thread-safe manner.
At work we've got llfio::stat_t::fill() being called in some places millions of times per second.
It is implemented by attempting statx first, and if it is not supported, then stat or other syscalls to emulate the same.
You cannot cache statx support/non-support because only some filing systems support statx, and others do not. So depending on the path used, it may work, but at other times, not.
Right now at work we're on kernels which don't support statx, so we are always getting back ENOSYS and then falling back to emulation. We have not found any performance issues.
A kernel call is infinitely more expensive than checking a number against ENOSYS. It's absolutely impossible for the test to ever be a performance issue.
Well, that depends.
If you're on a kernel that has no statx syscall, then the userspace syscall dispatcher will spot the unknown syscall number, and you'll get back ENOSYS very quickly, as you point out.
But if you're on a kernel with statx syscall, and instead it is the filing system which doesn't support statx, then that's a full fat syscall just to get back ENOSYS.
Kernel should not return ENOSYS for a syscall that is implemented. This error code is reserved for this specific purpose. What you should get is probably ENOTSUP or EINVAL. Are you sure you're getting specifically ENOSYS?
I don't personally think anyone who cares about metadata retrieval performance would ever use Filesystem or std::filesystem because both tend to do dynamic memory allocations, which easily swamp a statx call. So, in that sense, even many syscalls can be done and it'll be unmeasurable compared to say a filesystem path construction.
If this were Windows, because Filesystem calls many Win32 functions to build a stat, that is a very different situation. But not on POSIX.
Boost.Filesystem doesn't implement stat, it uses it as a means to implement other, more fine grained operations. On Windows, these operations tend to be implemented with a single WinAPI call, in some cases with a pair open/close file handle calls.

On 6/4/21 5:44 PM, Niall Douglas via Boost wrote:
On 04/06/2021 13:04, Andrey Semashev via Boost wrote:
On 6/4/21 2:03 PM, Niall Douglas via Boost wrote:
On 04/06/2021 10:12, Andrey Semashev via Boost wrote:
My stance on this is that Boost.Filesystem must be built on a system with kernel headers matching or older than the systems that will run the binary. This matches the same requirements for Windows. I don't support configs with headers newer than runtime kernel.
It's trivially easy to do a runtime check for statx kernel support and fall back on older syscalls if needed.
This is LLFIO doing so, and it works on Linux kernel 2.6:
https://github.com/ned14/llfio/blob/develop/include/llfio/v2.0/detail/impl/p...
The code you pointed to is not what I would call trivial. There are quite a few places where statx is used in Boost.Filesystem, in some places exclusively (i.e. there is no fallback).
There are only a **very** few things which statx provides which older syscalls cannot. statx's big gain is it is a snapshot not subject to races, but that hardly affects Filesystem.
I could add a runtime check, but I definitely don't want that check to happen on every call. This means the result must be stored globally in a thread-safe manner.
At work we've got llfio::stat_t::fill() being called in some places millions of times per second.
It is implemented by attempting statx first, and if it is not supported, then stat or other syscalls to emulate the same.
You cannot cache statx support/non-support because only some filing systems support statx, and others do not. So depending on the path used, it may work, but at other times, not.
Isn't statx supposed to complete successfully through the stat route in the kernel, if the filesystem does not support statx natively?
Right now at work we're on kernels which don't support statx, so we are always getting back ENOSYS and then falling back to emulation. We have not found any performance issues.
Then there are other syscalls, on Linux and other systems, should I also add runtime checks for those? All that to support what I consider improperly configured systems. It just doesn't look like a worthwhile effort while I have other things to do, including in Boost.Filesystem too.
A very great deal of Boost users are either on older Linux kernels, or are running Boost inside Docker containers where various newer syscalls are disabled e.g. where I work. In my opinion, if you like your users, you need to implement syscall fallback for all recent Linux syscalls, same as you'd do on Windows. I agree that you can reasonably cut off all Linux kernels past EOL, so you only need to support syscall fallback for kernels since EOL.
My point is that if you're running a kernel that doesn't support statx then you should compile Boost.Filesystem with headers from that kernel. It will not have statx syscall and Boost.Filesystem will compile to use the old stat syscall. <rant> Docker people seem to think of Docker images as of VMs while they are not. Hence they consider it normal to build the image on one system and then run it on another, with possibly older kernel. Well, this is wrong. It doesn't work that way, and Boost.Filesystem is not the only piece of software that will break because of that. </rant>

Andrey Semashev wrote:
<rant> Docker people seem to think of Docker images as of VMs while they are not. Hence they consider it normal to build the image on one system and then run it on another, with possibly older kernel. Well, this is wrong. It doesn't work that way, and Boost.Filesystem is not the only piece of software that will break because of that. </rant>
As Niall mentioned, it might be that Docker just doesn't support the syscalls, even when running on the same kernel.

On 6/4/21 6:34 PM, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
<rant> Docker people seem to think of Docker images as of VMs while they are not. Hence they consider it normal to build the image on one system and then run it on another, with possibly older kernel. Well, this is wrong. It doesn't work that way, and Boost.Filesystem is not the only piece of software that will break because of that. </rant>
As Niall mentioned, it might be that Docker just doesn't support the syscalls, even when running on the same kernel.
This is a different case, and for that I added compile-time options to disable some syscalls in Boost.Filesystem.

Andrey Semashev wrote:
On 6/4/21 6:34 PM, Peter Dimov via Boost wrote:
As Niall mentioned, it might be that Docker just doesn't support the syscalls, even when running on the same kernel.
This is a different case, and for that I added compile-time options to disable some syscalls in Boost.Filesystem.
That's not going to be of help in general. Many users have no control over how the library is built. They get it from the system, or from a package manager. If this check is unreliable, it should be off by default and opt-in. It does more harm than good.

On 6/4/21 7:20 PM, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
On 6/4/21 6:34 PM, Peter Dimov via Boost wrote:
As Niall mentioned, it might be that Docker just doesn't support the syscalls, even when running on the same kernel.
This is a different case, and for that I added compile-time options to disable some syscalls in Boost.Filesystem.
That's not going to be of help in general. Many users have no control over how the library is built. They get it from the system, or from a package manager.
That's a fair point. Damn, it looks like I'll have to make a runtime check after all...
If this check is unreliable, it should be off by default and opt-in. It does more harm than good.
Stuff like this never gets enabled. And besides, I do need statx unconditionally for some of the operations.

Andrey Semashev wrote:
My stance on this is that Boost.Filesystem must be built on a system with kernel headers matching or older than the systems that will run the binary. This matches the same requirements for Windows. I don't support configs with headers newer than runtime kernel.
I don't think this is a realistic requirement to put on users. Is it possible to handle the runtime failure and fall back on the no-statx code path?

I don't think globbing would be a particularly good practice in this case, but whatever floats your boat, I suppose. I will continue hardcoding the required include directories.
To put this in perspective, since winapi only depends on config and predef, and those depend on nothing else, you only need three include directories: "../config/include" "../predef/include" "../winapi/include" Globbing, in contrast, would add 141 include directories. I fail to see how the latter can in any way be considered a superior solution.

On 6/4/21 5:26 PM, Peter Dimov via Boost wrote:
I don't think globbing would be a particularly good practice in this case, but whatever floats your boat, I suppose. I will continue hardcoding the required include directories.
To put this in perspective, since winapi only depends on config and predef, and those depend on nothing else, you only need three include directories:
"../config/include" "../predef/include" "../winapi/include"
Globbing, in contrast, would add 141 include directories.
The number of include directories doesn't really matter, as long as they don't cause conflicts (they shouldn't). The performance hit is unfortunate, and as I said, I would have liked to avoid it. This is ugly, I get it, and I fully agree, but I don't see a better way.
I fail to see how the latter can in any way be considered a superior solution.
It is superior as it does not require active maintenance. Especially given that a failed configure check because of a missing dependency might not be noticeable in CI. Of all resources we have in Boost, I consider maintainer's time and effort the most valuable one. This is evident by the number of libraries without active maintainance we have. Any solution that has zero maintenance cost in the long run, without drastic downsides in other areas, is a worthy solution.

Andrey Semashev wrote:
I don't think globbing would be a particularly good practice in this case, but whatever floats your boat, I suppose. I will continue hardcoding the required include directories.
To put this in perspective, since winapi only depends on config and predef, and those depend on nothing else, you only need three include
On 6/4/21 5:26 PM, Peter Dimov via Boost wrote: directories:
"../config/include" "../predef/include" "../winapi/include"
Globbing, in contrast, would add 141 include directories.
The number of include directories doesn't really matter, as long as they don't cause conflicts (they shouldn't). The performance hit is unfortunate, and as I said, I would have liked to avoid it. This is ugly, I get it, and I fully agree, but I don't see a better way.
I fail to see how the latter can in any way be considered a superior solution.
It is superior as it does not require active maintenance. Especially given that a failed configure check because of a missing dependency might not be noticeable in CI.
We're talking about one of your libraries depending on another one of your libraries. The dependencies of winapi aren't going to suddenly change. And no, config and predef aren't going to acquire secondary dependencies either. Checking that winapi can't be included isn't hard; add another check that only includes it and checks nothing and message(AUTHOR_WARNING when it fails.

On 6/4/21 6:27 PM, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
I don't think globbing would be a particularly good practice in this case, but whatever floats your boat, I suppose. I will continue hardcoding the required include directories.
To put this in perspective, since winapi only depends on config and predef, and those depend on nothing else, you only need three include
On 6/4/21 5:26 PM, Peter Dimov via Boost wrote: directories:
"../config/include" "../predef/include" "../winapi/include"
Globbing, in contrast, would add 141 include directories.
The number of include directories doesn't really matter, as long as they don't cause conflicts (they shouldn't). The performance hit is unfortunate, and as I said, I would have liked to avoid it. This is ugly, I get it, and I fully agree, but I don't see a better way.
I fail to see how the latter can in any way be considered a superior solution.
It is superior as it does not require active maintenance. Especially given that a failed configure check because of a missing dependency might not be noticeable in CI.
We're talking about one of your libraries depending on another one of your libraries.
I can't be held to remember which of my libraries depend on what. My memory isn't that good, sorry. And I'm not immortal, you know. One day there will be another maintainer (hopefully), and the less work he has to do the better.
The dependencies of winapi aren't going to suddenly change.
Well, I wouldn't bet on that. I'm not going to add dependencies just because, but I want to be able to should I need it without silently breaking someone's configure checks.
And no, config and predef aren't going to acquire secondary dependencies either.
Checking that winapi can't be included isn't hard; add another check that only includes it and checks nothing and message(AUTHOR_WARNING when it fails.
Eh, that's still brittle and requires intervention...

Andrey Semashev wrote:
Checking that winapi can't be included isn't hard; add another check that only includes it and checks nothing and message(AUTHOR_WARNING when it fails.
Eh, that's still brittle and requires intervention...
It also has the advantage of telling you that your configure checks don't work for some other reason.

On 6/3/2021 2:30 PM, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
But that would mean one has to specify second order dependencies as well for configure checks.
¯\_(ツ)_/¯
I'm not sure I understand. Is it that Cygwin doesn't accept angle brackets with Windows paths?
The Cygwin compiler just doesn't accept Windows paths, as far as I can see.
Performing C++ SOURCE FILE Test BOOST_FILESYSTEM_HAS_STAT_ST_BLKSIZE failed with the following output: Change Dir: C:/boost-git/develop/__build_gcc/CMakeFiles/CMakeTmp
Run Build Command(s):C:/cygwin64/bin/make.exe -f Makefile cmTC_8f9df/fast && /usr/bin/make -f CMakeFiles/cmTC_8f9df.dir/build.make CMakeFiles/cmTC_8f9df.dir/build make[1]: Entering directory '/cygdrive/c/boost-git/develop/__build_gcc/CMakeFiles/CMakeTmp' Building CXX object CMakeFiles/cmTC_8f9df.dir/src.cxx.obj
C:/cygwin64/bin/c++.exe -DBOOST_FILESYSTEM_HAS_STAT_ST_BLKSIZE @CMakeFiles/cmTC_8f9df.dir/includes_CXX.rsp -o CMakeFiles/cmTC_8f9df.dir/src.cxx.obj -c C:/boost-git/develop/__build_gcc/CMakeFiles/CMakeTmp/src.cxx C:/boost-git/develop/__build_gcc/CMakeFiles/CMakeTmp/src.cxx:1:10: fatal error: C:/boost-git/develop/libs/filesystem/config/has_stat_st_blksize.cpp: No such file or directory 1 | #include <C:/boost-git/develop/libs/filesystem/config/has_stat_st_blksize.cpp> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. make[1]: *** [CMakeFiles/cmTC_8f9df.dir/build.make:79: CMakeFiles/cmTC_8f9df.dir/src.cxx.obj] Error 1 make[1]: Leaving directory '/cygdrive/c/boost-git/develop/__build_gcc/CMakeFiles/CMakeTmp' make: *** [Makefile:127: cmTC_8f9df/fast] Error 2
Source file was: #include <C:/boost-git/develop/libs/filesystem/config/has_stat_st_blksize.cpp>
Maybe #include <cygdrive/c/boost-git/develop/libs/filesystem/config/has_stat_st_blksize.cpp> works ?

I don't think we should mandate add_subdirectory calls to come in a specific order. Even if we can hack something in the superproject to do it, users who use add_subdirectory manually will need to get the order right. I know this is a kind of trade-off. At least for the super-project it is kinda easy: You already have a function which includes dependencies recursively. All you need to do is do the depth-first.
For users who add_subdirectory the boost libs I admit this is an issue. But to me this only means that Boost authors should avoid relying on configure-checks which need other libs and where required use the Boost::* aliases so the error in case it was not added is quite clear. Users then need to make sure to e.g. add Boost::predef and Boost::config first. I don't think this is too bad.

On 03/06/2021 14:54, Andrey Semashev via Boost wrote:
For the check_cxx_source_compiles calls, I would have liked to add targets like Boost::config and Boost::winapi to CMAKE_REQUIRED_LIBRARIES so that the checks are compiled with include directories of those dependencies added in the command line. This doesn't work because these targets may not be defined at the point when the configuration checks are performed.
You might find `if(TARGET <tgt>)` useful here. Niall

On 6/3/21 8:42 PM, Niall Douglas via Boost wrote:
On 03/06/2021 14:54, Andrey Semashev via Boost wrote:
For the check_cxx_source_compiles calls, I would have liked to add targets like Boost::config and Boost::winapi to CMAKE_REQUIRED_LIBRARIES so that the checks are compiled with include directories of those dependencies added in the command line. This doesn't work because these targets may not be defined at the point when the configuration checks are performed.
You might find `if(TARGET <tgt>)` useful here.
I don't think that helps in this case. The point is not to skip configuration checks if the target is not available. The checks must execute, I just want to have a way to specify include directories even when the targets are not yet defined.
participants (5)
-
Alexander Grund
-
Andrey Semashev
-
Edward Diener
-
Niall Douglas
-
Peter Dimov