[Andrey Semashev]
Why changing compiler options affects C++ AST?
Increasing /std:c++20 to /std:c++latest (or /std:c++23, /std:c++26, etc. in the future) causes lots of stuff to appear, some stuff to disappear, some stuff to be marked as deprecated, more stuff to be marked as constexpr, and some stuff simply changes form (return types have been changed from void to non-void in the past, classes have gained typedefs, etc.). Changing between static linking and dynamic linking (/MT versus /MD) affects whether things are declared as __declspec(dllimport). Changing between release and debug (/MT or /MD versus /MTd or /MDd) massively affects the representations of classes, and the code that they execute. The calling convention options (/Gd /Gr /Gv /Gz) affect whether functions are treated as __cdecl, __stdcall, __fastcall, __vectorcall, etc. The /Zp option (affecting packing) affects the layout of classes. The STL defends itself against this one, but most code doesn't bother. There are many escape hatches for Standard behavior that affect semantics: The accursed /Zc:wchar_t- affects whether wchar_t is a real type or a fake unsigned short. /Zc:noexceptTypes- affects whether noexcept participates in the type system, which the STL has to occasionally react to by omitting noexcept from function pointer typedefs. /Zc:char8_t- removes char8_t from the type system, and the STL has to react accordingly. And on, and on, and on. I haven't even mentioned the macro modes we support (like controlling deprecations, restoring removed machinery, etc.). Shipping all possible combinations of these settings is impossible.
Maybe, but one needs to build it first in its entirety. And there are cases when you *always* build from scratch. For example, in CI. This seems like a deal breaker to me.
Building the Standard Library Modules takes something like 6 seconds and emits less than 40 MB of output (it's about 10x smaller than a PCH). The cost is nonzero, but not massive. Boost's headers are more massive than the Standard Library, but I still expect building all of Boost as a module to be pretty fast - certainly nothing like building Boost's separately compiled components which is extremely expensive.
Again, why this is needed? As far as I'm concerned, the standard library is bundled with the compiler, and its module should ship with it, just like headers and compiled library currently are.
The headers are compiled with the user's choice of compiler options and macros - which, as I explained above, can vary dramatically. Modules are an alternative to classic inclusion, so they need to respect those options. The separately compiled sources are a huge headache precisely because they can only ship in a small, finite number of configurations - which is why we've tried to shrink the separately compiled sources over the years, and flatten its surface area to plain old extern "C" functions. We tried shipping MSVC's early experimental, non-Standard modules for the standard library as prebuilt components that were usable with specific compiler options. This wasn't suitable for the Standard, production-level import std; which is why we will never ship prebuilt versions of them, only std.ixx and std.compat.ixx sources.
If you want separately compiled source files to be usable with classic headers or named modules equally, this is possible. In MSVC we've achieved this for the Standard Library by using extern "C++".
Could you give an example? Does this involve some compiler-specific magic (i.e. non-portable), beyond marking symbols exported from the compiled library with __declspec(dllexport)/__attribute__((visibility("default")))?
I can only speak for the MSVC environment (I don't know what a visibility attribute is). The separately compiled sources are built normally (no modules). The headers declaring separately compiled machinery (whether functions or classes) need to wrap them in either extern "C" (if you want that, with the usual effects) or extern "C++". extern "C++" is interesting because it's valid going back to C++98, but had essentially no effect. Now it means "this stuff is attached to the global module", which allows module code to link with classic code. (That is, in MSVC where modules have strong ownership, we still want any separately compiled machinery to not be owned by the module.) Because classic code isn't affected by extern "C++", non-modules scenarios aren't impacted. (In the STL, we went further and wrapped everything in extern "C++" that wasn't already extern "C". That gave up strong ownership as a workaround for making the include-before-import scenario work. This was an acceptable sacrifice because std is special and already relies on _Ugly names to avoid conflicts with implementation details, so we don't need strong ownership to coexist with user code.)
Is the order of includes and imports a fundamental limitation or is this a limitation of current implementations that will be lifted in the future?
It is a current-implementation limitation of MSVC (can't speak for the other toolsets) that will be resolved, somehow, in the future. We know it's a huge headache to widespread use of modules in practice. As I've mentioned, the Standard requires arbitrary mixing to work (and in fact I wrote that Standardese). STL