
El 20/05/2025 a las 18:00, Ivan Matek escribió:
On Tue, May 20, 2025 at 5:10 PM Joaquin M López Muñoz via Boost <boost@lists.boost.org> wrote:
That's a matter of opinion, I guess, but I'd rather have people not wanting the fallback write the compile-time check instead of the other way around. Sometimes you're not writing a final application but a library (say, on top of candidate Boost.Bloom), and you don't control compilation flags or target architecture.
I guess my concern is that people will assume reading documentation that if fast_ compiles it uses SIMD. But I see your point. To be clear what I mean here: /"but uses faster SIMD-based algorithms when SSE2, AVX2 or Neon are available". / User might think: my CPU supports AVX2, so surely it will use SIMD algorithms. But available here refers to compiler options(and obviously CPU support when binary is started), not just on CPU support. I know I am not telling you anything you do not know, I just think large percentage of users might misunderstand what available means.
Yes, you're right, I can rewite "are available" as "are enabled at compile time".
I fail to see any run-time table initialization in your original snippet at https://godbolt.org/z/sYfc7rffa .
I am not a SIMD expert, but is this not creating those variables on stack? gcc asm vbroadcastsdymm1, qwordptr[rip+ .LCPI0_1] vmovapsxmm3, xmm1 vmovapsymmwordptr[rsp+ 64], ymm3 vpmovsxbqymm4, dwordptr[rip+ .LCPI0_4] vmovapsymmwordptr[rsp+ 128], ymm4 vmovapsymmwordptr[rsp+ 192], ymm1 vmovapsymmwordptr[rsp+ 256], ymm1 vmovapsymmwordptr[rsp+ 320], ymm1 vmovapsymmwordptr[rsp+ 384], ymm1 vmovapsymmwordptr[rsp+ 448], ymm1
Umm, yes, maybe. Anyway, scratch what I said about compilers not really caring about const vs. static const: adding static to your snippet severely pessimizes the codegen, with static initialization guards and all. So there goes your explanation to why static was not used :-) For the record, during develpment I examined the gencode for all fast_multiblockXX classes with the three major compilers, Intel and ARM to check that nothing looked bad. Joaquin M Lopez Munoz