
Johan Torp wrote:
That way you get non-portable code adapted for some particular architecture(s). Ideally you'd like code to execute well on future, non-existing processors too.
Scheduling doesn't have to be static. You can define an algorithm that scales to any architecture, and that favors processor usage, minimum migration between cores, keeping tasks that interact a lot together close, or whatever depending on what works best for your application. That scheduling algorithm could simply be a way to map a tree of tasks to a tree of processors, with some code triggered for some events (a processor is idle, some new tasks have been created, etc.) Task hierarchy defines task affinity, and topology hierarchy defines nested NUMA nodes, SMP, Multi-core, and SMT units (Hyperthreading). Such platforms already exist, at least in research. That's imagining a hierarchical topology and hierarchical tasks. Which is not always relevant since topologies often form graphs, but the abstraction is practical enough.
Mathias Gaunard-2 wrote:
As for heterogeneous architectures, this is of course an open problem that C++ cannot solve because you cannot predict which piece of code will be faster on which processor. But I don't think C++-like languages are unable to do parallelization well on heterogeneous architectures, even those with NUMA effects.
Ok, there is a problem here: I meant "I don't think C++-like language are unable to do parallelization well on *homogeneous* architectures, even those with NUMA effects". Sorry for the confusion.
But even if you aim at non-portable code, C++ is a poor at expressing parallel code for many other reasons: - Threads are heavy-weight
Only if you make them so. Kernel threads are fairly heavyweight, especially since you don't have much control over them, but user-level threads can be very lightweight. There is nothing inherently heavyweight about a "task". Well, except its stack, but that's more of a memory usage problem.
and difficult to program with (dead locks, race conditions, indeterminism, composability problems)
There are many problems with using shared memory concurrency, yes. There are other patterns though. Anyway here we were discussing about defining all tasks that can be parallelized as potentially-parallelizable and have a library parallelize them or not and schedule them well so that parallelization doesn't hurt more than it helps. Concurrency is kind of a different problem, I'd say.