
Mathias Gaunard-2 wrote:
For a lot of parallelizable problems, we can never hope for any particularily good programming models in a languages as C++ or with the desktop operating systems we have today. Needless to say, that shouldn't stop us from extracting as much parallel performance as we can from the current software stack. See the first part of my thesis at www.johantorp.com for an in-depth discussion.
How to schedule optimally depends on the application, yes. Just let the user customize the scheduler behavior then.
That way you get non-portable code adapted for some particular architecture(s). Ideally you'd like code to execute well on future, non-existing processors too. Mathias Gaunard-2 wrote:
As for heterogeneous architectures, this is of course an open problem that C++ cannot solve because you cannot predict which piece of code will be faster on which processor. But I don't think C++-like languages are unable to do parallelization well on heterogeneous architectures, even those with NUMA effects.
It can be possible with knowledge of the hardware. But simply scheduling work onto N cores will not work well when you move to manycore processor, you'll need more complex hardware models. But knowledge about hardware should be in the operating system and drivers, not in user level code. Both to simplify coding and to get portable code. But even if you aim at non-portable code, C++ is a poor at expressing parallel code for many other reasons: - Threads are heavy-weight and difficult to program with (dead locks, race conditions, indeterminism, composability problems) and are very explicit. You can build some other flow control mechanisms (lighter such as fibers/co-routines, more implicit such as thread pools etc) but others (transactional memory, nested data level parallelism, etc) can't be implemented efficiently. - C++ has a UMA shared memory model and provides reference semantics without any sense of purity. You might need to have implicit or explicit control over where code is stored in memory too, C++ does not provide this. - There is no way to proactively load a cache or express that you intend to access certain memory locations. - You can not set up efficient data streams. As we move towards manycore and transistors become ubiquitous, it seems probable that you'd often like to schedule memory interconnect access rather than tasks. - You want to do scheduling based on all processes running, not per C++ application. If all applications have their own threadpools we might get better performance if none of them did. .NET can handle this. Now don't get me wrong. There's no one language that is going to be good at expressing all kinds of parallelism - we will need a lot of parallel programming models. C++ can be used for some basic parallel code but is poorly suited for a lot of others, especially when we go to manycore. Best Regards, Johan Torp www.johantorp.com -- View this message in context: http://www.nabble.com/Pondering-Futures-tp21359362p21398146.html Sent from the Boost - Dev mailing list archive at Nabble.com.