
On Sun, Jan 18, 2009 at 9:50 PM, Joel Falcou <joel.falcou@u-psud.fr> wrote:
Dean Michael Berris a écrit :
Please don't misunderstand that I'm disagreeing with you here because I do agree that there is a need to address the parallelism problem when implementing considerably demanding solutions at a level where you don't really have to worry about the architecture below on which the code you're running on is like. However given the reality of the situation with a myriad of available platforms on which to compile/run C++, the pressure from both sides of the equation (library writers and tool developers on one side, hardware vendors on the other) to come up with a solution is immense -- especially since the industry has got to adapt to it sooner than later. ;-)
We agree then.
... I think there is a market for precisely this kind of thing/work now -- helping domain experts be able to recognize and utilize the inherent parallelism in their solutions and the tools they are using. :-)
Best is to have them benefit of parallelism without them knowing about it really.
I'm a little weary about hiding these important issues to the people who understand the higher scheme of things. Which is why I personally don't think leaving the (non-C++ programming) domain experts in the dark about the inherent parallelism in their solution is a good idea. The reason why I think this is because the people who are going to be solving the real world problem should be aware that the computing facilities they have are actually capable of parallel computing, and that the way they write their solutions (be it in any programming language) will have a direct impact in the performance and scalability of their solution. It really doesn't matter if they're writing something that should work on a GPU/CPU+SIMD but that the way they write their solution should work in parallel. Once they are aware of the available parallelism, they should be able to adapt the way they think, and the way they come up with the solutions. As far as hiding the parallelism from them, the compiler is the perfect place to do that especially if your aim is to just leverage platform specific parallelism features of the machine. Even these domain experts once they know about the compiler capabilities may be able to write their code in such a way that the compiler will be happy to auto-vectorize -- and that's I think where it counts most.
Libraries OTOH are more components to me rather than tools. Maybe I'm being too picky with terms, but if you meant libraries to be tools, then I feel that simply doesn't work with how my brain is wired. It doesn't make you wrong, but it just doesn't feel right to me. ;-)
Beware that Embedded DSL are nto more than DSL in disguise inside a library, hence the confusion I keep between tools and libraries.
I would tend to agree with Dave that libraries tend to be disguised as DSEL's (think Spirit) that perform a certain function -- thus the way I think about them as components that work as part of a bigger whole.
I think I understand what you mean, but I don't think it's a failure of the libraries that they're not known/used by the people doing the programming. Much like how you can't blame the nailgun if a carpenter didn't know about it that's why he's still using traditional hammers and nails.
My point was : it is not that easy to say to people "use X".
Actually, it's easy to say it -- it's a matter of acceptance that's a problem. Now if it was a library that forced users to change their code just to be able to leverage something that the compiler should be able to handle for them (like writing assembly code for instance) sounds to me like too much to ask for. After all, the reason we have higher level programming languages is to hide from ourselves the details of the assembly/machine language on which platform we're going to run programs on. ;-)
True, but libraries also require that users write code that actually use the library. If the users already had code that didn't use your library, how is it an advantage if they can get the auto-vectorization from a future version of a compiler anyway without having to butcher their code to use your library? And what if they find a bug in the code using the library or (god forbid) find a bug in the library?
Same can be said for any library out there. What if tomorrow new C++ compiler will extract code from source and built top-notch threads from it ? Should we prevent people to use Boost.Threads from now ?
No, what I'm pointing at here is that libraries for considerably very low level parallelism will have to be maintained independent of the code that's actually using it -- and thus another layer on which failure can be found and inefficiencies introduced. The point for using Boost.Thread instead of platform-specific-threading-library is so that you can rely on a coherent interface for specifically threading and synchronization among threads. If that new C++ compiler is able to do that parallelism for us effectively without us having to use Boost.Threads, then I think slowly usage of Boost.Threads would go down on its own. However I think the problem that Boost.Threads is solving is compelling enough to be a viable solution in the interim. The point I'm trying to make is that if the target is simply just SIMD at the processor level, I'd think a library just for that is too specific to be considerably generic. I might be missing the point here, but if the compiler can already do it now (and will only get better in the future) and that I can write specific code for the platform even with C++ through compiler-vendor-provided libraries (if I needed to be specific about what I wanted to do with the compiler and the platform) if I didn't want to rely on a compiler to do it for me, what would be the value of a very narrow/specific library like a SIMD-specific thingamagig?
Actually, DSELs require that you write code in the domain language -- and here is where the problem lies
Well, if parallelism is outsourced behind the scene, it's not a problem.
But then you (the DSEL writer) for that specific domain would have to deal with parallelism the old-fashioned way without a DSEL for that (yet) helping you to do it -- and that doesn't scale. That doesn't help the domain expert especially if he doesn't know that he can actually come up with solutions that do leverage the parallelism available in his platform.
If this were the case then maybe just having this DSEL may be good to give to parallelism-savvy C++ programmers, but not necessarily still the domain experts who will be doing the writing of the domain-specific logic. Although you can argue that parallel programming is a domain in itself, in which case you're still not bridging the gap between those that know about parallel programming and the other domain experts.
Parallel programming is a domain in itself but not a domain for user but for tool writer. A user domain si things like math, finance, physics, anything. We agree
Yes, not all platforms are Intel platforms, but I don't know if you've noticed yet that Intel compilers even create code that will run on AMD processors -- yes, even SSE[0..3] -- as per their product documentation. If your target is CotS machines, I think Intel/GCC is your best bet (at least in x84_64). I haven't dealt with other platforms though aside from Intel/AMD, but it's not unreasonable to think that since everybody's moving the direction of leveraging and exploiting parallelism in hardware, that the compiler vendors will have to compete (and eventually get better) in this regard.
Well, we can't let Altivec and its offspring on the side of the road. Cell processor use it and I considering the Cell as a simili-COtS as a PS3 cost something like only half a kidney. I don't target COTS or not-CotS, my goal is cover the basics and the SIMD absics involves old Motorola enabled PPC and Intel machines. So the strict minimum is Altivec+SSE flavors. I hope that one day (AVX v2), both will converge tough.
And precisely because of that is why I think better compilers that leverage these platform-specific features would be the correct and far-reaching solution than a library just for SIMD. If your goal was a library/DSEL for expressing parallelism in general in C++ hiding the details of threads and whatnot only which a SIMD-specific extension would be part of, then I wouldn't feel like the goal is a little too narrow.
Why do I get the feeling that you're saying:
compiler writing != software engineering
? :-P
No I mean that *I* feel more confortabel writing stuff on this side of the compiler than on the other ;)
Okay. :-)
Anyway, I think if you're looking to contribute to a compiler-building community, GCC may be a little too big (I don't want to use the term advanced, because I haven't bothered looking at the code of the GCC project) but I know Clang over at LLVM are looking for help to finish the C++ implementation of the compiler front-end. From what I'm reading with Clang and LLVM, it should be feasible to write language-agnostic optimization algorithms/implementations just dealing with the LLVM IR.
Well, as I work like half a miel from Albert Cohen office, I'll certainly have a discussion about Clang someday ;) The C++->C++ tools is on my todo task list, but not for now as I think DSEL in C++ still have untapped ressources.
I agree, but if you're going to tackle the concurrency problem through a DSEL, I'd think a DSEL at a higher level than SIMD extensions would be more fruitful. For example, I'd think something like: vector<huge_numbers> numbers; // populate numbers async_result_stream results = apply(numbers, [... insert funky parallelisable lambda construction ...]) while (results) { huge_number a; results >> a; cout << a << endl; } Would be able to spawn thread pools, launch tasks, and provide an interface to getting the results using futures underneath. The domain experts who already know C++ will be able to express their funky parallelisable lambda construction and just know that when they use the facility it will do the necessary decomposition and parallelism as much as it can at the library level. This I think is something that is feasible (although a little hard) to achieve -- and to think that the compiler will even be able to vectorize an inner loop in the decomposed lambda construction, that detail isn't even necessarily dealt with by the library.
In that case, I think that kind of library (DSEL) would be nice to have -- especially to abstract the details of expressing parallelism in general a the source code level.
Except it is like ... friggin hard ?
Uh, yes. ;-)
My stance is to have applciation domain specific library that hide all parallelism tasks by relying on small scale parallel library themselves like Thread or my proposition.
In which case I think that DSEL for parallelism would be much more acceptable than even the simplest SIMD DSEL mainly because I'd think if you really wanted to leverage SIMD by hand, you'd just use the vector registers and use the vector functions directly from your code instead. At least that's in my case as both a user and a library writer.
Nice! I would agree that something like *that* is appropriate as a domain-specific language which leverages parallelism in the details of the implementation.
I however think also that there are some details that would be nice to tackle at the appropriate layer -- SIMD code construction is, well, meant to be at the domain of the compiler (as far as SSE or similar things go). OpenCL is meant to be an interface the hardware and software vendors are moving towards supporting for a long time coming (at least what I'm reading from the press releases) so I'm not too worried about the combinatorial explosion of architectures and parallelism runtimes.
Except some people (like one of the poster in the previosu thread) daily deals with code that need this level of abstraction and not more. Hence the rationale behind "Boost.SIMD"
In which case I think a DSEL is clever, but a SIMD-only library would be too small in scope for my taste. But that's just me I think. ;-)
I agree completely, but I'm afraid if the DSEL is for expressing parallelism in C++, the goal of "giving domain experts tools that knew about the parallelism" wouldn't be met readily. For C++ developers that want to leverage parallelism in general sure, but I don't think I'd be particularly compelled to use a SIMD-only DSEL.
I think we can't just wake up and say "ok today I just sole the parallelism problem in C++ using DSEL". I think that, on the contrary, a concrete, reasonable roadmap would be "tiling" the parallel problem world by small scale software solution that can inter-operate and interact freely. Then when the basic blocks of such tools have been done, we can start cementing them into higher one.
Of course maybe not in a day. But it can feasibly be achieved with some effort from brilliant library writers. I like thinking at a higher level first and solving the problems in the lower level with more specific focus but within a bigger context. Once you can recognize the patterns in the solution from a higher level can you really try solving problems at a lower level with better insight. Missing context is always hard to deal with. They came up with the STL anyway right, whoever thought there'd be a string class that makes sense in C++. ;-) -- Dean Michael C. Berris Software Engineer, Friendster, Inc.