[GSoC] SIMD proposal

newer
Re: [boost] [inspect] exceptions...

Mathieu -

23 Mar 2011 23 Mar '11

8:45 p.m.

Hello everyone, I'm a french student preparing a Master in Computer Science. I'm highly interested in participating in the 2011 GSoC with Boost. I'm particulary keen on working on the boost.simd subject. I think having SIMD abstraction in boost could be really beneficial (the first example which comes to mind is boost.math). However I have some questions about the subject in order to do my proposal, some of them have already been discussed with Joel Falcou and Mathias Gaunard on IRC, but here it is : One of the main concern I have is that nt2 relies heavily on cmake to detect various things like SSE instruction set support etc. From what I know (I ported nt2 to OpenBSD a while ago), depending on the architecture different methods are used, in order to be portable, like reading from sysctl in OSX or lauching a little executable which collect various informations using cpuid. So the question is : is bjam able to do everything we need to do, or will we need to do the detection in any other way? I use, and I kind of know bjam but I'm not an expert either, I believe Vladimir Prus will be able to help here. Another thing that is a bit blur for me at the moment, is what is the scope of boost.simd? I mean, from what I see the simd part is quite deep-rooted in nt2, has dependencies on several modules of it, so will boost.simd be a subset of nt2 simd module or will we need to rewrite part of it to avoid dragging huge dependencies (and well, end up doing boost.nt2). Mathieu Masson. Regards.

Show replies by date

Joel Falcou

23 Mar 23 Mar

9:16 p.m.

On 23/03/11 21:45, Mathieu - wrote:

...

However I have some questions about the subject in order to do my proposal, some of them have already been discussed with Joel Falcou and Mathias Gaunard on IRC, but here it is : One of the main concern I have is that nt2 relies heavily on cmake to detect various things like SSE instruction set support etc. From what I know (I ported nt2 to OpenBSD a while ago), depending on the architecture different methods are used, in order to be portable, like reading from sysctl in OSX or lauching a little executable which collect various informations using cpuid. So the question is : is bjam able to do everything we need to do, or will we need to do the detection in any other way?

The SIMD capability detection is only needed by the compilation step for the unit tests and user code. It can be delayed there. Maybe bjam can ask for specific stuff like that on the user on the command line ?

...

Another thing that is a bit blur for me at the moment, is what is the scope of boost.simd? I mean, from what I see the simd part is quite deep-rooted in nt2, has dependencies on several modules of it, so will boost.simd be a subset of nt2 simd module or will we need to rewrite part of it to avoid dragging huge dependencies (and well, end up doing boost.nt2).

the nt2 part of simd to be leveraged should be the core pack abstraction, the simd range adaptors and first the basic operators + roughly an equivalent of libm or some such. We use some other stuff that can live in boost::details.

Mathias Gaunard

11:44 p.m.

On 23/03/2011 21:45, Mathieu - wrote:

...

Hello everyone,

I'm a french student preparing a Master in Computer Science. I'm highly interested in participating in the 2011 GSoC with Boost. I'm particulary keen on working on the boost.simd subject. I think having SIMD abstraction in boost could be really beneficial (the first example which comes to mind is boost.math). However I have some questions about the subject in order to do my proposal, some of them have already been discussed with Joel Falcou and Mathias Gaunard on IRC, but here it is : One of the main concern I have is that nt2 relies heavily on cmake to detect various things like SSE instruction set support etc.

The library must know what instruction sets are available on the target machine. This is done using a configuration header file that contains macros that exist if the instruction sets in question are available. That configuration header can be either written manually or be generated by the build system. On x86, we use a program that calls the 'cpuid' instruction (using inline assembler) which directly gives the processor capabilities. Then we check the compiler allows their usage (MSVC Express Edition does not allow SSE3+). Finally, the right options to enable those instruction sets must also be passed to the compiler whenever compiling code that uses the library. I think this is quite unique, as no other boost library requires specific compiler flags to be used AFAIK. Therefore it *could* make sense to also ship the bjam or cmake modules that allow to detect the best set of options available as part of the boost installation; but I understand that's quite a disruptive idea. In NT2 we also put other things in the configuration header but all of this is being reworked, and won't be in Boost.SIMD.

...

is bjam able to do everything we need to do, or will we need to do the detection in any other way?

If bjam can 1) compile temporary programs, run them and retrieve their output and 2) test whether the compiler supports some flags -- then we're fine.

...

I use, and I kind of know bjam but I'm not an expert either, I believe Vladimir Prus will be able to help here.

Anyway, there are lots of interesting things to do technically, and which ones you want to do is up to your project to define. There are even possibilities I didn't outline in the project idea on the wiki: NEON support (ARM) is possible as well, for example, but that's somewhat harder than AltiVec. I wouldn't put too much a focus on things like the build system.

...

Another thing that is a bit blur for me at the moment, is what is the scope of boost.simd? I mean, from what I see the simd part is quite deep-rooted in nt2, has dependencies on several modules of it, so will boost.simd be a subset of nt2 simd module or will we need to rewrite part of it to avoid dragging huge dependencies (and well, end up doing boost.nt2).

NT2 is being split into independent modules. At the moment, I'd say the simd supports spans the sdk, arithmetic, bitwise, predicates, reduction, swar and ieee modules (quite a big chunk indeed). Those need to be split further, as not everything is necessary in those modules. We'll be working on this and try to come to a solution shortly.

Gruenke, Matt

24 Mar 24 Mar

12:16 a.m.

cpuid? It seems to me that the *only* thing that should determine how the program gets compiled is the compiler flags, since the target platform might have either less or more capability than the build machine. I've experienced both cases, firsthand. GCC will define the following preprocessor macros, depending on which code generation flags you specify: __MMX__ __SSE__ __SSE2__ __SSE3__ __SSSE3__ __SSE4A__ __SSE4_1__ __SSE4_2__ __AES__ __PCLMUL__ __AVX__ Matt ________________________________ From: boost-bounces@lists.boost.org on behalf of Mathias Gaunard Sent: Wed 3/23/2011 7:44 PM To: boost@lists.boost.org Subject: Re: [boost] [GSoC] SIMD proposal On 23/03/2011 21:45, Mathieu - wrote:

...

One of the main concern I have is that nt2 relies heavily on cmake to detect various things like SSE instruction set support etc.

Cory Nelson

1:50 a.m.

On Wed, Mar 23, 2011 at 5:16 PM, Gruenke, Matt <mgruenke@tycoint.com> wrote:

...

cpuid? It seems to me that the *only* thing that should determine how the program gets compiled is the compiler flags, since the target platform might have either less or more capability than the build machine. I've experienced both cases, firsthand.

Agreed -- I'm currently writing a lot of code for AVX and SSE4, and my CPU supports neither. I'd hope to be able to transition to a Boost SIMD library when it became available. We should be able to use any instruction set available on the target platform. -- Cory Nelson http://int64.org

Joel Falcou

8:47 a.m.

On 24/03/11 01:16, Gruenke, Matt wrote:

...

cpuid? It seems to me that the *only* thing that should determine how the program gets compiled is the compiler flags, since the target platform might have either less or more capability than the build machine. I've experienced both cases, firsthand. The automatic detection of supported flags was done in nt2 for automatizing the unit test/benchmarks build. In the case of Boost.SIMD, I think we just need a way to pass such flags to bjam. We are moving to there already in NT2.

Mathias Gaunard

9:32 a.m.

On 24/03/2011 01:16, Gruenke, Matt wrote:

...

cpuid? It seems to me that the *only* thing that should determine how the program gets compiled is the compiler flags, since the target platform might have either less or more capability than the build machine. I've experienced both cases, firsthand.

GCC will define the following preprocessor macros, depending on which code generation flags you specify:

__MMX__ __SSE__ __SSE2__ __SSE3__ __SSSE3__ __SSE4A__ __SSE4_1__ __SSE4_2__ __AES__ __PCLMUL__ __AVX__

But how do you know what flags to pass to GCC in the first place?

Mathias Gaunard

12:07 p.m.

On 24/03/2011 01:16, Gruenke, Matt wrote:

...

cpuid? It seems to me that the *only* thing that should determine how the program gets compiled is the compiler flags, since the target platform might have either less or more capability than the build machine. I've experienced both cases, firsthand.

GCC will define the following preprocessor macros, depending on which code generation flags you specify:

__MMX__ __SSE__ __SSE2__ __SSE3__ __SSSE3__ __SSE4A__ __SSE4_1__ __SSE4_2__ __AES__ __PCLMUL__ __AVX__

Alright, I just looked into how other compilers do it. On GCC, SSEx built-ins are only available if the suitable -mssex option is set, which you can detect with __SSEx__. On MSVC, SSE built-ins are always available but may result in a runtime error. There are only the /arch:SSE and /arch:AVX option used to tell the compiler that it can generate SSE or AVX instructions automatically, there are no /arch:SSEx options. Therefore there is no way to do the kind of thing you suggest with MSVC.

Mathias Gaunard

12:11 p.m.

On 24/03/2011 13:07, Mathias Gaunard wrote:

...

On MSVC, SSE built-ins are always available but may result in a runtime error. There are only the /arch:SSE and /arch:AVX option used to tell the compiler that it can generate SSE or AVX instructions automatically, there are no /arch:SSEx options.

Errata: there is a /arch:SSE and a /arch:SSE2, but no /arch:SSE3, /arch:SSSE3, /arch:SSE4a, /arch:SSE4.1, /arch:SSE4.2

Mathieu -

12:54 p.m.

On 24 March 2011 13:07, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:

...

On 24/03/2011 01:16, Gruenke, Matt wrote:

...
cpuid? It seems to me that the *only* thing that should determine how the program gets compiled is the compiler flags, since the target platform might have either less or more capability than the build machine. I've experienced both cases, firsthand.

GCC will define the following preprocessor macros, depending on which code generation flags you specify:

__MMX__ __SSE__ __SSE2__ __SSE3__ __SSSE3__ __SSE4A__ __SSE4_1__ __SSE4_2__ __AES__ __PCLMUL__ __AVX__

Alright, I just looked into how other compilers do it.

On GCC, SSEx built-ins are only available if the suitable -mssex option is set, which you can detect with __SSEx__. On MSVC, SSE built-ins are always available but may result in a runtime error. There are only the /arch:SSE and /arch:AVX option used to tell the compiler that it can generate SSE or AVX instructions automatically, there are no /arch:SSEx options.

Therefore there is no way to do the kind of thing you suggest with MSVC.

That's why I think the most sensible way to do it is to let the end user specify what he wants to use, and just do everything conditionnaly with ifdef's.

Gruenke, Matt

6:21 p.m.

MSVC defines _M_IX86_FP, to indicate whether /arch was specified, and whether for SSE or SSE2. Unfortunately, it does not (currently) go beyond SSE2. Source: http://msdn.microsoft.com/en-us/library/b0084kay.aspx <http://msdn.microsoft.com/en-us/library/b0084kay.aspx> One possibility would be to require the user to manually define a macro, when using such compilers. Matt ________________________________ From: boost-bounces@lists.boost.org on behalf of Mathias Gaunard Sent: Thu 3/24/2011 8:07 AM To: boost@lists.boost.org Subject: Re: [boost] [GSoC] SIMD proposal [snip]

...

On MSVC, SSE built-ins are always available but may result in a runtime error. There are only the /arch:SSE and /arch:AVX option used to tell the compiler that it can generate SSE or AVX instructions automatically, there are no /arch:SSEx options.

Therefore there is no way to do the kind of thing you suggest with MSVC.

Mathias Gaunard

8:41 p.m.

On 24/03/2011 19:21, Gruenke, Matt wrote:

...

MSVC defines _M_IX86_FP, to indicate whether /arch was specified, and whether for SSE or SSE2. Unfortunately, it does not (currently) go beyond SSE2.

Source: http://msdn.microsoft.com/en-us/library/b0084kay.aspx<http://msdn.microsoft.com/en-us/library/b0084kay.aspx>

One possibility would be to require the user to manually define a macro, when using such compilers.

Matt

Which I already said. It is also completely orthogonal; /arch is more like an equivalent of the -mfpmath GCC option. It does not control availability of instructions, but rather what instructions are generated for the floating point operations (as the name _M_IX86_FP clearly reflects)

Stefan Seefeld

12:39 a.m.

On 2011-03-23 19:44, Mathias Gaunard wrote:

...

Finally, the right options to enable those instruction sets must also be passed to the compiler whenever compiling code that uses the library. I think this is quite unique, as no other boost library requires specific compiler flags to be used AFAIK. Therefore it *could* make sense to also ship the bjam or cmake modules that allow to detect the best set of options available as part of the boost installation; but I understand that's quite a disruptive idea.

I'd like to suggest using a build-system-agnostic tool to report such build options, which can be picked up by any build system. I'm specifically thinking of pkg-config, which can report build flags based on .pc files that can be provided with binary packages. That system is widely used with most (if not all) Linux distributions, and it supports many more platforms.

...

In NT2 we also put other things in the configuration header but all of this is being reworked, and won't be in Boost.SIMD.

...
is bjam able to do everything we need to do, or will we need to do the detection in any other way?

If bjam can 1) compile temporary programs, run them and retrieve their output and 2) test whether the compiler supports some flags -- then we're fine.

While that sounds fine, please clearly separate the benchmark process that you use to obtain machine characteristics from the actual build process, such that the characteristics can be saved, distributed, and reused, to have a deterministic build process. (ATLAS calls these characteristics "architectural defaults". http://math-atlas.sourceforge.net/faq.html#ArchDef) Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...

Mathias Gaunard

9:49 a.m.

On 24/03/2011 01:39, Stefan Seefeld wrote:

...

I'd like to suggest using a build-system-agnostic tool to report such build options, which can be picked up by any build system. I'm specifically thinking of pkg-config, which can report build flags based on .pc files that can be provided with binary packages. That system is widely used with most (if not all) Linux distributions, and it supports many more platforms.

That's just one extra module to find out those options. Not everyone use pkg-config, even on Linux. So we agree that boost should also ship extra files on top of headers and library binaries?

...

...
If bjam can 1) compile temporary programs, run them and retrieve their output and 2) test whether the compiler supports some flags -- then we're fine.

While that sounds fine, please clearly separate the benchmark process that you use to obtain machine characteristics from the actual build process, such that the characteristics can be saved, distributed, and reused, to have a deterministic build process. (ATLAS calls these characteristics "architectural defaults". http://math-atlas.sourceforge.net/faq.html#ArchDef)

This is done at configuration time (e.g. the equivalent of ./configure with the autotools chain), not build time (e.g. make). But bjam does not distinguish the two AFAIK.

Stefan Seefeld

11 a.m.

On 2011-03-24 05:49, Mathias Gaunard wrote:

...

On 24/03/2011 01:39, Stefan Seefeld wrote:

...
I'd like to suggest using a build-system-agnostic tool to report such build options, which can be picked up by any build system. I'm specifically thinking of pkg-config, which can report build flags based on .pc files that can be provided with binary packages. That system is widely used with most (if not all) Linux distributions, and it supports many more platforms.

That's just one extra module to find out those options. Not everyone use pkg-config, even on Linux.

So we agree that boost should also ship extra files on top of headers and library binaries?

I think that's sensible, yes. Stefan -- ...ich hab' noch einen Koffer in Berlin...

5225

Age (days ago)

5226

Last active (days ago)

List overview

Download

14 comments

6 participants

participants (6)

Cory Nelson
Gruenke, Matt
Joel Falcou
Mathias Gaunard
Mathieu -
Stefan Seefeld