units: performance testing

Lewis Hyatt

28 Mar 2007 28 Mar '07

7:54 p.m.

Hello- I don't think I will have time to write a full review of units (and I probably don't know enough to seriously critique it anyway), but FWIW, I think it should be accepted, and it looks quite useful and well designed to me. I did try some quick performance tests out of curiosity on a few different systems. Here is the output of the unit_example_14 test program for: 1) gcc 3.4.4 on WinXP / MinGW (Intel P4 2.8 GHz) ------------------------------------ f(x,y,z) took 6.593 seconds to run 1e+009 iterations with double = 6.06704e+008 flops f(x,y,z) took 10.782 seconds to run 1e+009 iterations with quantity<double> = 3.70989e+008 flops g(x,y,z) took 6.625 seconds to run 1e+009 iterations with double = 6.03774e+008 flops g(x,y,z) took 9.906 seconds to run 1e+009 iterations with quantity<double> = 4.03796e+008 flops ------------------------------------ 2) gcc 3.4.4 on WinXP / cygwin (Intel P4 2.8 GHz) ------------------------------------ f(x,y,z) took 6.5 seconds to run 1e+09 iterations with double = 6.15385e+08 flops f(x,y,z) took 10.375 seconds to run 1e+09 iterations with quantity<double> = 3.85542e+08 flops g(x,y,z) took 6.719 seconds to run 1e+09 iterations with double = 5.95327e+08 flops g(x,y,z) took 10.531 seconds to run 1e+09 iterations with quantity<double> = 3.79831e+08 flops ------------------------------------ 3) gcc 3.4.6 on Linux (Intel Xeon 3.2 GHz) ------------------------------------ f(x,y,z) took 13.76 seconds to run 1e+09 iterations with double = 2.90698e+08 flops f(x,y,z) took 17.34 seconds to run 1e+09 iterations with quantity<double> = 2.30681e+08 flops g(x,y,z) took 13.82 seconds to run 1e+09 iterations with double = 2.89436e+08 flops g(x,y,z) took 14.76 seconds to run 1e+09 iterations with quantity<double> = 2.71003e+08 flops ------------------------------------ 4) gcc 3.4.6 on Linux (AMD Athlon 1.6GHz) ------------------------------------ f(x,y,z) took 27.83 seconds to run 1e+09 iterations with double = 1.4373e+08 flops f(x,y,z) took 33.22 seconds to run 1e+09 iterations with quantity<double> = 1.20409e+08 flops g(x,y,z) took 27.54 seconds to run 1e+09 iterations with double = 1.45243e+08 flops g(x,y,z) took 30.68 seconds to run 1e+09 iterations with quantity<double> = 1.30378e+08 flops ------------------------------------ So gcc, at least, seems to be able to tell the difference when the units are being used, at the level of a few tens of percent. This was compiled with g++ -s -O3 -DNDEBUG I also tried compiling with gcc 3.3.3 but got a large number of errors... I attached the gzipped output in case that is interesting, but maybe this compiler is not supposed to be supported anyway? -Lewis

Attachments:

gcc_3.3.3_errors.txt.gz (application/x-gzip — 4.4 KB)

Show replies by date

Matthias Schabel

28 Mar 28 Mar

8:04 p.m.

...

I don't think I will have time to write a full review of units (and I probably don't know enough to seriously critique it anyway), but FWIW, I think it should be accepted, and it looks quite useful and well designed to me.

Thanks for the support. Please do try to find some time to submit a formal review - it doesn't need to be lengthy, and the more input we get, the better the final outcome (and, in the case of positive votes, the better the chance of acceptance :^).

...

I did try some quick performance tests out of curiosity on a few different systems. Here is the output of the unit_example_14 test program for:

1) gcc 3.4.4 on WinXP / MinGW (Intel P4 2.8 GHz) ------------------------------------ f(x,y,z) took 6.593 seconds to run 1e+009 iterations with double = 6.06704e+008 flops f(x,y,z) took 10.782 seconds to run 1e+009 iterations with quantity<double> = 3.70989e+008 flops g(x,y,z) took 6.625 seconds to run 1e+009 iterations with double = 6.03774e+008 flops g(x,y,z) took 9.906 seconds to run 1e+009 iterations with quantity<double> = 4.03796e+008 flops

Just for reference, using gcc 4.0.1 on Mac OSX (G5 2GHz) I get no overhead : f(x,y,z) took 4.29 seconds to run 1e+09 iterations with double = 9.32401e+08 flops f(x,y,z) took 4.28 seconds to run 1e+09 iterations with quantity<double> = 9.34579e+08 flops g(x,y,z) took 4.28 seconds to run 1e+09 iterations with double = 9.34579e+08 flops g(x,y,z) took 4.28 seconds to run 1e+09 iterations with quantity<double> = 9.34579e+08 flops Naturally, this library is demanding of the optimizer. I believe someone else has verified that it is possible to achieve zero overhead under Visual C++ 8.0 as well...

...

I also tried compiling with gcc 3.3.3 but got a large number of errors... I attached the gzipped output in case that is interesting, but maybe this compiler is not supposed to be supported anyway?

We have not attempted to support gcc versions before 3.4.4, and I don't know if it is possible to do it. I'll take a look and see if there's anything obvious. Matthias

Steven Watanabe

8:16 p.m.

AMDG Matthias Schabel <boost <at> schabel-family.org> writes:

...

Naturally, this library is demanding of the optimizer. I believe someone else has verified that it is possible to achieve zero overhead under Visual C++ 8.0 as well...

unit_example_14.cpp passes double by value and quantity by reference. When I looked at the assembler output from msvc 8.0 it looked like this had a noticable effect (the quantity code was quite a bit more complex). In Christ, Steven Watanabe

Matthias Schabel

8:20 p.m.

...

...
Naturally, this library is demanding of the optimizer. I believe someone else has verified that it is possible to achieve zero overhead under Visual C++ 8.0 as well...

unit_example_14.cpp passes double by value and quantity by reference. When I looked at the assembler output from msvc 8.0 it looked like this had a noticable effect (the quantity code was quite a bit more complex).

Interesting. I checked both permutations : double by const reference and quantity by value and get const reference : f(x,y,z) took 4.28 seconds to run 1e+09 iterations with double = 9.34579e+08 flops f(x,y,z) took 4.29 seconds to run 1e+09 iterations with quantity<double> = 9.32401e+08 flops g(x,y,z) took 4.27 seconds to run 1e+09 iterations with double = 9.36768e+08 flops g(x,y,z) took 4.3 seconds to run 1e+09 iterations with quantity<double> = 9.30233e+08 flops value : f(x,y,z) took 4.29 seconds to run 1e+09 iterations with double = 9.32401e+08 flops f(x,y,z) took 4.28 seconds to run 1e+09 iterations with quantity<double> = 9.34579e+08 flops g(x,y,z) took 4.28 seconds to run 1e+09 iterations with double = 9.34579e+08 flops g(x,y,z) took 4.28 seconds to run 1e+09 iterations with quantity<double> = 9.34579e+08 flops These differences look negligible to me...obviously, this depends on the compiler/optimizer since there is no reason why there should be any runtime cost... Matthias

Lewis Hyatt

8:26 p.m.

Matthias Schabel wrote:

...

Just for reference, using gcc 4.0.1 on Mac OSX (G5 2GHz) I get no overhead :

I just tried gcc 4.0.2 on AMD Opteron 2.2 GHz, and I get no overhead as well: f(x,y,z) took 3.21 seconds to run 1e+09 iterations with double = 1.24611e+09 flops f(x,y,z) took 3.18 seconds to run 1e+09 iterations with quantity<double> = 1.25786e+09 flops g(x,y,z) took 3.2 seconds to run 1e+09 iterations with double = 1.25e+09 flops g(x,y,z) took 3.19 seconds to run 1e+09 iterations with quantity<double> = 1.25392e+09 flops I guess gcc 3.4.4 just doesn't cut it these days! Steven Watanabe wrote:

...

unit_example_14.cpp passes double by value and quantity by reference. When I looked at the assembler output from msvc 8.0 it looked like this had a noticable effect (the quantity code was quite a bit more complex).

I just tried it, and changing the pass by value to pass by reference did not affect the timings on gcc 3.4.4. (I did not confirm but I believe gcc is inlining the function call anyway.) -Lewis

Steven Watanabe

9 p.m.

AMDG Lewis Hyatt <lhyatt <at> princeton.edu> writes:

...

I guess gcc 3.4.4 just doesn't cut it these days!

Try commenting out quantity.hpp line 57 In Christ, Steven Watanabe

Matthias Schabel

9:02 p.m.

...

...
I guess gcc 3.4.4 just doesn't cut it these days!

Try commenting out quantity.hpp line 57

Good point. Clearly, for built-in types, this check is not necessary, but could matter quite a bit in, for example, cases like this: quantity<length,boost::numeric::ublas::vector> q; q = q; If commenting the if (this == &source) line out helps, we should probably deal with it since builtin types are likely to represent the vast bulk of actual uses of the library... this_type& operator=(const this_type& source) { if (this == &source) return *this; val_ = source.val_; return *this; } Matthias

Steven Watanabe

9:15 p.m.

AMDG Matthias Schabel <boost <at> schabel-family.org> writes:

...

quantity<length,boost::numeric::ublas::vector> q;

q = q;

If commenting the if (this == &source) line out helps, we should probably deal with it since builtin types are likely to represent the vast bulk of actual uses of the library...

The value_type can implement this optimization itself, right? I see no reason for us to use it at all. In Christ, Steven Watanabe

Matthias Schabel

9:13 p.m.

...

The value_type can implement this optimization itself, right? I see no reason for us to use it at all.

Of course - just force of habit, but you're right : we can just remove it. Matthias

Lewis Hyatt

9:31 p.m.

Steven Watanabe wrote:

...

Try commenting out quantity.hpp line 57

on the gcc 3.4.4 cygwin: f(x,y,z) took 6.547 seconds to run 1e+09 iterations with double = 6.10967e+08 flops f(x,y,z) took 9.718 seconds to run 1e+09 iterations with quantity<double> = 4.11607e+08 flops g(x,y,z) took 6.579 seconds to run 1e+09 iterations with double = 6.07995e+08 flops g(x,y,z) took 9.875 seconds to run 1e+09 iterations with quantity<double> = 4.05063e+08 flops Doesn't look like it changed much. -Lewis

Martin Schulz

29 Mar 29 Mar

8:32 p.m.

Anybody out there? Please tell me that I am not the only person on that mailing list to spot that those flop rates posted beforehand are much too low compared to what a 2.8 Ghz P4 should be able to deliver? So, for a closer look, I get out the compiler, VC8 in my case. Example14, ok. Run. First line appears, then the program hangs for minutes. Oh no, yeah, that was the debug build.... ^C. Ok, once again in release mode. Works better, but the figures still are disapointing: f(x,y,z) took 4.218 seconds to run 1e+009 iterations with double = 9.48317e+008 flops f(x,y,z) took 5.047 seconds to run 1e+009 iterations with quantity<double> = 7.9255e+008 flops g(x,y,z) took 4.219 seconds to run 1e+009 iterations with double = 9.48092e+008 flops g(x,y,z) took 4.625 seconds to run 1e+009 iterations with quantity<double> = 8.64865e+008 flops 950 MFlops, already better than the numbers posted beforehand. A brief look at the code. No memory referenced. Just local variables. That one should perform better. The zero-overhead is not so zero, after all. Once again - no avail. Another look at the code. Oh, what is this "if" in that loop? So measure that "if" alone: inline double f2(double x,double y,double z) { double V = 0, C = 0; for (int i = 0; i < TEST_LIMIT; ++i) { if (i % 100000 == 0) C = double(std::rand())/RAND_MAX; //V = V + ((x + y) * z * C); } return V; } That gives: f2(x,y,z) took 3.187 seconds to run 1e+009 iterations with double = 1.2551e+009 (would be) flops f2(x,y,z) took 3.141 seconds to run 1e+009 iterations with quantity<double> = 1.27348e+009 (would be) flops Ooops, the loop alone, even without any floating point, takes more than 3 seconds? More overhead than payload? Another one: inline double f3(double x,double y,double z) { double V = 0, C = 0; for (int i = 0; i < TEST_LIMIT; ){ C = double(std::rand())/RAND_MAX; const int next_limit = std::min(TEST_LIMIT, i+100000); for (; i < next_limit; ++i){ V = V + ((x + y) * z * C); } }; return V; } That gives: f3(x,y,z) took 1.515 seconds to run 1e+009 iterations with double = 2.64026e+009 flops f3(x,y,z) took 4.656 seconds to run 1e+009 iterations with quantity<double> = 8.59107e+008 flops 2.6 GFlops? That is ok for a single thread. But the zero-overhead appears to be a factor of 3 now! What do you say? Gcc would be better? So switch to linux box. g++ -O3. Looks better right from the start. Even though the P4 is supposed to be slower than the Core 2. f(x,y,z) took 3.22 seconds to run 1e+09 iterations with double = 1.24224e+09 flops f(x,y,z) took 3.21 seconds to run 1e+09 iterations with quantity<double> = 1.24611e+09 flops f2(x,y,z) took 3.22 seconds to run 1e+09 iterations with double = 1.24224e+09 (would be) flops f2(x,y,z) took 3.22 seconds to run 1e+09 iterations with quantity<double> = 1.24224e+09 (would be) flops f3(x,y,z) took 0.51 seconds to run 1e+09 iterations with double = 7.84314e+09 flops f3(x,y,z) took 0.65 seconds to run 1e+09 iterations with quantity<double> = 6.15385e+09 flops Oh, but what is that 7.84 Gflops over there? That one goes beyond the peak performance of the processor! GCC must be cheating here! Hmm. What does the intel compiler give? f(x,y,z) took 4.2 seconds to run 1e+09 iterations with double = 9.52381e+08 flops f(x,y,z) took 5.29 seconds to run 1e+09 iterations with quantity<double> = 7.56144e+08 flops f2(x,y,z) took 4.19 seconds to run 1e+09 iterations with double = 9.54654e+08 (would be) flops f2(x,y,z) took 4.18 seconds to run 1e+09 iterations with quantity<double> = 9.56938e+08 (would be) flops f3(x,y,z) took 0.47 seconds to run 1e+09 iterations with double = 8.51064e+09 flops f3(x,y,z) took 6.95 seconds to run 1e+09 iterations with quantity<double> = 5.7554e+08 flops Hmm. Even more cheating more on plain doubles but does not seem to like the templates. For this one, the overhead increases to nearly a factor of 15! So lets play a bit further,... What are those funny "inline" for? Lets try to #define them away,.... G++ -O3 again. f(x,y,z) took 3.25 seconds to run 1e+09 iterations with double = 1.23077e+09 flops f(x,y,z) took 9.96 seconds to run 1e+09 iterations with quantity<double> = 4.01606e+08 flops f2(x,y,z) took 3.23 seconds to run 1e+09 iterations with double = 1.23839e+09 (would be) flops f2(x,y,z) took 3.19 seconds to run 1e+09 iterations with quantity<double> = 1.25392e+09 (would be) flops f3(x,y,z) took 0.52 seconds to run 1e+09 iterations with double = 7.69231e+09 flops f3(x,y,z) took 10.2 seconds to run 1e+09 iterations with quantity<double> = 3.92157e+08 flops Ouch, f3 on quantity<double> had been 0.65 seconds beforehand, now it is 10 seconds. Somehow gcc forgot to cheat (eh, .."optimize") here. And even the original example f gets about 3 times slower now. There are only about 1000 (even non-virtual) function calls involved. These can impossibly sum up to 6 or even 10 seconds. Something different is going on here.... Well, it is getting late, I will stop here. So for the long and the short, I dont believe the zero-overhead. Not with the compilers I currently have at hand. Furthermore, the example is simply not meaningfull; it allows the compilers to play so many tricks that the resulting numbers are little more than noise. Matthias, I therefore have 3 further points for the "application domain and restrictions" page: - By the use of the library, performance of the debug build of your software may or may not degrade by several orders of magnitude, depending on the actual code. - The library is very demanding on the compiler. Apart from the compatibility requirements, the performance penalty induced by the use of the library mostly varies between zero and three, even higher numbers have been observed in very special cases. [btw, did anybody a comparision of compile-times on reasonably sized projects?] - The use of this library may impose additional obstacles when doing in-depth performance tuning for numerical computations, as the compilers may or may not recognize certain optimization possiblities anymore. I would have liked to give you more positive feedback, Martin.

Matthias Schabel

8:40 p.m.

Dear Martin, Please look at the first line in the Boost Design and Programming Guidelines : http://www.boost.org/more/lib_guide.htm#Guidelines It may be that some compilers optimize the current code better than others and that some do very poorly. Also in accordance with the aforementioned guidelines we have not focused primarily on performance or performance testing. There is no fundamental reason why the code should not be able to be optimized away, but achieving that for many compilers may need to be left for later. It certainly will never happen without a putative starting point. Finally, if a nominally compile-time unit system incurs as much overhead as you seem to believe, imagine the cost of a runtime system... Cheers, Matthias

Martin Schulz

2 Apr 2 Apr

7:04 p.m.

...

Please look at the first line in the Boost Design and Programming Guidelines :

http://www.boost.org/more/lib_guide.htm#Guidelines

It may be that some compilers optimize the current code better than others and that some do very poorly. Also in accordance with the aforementioned guidelines we have not focused primarily on performance or performance testing.

Thats fully okay. I never said the performance is to slow to be usable. I never said that it should not be included because of the observed drop in some cases. I did even not mention any performance aspect in my initial review. It is just the noncontradicted repetition of the "zero-overhead" keyword over and over again, that irritates me. This indicates that performance is not so unimportant anyhow. The factors of 15 or such is clearly measurement noise as I already noted, in fact the whole example is little meaningfull. It can't prove that the library is fast, nor that it is slow. We can observe that it can be zero-overhead in some cases and may involve overhead in other cases that are quite nearby. It is this "discontinuous behaviour", that makes things difficult sometimes. After all, a factor between zero an three is no big deal, if it applies to moderate number of arithmetic operations only. You should know that some unsuspicious conditional statements in a loop, a cache miss or other kinds of data dependencies may already hide this. In essence, I would qualify that library as "zero- to low-overhead". And, I must add, I cannot hope to make it any better.

...

There is no fundamental reason why the code should not be able to be optimized away,

Yes, there is no fundamental reason. It is just the "compilers I currently have at hand" as I wrote, do produce mixed results. It appears as if I cannot just use the library and rely on that my compiler will produce zero-overhead. Such a claim would be too ambitious.

...

Finally, if a nominally compile-time unit system incurs as much overhead as you seem to believe, imagine the cost of a runtime system...

Again, you are completely right. Did I suggest that a runtime system would be faster? I have to apologize then. I tried to convice you (and anybody else) that it would be better to have a dynamic system that is closely integrated with the static one (and vice-versa). A dynamic system enables to create additional benefits, going far beyond what a static unit system can do. I simply cannot follow the one-fits-all attitude. Consequently, I opt against rational exponents, as I consider them to be an obstacle for such an integration. Furthermore, I tried to point out that inside of computation kernels (thats where performance really matters), you probably go without any unit library. Partly because numerical libraries of all sorts do not support units anyway, partly because I will try to avoid any uncertainties or compiler idiosyncrasies that might get into my way. Yours, Martin.

Matthias Schabel

7:39 p.m.

New subject: [units]

Just a reminder to interested parties - two more days left in the proposed Boost.Units review. We'd love to get some more reviews...

...

It is just the noncontradicted repetition of the "zero-overhead" keyword over and over again, that irritates me. This indicates that performance is not so unimportant anyhow.

OK - the library itself doesn't introduce any overhead, so in that context the zero-overhead description is correct. As we've already established, achieving zero-overhead with current compilers can be non-trivial. I will add documentation clarifying performance issues and warning library users that any use of this library in performance- critical code should be tested, etc...

...

a cache miss or other kinds of data dependencies may already hide this. In essence, I would qualify that library as "zero- to low-overhead".

OK - we will temper statements on performance and point out that, as the library is still in development, compiler-specific optimizations have not been undertaken for the most part.

...

system would be faster? I have to apologize then. I tried to convice you (and anybody else) that it would be better to have a dynamic system that is closely integrated with the static one (and vice-versa). A dynamic system enables to create additional benefits, going far

Assuming the existing library ultimately finds its way into Boost, we would be happy to work with you to provide support in integrating a run-time system that you envisage with the existing compile-time system. I have never denied the potential usefulness of runtime unit support, and would be 100% behind an effort to write a runtime library that complements this one.

...

Consequently, I opt against rational exponents, as I consider them to be an obstacle for such an integration.

See noise-density for an example of a unit in use that has a fractional exponent : http://swiss.csail.mit.edu/~jaffer/MIXF

...

Furthermore, I tried to point out that inside of computation kernels (thats where performance really matters), you probably go without any unit library.

We are investigating ways of guaranteeing that the layout of, e.g. std::vector<quantity<length,double> > is identical to std::vector<double>, so that safe casting is possible when using computational libraries that do not support units. For most compilers/ operating systems this should be the case as the quantity class doesn't have any virtual member functions or anything else that would alter the layout in memory. Matthias

Zach Laine

8:08 p.m.

New subject: [units]

On 4/2/07, Matthias Schabel <boost@schabel-family.org> wrote:

...

Just a reminder to interested parties - two more days left in the proposed Boost.Units review. We'd love to get some more reviews...

That reminds me: John Phillips, please note that I'm officially changing my no vote to a yes vote. The only no-vote-worthy problem I had with the library was the non-const value() quantity member, and associated issues, which I feel have now been resolved. Zach Laine

Janek Kozicki

3 Apr 3 Apr

6:06 p.m.

New subject: [units]

Zach Laine said: (by the date of Mon, 2 Apr 2007 15:08:50 -0500)

...

On 4/2/07, Matthias Schabel <boost@schabel-family.org> wrote:

...
Just a reminder to interested parties - two more days left in the proposed Boost.Units review. We'd love to get some more reviews...

That reminds me: John Phillips, please note that I'm officially changing my no vote to a yes vote. The only no-vote-worthy problem I had with the library was the non-const value() quantity member, and associated issues, which I feel have now been resolved.

Is thre review manager aware of that? :> -- Janek Kozicki |

Zach Laine

6:20 p.m.

New subject: [units]

On 4/3/07, Janek Kozicki <janek_listy@wp.pl> wrote:

...

Zach Laine said: (by the date of Mon, 2 Apr 2007 15:08:50 -0500)

...
On 4/2/07, Matthias Schabel <boost@schabel-family.org> wrote:

...
Just a reminder to interested parties - two more days left in the proposed Boost.Units review. We'd love to get some more reviews...

That reminds me: John Phillips, please note that I'm officially changing my no vote to a yes vote. The only no-vote-worthy problem I had with the library was the non-const value() quantity member, and associated issues, which I feel have now been resolved.

Is thre review manager aware of that? :>

Isn't John Phillips the review manager? Zach Laine

Matthias Schabel

6:23 p.m.

New subject: [units]

...

...
Is thre review manager aware of that? :>

Isn't John Phillips the review manager?

Yes, he is. Thanks for clarifying and thanks for the positive vote. Matthias

John Phillips

6 Apr 6 Apr

5:16 p.m.

New subject: [units]

Zach Laine wrote:

...

On 4/3/07, Janek Kozicki <janek_listy@wp.pl> wrote:

...
Zach Laine said: (by the date of Mon, 2 Apr 2007 15:08:50 -0500)

...
On 4/2/07, Matthias Schabel <boost@schabel-family.org> wrote:

...
Just a reminder to interested parties - two more days left in the proposed Boost.Units review. We'd love to get some more reviews...

That reminds me: John Phillips, please note that I'm officially changing my no vote to a yes vote. The only no-vote-worthy problem I had with the library was the non-const value() quantity member, and associated issues, which I feel have now been resolved.

Is thre review manager aware of that? :>

Isn't John Phillips the review manager?

Zach Laine _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Zach, Yes, I am, and I have noted your change. John

Ford, Rich

8:28 p.m.

New subject: [filter_iterator] Problem with --iter when applied to first satisfying location

There is a problem with the filter_iterator decrement operation if the iterator is currently referencing the first object that satisfies the predicate. In that case the base iterator is decremented until the predicate is satisfied, but since none of the locations back to the beginning satisfy the predicate eventually the base decrement operation is applied to the begin() of the container resulting in undefined behavior. I see these possible solutions. 1. Don't allow operator-- on filter_iterators. Require that you use the filter_iterator on a reverse_iterator so that the end condition can be properly checked. 2. One could require the constructors to specify not only the end of the base sequence, but also the beginning. Then the filter decrement operator could check for the beginning and not attempt to back up beyond it. This seems somewhat of a pain for the user. The decrement operator could set the result to end() if this situation occurred (or perhaps leave it at begin()). 3. There is a nicer solution if the base iterator class is circular. A circular iterator class is one where: - ++(end()) is legal and yields begin(), i.e. you can increment end() to get back to the beginning() - --(begin()) is legal and yields end(), i.e. you can decrement begin() to get back to end(). In case #3, the filter_iterator could be changed so that when decrementing it also checked for end() when skipping over elements not satisfying the predicate. The result of applying --x when x is the first element in the container satisfying the predicate is end(). Perhaps a new kind of concept, circular_iterator_concept, should be defined, and a new circular_filter_iterator defined which would have this behavior on a base circular_iterator. I think circular lists are not an uncommon implementation of containers so this might be applicable more than one would expect. In case you haven't guessed, I have a structure where I have defined my increment and decrement to satisfy this circular requirement. I've made the small changes needed in a copy of filter_iterator which I've renamed to circular_filter_iterator. I've tested this out and it works. I'm attaching a copy for those who might be interested. Rich

David Abrahams

7 Apr 7 Apr

4:48 a.m.

New subject: [filter_iterator] Problem with --iter when applied to first satisfying location

on Fri Apr 06 2007, "Ford, Rich" <Rich.Ford-AT-amd.com> wrote:

...

There is a problem with the filter_iterator decrement operation if the iterator is currently referencing the first object that satisfies the predicate.

Ford, Rich

8 Apr 8 Apr

3:01 a.m.

New subject: [filter_iterator] Problem with --iter when applied to first satisfying location

Thanks. I hadn't thought of that, but of course one could find the begin of the filtered sequence by constructing it from the begin of the base sequence. Then an algorithm doing decrementing could check to see if begin had been reached and avoid decrementing beyond that. -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of David Abrahams Sent: Saturday, April 07, 2007 12:49 AM To: boost@lists.boost.org Subject: Re: [boost] [filter_iterator] Problem with --iter when applied to first satisfying location on Fri Apr 06 2007, "Ford, Rich" <Rich.Ford-AT-amd.com> wrote:

...

There is a problem with the filter_iterator decrement operation if the iterator is currently referencing the first object that satisfies the predicate.

That's the beginning of the sequence traversed by the iterator, so you can't decrement it legally, just like any other "begin" iterator. -- Dave Abrahams Boost Consulting www.boost-consulting.com Don't Miss BoostCon 2007! ==> http://www.boostcon.com _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

David Abrahams

2:30 p.m.

New subject: [filter_iterator] Problem with --iter when applied to first satisfying location

on Sat Apr 07 2007, "Ford, Rich" <Rich.Ford-AT-amd.com> wrote:

...

Thanks. I hadn't thought of that, but of course one could find the begin of the filtered sequence by constructing it from the begin of the base sequence. Then an algorithm doing decrementing could check to see if begin had been reached and avoid decrementing beyond that.

But what would the point be? An iterator, after decrementation, must point at a valid element. If you're decrementing past the beginning of the filtered sequence, there's no element (of the filtered sequence) to stop on. Sure, I guess we could make decrementation a no-op at the beginning of the sequence, but again, I don't see the point. -- Dave Abrahams Boost Consulting www.boost-consulting.com Don't Miss BoostCon 2007! ==> http://www.boostcon.com

Martin Schulz

4 Apr 4 Apr

7:54 a.m.

New subject: [units]

...

See noise-density for an example of a unit in use that has a fractional exponent : http://swiss.csail.mit.edu/~jaffer/MIXF

Thank you for the pointer, I will have to think about that... Yours, Martin.

Matthias Schabel

3:08 p.m.

New subject: [units]

...

...
See noise-density for an example of a unit in use that has a fractional exponent : http://swiss.csail.mit.edu/~jaffer/MIXF

Thank you for the pointer, I will have to think about that...

Another unit that is expressed in fractional powers is the esu (electrostatic unit); it is in fact defined as esu := cm^(3/2) g^(1/2) s^(-1) See here for a nice explanation : http://www.tf.uni-kiel.de/matwis/ amat/mw1_ge/kap_2/basics/b2_1_14.html Now, as has been noted before, it's almost always possible to rewrite your equations to eliminate fractional powers, but from the end-user standpoint, reformulating an equation is more difficult and potentially error-prone than simply implementing an existing one... Matthias

Steven Watanabe

2 Apr 2 Apr

8:18 p.m.

AMDG Martin Schulz <Martin.Schulz <at> synopsys.com> writes:

...

It is just the noncontradicted repetition of the "zero-overhead" keyword

over and over again, that irritates me. This indicates that performance is not so unimportant anyhow.

I suppose that the most correct statement is that quantity imposes the same overhead as any wrapper around a plain double. From looking through the assembler output of unit_example_14 it appears that msvc optimized register usage better for double.

...

in fact the whole example is little meaningfull. It can't prove that the library is fast, nor that it is slow.

I agree. The main thing that the example tests is the compiler's ability to play games. I just tried multiplying 1000 * 1000 matrices together using msvc 8.0 with /Ox /Ot /Ob2 /Oy The results are effectively the same for double and quantity. ublas: double = 16.282 seconds quantity = 16.547 seconds tiled: double = 1.875 seconds quantity = 1.875 seconds

...

Consequently, I opt against rational exponents, as I consider them to be

an obstacle for such an integration.

How so? At runtime you can use boost::rational In Christ, Steven Watanabe

Matthias Schabel

8:17 p.m.

...

...
in fact the whole example is little meaningfull. It can't prove that the library is fast, nor that it is slow.

I agree. The main thing that the example tests is the compiler's ability to play games. I just tried multiplying 1000 * 1000 matrices together using msvc 8.0 with /Ox /Ot /Ob2 /Oy The results are effectively the same for double and quantity.

ublas: double = 16.282 seconds quantity = 16.547 seconds

tiled: double = 1.875 seconds quantity = 1.875 seconds

Steven, If you send me the test code you used, I'm happy to replace the existing (problematic) ad hoc "performance test" example with your example...This is certainly more representative of the kinds of performance-critical applications where quantities might be found, anyway. Matthias

Matthias Schabel

29 Mar 29 Mar

8:58 p.m.

Furthermore, if the compiler being used is incapable of providing adequate performance, it would be basically trivial to add a release option that replaces all quantity operations by null operations and bypasses all dimension checking... Matthias PS Donald Knuth said, "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." (Code Complete, Page 594)

Steven Watanabe

30 Mar 30 Mar

2:42 a.m.

AMDG Martin Schulz <Martin.Schulz <at> synopsys.com> writes:

...

<snip msvc 8 timings>

msvc 8 doesn't want to store a quantity in a register. BTW, when I removed the if part msvc 8.0 sp1 /Ox precomputed ((x + y) * z * C) and unrolled the loop 10x making the printed time 0. In Christ, Steven Watanabe

Steven Watanabe

3:37 a.m.

Martin Schulz <Martin.Schulz <at> synopsys.com> writes:

...

That gives: f3(x,y,z) took 1.515 seconds to run 1e+009 iterations with double = 2.64026e+009 flops f3(x,y,z) took 4.656 seconds to run 1e+009 iterations with quantity<double> = 8.59107e+008 flops

2.6 GFlops? That is ok for a single thread. But the zero-overhead appears to be a factor of 3 now!

Really? I get f(x,y,z) took 1.953 seconds to run 1e+009 iterations with double = 2.04813e+009 flops f(x,y,z) took 1.906 seconds to run 1e+009 iterations with quantity<double> = 2.09864e+009 flops I am compiling with /Ox the innermost loop is identical for the double and quantity versions: sub eax, 1 fadd ST(0), ST(1) fadd ST(0), ST(1) fadd ST(0), ST(1) fadd ST(0), ST(1) fadd ST(0), ST(1) fadd ST(0), ST(1) fadd ST(0), ST(1) fadd ST(0), ST(1) jne SHORT $LN17@f Obviously the compiler is cheating. In Christ, Steven Watanabe

6679

Age (days ago)

6690

Last active (days ago)

List overview

Download

29 comments

9 participants

participants (9)

David Abrahams
Ford, Rich
Janek Kozicki
John Phillips
Lewis Hyatt
Martin Schulz
Matthias Schabel
Steven Watanabe
Zach Laine