data:image/s3,"s3://crabby-images/63515/635154bb02cb5e24a2eda06b9480c22bdc6093aa" alt=""
On Fri, 2010-04-09 at 00:56 -0400, Brett Gmoser wrote:
The documentation clearly says that v(x, y) is slower than iteration.
Interestingly, I've just been benchmarking the various GIL image
traversal methods, as I find the coordinate access method very
convenient, but I wanted to get some idea of how much improvement I
should realistically expect from converting code to use iterators.
Results obtained on standard Debian Lenny on an Intel Core i7, compiled
with -march=core2 -mfpmath=sse -msse4.1 -O3 -DNDEBUG ; the main bits of
code appended to this email.
GIL coord access 1120.9697 Megapixels/s
GIL row iterator access 1293.0218 Megapixels/s
GIL image iterator access 77.9477 Megapixels/s
I was pretty surprised just how efficient the v(x,y) accessor is,
and even more surprised by how inefficient the whole-image iterator is!
Inspecting the assember, the inner loop of v(x,y) looks like:
.L768:
xorb (%rcx,%rdx), %sil
movq %rax, %rdx
incq %rax
cmpq %r8, %rax
jne .L768
which is very lean, but not quite as good as the inner loop of the row
iterator:
.L734:
xorb (%rdx,%rax), %cl
incq %rax
cmpq %rax, %r8
jg .L734
However, the inner loop of the all-image iterator is:
.L696:
movzbl (%rcx), %eax
incq %rdx
leaq 1(%rcx,%rbp), %rcx
xorl %eax, %r10d
.L708:
testq %rdx, %rdx
jne .L696
cmpq %rcx, %rbx
jne .L696
which is a bit more complicated, although it seems remarkable it runs
~15 times slower than the other methods.
What I took away from this:
- Avoid the all-image iterator like the plague (although I don't really
understand how it manages to be quite so spectacularly slow).
- You need to be pretty desperate for performance to convert working
and basically fast enough coordinate-access based code to iterators.
- Compilers can do a pretty nice job with GIL classes. I've used other
image classes which leave far more to run-time (e.g virtual function
calls) and you have to basically "unload" the class information to
pointers and ints and do it all yourself to get performant inner loops.
-----
BOOST_AUTO_TEST_CASE(coord_access_benchmark)
{
unsigned char hash=0;
scoped_timer t("GIL coord access",images().size(),"Megapixels");
for (images_t::const_iterator it=images().begin();it!=images().end();++it)
{
const boost::gil::gray8c_view_t v=boost::gil::const_view(**it);
for (int y=0;y