Re: [boost] GIL - late review

16 Nov 2006

      ...
Lubomir Bourdev wrote:
I don't see the big convenience of having copy_pixels do implicit
conversion.
I agree that there is no advantage at all in a direct call of copy_pixels. 
But I'm thinking about conversions happening in nested function calls, 
where the intermediate types are _deduced_ (by means of traits and little 
template metaprograms). Consequentially, the appropriate conversions must 
also be deduced, and a default conversion is just the simplest form of 
deduction.

Type deduction is central to VIGRA. For example,

  gaussianSmoothing(byte_image_src, byte_image_dest);

is actually executed as a separable convolution

  gaussianSmoothingX(byte_image_src, temp_image);
  gaussianSmoothingY(temp_image, byte_image_dest);

where the type of temp_image is automatically determined, and both calls 
involve an automatic conversion. (I admit that the customization options 
of this behavior could be improved.) With deeper nesting, customization of 
this behavior can become somewhat complicated, and defaults will be useful 
(or even required).
...
...
Unfortunately, the CPU itself violates rule 3
(That seems quite a serious problem though! Can you point me at a
document describing this? Which CPUs are affected?)
We learned it the hard way. AFAIK, it affects Intel CPUs and compatible. 
Registers have more than 64 bits for higher accuracy, but this is not 
appropriately handled in the comparisons. One can switch off the extra 
bits, but this throws out the baby with the bath water. I'm sure, someone 
at Adobe knows everything about this problem and its optiomal solution. 
Please keep me informed.
...
So in this case the range is -infinity to infinity. It is still defined.
But I would argue that most of the time the range is finite.
Yes, but often it is not necessary to specify the range explicitly.
...
Floating point operations have higher latency and lower
throughput because of fewer functional units available to process them.
When you time 32-bit integers against 32-bit floats (so that memory 
throughput is the same) on a modern desktop machine, the difference is 
small (if it exists at all). Small computers (e.g. PDAs and cell phones) 
are a different story, where I don't have much experience.
...
Another issue is their size and ability to fit in the cache, since they
are typically four to eight times larger than a char.
Well, to do image processing with any kind of accuracy, you will need at 
least 16-bit integers. Then the difference to 32-bit float shouldn't be 
that big.
...
A third issue is
the performance of floating point to integer conversion on many common
architectures.
Indeed, these are real performance killers. That's why we tend to work in 
floating point throughout, when we don't need the last bit of speed. After 
all, the 25% speed-up of your face detector is not that impressive, given 
that it was probably a lot of work. We made a similar experience with 
replacing floating point by fixed point in some application -- it was 
faster, but hardly that much faster to justify the effort and loss in 
genericity.
...
This is why
providing generic algorithms that can work natively on integral types
(unsigned char, short, int) is very important for GIL. This necessitates
providing a suite of atomic channel-level operations (like
channel_invert, channel_multiply, channel_convert) that have performance
specializations for various channel types.
What I often do is to specialize the functors. For example, a 
LinearRangeMappingFunctor computes a linear transformation at each pixel 
by default, but for uint8, it computes a look-up table in its constructor. 
The specialized functor can be created automatically.
...
I am not arguing that there are contexts in which knowing the range is
not important - of course there are!
All I am saying is that the ranges matter at least for _some_
operations.
No doubt about that. Perhaps, the notion of a range is just too general? 
It might be better to study the semantics of various uses of ranges and 
provide the appropriate specializations on this basis.

For example, one specialization I was thinking about is a 'fraction' which 
maps an arbitray range onto the semantic interval 0...1.
For example,

Fraction<unsigned char>

is the type of the standard 8-bit color channel, but

Fraction<unsigned char, 0, 200>
Fraction<unsigned short, 1000, 16000>

would be possible as well, and the lower and upper bounds represent 0 and 
1 respectively. The default bounds would be numeric_limits::min and 
numeric_limits::max.

Fraction<float, 0, 1>

would be a float restricted to the interval 0...1 (which could be mapped 
to a native float, depending on the out-of-bounds policy). A traits class 
can specify how out-of-bounds values are handled (e.g. by clamping, or by 
simply allowing them) and how mixed-type expressions are to be coerced. I 
suppose you have benchmarked the abstraction penalty of ideas similar to 
this -- can you send me some of the data?

What other semantic interpretations of ranges are required?
...
It is not against GIL principles to have intermediate values outside the
range when it makes sense, as long as you know what you are doing.
OK, that makes sense.
...
1. Provide a metafunction to construct a channel type from a (built-in)
type and range. For example, here is how we could wrap a float into a
class and associate the range [0..1] with it:
typedef channel_type<float,0,1>::type bits32f;
That's very similar to my Fraction proposal above. You would then just write

  channel_type<Fraction<float,0,1> >::type

which also assigns a meaning to the range. And if out-of-bounds handling 
was 'ALLOW_OUT_OF_BOUNDS', that type could be a native float.
...
C. Like A, but associate ranges with certain built-in types (like 0..1
with float)
This is essentially what GIL does currently. The advantage is that in
the vast majority of cases you can use built-in types as channels (no
abstraction penalty) and they will do what you want.
Well, I prefer clamping over modulo arithmetic as a default, which is not 
quite built-in for the integral types.
...
...
...
In my opinion tiled images are a different story, they 
cannot be just  > abstracted out and hidden under the rug the 
way planar/interleaved images  > can.
I'm not so pessimistic. I have some ideas about how 
algorithms could be easily prepared for handling tiled 
storage formats.
We would be very interested in hearing more about this. But I must be
misunderstanding you because I can't imagine how this could possibly be.
How could you have a scheme for taking any inherently global algorithm
(like flood-fill) and making it tile-friendly.
This is certainly a difficult one, but I guess there exists some parallel 
version written in the Golden Age of Parallel Image Processing (which 
ended because the serial computers improved faster than people were able 
to write parallel algorithms).

But for a general solution, I was thinking mainly about the simpler 
functions, like pixel transformations, filters, morphology, local edge 
detectors, perhaps geometric transformations and warping.

Ulli
-- 
  ________________________________________________________________
|                                                                |
| Ullrich Koethe  Universitaet Hamburg / University of Hamburg   |
|                 FB Informatik        / Dept. of Informatics    |
|                 AB Kognitive Systeme / Cognitive Systems Group |
|                                                                |
| Phone: +49 (0)40 42883-2573                Vogt-Koelln-Str. 30 |
| Fax:   +49 (0)40 42883-2572                D - 22527 Hamburg   |
| Email: u.koethe@computer.org               Germany             |
|        koethe@informatik.uni-hamburg.de                        |
| WWW:   http://kogs-www.informatik.uni-hamburg.de/~koethe/      |
|________________________________________________________________|