
Lubomir Bourdev wrote:
I don't see the big convenience of having copy_pixels do implicit conversion.
I agree that there is no advantage at all in a direct call of copy_pixels. But I'm thinking about conversions happening in nested function calls, where the intermediate types are _deduced_ (by means of traits and little template metaprograms). Consequentially, the appropriate conversions must also be deduced, and a default conversion is just the simplest form of deduction. Type deduction is central to VIGRA. For example, gaussianSmoothing(byte_image_src, byte_image_dest); is actually executed as a separable convolution gaussianSmoothingX(byte_image_src, temp_image); gaussianSmoothingY(temp_image, byte_image_dest); where the type of temp_image is automatically determined, and both calls involve an automatic conversion. (I admit that the customization options of this behavior could be improved.) With deeper nesting, customization of this behavior can become somewhat complicated, and defaults will be useful (or even required).
Unfortunately, the CPU itself violates rule 3
(That seems quite a serious problem though! Can you point me at a document describing this? Which CPUs are affected?)
We learned it the hard way. AFAIK, it affects Intel CPUs and compatible. Registers have more than 64 bits for higher accuracy, but this is not appropriately handled in the comparisons. One can switch off the extra bits, but this throws out the baby with the bath water. I'm sure, someone at Adobe knows everything about this problem and its optiomal solution. Please keep me informed.
So in this case the range is -infinity to infinity. It is still defined. But I would argue that most of the time the range is finite.
Yes, but often it is not necessary to specify the range explicitly.
Floating point operations have higher latency and lower throughput because of fewer functional units available to process them.
When you time 32-bit integers against 32-bit floats (so that memory throughput is the same) on a modern desktop machine, the difference is small (if it exists at all). Small computers (e.g. PDAs and cell phones) are a different story, where I don't have much experience.
Another issue is their size and ability to fit in the cache, since they are typically four to eight times larger than a char.
Well, to do image processing with any kind of accuracy, you will need at least 16-bit integers. Then the difference to 32-bit float shouldn't be that big.
A third issue is the performance of floating point to integer conversion on many common architectures.
Indeed, these are real performance killers. That's why we tend to work in floating point throughout, when we don't need the last bit of speed. After all, the 25% speed-up of your face detector is not that impressive, given that it was probably a lot of work. We made a similar experience with replacing floating point by fixed point in some application -- it was faster, but hardly that much faster to justify the effort and loss in genericity.
This is why providing generic algorithms that can work natively on integral types (unsigned char, short, int) is very important for GIL. This necessitates providing a suite of atomic channel-level operations (like channel_invert, channel_multiply, channel_convert) that have performance specializations for various channel types.
What I often do is to specialize the functors. For example, a LinearRangeMappingFunctor computes a linear transformation at each pixel by default, but for uint8, it computes a look-up table in its constructor. The specialized functor can be created automatically.
I am not arguing that there are contexts in which knowing the range is not important - of course there are! All I am saying is that the ranges matter at least for _some_ operations.
No doubt about that. Perhaps, the notion of a range is just too general? It might be better to study the semantics of various uses of ranges and provide the appropriate specializations on this basis. For example, one specialization I was thinking about is a 'fraction' which maps an arbitray range onto the semantic interval 0...1. For example, Fraction<unsigned char> is the type of the standard 8-bit color channel, but Fraction<unsigned char, 0, 200> Fraction<unsigned short, 1000, 16000> would be possible as well, and the lower and upper bounds represent 0 and 1 respectively. The default bounds would be numeric_limits::min and numeric_limits::max. Fraction<float, 0, 1> would be a float restricted to the interval 0...1 (which could be mapped to a native float, depending on the out-of-bounds policy). A traits class can specify how out-of-bounds values are handled (e.g. by clamping, or by simply allowing them) and how mixed-type expressions are to be coerced. I suppose you have benchmarked the abstraction penalty of ideas similar to this -- can you send me some of the data? What other semantic interpretations of ranges are required?
It is not against GIL principles to have intermediate values outside the range when it makes sense, as long as you know what you are doing.
OK, that makes sense.
1. Provide a metafunction to construct a channel type from a (built-in) type and range. For example, here is how we could wrap a float into a class and associate the range [0..1] with it:
typedef channel_type<float,0,1>::type bits32f;
That's very similar to my Fraction proposal above. You would then just write channel_type<Fraction<float,0,1> >::type which also assigns a meaning to the range. And if out-of-bounds handling was 'ALLOW_OUT_OF_BOUNDS', that type could be a native float.
C. Like A, but associate ranges with certain built-in types (like 0..1 with float)
This is essentially what GIL does currently. The advantage is that in the vast majority of cases you can use built-in types as channels (no abstraction penalty) and they will do what you want.
Well, I prefer clamping over modulo arithmetic as a default, which is not quite built-in for the integral types.
In my opinion tiled images are a different story, they cannot be just > abstracted out and hidden under the rug the way planar/interleaved images > can.
I'm not so pessimistic. I have some ideas about how algorithms could be easily prepared for handling tiled storage formats.
We would be very interested in hearing more about this. But I must be misunderstanding you because I can't imagine how this could possibly be. How could you have a scheme for taking any inherently global algorithm (like flood-fill) and making it tile-friendly.
This is certainly a difficult one, but I guess there exists some parallel version written in the Golden Age of Parallel Image Processing (which ended because the serial computers improved faster than people were able to write parallel algorithms). But for a general solution, I was thinking mainly about the simpler functions, like pixel transformations, filters, morphology, local edge detectors, perhaps geometric transformations and warping. Ulli -- ________________________________________________________________ | | | Ullrich Koethe Universitaet Hamburg / University of Hamburg | | FB Informatik / Dept. of Informatics | | AB Kognitive Systeme / Cognitive Systems Group | | | | Phone: +49 (0)40 42883-2573 Vogt-Koelln-Str. 30 | | Fax: +49 (0)40 42883-2572 D - 22527 Hamburg | | Email: u.koethe@computer.org Germany | | koethe@informatik.uni-hamburg.de | | WWW: http://kogs-www.informatik.uni-hamburg.de/~koethe/ | |________________________________________________________________|