[boost] GIL Review

25 Oct 2006

      Greg Reese, Ph.D.
Senior Research Computing Specialist
Research Computing Support Group
Gaskill Hall, Room 352
Miami University

Dr. Reese wrote the following review of the GIL library (I've pasted it
wiithout modification below): *
*
Review of Generic Image Library (GIL)

Version

Generic Image Library Design Guide, Version 1.01, October 2, 2006
Generic Image Library Tutorial, Version 1.01, October 2, 2006
Code downloaded October 11, 2006

SHORT REVIEW

What is your evaluation of the design?
Apart from my major concern (see below) the design seems to meet its goals
of generality without loss of speed.

What is your evaluation of the implementation?
Good implementation, especially having everything done in headers, with no
library necessary. Too much reliance on proxies may lessen use of STL
though.

What is your evaluation of the documentation?
Excellent documentation. When released suggest adding a lot of examples and
checking spelling in the docs.

What is your evaluation of the potential usefulness of the library?
Unfortunately, I think it's going to be tough to get people to use it. One
obvious drawback is the lack of algorithms. This will change with time
though. I think a bigger drawback is going to be the highly complex
appearance of the code. Applications programmers are going to look at it and
faint, even when told that although the internal construction of the code is
complicated its use is not. The authors have put in typedef's for some
common image and pixel types and I encourage them to add even more typedef's
and hide as much of the complexity as possible.

I think the library will be most useful in projects that use images of many
different types of pixels. In this case the tradeoff between the generality
of the library and its complexity is beneficial. In other cases there may
have to be a real marketing push. The authors have made a good start in
their video tutorial by answering the top ten excuses for not using GIL.

Did you try to use the library? With what compiler? Did you have any
problems?
Tried to use it with Borland C++ Builder 2006 but couldn't because of
compilation problems with Boost. Used it with no problems in Microsoft
Visual C++ 2005 Express Edition.

How much effort did you put into your evaluation? A glance? A quick reading?
In-depth study?
A good amount of effort. Read the tutorial and design guides pretty
carefully, watched the online video guide and ran the two sample programs.

Are you knowledgeable about the problem domain?
Yes.

LONG REVIEW

First of all, I'd like to thank Mssrs. Bourdev and Jin for writing GIL. They
have obviously put a lot of effort and thought into the library. My thanks
to them and Adobe also for making it open source.

I hope the authors find the comments below constructive. Because of the
complexity of the software and the limited time to review, some of the
problems I point out may in fact be problems in my understanding and not in
the library itself. If that's the case – great! It makes those non-existent
problems much easier to solve.

Major Concern

My major concern is the lack of requirements/concepts on arithmetic between
Pixels, i.e., pixel classes/structures. For example, in the Guide there are
numerous code snippets for the computation of a gradient. These take the
form of           c = (a – b) / 2

where a and b are uint8s and c is an int8. Unless the subtraction operator
is defined in some special way not mentioned in the Guide, all of these
examples are wrong. The difference of a and b can easily be a negative
number but it will be interpreted as a uint8, giving a weird result. Since
the most common pixels are uint8 grays and RGB colors made of triplets of
uint8's, this is a serious concern.

When performing arithmetic with uint8s in image processing, the results need
to be saturated, i.e., values above 255 need to be set to 255 and values
below 0 need to be set to 0. One way to do this to define the binary
arithmetic operators so that they convert to int16, do the arithmetic
between the two operands, saturate the values if necessary and convert back
to uint8. If this technique is used, the requirement for this behavior
should be placed on Pixel.

Note too that the operator actions depend on the data type. In many
applications, its okay to assume that there won't be any over/underflow when
doing arithmetic for int16s because the number of terms and the pixels'
values are usually small. The same holds true for not converting floats to
doubles. However, it wouldn't be unusual for a program to want to convert
int16s to int32s, so that needs to be an option.

Unfortunately, simply defining binary arithmetic between Pixels is not
sufficient. For example, suppose that we've defined binary addition between
uint8s as above, so that saturation is handled correctly. Now consider
finding the average of two uint8s a and b, each equal to 255. The answer is
obviously 255. However, if we use the formula     average = (a + b) / 2,   a
and b will sum to 255 because of saturation, and 255/2 yields a value for
the average of 127.

This problem with pixel arithmetic doesn't affect GIL per se because it only
represents images, not actions on them. However, having a completely generic
image representation that doesn't accommodate pixel arithmetic makes the
representation useless for image processing. Moreover, now that the authors
have finished the bulk of work on the image software and are starting on the
algorithms to process them, the majority of the algorithms will involve
pixel arithmetic, so a general concept of arithmetic between Pixels should
now be a concern.

Minor concerns

    * The I/O library really should have facilities to read image headers.

Suggestions/questions

    * The channels provide their min and max values, but it would be nice if
they could also provide the min and max values that the data they hold can
attain, i.e., how many bits of data there are, not of data storage. For
example, MRIs usually have a 12-bit data range but are stored in 16-bit
pixels.
    * Can GIL accommodate images with a "virtual range", i.e., in which the
x-y range the user accesses is different than the x-y range of the actual
data? For example, the Fourier Transform of an n x n image is an image of n
x n (complex) numbers. However, half the transformed numbers are redundant
and are not stored. Thus it would be useful to let the user access the
transformed image as if it were n x n but let the image representation
retrieve the appropriate value if the user requests one that is not in the
stored range. Perhaps this can be done through some sort of pixel iterator
adaptor. How much work would it be entailed for all of the library to work
on images like these with a virtual range?
    * How ubiquitous are proxies in GIL? They can't be used in STL
algorithms, which is a big drawback.
    * The use of piped views for processing makes me nervous. For example,
the Tutorial states that one way to compute the y-gradient is to rotate an
image 90°, compute the x-gradient, and rotate it back. First of all this
only works for kernels that have that rotational symmetry, e.g., square or
circular kernels. It doesn't work with rectangular kernels. Secondly, even
if the kernels have the correct symmetry, the process won't yield usable
results if the spatial resolution in the x- and y-directions is different,
as happens often with scanners. No computations done by the library should
be performed in this indirect manner, or at a minimum, the documentation
should clearly warn about these potential problems.

      Finally, one possible extension to the design that may be helpful is
to incorporate the idea of an image region. This is a generalization of the
subimage in GIL but is not limited to a rectangle. Image regions are useful
because they help carry out a typical chain of image processing, which goes
something like this:

         1. Image enhancement - after the image is acquired, it is processed
to bring out information of interest, e.g., made brighter, made sharper,
converted from gray to color or vice versa. This kind of processing is done
by the algorithms that the GIL people are starting to write.
         2. Image segmentation – the image is divided into a moderately
small number of regions, with all pixels in a region representing one
physical quantity, e.g., diseased plants, a particular metal, cancerous
cells, etc.
         3. Image representation and description – possible connection of
regions into objects, e.g., all cancerous cells in these three segments are
part of the same cancer;  descriptions of groups of pixels by other
measurements, e.g., boundary length, circularity, etc.
         4. Image interpretation – deciding what physical objects the region
representations correspond to

For example, consider an industrial CT (computerized) scan of an engine.
It's easy to distinguish the air from the engine metal, so the image may be
enhanced by using less pixels values to represent air than metal. The
picture could then be segmented into regions representing air and different
kinds of metals or other engine materials. These segments could then
possibly be connected into objects, perhaps depending on the distance the
segments are separated by, their constituent materials, the intervening
material (air or metal). Their properties can also be computed, e.g., size,
shape, boundary brightness. Finally some sort of artificial intelligence
software could decide if a region is a carburetor, valve, etc. In all cases,
it's often necessary to refer back to the pixels in the original image that
are denoted by a particular segment or region.

In the above example, it would be reasonable for the CT image to have 8 or
10 bits per pixel (bpp), a segmented image to have 1 bpp ( a crude
segmentation into air and metal) or 4 bpp (air and 15 kinds of metal or
engine material), and  a representation image to have 3 or 4 bpp

Anyway, some characteristics of a GIL region could be:

          o The user should be able to define any region (connected or not,
any shape) in the original image.
          o The pixel type can be different than that of the original image.
          o The image regions should be "stackable", i.e., more than one at
a time referring to the original image, like channels in a Pixel
          o The region should have iterators that travel only over that
region but that can access pixel values of the region and pixel values of
the original image that correspond to a region.

There are various ways to accomplish this region stuff. One way may be by
appropriately defining the current subimage structure (a view, I believe)
and iteration over it, which is why I brought this topic up.

Once again, thanks to the authors for their hard work.

If there are questions, I can contacted at:
reesegj at muohio.edu

Greg Reese

[boost] GIL Review

Tom Brinkman