
Greg Reese, Ph.D. Senior Research Computing Specialist Research Computing Support Group Gaskill Hall, Room 352 Miami University Dr. Reese wrote the following review of the GIL library (I've pasted it wiithout modification below): * * Review of Generic Image Library (GIL) Version Generic Image Library Design Guide, Version 1.01, October 2, 2006 Generic Image Library Tutorial, Version 1.01, October 2, 2006 Code downloaded October 11, 2006 SHORT REVIEW What is your evaluation of the design? Apart from my major concern (see below) the design seems to meet its goals of generality without loss of speed. What is your evaluation of the implementation? Good implementation, especially having everything done in headers, with no library necessary. Too much reliance on proxies may lessen use of STL though. What is your evaluation of the documentation? Excellent documentation. When released suggest adding a lot of examples and checking spelling in the docs. What is your evaluation of the potential usefulness of the library? Unfortunately, I think it's going to be tough to get people to use it. One obvious drawback is the lack of algorithms. This will change with time though. I think a bigger drawback is going to be the highly complex appearance of the code. Applications programmers are going to look at it and faint, even when told that although the internal construction of the code is complicated its use is not. The authors have put in typedef's for some common image and pixel types and I encourage them to add even more typedef's and hide as much of the complexity as possible. I think the library will be most useful in projects that use images of many different types of pixels. In this case the tradeoff between the generality of the library and its complexity is beneficial. In other cases there may have to be a real marketing push. The authors have made a good start in their video tutorial by answering the top ten excuses for not using GIL. Did you try to use the library? With what compiler? Did you have any problems? Tried to use it with Borland C++ Builder 2006 but couldn't because of compilation problems with Boost. Used it with no problems in Microsoft Visual C++ 2005 Express Edition. How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? A good amount of effort. Read the tutorial and design guides pretty carefully, watched the online video guide and ran the two sample programs. Are you knowledgeable about the problem domain? Yes. LONG REVIEW First of all, I'd like to thank Mssrs. Bourdev and Jin for writing GIL. They have obviously put a lot of effort and thought into the library. My thanks to them and Adobe also for making it open source. I hope the authors find the comments below constructive. Because of the complexity of the software and the limited time to review, some of the problems I point out may in fact be problems in my understanding and not in the library itself. If that's the case – great! It makes those non-existent problems much easier to solve. Major Concern My major concern is the lack of requirements/concepts on arithmetic between Pixels, i.e., pixel classes/structures. For example, in the Guide there are numerous code snippets for the computation of a gradient. These take the form of c = (a – b) / 2 where a and b are uint8s and c is an int8. Unless the subtraction operator is defined in some special way not mentioned in the Guide, all of these examples are wrong. The difference of a and b can easily be a negative number but it will be interpreted as a uint8, giving a weird result. Since the most common pixels are uint8 grays and RGB colors made of triplets of uint8's, this is a serious concern. When performing arithmetic with uint8s in image processing, the results need to be saturated, i.e., values above 255 need to be set to 255 and values below 0 need to be set to 0. One way to do this to define the binary arithmetic operators so that they convert to int16, do the arithmetic between the two operands, saturate the values if necessary and convert back to uint8. If this technique is used, the requirement for this behavior should be placed on Pixel. Note too that the operator actions depend on the data type. In many applications, its okay to assume that there won't be any over/underflow when doing arithmetic for int16s because the number of terms and the pixels' values are usually small. The same holds true for not converting floats to doubles. However, it wouldn't be unusual for a program to want to convert int16s to int32s, so that needs to be an option. Unfortunately, simply defining binary arithmetic between Pixels is not sufficient. For example, suppose that we've defined binary addition between uint8s as above, so that saturation is handled correctly. Now consider finding the average of two uint8s a and b, each equal to 255. The answer is obviously 255. However, if we use the formula average = (a + b) / 2, a and b will sum to 255 because of saturation, and 255/2 yields a value for the average of 127. This problem with pixel arithmetic doesn't affect GIL per se because it only represents images, not actions on them. However, having a completely generic image representation that doesn't accommodate pixel arithmetic makes the representation useless for image processing. Moreover, now that the authors have finished the bulk of work on the image software and are starting on the algorithms to process them, the majority of the algorithms will involve pixel arithmetic, so a general concept of arithmetic between Pixels should now be a concern. Minor concerns * The I/O library really should have facilities to read image headers. Suggestions/questions * The channels provide their min and max values, but it would be nice if they could also provide the min and max values that the data they hold can attain, i.e., how many bits of data there are, not of data storage. For example, MRIs usually have a 12-bit data range but are stored in 16-bit pixels. * Can GIL accommodate images with a "virtual range", i.e., in which the x-y range the user accesses is different than the x-y range of the actual data? For example, the Fourier Transform of an n x n image is an image of n x n (complex) numbers. However, half the transformed numbers are redundant and are not stored. Thus it would be useful to let the user access the transformed image as if it were n x n but let the image representation retrieve the appropriate value if the user requests one that is not in the stored range. Perhaps this can be done through some sort of pixel iterator adaptor. How much work would it be entailed for all of the library to work on images like these with a virtual range? * How ubiquitous are proxies in GIL? They can't be used in STL algorithms, which is a big drawback. * The use of piped views for processing makes me nervous. For example, the Tutorial states that one way to compute the y-gradient is to rotate an image 90°, compute the x-gradient, and rotate it back. First of all this only works for kernels that have that rotational symmetry, e.g., square or circular kernels. It doesn't work with rectangular kernels. Secondly, even if the kernels have the correct symmetry, the process won't yield usable results if the spatial resolution in the x- and y-directions is different, as happens often with scanners. No computations done by the library should be performed in this indirect manner, or at a minimum, the documentation should clearly warn about these potential problems. Finally, one possible extension to the design that may be helpful is to incorporate the idea of an image region. This is a generalization of the subimage in GIL but is not limited to a rectangle. Image regions are useful because they help carry out a typical chain of image processing, which goes something like this: 1. Image enhancement - after the image is acquired, it is processed to bring out information of interest, e.g., made brighter, made sharper, converted from gray to color or vice versa. This kind of processing is done by the algorithms that the GIL people are starting to write. 2. Image segmentation – the image is divided into a moderately small number of regions, with all pixels in a region representing one physical quantity, e.g., diseased plants, a particular metal, cancerous cells, etc. 3. Image representation and description – possible connection of regions into objects, e.g., all cancerous cells in these three segments are part of the same cancer; descriptions of groups of pixels by other measurements, e.g., boundary length, circularity, etc. 4. Image interpretation – deciding what physical objects the region representations correspond to For example, consider an industrial CT (computerized) scan of an engine. It's easy to distinguish the air from the engine metal, so the image may be enhanced by using less pixels values to represent air than metal. The picture could then be segmented into regions representing air and different kinds of metals or other engine materials. These segments could then possibly be connected into objects, perhaps depending on the distance the segments are separated by, their constituent materials, the intervening material (air or metal). Their properties can also be computed, e.g., size, shape, boundary brightness. Finally some sort of artificial intelligence software could decide if a region is a carburetor, valve, etc. In all cases, it's often necessary to refer back to the pixels in the original image that are denoted by a particular segment or region. In the above example, it would be reasonable for the CT image to have 8 or 10 bits per pixel (bpp), a segmented image to have 1 bpp ( a crude segmentation into air and metal) or 4 bpp (air and 15 kinds of metal or engine material), and a representation image to have 3 or 4 bpp Anyway, some characteristics of a GIL region could be: o The user should be able to define any region (connected or not, any shape) in the original image. o The pixel type can be different than that of the original image. o The image regions should be "stackable", i.e., more than one at a time referring to the original image, like channels in a Pixel o The region should have iterators that travel only over that region but that can access pixel values of the region and pixel values of the original image that correspond to a region. There are various ways to accomplish this region stuff. One way may be by appropriately defining the current subimage structure (a view, I believe) and iteration over it, which is why I brought this topic up. Once again, thanks to the authors for their hard work. If there are questions, I can contacted at: reesegj at muohio.edu Greg Reese