
Apologies for the late review. Aside from the voting, I hope it will still be of some use. I appreciate the efforts of the authors, as well as the opportunity to provide feedback. What is your evaluation of the design? -------------------------------------- Elegant, but it takes a slightly simplistic view of images, in some regards. I'm concerned that GIL's color spaces combine too many distinct concepts: memory layout, channel ordering, channel type & range, and a more pure definition of a color space (e.g. basis vectors, primaries, transfer function, etc.). In practice, this may require some users to define a large number of GIL color spaces. For example, a program or library that handles encoding/decoding of MPEG-4 video (non-studio profiles) has to deal with as many as 6 variants of YUV, 4 transfer functions, and two different scales of sample values (without getting into things like n-bit profile). In addition to that, professional video production systems will also have to deal with a variety of linear, non-linear, and log-scale RGB formats. Add RGBA, and you also have to deal with whether Alpha is premultiplied. Combined with a few different channel orderings and data layouts, I fear the result is such a multiplicity of combinations that the core purpose of GIL's color space construct would be defeated. Perhaps this is simply at odds with GIL's goal of uncompromising performance. Still, I think the library shouldn't simply exclude such cases. There should be ways to trade various amounts of efficiency for various amounts of runtime flexibility. While I generally agree with Lubomir about the numerical aspects of promotion & conversion traits (i.e. the safe default will be overkill, in most cases), I do think they could have both significant runtime and structural advantages. The runtime advantages would be mostly avoidance of extra conversion steps and may require a per-algorithm override, so that the policy can be tweaked on a case-by-case basis. The structural advantages would come from being able to establish a semantic relationship between the values of different channel types. In order to get this right, I think channel types would rarely be POD types, such as int and float. Instead, an example of what you might use is a unique type that represents its value as a float, but which can have distinct traits template specializations like channel_traits<T>::zero_value(), channel_traits<T>::unity_gain_value(), and channel_traits<T>::saturated_value() (i.e. "white" value - often different from the largest value representable by the type!). I should concede that I haven't developed this idea very far, though it may provide a foundation for addressing some of the concerns raised by Dr. Reese. Is there any way to create views of an image that include a subset of the channels (for subsets larger than 1), besides color_converted_view? Or is there some way color_converted_view might come for free, in this case (I didn't get a chance to look into that)? IMO, this is pretty important for processing RGBA images since, as Ullrich Koethe points out, it's often necessary to treat Alpha differently than RGB (or YUV). Same goes for Z-buffers, and a variety of other synthetic and captured channels one might imagine. Finally, GIL seems to lack any explicit notion of non-uniform sample structures. In video, 4:2:2, 4:1:1, and 4:2:0 are ubiquitous. An image representation that can efficiently support uniform presentation of channels with different sample frequencies and phases is important for high-performance video processing applications. I accept that this is beyond GIL's intended scope, though I'm citing it as a problem I had hoped GIL would solve. While I believe a synthetic view could be used to provide 4:4:4 access to images with other sample structures, such a mechanism would likely constitute a severe performance/quality tradeoff - at least for a serious video processing application. What is your evaluation of the implementation? ---------------------------------------------- Elegant. Good focus on performance & generality. I had a few stylistic issues that probably don't bear much discussion, here. Mostly concerning line length (207 columns!?!) and type names (pixel<> vs. color<> - I agree with Fernando; as well as more minor issues). What is your evaluation of the documentation? --------------------------------------------- I found the design guide a bit terse, in places. I'm not sure it's a good stand-alone resource for providing a high-level understanding of the library. I may not be a good judge, given my familiarity with the problem domain, and the fact that I first watched the Breeze presentation. Not to contradict others' criticisms, but I did find the Doxygen docs a helpful aid, when trying to navigate the header files. The Breeze presentation provided a great starting point. Overall, I feel the written documentation isn't quite up to the standard of other Boost library docs. I expect this may be an issue for some first time users of the library, and relatively new users trying to find answers to specific questions. What is your evaluation of the potential usefulness of the library? ------------------------------------------------------------------- It's probably useful enough to warrant acceptance, as is. Again, it's facilities for color conversion are hampered with the heavy overloading of the color space concept. Did you try to use the library? With what compiler? Did you have any problems? -------------------------------------------------------------------------------- No. How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? -------------------------------------------------------------------------------------------- Watched the Breeze presentation. Read the Design guide. Read all of the review-related discussions on the mailing list. Looked at many of the header files & Doxygen docs. Are you knowledgeable about the problem domain? ----------------------------------------------- Yes. Seven of my Ten years of professional experience have been focused on development of high performance software for: photorealistic 3D rendering, video compression, film & video post-production, and computer vision. Prior to that, computer graphics was one of my primary interests. Overall strengths ----------------- * Heavy use of templates & focus on matching the performance of non-generic code. * Use of STL idioms and compatibility with STL and Boost. * Good generalization of most concepts. * I like the focus on image containers, access, and conversions. These are applicable to nearly all graphics & imaging libraries and applications, whereas such a universal subset of image algorithms is virtually non-existent Weaknesses ---------- * Conflation of too many distinct concepts in color spaces. * In order to better address the problem of providing a unified interface over different image representations, more attention should be given to the semantics of certain pixel & channel values. This may help make type conversion & promotion more tractable and may result in more intuitive algorithm behavior. Do you think the library should be accepted as a Boost library? --------------------------------------------------------------- Yes. I feel the shortcomings mentioned above and by others neither prevent GIL from being usable nor useful. However, its usefulness (and perhaps usability) could be greatly enhanced, if these issues could be addressed more comprehensively. In particular, I hope the concepts of color space, data layout, and channel type can be better separated, at some point. Matt

Hi Matt, Thanks for your review! Matt Gruenke wrote:
I'm concerned that GIL's color spaces combine too many distinct concepts: memory layout, channel ordering, channel type & range, and a more pure definition of a color space (e.g. basis vectors, primaries, transfer function, etc.). In practice, this may require some users to define a large number of GIL color spaces.
I guess our documentation was not very clear... The only aspects that GIL color spaces combine are the ordering and the name of the channels. Maybe using Fernando's suggestion things will be clearer - instead of color spaces, think of them as "pixel formats". Memory layout (planar vs interleaved, packed vs non-packed) is addressed elsewhere. The channel type and range are properties of the channels. All the other aspects you list, such as transfer functions and color profiles, make sense to be properties of the color conversion object. That means that you don't need to create a new pixel format to represent each combination of the above properties. Let's pick a concrete example, handling premultiplied alpha in RGBA. There are two ways you could do that: 1. You can certainly create a new pixel format. And you need just one line of code: struct premultiplied_rgba_t : public rgba_t {}; You can then provide color conversion to and from your new format. Since this is a new color format, you don't even need to create a custom color conversion object, you can just define the default color conversion for it. A pixel format is just a tag and it is trivial and easy to make new ones if you need to. You can also store constants and types inside it if this is convenient for you. 2. Alternatively, instead of creating a custom pixel format, you could define your own color conversion object, as described in Section 16 of the design guide. In particular, it will treat RGBA as pre-multiplied. Use that only when you know your image contains premultiplied alpha. Custom color conversion may be a better choice if you want to handle more advanced color conversion, such as using color profiles and transfer functions. The color conversion object is stored and is allowed to have a state. These two design decisions go to a fundamental problem that you need to resolve every time you want to add new functionality: does the new functionality make more sense to be a property of the type, or a property of the algorithm? There are tradeoffs for either approach. GIL does not force one choice on you - it lets you do either, or both.
For example, a program or library that handles encoding/decoding of MPEG-4 video (non-studio profiles) has to deal with as many as 6 variants of YUV, 4 transfer functions, and two different scales of sample values (without getting into things like n-bit profile). In addition to that, professional video production systems will also have to deal with a variety of linear, non-linear, and log-scale RGB formats. Add RGBA, and you also have to deal with whether Alpha is premultiplied. Combined with a few different channel orderings and data layouts, I fear the result is such a multiplicity of combinations that the core purpose of GIL's color space construct would be defeated.
Hopefully my description above addresses all these examples. You don't need to create a custom color space for every combination of possibilities. These variations, which are mostly orthogonal to each other, are best addressed in different GIL abstractions, which are also orthogonal, such as custom channels and channel algorithms, custom pixels, pixel references and iterators, custom color conversion objects, views, etc.
Perhaps this is simply at odds with GIL's goal of uncompromising performance. Still, I think the library shouldn't simply exclude such cases. There should be ways to trade various amounts of efficiency for various amounts of runtime flexibility.
I believe GIL provides you with a variety of design alternatives of extending functionality, each of which comes with a tradeoff between efficiency, ease of use, ease of implementation and runtime flexibility.
While I generally agree with Lubomir about the numerical aspects of promotion & conversion traits (i.e. the safe default will be overkill, in most cases), I do think they could have both significant runtime and structural advantages.
I am not arguing against traits. Using traits certainly makes coding more convenient. All I am arguing is that the types of intermediate results should not be hard-coded into the algorithms.
The runtime advantages would be mostly avoidance of extra conversion steps and may require a per-algorithm override, so that the policy can be tweaked on a case-by-case basis.
I am not sure how the presence of traits has to do with avoiding extra color conversion steps. Perhaps by "promition and conversion traits" we mean of different concepts. The way I see traits is as a set of metafunctions that give you a type of intermediate result based on input types and perhaps other criteria, such as the specific algorithms that are used. You don't need to have traits - you could in theory pass the intermediate types yourself. It is more convenient to use traits to provide suitable defaults in cases you don't care.
The structural advantages would come from being able to establish a semantic relationship between the values of different channel types. In order to get this right, I think channel types would rarely be POD types, such as int and float. Instead, an example of what you might use is a unique type that represents its value as a float, but which can have distinct traits template specializations like channel_traits<T>::zero_value(), channel_traits<T>::unity_gain_value(), and channel_traits<T>::saturated_value() (i.e. "white" value - often different from the largest value representable by the type!). I should concede that I haven't developed this idea very far, though it may provide a foundation for addressing some of the concerns raised by Dr. Reese.
Yes, I agree we need channel and color properties like the zero value, the white point, etc. One way of introducing them is as you outlined - provide custom channel types and associate traits with them. An alternative is to use built-in types, and pass the above information as a context to the algorithms. These are the exact same design decisions as I outlined above - should it be part of the type or part of the algorithm. GIL does not prevent you to make either choice.
Is there any way to create views of an image that include a subset of the channels (for subsets larger than 1), besides color_converted_view? Or is there some way color_converted_view might come for free, in this case (I didn't get a chance to look into that)? IMO, this is pretty important for processing RGBA images since, as Ullrich Koethe points out, it's often necessary to treat Alpha differently than RGB (or YUV). Same goes for Z-buffers, and a variety of other synthetic and captured channels one might imagine.
This should be easy to define, and we can reuse planar_ref for it. "planar_ref" is a pixel model whose channels can be at different locations in memory. It is currently used to represent a reference to a planar pixel, but we could also use it to represent a reference to a subset of the channels of a pixel, for example. And it works as both l-value and r-value reference.
Finally, GIL seems to lack any explicit notion of non-uniform sample structures. In video, 4:2:2, 4:1:1, and 4:2:0 are ubiquitous. An image representation that can efficiently support uniform presentation of channels with different sample frequencies and phases is important for high-performance video processing applications. I accept that this is beyond GIL's intended scope, though I'm citing it as a problem I had hoped GIL would solve. While I believe a synthetic view could be used to provide 4:4:4 access to images with other sample structures, such a mechanism would likely constitute a severe performance/quality tradeoff - at least for a serious video processing application.
Again, there are a variety of ways of implementing the video formats that vary between simplicity and performance. One easy way is to use the virtual view abstraction: keep a handle on the image. You will be called with a given X and Y coordinates, which allows you to compute the locations of the Y,Cb,Cr channels and return them. You could use the planar_ref to return them as an l-value, which will allow you to have a mutable view. (Of course, changing one component may result in changing components of other pixels, because they are shared). An alternative design is to provide a custom pixel iterator that keeps track of its position in the image. Upon dereference it will return a planar_ref with the corresponding channels. This is abstracting the problem at low level, which allows us to use the regular locator and image view classes and is potentially more efficient. I have no comments regarding the rest of your review. Thanks again for spending the time to review GIL! Lubomir

Lubomir Bourdev wrote:
I guess our documentation was not very clear... The only aspects that GIL color spaces combine are the ordering and the name of the channels. Maybe using Fernando's suggestion things will be clearer - instead of color spaces, think of them as "pixel formats".
Ouch. But "color space" is a known terminology. Search wikipedia and you'll get: http://en.wikipedia.org/wiki/Color_space. Now try "pixel format" and you'll get nothing. Do we really want to reinvent terminology? Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Joel de Guzman wrote:
Lubomir Bourdev wrote:
I guess our documentation was not very clear... The only aspects that GIL color spaces combine are the ordering and the name of the channels. Maybe using Fernando's suggestion things will be clearer - instead of color spaces, think of them as "pixel formats".
Ouch. But "color space" is a known terminology. Search wikipedia and you'll get: http://en.wikipedia.org/wiki/Color_space. Now try "pixel format" and you'll get nothing. Do we really want to reinvent terminology?
I think inventing new terminology is better than overloading or hijacking existing terminology. Furthermore, I believe good names accurately describe the concepts to which they refer. Finally, as a user, seeing an unfamiliar term will either prompt me to investigate it - or at least to treat it as an unknown, and therefore with appropriate caution. In contrast, misleading terminology gives the false sense of understanding and leads to misuse and unpleasant surprises. Regarding your supporting point, if I'm using a Boost library, the place I'd look for usage information is the library's docs - not Wikipedia. Of course, if questions about the problem domain (or common solution practices) arise, when reading the library docs, I would obviously turn to other resources. So long as the library makes precise use of standard terminology, and carefully documents non-standard terminology when standard terminology is non-existent or cannot be used precisely, I see no problem. Matt

Matt Gruenke wrote:
Joel de Guzman wrote:
Lubomir Bourdev wrote:
I guess our documentation was not very clear... The only aspects that GIL color spaces combine are the ordering and the name of the channels. Maybe using Fernando's suggestion things will be clearer - instead of color spaces, think of them as "pixel formats".
Ouch. But "color space" is a known terminology. Search wikipedia and you'll get: http://en.wikipedia.org/wiki/Color_space. Now try "pixel format" and you'll get nothing. Do we really want to reinvent terminology?
I think inventing new terminology is better than overloading or hijacking existing terminology. Furthermore, I believe good names accurately describe the concepts to which they refer. Finally, as a user, seeing an unfamiliar term will either prompt me to investigate it - or at least to treat it as an unknown, and therefore with appropriate caution.
I see no overloading or hijacking.
In contrast, misleading terminology gives the false sense of understanding and leads to misuse and unpleasant surprises.
Regarding your supporting point, if I'm using a Boost library, the place I'd look for usage information is the library's docs - not Wikipedia. Of course, if questions about the problem domain (or common solution practices) arise, when reading the library docs, I would obviously turn to other resources. So long as the library makes precise use of standard terminology, and carefully documents non-standard terminology when standard terminology is non-existent or cannot be used precisely, I see no problem.
That's not my point. Take a look at all the existing libraries that use concepts. Pick one; Vigra seems popular. Note: http://tinyurl.com/y3mgfn I see no "pixel format" there. Pick another one: http://tinyurl.com/18r. See the term "color space" again? "Color Space" *IS* known terminology. That's my point. Changing it would be foolish. Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Joel de Guzman wrote:
Matt Gruenke wrote:
Regarding your supporting point, if I'm using a Boost library, the place I'd look for usage information is the library's docs - not Wikipedia. Of course, if questions about the problem domain (or common solution practices) arise, when reading the library docs, I would obviously turn to other resources. So long as the library makes precise use of standard terminology, and carefully documents non-standard terminology when standard terminology is non-existent or cannot be used precisely, I see no problem.
That's not my point. Take a look at all the existing libraries that use concepts. Pick one; Vigra seems popular. Note: http://tinyurl.com/y3mgfn I see no "pixel format" there. Pick another one: http://tinyurl.com/18r. See the term "color space" again?
Oops. I wonder why tinyurl didn't pick that one. Try http://tinyurl.com/yye9tq. And see the term "color space". Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Joel de Guzman wrote:
Matt Gruenke wrote:
Joel de Guzman wrote:
I think inventing new terminology is better than overloading or hijacking existing terminology. Furthermore, I believe good names accurately describe the concepts to which they refer. Finally, as a user, seeing an unfamiliar term will either prompt me to investigate it - or at least to treat it as an unknown, and therefore with appropriate caution.
I see no overloading or hijacking.
The classical definition of color space is not concerned with data layout (i.e. channel ordering). In many cases, the channel names in common use are heavily overloaded, and are therefore quite inadequate for specifying a color space. Data format and pixel semantics are orthogonal concepts. A primary goal of GIL is to separate these concepts. IMO, the names used in GIL should reflect this.
That's not my point. Take a look at all the existing libraries that use concepts. Pick one; Vigra seems popular.
As far as I can tell, GIL does not seek to be only as good as other libraries, in what it does - nor should it. In fact, its main objective seems to be to take a principled approach to the problem of image representation, access, and conversion and, in so doing, to provide a better foundation for imaging libraries and applications than existing solutions. Otherwise, why bother? Years of programming experience have taught me that semantics matter. Clouded semantics limit extensibility and cause confusion, bugs, and redundancy. (Ambiguity should be deliberate, and carefully bounded.)
"Color Space" *IS* known terminology. That's my point.
"known" and "accurate" are distinct criteria. I think I've made my points. In the end, it's up to Lubomir and Hailin. Matt

Matt Gruenke wrote:
Joel de Guzman wrote:
Matt Gruenke wrote:
Joel de Guzman wrote:
I think inventing new terminology is better than overloading or hijacking existing terminology. Furthermore, I believe good names accurately describe the concepts to which they refer. Finally, as a user, seeing an unfamiliar term will either prompt me to investigate it - or at least to treat it as an unknown, and therefore with appropriate caution.
I see no overloading or hijacking.
The classical definition of color space is not concerned with data layout (i.e. channel ordering). In many cases, the channel names in common use are heavily overloaded, and are therefore quite inadequate for specifying a color space.
I disagree. A color space is a tuple of numbers. It is concerned with data and layout as far as I can see. That is the very definition of "tuple".
Data format and pixel semantics are orthogonal concepts. A primary goal of GIL is to separate these concepts. IMO, the names used in GIL should reflect this.
I agree. That is why I support the idea that color spaces as tuples where the algorithms are separated from the containers. IOTW, color spaces are separate from color-space algorithms.
That's not my point. Take a look at all the existing libraries that use concepts. Pick one; Vigra seems popular.
As far as I can tell, GIL does not seek to be only as good as other libraries, in what it does - nor should it. In fact, its main objective seems to be to take a principled approach to the problem of image representation, access, and conversion and, in so doing, to provide a better foundation for imaging libraries and applications than existing solutions. Otherwise, why bother?
Years of programming experience have taught me that semantics matter. Clouded semantics limit extensibility and cause confusion, bugs, and redundancy. (Ambiguity should be deliberate, and carefully bounded.)
Years of experience has taught me that gratuitous invention of new terms is not a good thing.
"Color Space" *IS* known terminology. That's my point.
"known" and "accurate" are distinct criteria.
I agree. And to me "pixel-format" is unknown, confusing and inaccurate. The worst of the worst. It's like calling a std::vector a data-container just because the std::vector does not quite follow the "classical" term of vector. Let's not be too pedantic. ( Clearly, this is getting to be a bicycle-shed, so I'll just let it be and let the authors do what they think is best. of all things I do not like are naming games. ) Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Joel de Guzman wrote:
Matt Gruenke wrote:
The classical definition of color space is not concerned with data layout (i.e. channel ordering). In many cases, the channel names in common use are heavily overloaded, and are therefore quite inadequate for specifying a color space.
I disagree. A color space is a tuple of numbers. It is concerned with data and layout as far as I can see. That is the very definition of "tuple".
That's a color (or pixel value). A color space provides the semantics of these numbers, possibly including how they are converted to other color spaces. This may include its basis, CIE primaries, transfer function, gamut, etc. (For more information, here's a good starting point: http://www.poynton.com/notes/colour_and_gamma/ColorFAQ.html ) A pixel format establishes the mapping between channels of an image (i.e. color, pixel value, etc.) and components of a color space.
Years of experience has taught me that gratuitous invention of new terms is not a good thing.
Yes, and I think the distinction between color space and pixel format not only isn't gratuitous - it's quite well justified.
"Color Space" *IS* known terminology. That's my point.
"known" and "accurate" are distinct criteria.
I agree. And to me "pixel-format" is unknown, confusing and inaccurate. The worst of the worst. It's like calling a std::vector a data-container just because the std::vector does not quite follow the "classical" term of vector. Let's not be too pedantic.
I'm not sure that's a good analogy. In what ways are you suggesting vector an inaccurate description of that container? Matt

Matt Gruenke wrote:
Joel de Guzman wrote:
I disagree. A color space is a tuple of numbers. It is concerned with data and layout as far as I can see. That is the very definition of "tuple".
That's a color (or pixel value).
A color space provides the semantics of these numbers, possibly including how they are converted to other color spaces. This may include its basis, CIE primaries, transfer function, gamut, etc. (For more information, here's a good starting point: http://www.poynton.com/notes/colour_and_gamma/ColorFAQ.html )
A pixel format establishes the mapping between channels of an image (i.e. color, pixel value, etc.) and components of a color space.
Now I see the confusion. There are overlaps in the definition of colors and color-spaces of course. Did you think they were orthogonal? See: http://tinyurl.com/18r for the definition of color-spaces. Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Joel de Guzman wrote:
Matt Gruenke wrote:
Joel de Guzman wrote:
I disagree. A color space is a tuple of numbers. It is concerned with data and layout as far as I can see. That is the very definition of "tuple".
That's a color (or pixel value).
A color space provides the semantics of these numbers, possibly including how they are converted to other color spaces. This may include its basis, CIE primaries, transfer function, gamut, etc. (For more information, here's a good starting point: http://www.poynton.com/notes/colour_and_gamma/ColorFAQ.html )
A pixel format establishes the mapping between channels of an image (i.e. color, pixel value, etc.) and components of a color space.
Now I see the confusion. There are overlaps in the definition of colors and color-spaces of course. Did you think they were orthogonal? See: http://tinyurl.com/18r for the definition of color-spaces.
tinyurl mistake again! One more try: http://tinyurl.com/veebq Regards, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Joel de Guzman wrote:
Joel de Guzman wrote:
Matt Gruenke wrote:
Joel de Guzman wrote:
I disagree. A color space is a tuple of numbers. It is concerned with data and layout as far as I can see. That is the very definition of "tuple".
That's a color (or pixel value).
A color space provides the semantics of these numbers, possibly including how they are converted to other color spaces. This may include its basis, CIE primaries, transfer function, gamut, etc. (For more information, here's a good starting point: http://www.poynton.com/notes/colour_and_gamma/ColorFAQ.html )
A pixel format establishes the mapping between channels of an image (i.e. color, pixel value, etc.) and components of a color space.
Now I see the confusion. There are overlaps in the definition of colors and color-spaces of course. Did you think they were orthogonal? See: http://tinyurl.com/18r for the definition of color-spaces.
tinyurl mistake again! One more try:
Regards,
That very same wikipedia article may be used to explain why Matt and I are against using the term "color space" for the GIL element under discussion. If you read it carefully you can see that the "tuple of numbers" is a "color model", not a "color space" (a mapping function binds the model with the space) Reading further down you see that it refers to "RGB", "CYMK", etc as color models, not color spaces. So at most I would say that the correct term is color model. It certainly isn't color space and that article shows why quite well. But is not even that. You are probably missing the fact that the "GIL element under discussion", which I suggested not to be documented as a color space, isn't a tuple of numbers at all. It's a tag. That is, the tuple of number which holds the component values is the pixel<> class. Incidentally, if one were to follow that wikipedia article, would conclude that pixel<> (being a tuple of color component values) would have to be renamed color_model<>, or just color<>, since model is implied in computer programs. And that's what I suggested (but I can see that the class is used in certain contexts were calling it color<> is wrong.) The thing that is documented as being a color space is the a tag class that is used inside pixel<>. That tag is just there to bind each tuple element to the corresponding component in a given color model so that the resulting thing, the "tagged pixel" can be considered as an instance of a color model. So IMO that tag class is representing a format. Best Fernando Cacciola

Fernando Cacciola wrote:
Joel de Guzman wrote:
Joel de Guzman wrote:
Matt Gruenke wrote:
Joel de Guzman wrote:
I disagree. A color space is a tuple of numbers. It is concerned with data and layout as far as I can see. That is the very definition of "tuple".
That's a color (or pixel value).
A color space provides the semantics of these numbers, possibly including how they are converted to other color spaces. This may include its basis, CIE primaries, transfer function, gamut, etc. (For more information, here's a good starting point: http://www.poynton.com/notes/colour_and_gamma/ColorFAQ.html )
A pixel format establishes the mapping between channels of an image (i.e. color, pixel value, etc.) and components of a color space. Now I see the confusion. There are overlaps in the definition of
colors and color-spaces of course. Did you think they were orthogonal? See: http://tinyurl.com/18r for the definition of color-spaces.
tinyurl mistake again! One more try:
Regards,
That very same wikipedia article may be used to explain why Matt and I are against using the term "color space" for the GIL element under discussion. If you read it carefully you can see that the "tuple of numbers" is a "color model", not a "color space" (a mapping function binds the model with the space) Reading further down you see that it refers to "RGB", "CYMK", etc as color models, not color spaces.
So at most I would say that the correct term is color model. It certainly isn't color space and that article shows why quite well.
No! Read carefully. Color space is gamut (footprint) + color model. Cheers, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Joel de Guzman wrote:
Fernando Cacciola wrote:
Joel de Guzman wrote:
Joel de Guzman wrote:
Matt Gruenke wrote:
Joel de Guzman wrote:
I disagree. A color space is a tuple of numbers. It is concerned with data and layout as far as I can see. That is the very definition of "tuple".
That's a color (or pixel value).
A color space provides the semantics of these numbers, possibly including how they are converted to other color spaces. This may include its basis, CIE primaries, transfer function, gamut, etc. (For more information, here's a good starting point: http://www.poynton.com/notes/colour_and_gamma/ColorFAQ.html )
A pixel format establishes the mapping between channels of an image (i.e. color, pixel value, etc.) and components of a color space.
Now I see the confusion. There are overlaps in the definition of colors and color-spaces of course. Did you think they were orthogonal? See: http://tinyurl.com/18r for the definition of color-spaces.
tinyurl mistake again! One more try:
Regards,
That very same wikipedia article may be used to explain why Matt and I are against using the term "color space" for the GIL element under discussion. If you read it carefully you can see that the "tuple of numbers" is a "color model", not a "color space" (a mapping function binds the model with the space) Reading further down you see that it refers to "RGB", "CYMK", etc as color models, not color spaces.
So at most I would say that the correct term is color model. It certainly isn't color space and that article shows why quite well.
No!
What no exactly?
Color space is gamut (footprint) + color model.
Right, and this is in contradiction with? (from what I wrote) Anyway, as you know by now my problem was the tag. I wouldn't mind speaking of color spaces in the context of the pixel<> class (which plays the role of a color<> class). Although in that case I would add a side note in the docs explaining that instances of the class represents "points" in the color space rather than the color space itself as a whole (something which could be represented by a class, but isn't in the case of GIL) Fernando Cacciola

Now it's my take about color vs pixel :) I don't understand why pixel is a bad name. For someone in the graphics community, pixel really is the right name. The term pixel stands for picture element, and it seems to me that GIL's pixel is exactly that: an element in a raster image. The notion of pixel format is also well known (have a look at graphics APIs like OpenGL, for example). I do think that GIL has its concepts right, except that it's not true to say that a pixel is always the color at a given location in an image. It could be a depth value, for instance, or a 3D position (in so called geometry images). As for color space and all, according to my readings in FvDFH*, it really has nothing to do with how you pack channels in memory. According to a professor expert in the field in my department, it's actually more abstract than what is found in the book. But I do think that the way they handle the notion of color space in GIL is satisfactory. * J.D. Foley, A. van Dam, S.K. Feiner, J.F. Hughes. Computer Graphics - Principles and Practice. Second Edition. -- François Duranleau LIGUM, Université de Montréal

Hi all, I'm back in the discussion. François Duranleau wrote:
Now it's my take about color vs pixel :)
I don't understand why pixel is a bad name. For someone in the graphics community, pixel really is the right name. The term pixel stands for picture element, and it seems to me that GIL's pixel is exactly that: an element in a raster image.
I disagree. An image is a collection of sampling points with associated sample values. Each sampling point defines a pixel, which is the Voronoi region around the sampling point (i.e. the set of points in the Euclidean plane that is nearest to a particular sampling point). In a square raster, the sampling points are conveniently located at integer coordinates, so that the pixels become squares. Sampling points and pixels are therefore geometric entities (points and regions in the plane), whereas pixel values can essentially live in an arbitrary domain. For convenience, the term pixel is often used as a synonym for sampling point, but it cannot be the pixel value at the same time. Consequently, the right term is pixel_value, and GIL is halfway right, because the associated typedef is already called pixel_value_type. Calling the concept itself pixel_value would just be consequent, IMHO. Parhaps, just using 'value' and 'value_type' would be even simpler (because 'pixel' is implied by the image data structure). Regards Ulli -- PS. I'm working on a detailed review. I hope it is still of useful after the review deadline. ________________________________________________________________ | | | Ullrich Koethe Universitaet Hamburg / University of Hamburg | | FB Informatik / Dept. of Informatics | | AB Kognitive Systeme / Cognitive Systems Group | | | | Phone: +49 (0)40 42883-2573 Vogt-Koelln-Str. 30 | | Fax: +49 (0)40 42883-2572 D - 22527 Hamburg | | Email: u.koethe@computer.org Germany | | koethe@informatik.uni-hamburg.de | | WWW: http://kogs-www.informatik.uni-hamburg.de/~koethe/ | |________________________________________________________________|

Hi Ullrich,
Consequently, the right term is pixel_value, and GIL is halfway right, because the associated typedef is already called pixel_value_type.
FWIW it is precisely that what prompted me to suggest renaming pixel<> to color<>. That class does not neccesarily represent the "element of the picture" that we call pixel. In fact, for a planar image, the class "planar_ref" comes a lot closer to that purpose, and in that context, the class pixel<> is used to capture a deep-copy of that pixel's value.
Calling the concept itself pixel_value would just be consequent,
Since a few people pointed put that the value of a pixel is not neccesarily a color, I like this suggestion. Now, off the pixel/color topic. rom the Quicktime format disccusion: is it possible to supersample (rather than subsample) an image? That is, to take two or more consequtive pixels in a row and pretend is just one pixel? I ask because the lack of uniformity in even/odd pixels in the v210 format could be elegantly solved this way (that is if IIUC the problem). Best Fernando Cacciola

Fernando Cacciola wrote:
Calling the concept itself pixel_value would just be consequent,
Since a few people pointed put that the value of a pixel is not neccesarily a color, I like this suggestion.
Yes, using the term 'color' for a depth or gradient value might be very confusing.
Now, off the pixel/color topic. rom the Quicktime format disccusion: is it possible to supersample (rather than subsample) an image? That is, to take two or more consequtive pixels in a row and pretend is just one pixel?
Regarding terminology: 'supersampling' is commonly understood as creating more sampling points than were originally available, whereas 'subsampling' reduces the number of sampling points (opposite to your use of the term). I think it is possible to do it by means of an appropriate pixel value adapter, but it may be too slow for being of practical value. I'd carefully benchmark it against a solution where the representation is converted into something more convenient only once (and converted back right before storage or display). Regards Ulli -- ________________________________________________________________ | | | Ullrich Koethe Universitaet Hamburg / University of Hamburg | | FB Informatik / Dept. of Informatics | | AB Kognitive Systeme / Cognitive Systems Group | | | | Phone: +49 (0)40 42883-2573 Vogt-Koelln-Str. 30 | | Fax: +49 (0)40 42883-2572 D - 22527 Hamburg | | Email: u.koethe@computer.org Germany | | koethe@informatik.uni-hamburg.de | | WWW: http://kogs-www.informatik.uni-hamburg.de/~koethe/ | |________________________________________________________________|

Hi Ullrich,
Now, off the pixel/color topic. rom the Quicktime format disccusion: is it possible to supersample (rather than subsample) an image? That is, to take two or more consequtive pixels in a row and pretend is just one pixel?
Regarding terminology: 'supersampling' is commonly understood as creating more sampling points than were originally available,
Right, that's my understading of supersampling.
whereas 'subsampling' reduces the number of sampling points (opposite to your use of the term).
GIL's subsample_view skips pixels (takes less sampling points) and I was thinking in doing sort of the opposite: take 2 or more pixels from the source image as if they were multiple samples from the same "point", that is, mapped into just one pixel in the target. That's why I referred to it as supersampling. Best Fernando Cacciola SciSoft http://fcacciola.50webs.com/

Fernando Cacciola wrote:
Now, off the pixel/color topic. rom the Quicktime format disccusion: is it possible to supersample (rather than subsample) an image? That is, to take two or more consequtive pixels in a row and pretend is just one pixel?
You're talking about a subsampling (or decimating) iterator, instead of an interpolating iterator? I believe the code structures necessary for both of these tasks are largely identical - once you've worked out how to do one, the other should be quite straight forward. However, for most purposes where this is desirable, it is probably more efficient to write a routine that resamples entire rows and columns at a time. On the other hand, if you care more about conserving memory than performance, interpolating iterators may be a good solution for image scaling/zooming, and are not limited to images with non-uniform sample structures.
I ask because the lack of uniformity in even/odd pixels in the v210 format could be elegantly solved this way (that is if IIUC the problem).
You'd be discarding information, though. The reason chroma has half the sampling frequency, in this case, is because it has been band-limited to half the bandwidth of luma (a perceptual optimization). If you decimate luma, you're going to loose information (given no assumptions about the input data). So, it's fails as a simplification, because it's not equivalent. Matt

Matt Gruenke wrote:
Fernando Cacciola wrote:
Now, off the pixel/color topic. rom the Quicktime format disccusion: is it possible to supersample (rather than subsample) an image? That is, to take two or more consequtive pixels in a row and pretend is just one pixel?
You're talking about a subsampling (or decimating) iterator,
No, but I can see how the way I expressed it looked like that.
instead- of an interpolating iterator?
Exactly. The idea was not to skip even or odd pixels (which subsampling does) but to combine them. The actual N-1 mapping would be defined by the iterator: it could interpolate, add and clobber, concatenate or whatever makes sense.
I believe the code structures necessary for both of these tasks are largely identical - once you've worked out how to do one, the other should be quite straight forward.
However, for most purposes where this is desirable, it is probably more efficient to write a routine that resamples entire rows and columns at a time.
Which is what Ulrich said. It would be interesting the benchmark it though.
You'd be discarding information, though. The reason chroma has half the sampling frequency, in this case, is because it has been band-limited to half the bandwidth of luma (a perceptual optimization). If you decimate luma, you're going to loose information (given no assumptions about the input data). So, it's fails as a simplification, because it's not equivalent.
The above corresponds to a subsampling/decimating iterator right? I was referring to "combining" iterator. Would that work? Best -- Fernando Cacciola SciSoft http://fcacciola.50webs.com/

Fernando Cacciola wrote:
Matt Gruenke wrote:
Fernando Cacciola wrote:
Now, off the pixel/color topic. rom the Quicktime format disccusion: is it possible to supersample (rather than subsample) an image? That is, to take two or more consequtive pixels in a row and pretend is just one pixel?
You're talking about a subsampling (or decimating) iterator,
No, but I can see how the way I expressed it looked like that.
instead- of an interpolating iterator?
Exactly. The idea was not to skip even or odd pixels (which subsampling does) but to combine them.
You mean like a 4:2:2 iterator, which contains two consecutive luma samples and a chroma sample (pair)? Perhaps you could generalize this into a multisample iterator, and provided a means of getting the image dimensions in terms of multisample steps. However, I'm not sure how useful any of that would be.
However, for most purposes where this is desirable, it is probably more efficient to write a routine that resamples entire rows and columns at a time.
Which is what Ulrich said. It would be interesting the benchmark it though.
It would probably be slower, because it would involve redundant channel unpacking (certainly, if you're using a large reconstruction kernel) and probably redundant loads of reconstruction kernel coefficients. Matt

Fernando Cacciola wrote:
But is not even that. You are probably missing the fact that the "GIL element under discussion", which I suggested not to be documented as a color space, isn't a tuple of numbers at all. It's a tag. That is, the tuple of number which holds the component values is the pixel<> class. Incidentally, if one were to follow that wikipedia article, would conclude that pixel<> (being a tuple of color component values) would have to be renamed color_model<>, or just color<>, since model is implied in computer programs. And that's what I suggested (but I can see that the class is used in certain contexts were calling it color<> is wrong.)
The thing that is documented as being a color space is the a tag class that is used inside pixel<>. That tag is just there to bind each tuple element to the corresponding component in a given color model so that the resulting thing, the "tagged pixel" can be considered as an instance of a color model. So IMO that tag class is representing a format.
Ok, those are good points (except for the bit where you are disconnecting color-space from color-model). I must admit, I missed that _tag_ thing. Cheers, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Matt Gruenke wrote:
Joel de Guzman wrote:
I agree. And to me "pixel-format" is unknown, confusing and inaccurate. The worst of the worst. It's like calling a std::vector a data-container just because the std::vector does not quite follow the "classical" term of vector. Let's not be too pedantic.
I'm not sure that's a good analogy. In what ways are you suggesting vector an inaccurate description of that container?
I do not want to digress but look up the definition of "vector" in a dictionary and you'll see _overloaded_ meanings, and those meanings were invented before computer science. Ok, I've had enough bike-shed debates. As I said, I'll leave it up to the authors to do the right thing. Cheers, -- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net

Joel de Guzman wrote:
I agree. And to me "pixel-format" is unknown, confusing and inaccurate. The worst of the worst. It's like calling a std::vector a data-container just because the std::vector does not quite follow the "classical" term of vector. Let's not be too pedantic.
FWIW pixel format has at least some precedence. For instance: http://windowssdk.msdn.microsoft.com/en-us/library/ms537569.aspx But I haven't followed the discussion too closely so this may be off point. Thanks, Michael Marcin

Lubomir Bourdev wrote:
Matt Gruenke wrote:
For example, a program or library that handles encoding/decoding of MPEG-4 video (non-studio profiles) has to deal with as many as 6 variants of YUV, 4 transfer functions, and two different scales of sample values (without getting into things like n-bit profile). In addition to that, professional video production systems will also have to deal with a variety of linear, non-linear, and log-scale RGB formats. Add RGBA, and you also have to deal with whether Alpha is premultiplied. Combined with a few different channel orderings and data layouts, I fear the result is such a multiplicity of combinations that the core purpose of GIL's color space construct would be defeated.
Hopefully my description above addresses all these examples. You don't need to create a custom color space for every combination of possibilities. These variations, which are mostly orthogonal to each other, are best addressed in different GIL abstractions, which are also orthogonal, such as custom channels and channel algorithms, custom pixels, pixel references and iterators, custom color conversion objects, views, etc.
Maybe it's just me but I find extending GIL to support something like the v210 Quicktime format quite challenging (I don't want to imply that this is GIL's fault). This is a 10-bit YUV 4:2:2 format which stores 6 pixels in 16 bytes. It appears to me as if trying to support it would touch on a lot of concepts and corners of GIL, as it would require a new pixel storage format, color space, component subsampling, and maybe more. I believe it would help understanding if you could try to give at least a road map of what needs doing to support this properly (a fully coded example would probably require quite some effort). Cheers Stefan

Stefan Heinzmann wrote:
Maybe it's just me but I find extending GIL to support something like the v210 Quicktime format quite challenging (I don't want to imply that this is GIL's fault). This is a 10-bit YUV 4:2:2 format which stores 6 pixels in 16 bytes. It appears to me as if trying to support it would touch on a lot of concepts and corners of GIL, as it would require a new pixel storage format, color space, component subsampling, and maybe more.
I believe it would help understanding if you could try to give at least a road map of what needs doing to support this properly (a fully coded example would probably require quite some effort).
Stefan, This is an excellent example for a very complicated image format. Here is a link that I found that describes it: http://developer.apple.com/quicktime/icefloe/dispatch019.html#v210 Basically, each 16 bytes contain 6 packed Y'CbCr pixels, each channel of which is 10-bits long. Some of the channels are shared between different pixels. Here is a rough plan of how I would approach modeling this in GIL: 1. Provide yCrCb color space 2. Provide a model of sub-byte channel reference whose offset can be specified at run time 3. Create a custom pixel iterator to handle v120 format __________________________ Detail: 1. Provide yCrCb color space (see design guide for detail): struct ycrcb_t { typedef ycrcb_t base; BOOST_STATIC_CONSTANT(int, num_channels=3); }; This defines the common typedefs for pixels, iterators, locators, images, etc: GIL_DEFINE_ALL_TYPEDEFS(8, ycrcb) GIL_DEFINE_ALL_TYPEDEFS(8s, ycrcb) GIL_DEFINE_ALL_TYPEDEFS(16, ycrcb) GIL_DEFINE_ALL_TYPEDEFS(16s,ycrcb) GIL_DEFINE_ALL_TYPEDEFS(32f,ycrcb) GIL_DEFINE_ALL_TYPEDEFS(32s,ycrcb) 2. Create a model of a sub-byte channel reference, whose offset is a dynamic parameter. This is almost identical to class packed_channel_reference from the packed_pixel example: template <typename DataValue, typename ChannelValue, int FirstBit, int NumBits, bool Mutable> class packed_channel_reference; Except that FirstBit is passed at run time and stored inside of it: template <typename DataValue, typename ChannelValue, int NumBits, bool Mutable> class packed_runtime_channel_reference { ... const int _first_bit; }; We now have a model of the 10-bit channel: typedef packed_runtime_channel_reference<uint32_t, uint16_t, 10, true> v120_channel_ref_t; We can use it to define a model of a pixel reference. We can reuse pixel_ref, which is a class that models PixelConcept whose channels are at disjoint places in memory: typedef planar_ref<v120_channel_ref, ycrcb_t> v120_pixel_ref_t; 3. Create a custom pixel iterator, containing a pointer to the first byte in 16-byte block and index to the current pixel in the block: // Models PixelIteratorConcept struct v120_pixel_ptr : public boost::iterator_facade<...> { uint32_t* p; // pointer to the first byte of a 16-byte chunk int index; // which pixel is it currently on? (0..5) typedef v120_pixel_ref_t reference; typedef ycrcb16_pixel_t value_type; void increment(); reference dereference() const; }; Its increment will bump up the index of the pixel, and if it reaches 6, will move the pointer to the next 16 bytes: void v120_pixel_ptr::increment() { if (++index==6) { index=0; p+=4; } } Its dereference will return a reference to the appropriate channels. For example, the fourth pixel uses: For Y': bits [22..31] of the 3rd word For Cb: bits [2 ..11] of the 2nd word For Cr: bits [12..21] of the 3rd word reference v120_pixel_ptr::dereference() const { switch (index) { ... case 4: return reference( v120_channel_ref_t(*(p+3),22), v120_channel_ref_t(*(p+2),2), v120_channel_ref_t(*(p+3),12)); ... } } You can now construct a view from the iterator: typedef type_from_x_iterator<v120_pixel_ptr>::view_t v120_view_t; And you should be able to construct it with common GIL functions: v120_view_t v120_view=interleaved_view(width, height, ptr, row_bytes); You should be able to use this view in algorithms: copy_pixels(v120_view1, v120_view2); Note that it is only compatible with other v120 views. So you cannot copy to/from a regular view, even if it is Y'CbCr type. To do that you will have to write channel conversion and color conversion. Use the packed_pixel.hpp example to see how to do that. Once you do that you should be able to do: copy_and_convert_pixels(v120_view, rgb8_view); copy_and_convert_pixels(rgb8_view, v120_view); or: jpeg_write_view("out.jpg", color_converted_view<rgb8_pixel_t>(v120_view, v120_color_converter)); You should be able to run carefully designed generic algorithms directly on native v120 data. Lubomir

Lubomir Bourdev wrote:
Stefan Heinzmann wrote:
Maybe it's just me but I find extending GIL to support something like the v210 Quicktime format quite challenging (I don't want to imply that this is GIL's fault). This is a 10-bit YUV 4:2:2 format which stores 6 pixels in 16 bytes.
Stefan,
This is an excellent example for a very complicated image format.
It's an example of the sort of thing I mentioned in parts of my review. Though you seemed to focus mostly on the complex packing, I do appreciate your illustration of this case.
Here is a rough plan of how I would approach modeling this in GIL:
Sorry to nitpick, but in case you're adding any of this to GIL...
1. Provide yCrCb color space
I believe there exist component formats with the channel order Y'CrCb. However, when referring to the family of digital component video formats, I think the order Y'CbCr is preferable, as it matches the common (though less precise) Y'UV nomenclature. In the past, the digital video community largely discouraged use of the term Y'UV, since it was only defined for the analog domain. However, having been used pervasively in standards, such as ISO/IEC 14496 (aka MPEG-4), as well as in many implementations, the term Y'UV (or YUV) is here to stay, and should be treated as an imprecise substitute for Y'CbCr (and perhaps a couple other color difference coding formats based on B' - Y' and R' - Y'). So, unless Y'CrCb serves to define a pixel format with that specific ordering, I'd stick with Y'CbCr. The Quicktime docs you cited even define v210 as the "Component Y'CbCr 10-bit 4:2:2" codec.
Its dereference will return a reference to the appropriate channels. For example, the fourth pixel uses: For Y': bits [22..31] of the 3rd word For Cb: bits [2 ..11] of the 2nd word For Cr: bits [12..21] of the 3rd word
Almost. If you just wanted a rough approximation, that would be better than nothing. However, the chroma phase for Y3 is actually half way between C1 and C2. Of course, using standard video interpolation (windowed sinc), you would need to access many chroma samples on either side. This could be used to demonstrate an interpolating iterator. But, if you wanted the easy way out, your example should have used Y0, Y2, or Y4! ;)
You should be able to run carefully designed generic algorithms directly on native v120 data.
Mostly what I'm concerned with, when using non-4:4:4 sampled data, is a descriptive container format. My needs don't extend much beyond applying the proper (and most minimal) conversions, when necessary. Most processing of component video formats (especially those with subsampled chroma) is independently applied to each channel (e.g. resampling, DCT, IDCT, etc.). Because the components are nonlinear quantities, you can't do much else. Even seemingly simple operations - such as adjustments of brightness, contrast, hue, & saturation - are only approximations, in Y'CbCr. Matt

I believe there exist component formats with the channel order Y'CrCb. However, when referring to the family of digital component video formats, I think the order Y'CbCr is preferable, as it matches the common (though less precise) Y'UV nomenclature.
OK. I am obviously not that familiar with video formats. (Also replace v120 with v210 in my post)
Its dereference will return a reference to the appropriate channels. For example, the fourth pixel uses: For Y': bits [22..31] of the 3rd word For Cb: bits [2 ..11] of the 2nd word For Cr: bits [12..21] of the 3rd word
Almost. If you just wanted a rough approximation, that would be better than nothing. However, the chroma phase for Y3 is actually half way between C1 and C2. Of course, using standard video interpolation (windowed sinc), you would need to access many chroma samples on either side.
This could be used to demonstrate an interpolating iterator. But, if you wanted the easy way out, your example should have used Y0, Y2, or Y4! ;)
No, I want the hardest imaginable example. Bring it on!! :)
From your description I can conclude that in v210 every pixel with odd x-coordinate has the CR and CB values defined as the average ones of its logical left and right pixel.
In this case here is how I would change my plan: 1. Drop the requirement that our model of v210 be writable. It is not possible to accurately write into v210 pixel by pixel. Writing requires providing simlutaneously the values of at least a pair of pixels sharing channels. So writing requires providing a GIL algorithm, and cannot be done just by data abstraction. 2. Now that we have writing out of the way, we can create a model of an r-value channel reference (ChannelConcept, but not MutableChannelConcept) that takes two channels and returns a value half-way in between (or if generalization is important, an arbitrary interpolation). Lets call it InterpolatingChannel We obviously want to use a model of InterpolatingChannel, instantiated with two 10-bit channel references, to represent the CR and CB channels of the odd pixels. We don't need it for the even pixels and ideally we would like to use a "simple" 10-bit channel reference there. The problem is that even and odd pixels will have different types. STL-style iterators don't support changing their value type dynamically. So we must use InterpolatingChannel for the CR and CB channels of the even pixels as well. We could simply use the same channel when interpolating. The second design question is, what to do with the type of Y channel. There are two options - we could represent it with InterpolatingChannel or with a "simple" 10-bit channel reference. The second option results in a faster implementation, but requires us to write a heterogeneous equivalent of planar_ref (the model of a pixel whose channels are not together in memory). The pixel iterator model I outlined need not change. In fact, we could make it more generic and used in other contexts. It is a model of a pixel iterator for cases where a sequence of pixels may each have a different memory layout, but the sequence itself repeats. In the case of v210 the sequence is 6 pixels long and the increment to the next sequence is 16 bytes. We could reuse this class to represent a bitmap image. There the sequence is 8 pixels long and the increment to the next sequence is 1 byte. A bitmap image has a grayscale color space. For its channel reference we could use the same class I outlined for the v210 channel reference, except we instantiate it with a bit depth of 1 instead of 10. Obviously all of this abstraction is only necessary so we can use the image formats in generic algorithms. However, it comes with a price in performance - we can do a much better job by processing the entire sequence instead of the pixels one at a time. This could be done by providing performance overloads for specific algorithms. Alternatively, we could represent v210 format (and bitmap) as special single-channel images whose "pixel" corresponds to an entire sequence (6 logical pixels in case of v210 and 8 logical pixels in case of bitmap). That allows us to write some operations simpler and more efficiently, but then such images cannot be used in traditional GIL algorithms that pair 1-to-1 pixels between images, such as copy_and_convert_pixels, transform_pixels, etc. Of course, GIL allows you to use both alternatives - use a single-channel view to your v210 "6-pixel-grayscale" data in cases that makes sense, and use a more traditional view when you need 1-to-1 mapping with regular pixels. Lubomir

Lubomir Bourdev wrote:
Stefan Heinzmann wrote:
Maybe it's just me but I find extending GIL to support something like the v210 Quicktime format quite challenging (I don't want to imply that this is GIL's fault). This is a 10-bit YUV 4:2:2 format which stores 6 pixels in 16 bytes. This is an excellent example for a very complicated image format.
I thought I'd point out that another common sample structure, with similar issues, is the Bayer format. It's a popular optical filtering technique for capturing color images using a single CCD, in digital cameras. It's structured as a checkerboard consisting of green and blue/red samples. It's typically exposed to software via RAW files (vendor-specific files available from most digital still cameras), and is also a format supported by the IIDC 1394-based Digital Camera Specification (both 8 & 16 bits per sample). Matt
participants (8)
-
Fernando Cacciola
-
François Duranleau
-
Joel de Guzman
-
Lubomir Bourdev
-
Matt Gruenke
-
Michael Marcin
-
Stefan Heinzmann
-
Ullrich Koethe