[gil] Ideas to make my extension more generic?

Hi there, in the course of getting more familiar with gil 2.0 I started writing a new extension for downsampling (don't if that's the right term) an source image to its unsigned 8 bit counterpart. You would use such an extension when trying to save an image that is using more than 8 bit per channel, e.g. rgb32_image_t. Some file formats only take unsigned 8bit per channel. Attached there is the code I have so far. But there are still some areas where my extension isn't generic enough or even the algorithm fails due to my own short comings. Here are some questions I have: 1. How do I figure out if the src channel type is signed? If it is I would like to add half the value range to all channels and pixels. 2. When dealing with heterogeneous pixel types how can I initializes them generically? 3. How do I compute the unsigned 8bit image type of the src image type? I'm looking for xxx8_image_t. Thanks ahead. Any ideas or suggestions are welcome. Christian

Christian, First off, I think the semantics of color_convert already handle this. That function maps the min and max value of the source channel type to the min and max value of the target channel type. 1. I think you can use type traits. boost::is_integral<T>::value tells you if it is an integer data, and boost::is_signed tells you whether or not it is -- well, signed. Just #include<boost/type_traits.hpp>. 2. I think you can write a multifunction and pass it into boost::gil::static_for_each. Create a class and make a templatized member function operator()(T1 src, T2& dst) that does the special conversion you want. Use enable_if and disable_if from <boost/utility.hpp> to specialize the member function. 3. the xxx8_image_t is not a real type -- it is a typedef from image<pixel<uint8, xxx_layout_t>, false>. John Femiani. -----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Christian Henning Sent: Monday, April 23, 2007 8:38 AM To: boost@lists.boost.org Subject: [boost] [gil] Ideas to make my extension more generic? Hi there, in the course of getting more familiar with gil 2.0 I started writing a new extension for downsampling (don't if that's the right term) an source image to its unsigned 8 bit counterpart. You would use such an extension when trying to save an image that is using more than 8 bit per channel, e.g. rgb32_image_t. Some file formats only take unsigned 8bit per channel. Attached there is the code I have so far. But there are still some areas where my extension isn't generic enough or even the algorithm fails due to my own short comings. Here are some questions I have: 1. How do I figure out if the src channel type is signed? If it is I would like to add half the value range to all channels and pixels. 2. When dealing with heterogeneous pixel types how can I initializes them generically? 3. How do I compute the unsigned 8bit image type of the src image type? I'm looking for xxx8_image_t. Thanks ahead. Any ideas or suggestions are welcome. Christian

Thanks John, below my thoughts. On 4/23/07, John Femiani <JOHN.FEMIANI@asu.edu> wrote:
Christian,
First off, I think the semantics of color_convert already handle this. That function maps the min and max value of the source channel type to the min and max value of the target channel type.
When converting an image there is no guarantee for example that the red channel has values that are the actually min or max. Maybe the min is 92 or the max is 21000 for a rgb16_pixel_t. Thats why I explicitly look for the min and max values.
1. I think you can use type traits. boost::is_integral<T>::value tells you if it is an integer data, and boost::is_signed tells you whether or not it is -- well, signed. Just #include<boost/type_traits.hpp>.
Awesome, that works. typedef channel_type<rgb8_pixel_t>::type unsigned_t; typedef channel_type<rgb8s_pixel_t>::type signed_t; // prints 0 cout << is_signed<unsigned_t>::value << endl; // prints 1 cout << is_signed<signed_t>::value << endl;
2. I think you can write a multifunction and pass it into boost::gil::static_for_each. Create a class and make a templatized member function operator()(T1 src, T2& dst) that does the special conversion you want. Use enable_if and disable_if from <boost/utility.hpp> to specialize the member function.
What do you mean with multifunction? Do you have an example somewhere? I don't understand.
3. the xxx8_image_t is not a real type -- it is a typedef from image<pixel<uint8, xxx_layout_t>, false>.
How do you get the layout type of a view? Again, thanks a lot for the answers, Christian

Christian,
2. I think you can write a multifunction and pass it into boost::gil::static_for_each. Create a class and make a templatized member function operator()(T1 src, T2& dst) that does the special conversion you want. Use enable_if and disable_if from <boost/utility.hpp> to specialize the member function.
What do you mean with multifunction? Do you have an example somewhere? I don't understand.
I may have misused the word; I just meant a class with a template operator(). John.

Christian Henning wrote:
Hi there, in the course of getting more familiar with gil 2.0 I started writing a new extension for downsampling (don't if that's the right term) an source image to its unsigned 8 bit counterpart.
Hi Christian, It is great to see someone take a stab at providing more GIL algorithms! You are right that downsample is not the best name for this operation. This term is used for a different operation. You are scaling up the histogram to the full range, so maybe scale_histogram or spread_histogram is better.
You would use such an extension when trying to save an image that is using more than 8 bit per channel, e.g. rgb32_image_t. Some file formats only take unsigned 8bit per channel.
While this algorithm is useful for increasing the contrast of the image, I wouldn't use it as the default when saving higher bit depth images to 8-bit formats. If you want to preserve as much as possible the original look of the image I suggest using color_converted_view to a form supported by the I/O format.
1. How do I figure out if the src channel type is signed? If it is I would like to add half the value range to all channels and pixels.
As suggested, you may want to use channel_convert. It linearly maps the range of the source into the destination. It will add half the range when going from signed to the corresponding unsigned integral channel.
2. When dealing with heterogeneous pixel types how can I initializes them generically?
It is hard to write code that supports heterogeneous pixels. Since each channel may be of a different type you cannot create explicit loop. The only way to loop is via static recursion (see static_for_each as an example). You may use GIL's static_xxx algorithms or write your own compile-time recursion.
3. How do I compute the unsigned 8bit image type of the src image
type?
I'm looking for xxx8_image_t.
If image_in_t is your input image type: typedef typename color_space_type<image_in_t>::type cs_t; typedef typename channel_mapping_type<image_in_t>::type cnm_t; typedef pixel<bits8, layout<cs_t,cnm_t> > pixel_out_t; typedef image<pixel_out_t, is_planar<image_in_t>::type::value> image_out_t; But in your case you probably want to operate on views, not images. Also you may want to leave the out channel type as a template parameter instead of hard-coding it to 8-bit unsigned. Other suggestions regarding your code: 1. A better histogram spread computes the image histogram and uses the top/bottom N% value as the min/max value, instead of the absolute minimum and maximum. This is more robust to noise. 2. Consider taking out the "min pixel value" and "max pixel value" as separate stand-alone algorithms. They could be used in other contexts. 3. You might be able to use std::min_element, std::max_element, boost::minmax_element coupled with nth_channel_view to find the min/max value for each channel in the image. It is simpler though it might be slower for interleaved images because we will be revisiting the pixels of the image K times (K == number of channels). 4. Never take a non-constant reference to a view. Image views are always constant. Mutability is part of their type.(line 65) 5. Don't assume that the reference type is value_type& (lines 91,120,121). In fact, this doesn't hold true for planar images. Use View::reference or View::const_reference as the return type of dereferencing the view iterators. 6. I would suggest making the transformation dst_max * (src - min) / (max - min) as a per-channel function object and using static_for_each to apply it to each channel. This is not only faster but will work for heterogeneous channels. Another advantage is that performance specializations can be provided for pairs of source/destination channels. You can then wrap it into a function object per pixel and use gil::for_each_pixel Lubomir

Hi Lubomir, thanks a lot for the tips. It took me a couple of days to implement them. Please read my comments below.
This term is used for a different operation. You are scaling up the histogram to the full range, so maybe scale_histogram or spread_histogram is better. I cannot quite follow you here. Could you elaborate on what you mean with scaling up the histogram to the full range?
While this algorithm is useful for increasing the contrast of the image, I wouldn't use it as the default when saving higher bit depth images to 8-bit formats. If you want to preserve as much as possible the original look of the image I suggest using color_converted_view to a form supported by the I/O format. Again, useful for increasing the contrast? Seems my limited knowledge is getting back on me, here. Could you also please describe what you mean.
Other suggestions regarding your code:
1. A better histogram spread computes the image histogram and uses the top/bottom N% value as the min/max value, instead of the absolute minimum and maximum. This is more robust to noise.
How would you implement it? I started my own implementation that would create a vector<channel_t> for each channel. But I was running into problems all over. I tried creating a mpl::vector that contains a std::vector<channel_t> for each channel using the mpl::transform1 algorithm. That works flawlessly. For a rgb8_view_t. It would create something like: mpl::vector< std::vector<uint8>, std::vector<uint8>, std::vector<uint8> > . template <typename T> struct add_vector { typedef std::vector<T> type; }; template< class PIXEL , class VIEW > inline PIXEL min_channel_values( const VIEW& view , const size_t percent = 1 ) { typedef mpl::transform1< color_space_type<VIEW>::type, add_vector<mpl::_1> >::type channels_t; return PIXEL(); } Right now, I don't understand how to create such an object where all vectors are resized to the correct max values. Do you have any ideas on how to do that? Having this structure in order it's easy to accumulate the channel values for the whole view and to find the N% values. Can you follow my design?
2. Consider taking out the "min pixel value" and "max pixel value" as separate stand-alone algorithms. They could be used in other contexts.
Done. I have one for the min, max and both values. Same will be done for the N% ones.
3. You might be able to use std::min_element, std::max_element, boost::minmax_element coupled with nth_channel_view to find the min/max value for each channel in the image. It is simpler though it might be slower for interleaved images because we will be revisiting the pixels of the image K times (K == number of channels).
I like my current implementation and it works for planar and interleaved images. For example for the min channel value I have: struct set_to_max { template <typename CHANNEL > void operator()(CHANNEL& c) const { c = channel_traits< CHANNEL >::max_value(); } }; template< class PIXEL , class VIEW > inline PIXEL min_channel_values( const VIEW& view ) { PIXEL min; // initialize the min pixel with the max values static_for_each( min, set_to_max() ); for( int y=0; y < view.height(); ++y ) { VIEW::x_iterator x_it = view.row_begin( y ); for( int x = 0; x < view.width(); ++x ) { typename PIXEL::const_reference p = x_it[x]; for( int i = 0; i < num_channels<PIXEL>::type::value; ++i ) { if( dynamic_at_c( p, i ) < dynamic_at_c( min, i )) { dynamic_at_c( min, i ) = dynamic_at_c( p, i ); } } } } return min; }
4. Never take a non-constant reference to a view. Image views are always constant. Mutability is part of their type.(line 65)
done
5. Don't assume that the reference type is value_type& (lines 91,120,121). In fact, this doesn't hold true for planar images. Use View::reference or View::const_reference as the return type of dereferencing the view iterators.
done
6. I would suggest making the transformation dst_max * (src - min) / (max - min) as a per-channel function object and using static_for_each to apply it to each channel. This is not only faster but will work for heterogeneous channels. Another advantage is that performance specializations can be provided for pairs of source/destination channels.
You can then wrap it into a function object per pixel and use gil::for_each_pixel
I tried it but static_for_each only supports up to three pixels as parameters. But I think I need four. One for the current src pixel, current dst, src_min, and src_diff. Here is what I have: // channel_wise calculation. // @todo need better name template< typename DST_MAX > struct foo { foo( const DST_MAX& dst_max ) : _dst_max( dst_max ){} template < typename DST_CHANNEL , typename SRC_CHANNEL > void operator()( DST_CHANNEL& dst , SRC_CHANNEL& src , SRC_CHANNEL& min , SRC_CHANNEL& diff ) const { if( diff == 0 ) { dst = 0; return; } float d = ( static_cast<float>( _dst_max ) * ( ( static_cast<float>( src ) - static_cast<float>( min )) / static_cast<float>( diff )))); dst_channel = static_cast<DST_CHANNEL>( dst ); } DST_MAX _dst_max; }; // pixel_wise calculation. // @todo need better name template< typename SRC_VIEW , typename SRC_PIXEL , typename DST_VIEW , typename DST_MAX > struct do_it { do_it( const DST_VIEW& dst_view , const DST_MAX& dst_max , const SRC_PIXEL& min , const SRC_PIXEL& diff ) : _dst_view( dst_view ) , _min( min ) , _diff( diff ) , _op( dst_max ) { _dst_it = dst_view.begin(); } void operator()( SRC_PIXEL& src ) { static_for_each( *_dst_it , src , _min , _diff , _op ); ++_dst_it; } typename DST_VIEW::iterator _dst_it; foo<DST_MAX> _op; DST_VIEW _dst_view; SRC_PIXEL _min; SRC_PIXEL _diff; }; int main(int argc, char* argv[]) { bits8 max = 255; rgb16_pixel_t min; rgb16_pixel_t diff; rgb16_image_t src( 640, 480 ); rgb8_image_t dst( 640, 480 ); do_it< rgb16_view_t , rgb16_pixel_t , rgb8_view_t , bits8 > d( view( dst ), max, min, diff ); for_each_pixel( view( src ), d ); bmp_write_view( ".\\red.bmp", view( dst )); } Thank you such much for your time. Christian

On 4/26/07 12:33 PM, "Christian Henning" <chhenning@gmail.com> wrote:
I cannot quite follow you here. Could you elaborate on what you mean with scaling up the histogram to the full range?
Suppose you have a 16-bit grayscale image that is dark. Most of the colors will be close to black. Let's say they vary from 0/65535 to 300/65535. Your algorithm will scale their values so that 300/65535 will become 65535/65535, i.e. fully bright. Instead of using just 300 possible values, the image will use the full range. I suspect you scale the histogram so that you can decrease the loss of precision when converting the values to 8-bit range (because they now map to the full 0..255 range, instead of just the first few values).
While this algorithm is useful for increasing the contrast of the image, I wouldn't use it as the default when saving higher bit depth images to 8-bit formats. If you want to preserve as much as possible the original look of the image I suggest using color_converted_view to a form supported by the I/O format. Again, useful for increasing the contrast? Seems my limited knowledge is getting back on me, here. Could you also please describe what you mean.
Because the values cover the full range 0..65535, the distance between two neighbor intensities is larger and thus the contrast of the image increases. Unfortunately one side effect is that the brightness of the image changes. So your very dark image may look a lot brighter after the transformation. This is why I suggested that by default we use color_converted_view, which will preserve the original brightness and contrast of the image.
Other suggestions regarding your code:
1. A better histogram spread computes the image histogram and uses the top/bottom N% value as the min/max value, instead of the absolute minimum and maximum. This is more robust to noise.
How would you implement it? I started my own implementation that would create a vector<channel_t> for each channel. But I was running into problems all over.
I haven't thought much about this, but this is k-order statistic (in an array of numbers, find the K-th smallest one). There is an algorithm to do this in O(N). This would be rather useful, perhaps someone should put it in Boost?
I tried creating a mpl::vector that contains a std::vector<channel_t> for each channel using the mpl::transform1 algorithm. That works flawlessly. For a rgb8_view_t. It would create something like: mpl::vector< std::vector<uint8>, std::vector<uint8>, std::vector<uint8> > .
template <typename T> struct add_vector { typedef std::vector<T> type; };
template< class PIXEL , class VIEW > inline PIXEL min_channel_values( const VIEW& view , const size_t percent = 1 ) { typedef mpl::transform1< color_space_type<VIEW>::type, add_vector<mpl::_1> >::type channels_t;
return PIXEL(); }
Right now, I don't understand how to create such an object where all vectors are resized to the correct max values. Do you have any ideas on how to do that?
My suggestion is to ignore the fact that there are multiple channels in a pixel and just white the algorithm to operate on a grayscale image. It is then trivial to extend it by using nth_channel_view.
You can then wrap it into a function object per pixel and use gil::for_each_pixel
I tried it but static_for_each only supports up to three pixels as parameters. But I think I need four. One for the current src pixel, current dst, src_min, and src_diff.
You don't need four. min and diff don't change per channel, so you can keep them in the state of your function object. Your algorithm makes most sense for grayscale images, and will do reasonable job for RGB as well. However, in general, for many algorithms you don't want to treat the channels of a pixel as a simple vector of channels. In particular, for color spaces whose basis vectors are not orthogonal (like CMYK) such linear interpolation per channel will produce undesired effects. I suggest that you make your algorithm operate on grayscale images for now. Ideally it makes sense to do the algorithm on the luminosity and leave the hue alone. Lubomir

Lubomir, Regarding your comment:
I haven't thought much about this, but this is k-order statistic (in an array of numbers, find the K-th smallest one). There is an algorithm to do this in O(N). This would be rather useful, perhaps someone should put it in Boost?
It is part of the stl already: std::nth_element in the <algorithm> header. Also,
My suggestion is to ignore the fact that there are multiple channels in a pixel and just white the algorithm to operate on a grayscale image. It is then trivial to extend it by using nth_channel_view.
But nth_channel_view does not work for heterogeneous pixels, runtime specification of the channel requires that all channels be of the same type. As you said in an earlier post 'n' means runtime, 'k' means compile time. This is exactly the type of issue I have been having when I try to write code that is generic enough to apply to heterogeneous pixels. Unfortunately I have started writing code that simply precludes heterogeneous pixels (using nth_channel_view or operator[] to loop through channels) because I was spending too much time trying to write functors for every operation I needed to do on an image. I posted a message on the sourceforge discussion and I will summarize it here: I saw the recent posting on fixed point (or scaled value) types in the works by Phil Endecott, and apparently another version by Maurizio Vitale, and I was wondering whether GIL folks thought this could be useful for describing individual samples in gil. Instead of using bits8, one could define a pixel in terms of scaled<uint8_t, 255>, or scaled<uint16_t, 255>, etc. Operations on the pixels can deduce the return type for each channel using the methods applied by the fixed point authors. This way each channel, regardless of its type, should produce expected results for an expression without all the tweaking required when dealing with integer types directly. Any operation on a pixel could be applied to each of its channels, like in valarray, and code that uses operations on pixels is automatically generic wrt heterogeneous layouts as long as all channels have similar semantic meanings. John

John, On 4/29/07 12:41 AM, "John Femiani" <JOHN.FEMIANI@asu.edu> wrote:
Lubomir, Regarding your comment:
I haven't thought much about this, but this is k-order statistic (in an array of numbers, find the K-th smallest one). There is an algorithm to do this in O(N). This would be rather useful, perhaps someone should put it in Boost?
It is part of the stl already: std::nth_element in the <algorithm> header.
My understanding is that std::nth_element partially sorts the range, suggesting O(NlogN) running time. There are algorithms to return the n-th smallest element in an unordered list in O(N).
My suggestion is to ignore the fact that there are multiple channels in a pixel and just white the algorithm to operate on a grayscale image. It is then trivial to extend it by using nth_channel_view.
But nth_channel_view does not work for heterogeneous pixels, runtime specification of the channel requires that all channels be of the same type. As you said in an earlier post 'n' means runtime, 'k' means compile time.
True. But it should be easy to create kth_channel_view, in which the channel index is specified at compile time. And this should work for heterogeneous pixels.
This is exactly the type of issue I have been having when I try to write code that is generic enough to apply to heterogeneous pixels. Unfortunately I have started writing code that simply precludes heterogeneous pixels (using nth_channel_view or operator[] to loop through channels) because I was spending too much time trying to write functors for every operation I needed to do on an image.
I agree it is hard.
I posted a message on the sourceforge discussion and I will summarize it here:
I saw the recent posting on fixed point (or scaled value) types in the works by Phil Endecott, and apparently another version by Maurizio Vitale, and I was wondering whether GIL folks thought this could be useful for describing individual samples in gil. Instead of using bits8, one could define a pixel in terms of scaled<uint8_t, 255>, or scaled<uint16_t, 255>, etc. Operations on the pixels can deduce the return type for each channel using the methods applied by the fixed point authors. This way each channel, regardless of its type, should produce expected results for an expression without all the tweaking required when dealing with integer types directly. Any operation on a pixel could be applied to each of its channels, like in valarray, and code that uses operations on pixels is automatically generic wrt heterogeneous layouts as long as all channels have similar semantic meanings.
I just posted my reply to the sourceforge thread: http://sourceforge.net/forum/message.php?msg_id=4286563 To summarize it, nothing stops you from using Phil's fixed point numbers as GIL channels. However, I don't think there is a generic way to perform arithmetic operations on arbitrary types with the same performance as hand-writing the operation for a given type. In my opinion, this is best handled by providing a set of atomic operations that have overloads or specializations that are tuned to the specific types involved. Lubomir

Lubomir Bourdev wrote:
John,
On 4/29/07 12:41 AM, "John Femiani" <JOHN.FEMIANI@asu.edu> wrote:
Lubomir, Regarding your comment:
I haven't thought much about this, but this is k-order statistic (in an array of numbers, find the K-th smallest one). There is an algorithm to do this in O(N). This would be rather useful, perhaps someone should put it in Boost?
It is part of the stl already: std::nth_element in the <algorithm> header.
My understanding is that std::nth_element partially sorts the range, suggesting O(NlogN) running time. There are algorithms to return the n-th smallest element in an unordered list in O(N).
std::nth_element has (average) linear complexity.
participants (4)
-
Christian Henning
-
John Femiani
-
Lubomir Bourdev
-
Peter Dimov