Re: [boost] Stacking iterators vs. dataflow

3 Sep 2008

      Mathias Gaunard wrote:
...
...
a framework that allowed buffering of "sensible size" chunks and
potentially distributed the work between threads could be a good solution.
If you perform n transformations, adaptors will give you loop fusion for
free.
Maybe, subject to the fusion of all the levels' termination tests; I 
think this is what Dave has been talking about but I'm not 
knowledgeable about the area.
...
That kind of optimization seems more interesting to me than work
distribution and buffering.
You're lucky if you get to work on "interesting" things, rather than 
"important" things :-)

Here's a practical example:

cat email_with_attached_picture | decode_base64 | decode_jpeg | 
resize_image > /dev/framebuffer

How can I convert that shell pipeline into C++?  Naive approach:

vector<byte> a = read_file("/path/to/email");
vector<byte> b = decode_base64(a);
vector<byte> c = decode_jpeg(b);
vector<byte> d = resize_image(c);
write_file("/dev/framebuffer",d);

The problem with that is that I don't start to decode anything until 
I've read in the whole of the input.  The system would be perceptibly 
faster if the decoding could start as soon as the first data were available.

So I can use some sort of iterator adaptor stack or dataflow graph to 
process the data piece at a time.  But it's important that I process it 
in pieces of the right size.  Base64 encoding converts 6 input bytes 
into 4 output bytes, but it would be a bad idea to read the data from 
the file 6 bytes at a time; we should probably ask for BUFSZ bytes.  
libjpeg works in terms of lines, and you can ask it (at runtime, after 
it has read the file header) how many lines it suggests processing at a 
time (it's probably the height of the DCT blocks in the image).  
Obviously that corresponds to a variable number of bytes in the input.

I would love to see how readers would approach this problem using the 
various existing and proposed libraries.

Regards,  Phil.

Re: [boost] Stacking iterators vs. dataflow

Phil Endecott