
I decided not to use iterators because I concluded that it would miss the opportunity for many important optimizations that can be made when one is presented with a contiguous buffer full of characters instead of one character at a time. For example, I don't think zlib is very efficient if you pass it one character at a time.
Still ...
I wish I had looked at the Dataflow Iterators in your library when I first read your message! I put it on my mental 'to do' list then promptly forgot. It looks like Dataflow Iterators and input/output filters are intended to solve exactly the same sort of problems in many cases. So there are these obvious questions:
I didn't want to make a big issue of this because it amounts to comparing what currently the sketch of an idea (my dataflow iterators) to a fully developed and implemented idea (your iostreams). It's an unfair (to you) and often misleading type of comparison. So I see your library getting accepted more or less as it is because it works, well documented and addresses a real and pressing need. Dataflow iterators really became practical with Dave's new iterator façade/adaptor library - with its very formal definition. But, in spite of the fact that they have been used successfully in the serialization package for doing some of the things your library does, Dataflow iterators are a little too fragile for general use. Someday you or someone might want to re-implement some of your library with dataflow iterators - or not. At that point it would become an implementation issue. I'm very pleased that someone has seen the appeal of this idea. I sort of feel that I'm out there by myself with my ideas. I think its because I'm getting old. Anyway.
1. Where the functionality overlaps, which approach is faster?
One of the prime motivations for the dataflow iterators is gain absolutely maximum speed. This should occur by collapsing of composed inline functions by C++ optimizing compilers. My limited investigation into this convinces me that this occurs as I would hope.
2. What can your approach do that mine can't? 3. What can my approach do that yours can't?
Dataflow iterators is basically a compile-time composition concept while your method is basically a run-time composition concept. There in lies the difference Compile time composition a) compilers can collapse inline code for maximum speed b) each composition of filters is recompiled - can imply replicated almost the same code. Code bloat possible c) new chains can't be made at runtime under program control - its too late to recompile. Runtime composition a) filter chain can be built as needed under program control. b) each filter is compiled only once and reused. No code bloat c) no opportunity for "inter-filter" optimization - the code is compiled before the filters are composed.
4. If your approach is better all around, or in a broad class of cases, can I adapt my library to use it? (I think so -- the pipe notation, which you don't like, can give the compiler a clue which filters should be fused at compile time.)
My main complaint with the pipe notation is that, is doesn't permit the following idea to be expressed without destroying its original elegance: Encrypt( Cat( Stream 1, Stream 2 ) ) That is, that you have to break it (the pipe idea) in order to save it. Also I believe that implementing this at compile time by overloading operators would be a lot of work for mere syntactic sugar. In fact this hides the operation of your chaining scheme rather than keeping it transparent. This is the same reason I like to see the streambuf/stream clarified rather than hidden. Hiding is often a good idea - unix command line pipes - but I don't think its a great idea for programmer libraries. I want sharp tools whose application and usage are obvious so that I can compose them into what I need. I don't want a wrapped up solution. So I like a smaller library (e.g. streambuf) with examples (for cut and paste) on how to use it. I also think it's easier to document. I don't think you mess with your library in a major way. Separating out streams/buffers is useful conceptually but that's really mostly a matter of documentation. Basically you should finish what you started, let people start using it then at your leisure if you have the inclination look in this idea. 5. If my approach is better, can your library use it? (BTW, How do you use the Dataflow Iterators in your library?) My use case for this functionality is the following I want a zlib compression filter and an encryption filter. I want to be able to use them in a stream either as compress | encrypt or encrypt | compress. Your library can do that. So the problem is solved in the short term. Its interesting how my thinking has evolved on this. First I thought what I need was a stream adaptors to adapt a stream using filters. After I became more familiear with streams and streambufs I realized that this functionality didn't belong in streams but in streambufs - exactly where you put it. I was much encouraged by your efforts and approach - and Jeff Garland's previous accomplishment of adding zlib compression to a streambuf. Alas, I've now come to understand where codecvt facet fits in here. My current thinking - its just speculation at this point - is composition of codecvt facets using dataflow iterators. This would result in: a) a codecvt_facade - template for making a codecvt class given a dataflow iterator chain at compile time. b) dataflow iterators for encrypt, compress, base64, wchar_t to mbchar, mbchar to wchar_t, wchar_t to utf-8 etc. c) a method - codecvt_adaptor ? - for composition of arbitray codecvt facets. This would permit us to use other existing pre-compiled facets like translating to Chinese mult-byte characterset. So given the above, we could make a custom codecvt facet which would take the output, compress it, encrypt it, convert to base64, convert to Chinese character set with one compile time statement and use that facet with any stream. Except for the Chinese part which used a pre-compiled element, the compiler would optimize away all redundant copying. Naturally this codecvt facet on steroids could be used with any streambuf without alteration. (or even recompilation). So the "final division of effort would be" Streams - rendering types like int, float, etc as char or wchar strings. Streambuf - buffering Codecvt_facet - filtering/transformation Bear in mind that this glosses over details like the fact that there's a lot of annoying little differences in standard libraries in codecvt_facet interface, function prototypes, etc. GCC used 4 by wchar_t, window uses 2 byte wchar_t, endien issues, on and on and on. I've had problems with several libraries that I don't believe handle codecvt facets properly. Just using the utf-8 facet written by Ron Garcia - (nice job - no errors) has been a trial due to these issues.
I'm going to try to re-implement some of your iterators as Filters, and see what happens.
Feel free to experiment at your leisure. But don't get distracted. Starting something is easy - finishing something is very hard. You're in a position to get something of substance actually finished and in the hands of real users. Don't lose focus on that. In the serialization library - composition of dataflow iterators is used to implement the following: Escaping/unescaping html text Binary <-> base64 translation Wstring <-> string translation Maybe others - I don't remember The dataflow concept was once and object of research interest as a fundamental programming paradigm. It's somewhat to related to functional programming in this regard. My dataflow iterator implementation is somewhat similar to the FP package reviewed earlier this year in that it implements a different programming paradigm using the facilities of C++ templates. However, its much less ambitious and much more focused on immediate C++ problems rather producing a complete implementation of another programming paradigm. One thing is clear to me. We're going to see lots of new things in the future of programming. Much more than people think. Robert Ramey

"Robert Ramey" <ramey@rrsd.com> wrote in message
I wish I had looked at the Dataflow Iterators in your library when I first read your message! I put it on my mental 'to do' list then promptly forgot. It looks like Dataflow Iterators and input/output filters are intended to solve exactly the same sort of problems in many cases. So there are these obvious questions:
I didn't want to make a big issue of this because it amounts to comparing what currently the sketch of an idea (my dataflow iterators) to a fully developed and implemented idea (your iostreams). It's an unfair (to you) and often misleading type of comparison. So I see your library getting accepted more or less as it is because it works, well documented and addresses a real and pressing need. Dataflow iterators really became practical with Dave's new iterator façade/adaptor library - with its very formal definition. But, in spite of the fact that they have been used successfully in the serialization package for doing some of the things your library does, Dataflow iterators are a little too fragile for general use. Someday you or someone might want to re-implement some of your library with dataflow iterators - or not. At that point it would become an implementation issue.
I didn't mean to suggested that I was considering a redesign of my library to use iterators. There are several reasons I chose not to do this, some of which I mentioned already: 1. Many operations are more efficient when they can act on an array of characters at a time 2. I didn't want user defined components to have to manage the lifetime the immediately downstream component, since there are some tricky issues in gerenal. I prefer to pass the downstream component as an argument each time it is needed: filter.write(resource, s, n) Obviously this doesn't work for iterators. 3. When the filter and resource types are composed at compile time, it's difficult to decide what the constructor arguments should be. I see that with some of your iterators, you solve this problem using additional template parameters; this won't work for me, since constructors sometimes have to take strings, random number generators, etc. ((3) is solved easily using the pipe syntax: filtering_stream out( newline_inserter(7) | gzip_compressor(gzip::best_speed) | file("output") ); ) The reason I'm excited about your Dataflow Iterators is that they give me a chance to test my intuitions about the performance implications of composing filters at compile time. Until now, I conjectured that compile-time composition wouldn't make much difference. If I implement some of your filtering algorithms in my framework, I can run a head-to-head test. BTW, you say the comparison is unfair to me, but it might be the other way around since I've already spent a considerable amount of time on optimizations. If your filters are nonetheless faster, that shows me I need to consider compile-time composition.
I'm very pleased that someone has seen the appeal of this idea. I sort of feel that I'm out there by myself with my ideas. I think its because I'm getting old. Anyway.
1. Where the functionality overlaps, which approach is faster?
One of the prime motivations for the dataflow iterators is gain absolutely maximum speed. This should occur by collapsing of composed inline functions by C++ optimizing compilers. My limited investigation into this convinces me that this occurs as I would hope.
What other approaches did you try?
2. What can your approach do that mine can't? 3. What can my approach do that yours can't?
Dataflow iterators is basically a compile-time composition concept while your method is basically a run-time composition concept. There in lies the difference
<snip runtime/compile-time comparsion> I was thinking iterators might allow additional flexibility since the value types don't have to match exactly.
4. If your approach is better all around, or in a broad class of cases, can I adapt my library to use it? (I think so -- the pipe notation, which you don't like, can give the compiler a clue which filters should be fused at compile time.)
My main complaint with the pipe notation is that, is doesn't permit the following idea to be expressed without destroying its original elegance:
Encrypt( Cat( Stream 1, Stream 2 ) )
This would be: filtering_ostream out(encryptor() | concatenate(out1, out2)); Looks fine to me. (Currently, in my framework, concatenation only works for input.)
That is, that you have to break it (the pipe idea) in order to save it. Also I believe that implementing this at compile time by overloading operators would be a lot of work for mere syntactic sugar.
I've discovered that it's not just sugar, though. Rob Stewart pointed out that specifying a sequence of filters and resources all at once allows some compile-time errors which would otherwise show up only at runtime. Also, I may implement the idea of an in-place filter, which uses the buffer of the next filter in the chain rather than its own buffer. The two filters can be fused at complie time if they are pushed in one statement, using the pipe notation, but not if they are pushed separately.
clarified rather than hidden. Hiding is often a good idea - unix command line pipes - but I don't think it's a great idea for programmer libraries. I want sharp tools whose application and usage are obvious so that I can compose them into what I need.
Here's one way the 'hiding' manifests itself, which just occured to me: filtering_ostream out(tee(log_stream) | encryptor() | concatenate(out1, out2)); std::cout << "chain length = " << out.size() << "\n"; This could print 1, 2 or 3 depending on which filters the compiler decides to fuse.
I don't think you mess with your library in a major way. Separating out streams/buffers is useful conceptually but that's really mostly a matter of documentation. Basically you should finish what you started, let people start using it then at your leisure if you have the inclination look in this idea.
I'm not planning any major time investement in the library, if it's accepted, over and above improvements decided during the review. However, adding the pipe functionality took about half an hour (and it actually worked the first time I tried to use it). I'm interested to see if I can add some compile-time fusion of adjacent filters without spending more than a few hours or days.
Alas, I've now come to understand where codecvt facet fits in here. My current thinking - its just speculation at this point - is composition of codecvt facets using dataflow iterators. This would result in:
I don't think this is a good idea: codevts use virtual functions heavily, and they are very difficult to write properly. (more below)
a) a codecvt_facade - template for making a codecvt class given a dataflow iterator chain at compile time.
I've had the idea of a codecvt_facade twice (the second time I forgot that I had tried it before). Both times I ran into the following problem: codevct's basically have to store all their state in the state type that is passed as an argument to each member function, and in many cases users are allowed to assume that the state type is mbstate_t. This makes it hard to do something like this: struct Filter { // typedefs result_type in( const char* from, const char* from_end, const char*& from_next, char* to, char* to_limit, char*& to_next ); // other members }; typedef codevt_facade<Filter> my_codevt;
b) dataflow iterators for encrypt, compress, base64, wchar_t to mbchar, mbchar to wchar_t, wchar_t to utf-8 etc.
c) a method - codecvt_adaptor ? - for composition of arbitray codecvt facets. This would permit us to use other existing pre-compiled facets like translating to Chinese mult-byte characterset.
My library offers something very similar. It's called a SymmetricFilter (http://tinyurl.com/3lz82.) Essentially, it represents a generic version of codecvt. SymmetricFilters are used to implement wrappers for zlib and bzip2. Right now, they involve more copying than necessary, but with a planned optimization they will become one of the most efficient types of filters. The problem is that they are very difficult to write correctly. The simple test class toupper_symmetric_filter ( http://tinyurl.com/6bu23), took me two hours to get working properly. (The buffering is not necessary -- I added it to produce a better test case).
So given the above, we could make a custom codecvt facet which would take the output, compress it, encrypt it, convert to base64, convert to Chinese character set with one compile time statement and use that facet with any stream. Except for the Chinese part which used a pre-compiled element, the compiler would optimize away all redundant copying.
Naturally this codecvt facet on steroids could be used with any streambuf without alteration. (or even recompilation).
So the "final division of effort would be"
Streams - rendering types like int, float, etc as char or wchar strings. Streambuf - buffering Codecvt_facet - filtering/transformation
I disagree, as explained above. Ultimately, I'd like the division to be: stream_facade - formatting template< typename FilterOrResource, <---filtering typename Buffering = ..., <----buffering > struct streambuf_facade;
I'm going to try to re-implement some of your iterators as Filters, and see what happens.
Feel free to experiment at your leisure. But don't get distracted. Starting something is easy - finishing something is very hard. You're in a position to get something of substance actually finished and in the hands of real users. Don't lose focus on that.
I've been planning to write many of these filters for quite a while, but have forced myself to wait until I know whether the library is accepted. However, if it looks like I can steal a good part of the implementation from serialization, I may no longer be able to resist the temptation. ;-)
In the serialization library - composition of dataflow iterators is used to implement the following:
Escaping/unescaping html text Binary <-> base64 translation Wstring <-> string translation Maybe others - I don't remember
The dataflow concept was once and object of research interest as a fundamental programming paradigm.
One thing is clear to me. We're going to see lots of new things in the future of programming. Much more than people think.
I'm sure. Jonathan
participants (2)
-
Jonathan Turkanis
-
Robert Ramey