Re: [boost] IOStreams Formal review -- Guide for Reviewers

First, let me apologize for not being able to review the actual code (yet). The interface and correctness/performance of implementation are all I really care about for now :)
There are some special cases where the copying is wasteful, though. For instance:
1. If filter1 is just a passive observer, simply counting the number of occurences of '\n', or copying the data to a logging stream, then the end user should really be writing directly to buf2. Filter1 could process the data using the same buffer as filter2.
2. Same as above, but filter1 also modifies the data in-place, making character-by-character modifications. (E.g., a toupper filter). This can be handled the same way.
3. If 'resource' is a stream or stream buffer, it could be assumed to do its own buffering. In that case, filter2 should be writing directly to resource, instead of to streambuf3:
filter2.write(resource, buf2, buf2+ n2).
These three cases can be handled easily by modifying the existing framework. I didn't add special treatment because it occurred to me rather late in development.
I'd certainly feel a bit more proud of the library if it handled these cases (1 and 3 seem most important). It seems well worth a few days' delay.
There is another class of cases in which the current setup is wasteful, but I think it is rather domain-specific:
4. Most filters in the chain modify only small parts of character sequences, leaving big chunks unchanged.
Well, basic_newline_filter would do this - at least when replacing CRLF with a single '\n' character.
To optimize case 4 would require a major library extension. My feeling is that it is not necessary at this point, but I'd like to know what others think.
It should certainly wait if you don't have a design already in mind. The library is good enough without this.
-----------------------------------------------------
Part III: Interface questions:
1. How to handle read and write requests which return fewer characters than requested, though there has been no error, and EOF has not been reached. I think some answer to this question is necessary to allow the library to be extended later to handle models orther than ordinary blocking i/o. I mention three possibilties here, http://tinyurl.com/6r8p2, but only two are realistic. I'm interested to know how important pople think this issues is, and what is the best way to resolve it.
I think #2 would be most in line with what people are used to under Posix (-1/EAGAIN). Blocking (option 1) actually doesn't make any sense at all except at the ends (source/sink), unless you put each filter in a chain into its own thread and use something like semaphores. I suppose the idea is, if you're in the middle of a filter chain, and somebody gives you some input which would overflow your buffer, you attempt to empty out your buffer to the next guy, but if he can't take enough of it to allow you to accept the whole input (or even, none of it), you have to tell the guy who sent it to you you can't take it, and he has to hold onto it in his buffer. That seems reasonable. I might in fact prefer that my source or sink resource act like #1 (block until at least one character or EOF), and I assume this would be the default behavior if I open it in the default, blocking mode, but it isn't possible to have filters act that way; they need to pass the "can't take your data" feedback all the way back through the stack to the end user, who then needs to hold onto it and select/spin/whatever on the underlying sink resource ...
2. The stack interface. Is the interface to the underlying filter chains rich enoguh? Originally is was similar to std::list, so that you could disconnect chains at arbitrary points, store them, and reattach them later. I decided there wasn't much use for this, so I simplified the interface.
I'm sure someone, somewhere, sometime will want to perform splices/appends on filter chains, but I can't imagine why, either. At least, keep the simple interface and put the more complicated, flexible one in the appendix.
3. Exceptions. James Kanze has argued repeatedly that protected stream buffer functions should not throw exceptions (http://tinyurl.com/5o34x). I try to make the case for exceptions here: http://tinyurl.com/6r8p2. What do people think?
I sympathize with both arguments; either way seems fine to me. There is no real performance penalty for an exception that is thrown at most once per stream (EOF), but he's right that the existing interface (which end users never see) seems to specify a return value in that case. But, as you say, if you want to support async IO, it's moot - you do have to return the number of characters successfully read/written, so you need the std::streamsize return type, so ... you may as well return EOF instead of throwing it. That is, I see no reason to throw an EOF value if you support async IO (which I think would be lovely).
So the question is: Should an open() function be added to the closable interface, to eliminate the need for first-time switches? Alternatively, should there be a separate Openable concept?
Without a doubt, an Openable concept. If you add open() to Closeable you'd really want to change the concept name ;) For example, if I implement a first-time flag (no real hardship), I'll have to remember to add the first-time test not only when data is processed, but also when the stream is closed (need to handle empty input properly). I suspect people will forget this at least once. There's also a minor performance gain: the interface would be called when the stream is initialized, I assume, and not require a first-time flag check with each use. Admittedly, inconsequential. -Jonathan Graehl

"Jonathan Graehl" <jonathan@graehl.org> wrote in message news:413E1101.40404@graehl.org...
First, let me apologize for not being able to review the actual code (yet). The interface and correctness/performance of implementation are all I really care about for now :)
No apology necessary.
There are some special cases where the copying is wasteful, though. For instance:
1. If filter1 is just a passive observer, simply counting the number of occurences of '\n', or copying the data to a logging stream, then the end user should really be writing directly to buf2. Filter1 could process the data using the same buffer as filter2.
2. Same as above, but filter1 also modifies the data in-place, making character-by-character modifications. (E.g., a toupper filter). This can be handled the same way.
3. If 'resource' is a stream or stream buffer, it could be assumed to do its own buffering. In that case, filter2 should be writing directly to resource, instead of to streambuf3:
I'd certainly feel a bit more proud of the library if it handled these cases (1 and 3 seem most important). It seems well worth a few days' delay.
I plan to add this functionality, if the library is accepted. (3) is easy. Im going to change the name of the 'Buffered' concept to 'MultiCharacter', and use 'Buffered' to indicate that a component has its own buffer. Streams and stream buffers will be models of Buffered by default. I believe this will involve just a few lines of code. Even if (1) is more important than (2), I think (2) subsumes (1) and involves about the same amount of work. Let's call such filters 'in-place' filters. If in-place filters are added to a chain one at a time, their static type is lost, so to make them work will require a certain (small) amount of runtime indirection. Using the van Winkel/van Krieken pipe notation mentioned by Dietmar Kuehl: filtering_ostream out(tee(cout) | line_counter() | to_upper() | file("log.txt")); the in-place filters tee, line_counter and to_upper can be fused together at compile-time. (Another proof that this notation is not just syntactic sugar.)
There is another class of cases in which the current setup is wasteful, but I think it is rather domain-specific:
4. Most filters in the chain modify only small parts of character sequences, leaving big chunks unchanged.
Well, basic_newline_filter would do this - at least when replacing CRLF with a single '\n' character.
To optimize case 4 would require a major library extension. My feeling is
I think the real optimizations are possible only when a filter can tell that it doesn't need to modify a block just by reading the header information. A newline_filter has to scan the whole text to determine if changes need to be made. that
it is not necessary at this point, but I'd like to know what others think.
It should certainly wait if you don't have a design already in mind. The library is good enough without this.
Part III: Interface questions:
1. How to handle read and write requests which return fewer characters than requested, though there has been no error, and EOF has not been reached. I
Thanks. think
some answer to this question is necessary to allow the library to be extended later to handle models orther than ordinary blocking i/o. I mention three possibilties here, http://tinyurl.com/6r8p2, but only two are realistic. I'm interested to know how important pople think this issues is, and what is the best way to resolve it.
I think #2 would be most in line with what people are used to under Posix (-1/EAGAIN). Blocking (option 1) actually doesn't make any sense at all except at the ends (source/sink), unless you put each filter in a chain into its own thread and use something like semaphores. I suppose the idea is, if you're in the middle of a filter chain, and somebody gives you some input which would overflow your buffer, you attempt to empty out your buffer to the next guy, but if he can't take enough of it to allow you to accept the whole input (or even, none of it), you have to tell the guy who sent it to you you can't take it, and he has to hold onto it in his buffer. That seems reasonable.
Okay. BTW, I noticed an error in proposal #2: having both an implicit conversion to char and a safe-bool conversion to test for eof and unavil is unworkable. To test whether a member function is valid, I'll probably have to add an ordinary member function. Perhaps: template<typename Ch> struct basic_character { .... operator Ch() const; bool good() const; bool eof() const; bool fail() const; }; (Here I'm using 'fail' instead of 'unavail' or 'EAGAIN', but the main point is the addition of the member 'good()'). Now, looking at the alphabet_input filter from the tutorial, instead of struct alphabetic_input_filter : public input_filter { template<typename Source> int get(Source& src) { int c; while ((c = boost::io::get(src)) != EOF && !isalpha(c)) ; return c; } }; you'd write: struct alphabetic_input_filter : public input_filter { template<typename Source> int get(Source& src) { character c; while ((c = boost::io::get(src)).good() && !isalpha(c)) ; return c; } }; Here, eof and fail values are passed on to the caller unchanged. If you want to send an eof or fail notification explicitly, you'd write return eof() or return fail(). Now the big question: is the above formulation too convoluted to teach to an average user who is interested only in plain, blocking i/o?
I might in fact prefer that my source or sink resource act like #1 (block until at least one character or EOF), and I assume this would be the default behavior if I open it in the default, blocking mode, but it
My idea was that the initial filter concepts need to be designed so that they can be used unchanged when the library is extended to handle other i/o models. For resources, it's easy enough simply to introduce new concepts Non-Blocking Sink, Asynchronous Sink, etc.
isn't possible to have filters act that way; they need to pass the "can't take your data" feedback all the way back through the stack to the end user, who then needs to hold onto it and select/spin/whatever on the underlying sink resource ...
2. The stack interface. Is the interface to the underlying filter chains rich enoguh? Originally is was similar to std::list, so that you could disconnect chains at arbitrary points, store them, and reattach them later. I decided
Right. For now, I'm not worrying about what the proper abstraction will be to represent a chain of filters with, say, an Asynchronous Source at the end. It might be an 'async_istream', and ordinary filtering_istream which hides the asynchonous nature of the source, or some entirely new abstraction not related to the current standard i/o library. All I want to ensure is that filters written today will be usable in the future. there
wasn't much use for this, so I simplified the interface.
I'm sure someone, somewhere, sometime will want to perform splices/appends on filter chains, but I can't imagine why, either. At least, keep the simple interface and put the more complicated, flexible one in the appendix.
In that case, it's better to add it when someone actually needs it. It won't be hard since filter chains are still implemented as std::lists.
3. Exceptions. James Kanze has argued repeatedly that protected stream buffer functions should not throw exceptions (http://tinyurl.com/5o34x). I try to make the case for exceptions here: http://tinyurl.com/6r8p2. What do people think?
I sympathize with both arguments; either way seems fine to me. There is no real performance penalty for an exception that is thrown at most once per stream (EOF),
Just to clarify, I agree with JK that exceptions should not be used to signal EOF. You need a return value to tell you how many characters were successfully read. (See http://tinyurl.com/3waf8 'Exceptions')
but he's right that the existing interface (which end users never see) seems to specify a return value in that case. But, as you say, if you want to support async IO, it's moot - you do have to return the number of characters successfully read/written, so you need the std::streamsize return type, so ... you may as well return EOF instead of throwing it. That is, I see no reason to throw an EOF value if you support async IO (which I think would be lovely).
So the question is: Should an open() function be added to the closable interface, to eliminate the need for first-time switches? Alternatively, should there be a separate Openable concept?
Without a doubt, an Openable concept. If you add open() to Closeable you'd really want to change the concept name ;)
Of course ;-) But I can't think of a good one.
For example, if I implement a first-time flag (no real hardship), I'll have to remember to add the first-time test not only when data is processed, but also when the stream is closed (need to handle empty input properly). I suspect people will forget this at least once.
Good point.
There's also a minor performance gain:
Very minor. The filter/resource members are typically called by streambuf virtual functions.
the interface would be called when the stream is initialized, I assume, and not require a first-time flag check with each use. Admittedly, inconsequential.
-Jonathan Graehl
Thanks. Best Regards, Jonathan

struct alphabetic_input_filter : public input_filter { template<typename Source> int get(Source& src) { character c; while ((c = boost::io::get(src)).good() && !isalpha(c)) ; return c; } };
(boost::io::)character is just a type that wraps an int with a good() test?
Here, eof and fail values are passed on to the caller unchanged. If you want to send an eof or fail notification explicitly, you'd write return eof() or return fail().
Now the big question: is the above formulation too convoluted to teach to an average user who is interested only in plain, blocking i/o?
That seems fine. For such a user, this is just boilerplate code pasted from documentation examples. If you really care to make filters written by naive users via a simple blocking interface applicable to more advanced nonblocking scenarios, you can design a generic adapter that turns a blocking filter into a nonblocking one. That is, it would wrap both the upstream and downstream, I imagine with a dynamically growing buffer that will accept any single output the naive user wants to produce, but then flagging when its downstream consumer doesn't accept the entire amount, and returning the EAGAIN equivalent to its upstream instead of calling the user's naive blocking method. This strategy would require a close notification since the buffered unaccepted stuff would be left alone until the next write attempt (unless you want to spawn a thread to spin attempting to empty the buffer). About the close() or open() methods for a filter that wants to write some prelude or coda (e.g. gzip): aren't these only necessary because you can't guarantee that the constructors and destructors for the filter stack are called in the proper order? It would be nice if source -> A -> B -> sink could guarantee that B is (finally) constructed after sink, and then A is constructed after having been linked to B. That is, aren't the filters passed by const reference and only copy constructed once into the filter stack? I guess the part about ensuring that A is linked to B before the user constructor code might be accomplished by inheritance (superclass constructors always complete before subclass constructor executes, and the reverse for destructors?) I'm not sure if this is too clever, or can't be made to work portably, though. I don't think a second, simpler interface would be that much of a win; the complexity of having two interfaces or types of filters would add as much confusion as it simplifies the blocking case.

"Jonathan Graehl" <jonathan@graehl.org> wrote in message news:413F7C5A.1030701@graehl.org...
struct alphabetic_input_filter : public input_filter { template<typename Source> int get(Source& src) { character c; while ((c = boost::io::get(src)).good() && !isalpha(c)) ; return c; } };
(boost::io::)character is just a type that wraps an int with a good() test?
int_type or optional<char> would be good enough for that. The important point is that there are two separate 'not good' states -- one means end-of-sequence and the other means no data is currently available.
Here, eof and fail values are passed on to the caller unchanged. If you want to send an eof or fail notification explicitly, you'd write return eof() or return fail().
Now the big question: is the above formulation too convoluted to teach to an average user who is interested only in plain, blocking i/o?
That seems fine. For such a user, this is just boilerplate code pasted from documentation examples.
What worries me is that users already know about char, int and EOF. To use character properly probably requires more than just copying from examples.
If you really care to make filters written by naive users via a simple blocking interface applicable to more advanced nonblocking scenarios, you can design a generic adapter that turns a blocking filter into a nonblocking one. That is, it would wrap both the upstream and downstream, I imagine with a dynamically growing buffer that will accept any single output the naive user wants to produce, but then flagging when its downstream consumer doesn't accept the entire amount, and returning the EAGAIN equivalent to its upstream instead of calling the user's naive blocking method. This strategy would require a close notification since the buffered unaccepted stuff would be left alone until the next write attempt (unless you want to spawn a thread to spin attempting to empty the buffer).
I hadn't thought of that. I'd much rather teach people how to write correct filters, so they can be used with maximum efficiency. There's also no good way to turn a blocking input filter into a non-blocking one -- unless you consider that every blocking input filter is sort of a degenerate non-blocking filter.
About the close() or open() methods for a filter that wants to write some prelude or coda (e.g. gzip): aren't these only necessary because you can't guarantee that the constructors and destructors for the filter stack are called in the proper order?
No -- filters should be reusable. Here's an example from a reply I wrote to Rob Stewart (it turned out not to be relevant to that discussion, but maybe it'll be relevant here ;-). "Jonathan Turkanis" <technews@kangaroologic.com> wrote in message news:chalsi$cmb$1@sea.gmane.org...
struct zlib_ostream : filtering_ostream { zlib_ostream() { push(zlib_compressor()); }
template<typename Source> zlib_ostream(const Source& src ) { push(zlib_compressor()); open(src); }
template<typename Source> void open(const Source& src ) { BOOST_STATIC_ASSERT(is_resource<src>::value); push(src); }
bool is_open() const { return is_complete(); }
template<typename Source> void close(const Source& src ) { assert(is_open()); pop(); } };
int main() { using namespace boost::io; zlib_ostream out; out.open(file_sink("hello_world")); out << "hello world!"; out.close(); out.open(file_sink("goodbye_world")); out << "goodbye world!"; } Only one zlib_compressor is constructed, but it is used several times.
It would be nice if source -> A -> B -> sink could guarantee that B is (finally) constructed after sink, and then A is constructed after having been linked to B. That is, aren't the filters passed by const reference and only copy constructed once into the filter stack? I guess the part about ensuring that A is linked to B before the user constructor code might be accomplished by inheritance (superclass constructors always complete before subclass constructor executes, and the reverse for destructors?) I'm not sure if this is too clever, or can't be made to work portably, though.
Actually, I can run the destructors in any order I want, since I'm using boost::optional<Filter> to avoid requiring that filters and resources be default constructible. So I can just do filter_ = none;
I don't think a second, simpler interface would be that much of a win; the complexity of having two interfaces or types of filters would add as much confusion as it simplifies the blocking case.
I've lost you here. Which is the 'second, simpler interface' which you don't think is a good idea? Best Regards, Jonathan

I hadn't thought of that. I'd much rather teach people how to write correct filters, so they can be used with maximum efficiency. There's also no good way to turn a blocking input filter into a non-blocking one
The mechanism I was alluding to would work in general, requiring a dynamic buffer to handle the single largest output the dumb blocking filter wants to produce all at once. I agree that it would be best to make such a wrapper unnecessary.
About the close() or open() methods for a filter that wants to write some prelude or coda (e.g. gzip): aren't these only necessary because you can't guarantee that the constructors and destructors for the filter stack are called in the proper order?
No -- filters should be reusable. Here's an example from a reply I wrote to Rob Stewart (it turned out not to be relevant to that discussion, but maybe it'll be relevant here ;-).
zlib_ostream out; out.open(file_sink("hello_world")); out << "hello world!"; out.close(); out.open(file_sink("goodbye_world")); out << "goodbye world!";
Only one zlib_compressor is constructed, but it is used several times.
OK. I understand your rationale - you think that constructing these filtered streams might be expensive, and that one might want to cache and reuse them for many files/network connections/etc. I guess you can repeatedly open() and close() fstreams that way, although I've never wanted to.
Actually, I can run the destructors in any order I want, since I'm using boost::optional<Filter> to avoid requiring that filters and resources be default constructible. So I can just do filter_ = none;
So what I suggested (having filters be use-once and able to emit prelude/postlude in their constructor/destructor) is technically possible, but you prefer to make them reusable and thus need Openable/Closeable concepts (or require all filters to implement some open() and close(), even an empty one).
I don't think a second, simpler interface would be that much of a win;
I've lost you here. Which is the 'second, simpler interface' which you don't think is a good idea?
I meant the simpler (current) "blocking-only" filter interface that needs a wrapper to handle sinks that consume less than they're given without actually failing/EOFing (only nonblocking sinks, really).

"Jonathan Graehl" <jonathan@graehl.org> wrote in message news:413FE305.3060003@graehl.org...
I hadn't thought of that. I'd much rather teach people how to write correct filters, so they can be used with maximum efficiency. There's also no good
way
to turn a blocking input filter into a non-blocking one
The mechanism I was alluding to would work in general, requiring a dynamic buffer to handle the single largest output the dumb blocking filter wants to produce all at once. I agree that it would be best to make such a wrapper unnecessary.
Suppose I have an input filter f which expects that each time it invokes get() on the component immediately downstream it will block until the next character is available or the end of the stream is reached. Suppose that g is immediately upstream from f and wants to call f.get(). Supposing that no input is going to be available for a long time, how can you turn that into a non-blocking call?
About the close() or open() methods for a filter that wants to write some prelude or coda (e.g. gzip): aren't these only necessary because you can't guarantee that the constructors and destructors for the filter stack are called in the proper order?
No -- filters should be reusable. Here's an example from a reply I wrote to Rob Stewart (it turned out not to be relevant to that discussion, but maybe it'll be relevant here ;-).
OK. I understand your rationale - you think that constructing these filtered streams might be expensive,
Yes
and that one might want to cache and reuse them for many files/network connections/etc. I guess you can repeatedly open() and close() fstreams that way, although I've never wanted to.
I want to provide the same functionality that the standard library provides. People would complain if you couldn't close and reopen an fstream, don't you think?
I don't think a second, simpler interface would be that much of a win;
I've lost you here. Which is the 'second, simpler interface' which you don't think is a good idea?
I meant the simpler (current) "blocking-only" filter interface that needs a wrapper to handle sinks that consume less than they're given without actually failing/EOFing (only nonblocking sinks, really).
At the moment, I don't like the idea of wrapping blocking filters to make them non-blocking, partly because it sacrifices efficiency and partly because I can't see how to do it for filtered input. So if there were a simpler blocking interface and a more complex non-blocking interface, my preference would be to say that filters which only support the simpler interface simply can't be used in non-blocking chains. I'm hoping that the non-blocking interface can be made sufficiently simple. I'm glad to hear that you think so. (One cruel way to force programmers to write filters which can be used in non-blocking chains is to abolish the simple 'non-buffered' filter concepts, and make everyone use the interface struct my_filter : input_filter { template<typename Source> std::streamsize read(Source& src, char* s, std::streamsize n); }; Here the interpretation of the return value is clear: return the number of characters read, or -1 for EOF. The trouble is, there are some simple filters that it's very hard to express with the above interface.) Best Regards, Jonathan

I think we can agree to the following: Openable and Closeable interfaces (nasty spelling - like useable/usable, closeable/closable are both used widely - same probably applies to localizeable/localizable - probably localisable too in England) Either all filters are nonblocking filters, or some filters are blocking and can't be attached to nonblocking source/sinks (is there an easy way to make the boilerplate for the non bufferable type simple enough that copy/paste isn't needed? obviously, CPP can always be used but isn't self-documenting). Personally, I favor eliminating blocking filters; they aren't that much simpler IMO. I guess we're more or less talking about cooperative vs. preemptive multitasking (assuming you're forced to use a thread for each blocking sources/sink, if you want to use simple blocking filters, rather than nonblocking/multiplexed I/O with nonblocking filters)
Suppose I have an input filter f which expects that each time it invokes get() on the component immediately downstream it will block until the next character is available or the end of the stream is reached. Suppose that g is immediately upstream from f and wants to call f.get(). Supposing that no input is going to be available for a long time, how can you turn that into a non-blocking call?
This is tricky. I believe I already described the method for a blocking output filter, and the same strategy *almost* works for a blocking input filter as well. source . A . B where A is a blocking filter and source is a nonblocking source (can return EAGAIN when you get/read). B is either the user's code, or another nonblocking filter. source . inWrapper . A . outWrapper . B inWrapper communicates secretly with outWrapper, bypassing A entirely and just returning EAGAIN to B whenever inWrapper wouldn't have anything to give A. But, we can't predict ahead of time how much stuff an input filter A may consume in order to generate a single output character in response to get(), so the method of outWrapper bypassing A if inWrapper has not enough A, actually can't work. We can still create a thread somewhere and patch things up, but let's just say that it's too complicated for now - besides, you can just use blocking stuff and put the whole process that reads from source into its own thread - threads+blocking is easier than asynchronous/cooperative anyway, so such a simulation would be of dubious value. This discussion has caused me to seriously question the need for a simple InputFilter interface, or even a "pull" InputFilter at all. The Symmetric Filter idea seems most appropriate to me (and sufficiently general, if you allow arbitrary iterator types instead of just char *). Since we now allow boost::read to return less characters read (even 0) without meaning EOF/error, if you really only want to output one character then return, you can conform to the MultiCharacter (Buffered) interface but ignore n, always returning at most one character. Additionally, except for the simplest filters, you still have to deal with read requests that are less than what you would prefer to output, and due to lack of continuations/coroutines in the language, you'll have to reify your current state and resume next time somebody tries to read you. I suppose that an InputFilter really does allow you to write your parsing/recognition code in buffer-oblivious form (that is, you're to keep reading until you get a whole block of compressed data, or until you completely match your pattern), and you can reify your output state by simply buffering a whole chunk of output internally. But what's the problem with buffering input if you need to? You already have to potentially buffer output. There is a pretty well known stream/message pipeline mechanism: linked lists of buffer chunks allocated by the producer, passed by reference/pointer to the consumer, and freed when totally consumed (you could allow a transfer-ownership operation, which would be nice for in-place transforms, but don't really need one). Think message passing/queuing, but ignoring message boundaries and interpreting the data as a stream. Fixed size buffer chunks (that may be partially full) are usually called "mbufs", and a list of them is an mbuf chain. The only complexity involved in processing chains is that you have to either write your filter so that it operates character by character (a convenience iterator could step through the chain), or you have to reassemble the mbuf fragments into one contiguous buffer for reading. You can adopt more complex logic that works directly on a sequence of mbufs (for instance Posix supports writing/reading files/sockets to/from a sequence of noncontiguous buffers - called "scatter/gather IO"). You can imagine variations where you also allow circular buffering rather than one-pass use of mbufs. This reminds me: please consider the possibility of (some day in the future) automagically running pipelined filters in their own threads (properly synchronized, of course), and possibly some policy allowing the granularity of work done to be increased (generally, larger buffers would allow this) to minimize context switching overhead (which exists even for nonthreaded pipelines, because of cache coherency, state/continuation reification, and least important, function call overhead). mbuf chains might be easier to efficiently synchronize in a multithreaded environment than semaphores/circular buffers (wild speculation).
I guess you can
repeatedly open() and close() fstreams that way, although I've never wanted to.
I want to provide the same functionality that the standard library provides. People would complain if you couldn't close and reopen an fstream, don't you think?
Suffice it to say that I was heretofore unaware such capability ever existed, and you're obviously right to want it for your library :)
struct my_filter : input_filter { template<typename Source> std::streamsize read(Source& src, char* s, std::streamsize n); };
Here the interpretation of the return value is clear: return the number of characters read, or -1 for EOF. The trouble is, there are some simple filters that it's very hard to express with the above interface.
As I mentioned earlier, this wouldn't add any difficulties at all. You can simply return at most 1 character, no matter how many are requested, if that really simplifies your implementation. Naturally, there's some runtime overhead that wouldn't be necessary with your separate one-character interface, but doing things that way is generally less efficient anyway - you've already opted for programmer time over machine time. Sorry this has gotten to be so involved - I am perfectly happy to just use whatever interfaces you have now - it's better than what I have (i.e. nothing). The binary I/O you propose (in "future directions") sounds cool also - especially if the library had both native and "network byte order" variants. Obviously anything involving passing mbuf chains instead of internal buffering would be a huge change to interface and implementation; I don't think it needs to happen, or if it does, there would be a place for it as a second library that also handles message-passing pipelines (where message boundaries are preserved and not just implementation details of streams). Yrs, Jonathan
participants (2)
-
Jonathan Graehl
-
Jonathan Turkanis