
I think we can agree to the following: Openable and Closeable interfaces (nasty spelling - like useable/usable, closeable/closable are both used widely - same probably applies to localizeable/localizable - probably localisable too in England) Either all filters are nonblocking filters, or some filters are blocking and can't be attached to nonblocking source/sinks (is there an easy way to make the boilerplate for the non bufferable type simple enough that copy/paste isn't needed? obviously, CPP can always be used but isn't self-documenting). Personally, I favor eliminating blocking filters; they aren't that much simpler IMO. I guess we're more or less talking about cooperative vs. preemptive multitasking (assuming you're forced to use a thread for each blocking sources/sink, if you want to use simple blocking filters, rather than nonblocking/multiplexed I/O with nonblocking filters)
Suppose I have an input filter f which expects that each time it invokes get() on the component immediately downstream it will block until the next character is available or the end of the stream is reached. Suppose that g is immediately upstream from f and wants to call f.get(). Supposing that no input is going to be available for a long time, how can you turn that into a non-blocking call?
This is tricky. I believe I already described the method for a blocking output filter, and the same strategy *almost* works for a blocking input filter as well. source . A . B where A is a blocking filter and source is a nonblocking source (can return EAGAIN when you get/read). B is either the user's code, or another nonblocking filter. source . inWrapper . A . outWrapper . B inWrapper communicates secretly with outWrapper, bypassing A entirely and just returning EAGAIN to B whenever inWrapper wouldn't have anything to give A. But, we can't predict ahead of time how much stuff an input filter A may consume in order to generate a single output character in response to get(), so the method of outWrapper bypassing A if inWrapper has not enough A, actually can't work. We can still create a thread somewhere and patch things up, but let's just say that it's too complicated for now - besides, you can just use blocking stuff and put the whole process that reads from source into its own thread - threads+blocking is easier than asynchronous/cooperative anyway, so such a simulation would be of dubious value. This discussion has caused me to seriously question the need for a simple InputFilter interface, or even a "pull" InputFilter at all. The Symmetric Filter idea seems most appropriate to me (and sufficiently general, if you allow arbitrary iterator types instead of just char *). Since we now allow boost::read to return less characters read (even 0) without meaning EOF/error, if you really only want to output one character then return, you can conform to the MultiCharacter (Buffered) interface but ignore n, always returning at most one character. Additionally, except for the simplest filters, you still have to deal with read requests that are less than what you would prefer to output, and due to lack of continuations/coroutines in the language, you'll have to reify your current state and resume next time somebody tries to read you. I suppose that an InputFilter really does allow you to write your parsing/recognition code in buffer-oblivious form (that is, you're to keep reading until you get a whole block of compressed data, or until you completely match your pattern), and you can reify your output state by simply buffering a whole chunk of output internally. But what's the problem with buffering input if you need to? You already have to potentially buffer output. There is a pretty well known stream/message pipeline mechanism: linked lists of buffer chunks allocated by the producer, passed by reference/pointer to the consumer, and freed when totally consumed (you could allow a transfer-ownership operation, which would be nice for in-place transforms, but don't really need one). Think message passing/queuing, but ignoring message boundaries and interpreting the data as a stream. Fixed size buffer chunks (that may be partially full) are usually called "mbufs", and a list of them is an mbuf chain. The only complexity involved in processing chains is that you have to either write your filter so that it operates character by character (a convenience iterator could step through the chain), or you have to reassemble the mbuf fragments into one contiguous buffer for reading. You can adopt more complex logic that works directly on a sequence of mbufs (for instance Posix supports writing/reading files/sockets to/from a sequence of noncontiguous buffers - called "scatter/gather IO"). You can imagine variations where you also allow circular buffering rather than one-pass use of mbufs. This reminds me: please consider the possibility of (some day in the future) automagically running pipelined filters in their own threads (properly synchronized, of course), and possibly some policy allowing the granularity of work done to be increased (generally, larger buffers would allow this) to minimize context switching overhead (which exists even for nonthreaded pipelines, because of cache coherency, state/continuation reification, and least important, function call overhead). mbuf chains might be easier to efficiently synchronize in a multithreaded environment than semaphores/circular buffers (wild speculation).
I guess you can
repeatedly open() and close() fstreams that way, although I've never wanted to.
I want to provide the same functionality that the standard library provides. People would complain if you couldn't close and reopen an fstream, don't you think?
Suffice it to say that I was heretofore unaware such capability ever existed, and you're obviously right to want it for your library :)
struct my_filter : input_filter { template<typename Source> std::streamsize read(Source& src, char* s, std::streamsize n); };
Here the interpretation of the return value is clear: return the number of characters read, or -1 for EOF. The trouble is, there are some simple filters that it's very hard to express with the above interface.
As I mentioned earlier, this wouldn't add any difficulties at all. You can simply return at most 1 character, no matter how many are requested, if that really simplifies your implementation. Naturally, there's some runtime overhead that wouldn't be necessary with your separate one-character interface, but doing things that way is generally less efficient anyway - you've already opted for programmer time over machine time. Sorry this has gotten to be so involved - I am perfectly happy to just use whatever interfaces you have now - it's better than what I have (i.e. nothing). The binary I/O you propose (in "future directions") sounds cool also - especially if the library had both native and "network byte order" variants. Obviously anything involving passing mbuf chains instead of internal buffering would be a huge change to interface and implementation; I don't think it needs to happen, or if it does, there would be a place for it as a second library that also handles message-passing pipelines (where message boundaries are preserved and not just implementation details of streams). Yrs, Jonathan