RE: [boost] Re: IOStreams formal review start

newer
RE: [boost] Re: Porting to SCO Unix

older
serialization, pickling, and...

Reece Dunn

31 Aug 2004 31 Aug '04

11:34 a.m.

Jonathan Turkanis wrote:

...

"Daryle Walker" <darylew@hotmail.com> wrote:

...
On 8/28/04 8:09 PM, "Jeff Garland" <jeff@crystalclearsoftware.com> wrote: 2. This library does what a lot of other text-I/O libraries do, try to fit in "kewl" compression schemes. The problem is that the types of compression here are binary oriented; they convert between sets of byte streams. However, characters are not bytes (although characters, like other types, are stored as bytes).

Are you saying there are problems with the implementation of the compression filters, e.g., that they make unwarranted assumptions about 'char'? If so, please let me know. I'm sure it can be fixed.

If you want to read a bzip2 file (for example), the bzip2 file will be a binary stream and the resulting stream will be either binary or text depending on what the compressed file stores. You can tell std::ifstream to read the stream as binary, so what is the problem? Also, you might want to convert between text and binary modes: file > bzip2 (binary) > text > (process) > bzip2 (binary) > file

...

...
1. Actual code using this library is very slick and easy to set up. This ease of use/set-up also applies to the plug-in filters and/or resources.

I also like the ability to chain filters. I have a program that uses a similar mechanism on a character tape stream that allows text to be split to a certain character length and/or merged back into one block. It would be interesting to compare the performance of using my program verses an implementation using this library.

...

Can I rephrase this as follows: InputFilters and OutputFilters are a useful addition to the standard library, but Sources and Sinks just duplicate functionality alread present? If this is not your point please correct me.

There are two main resons to write Sources and Sinks instead of stream buffers:

1. Sources and Sinks and sinks express just the core functionality of a component. Usually you have to implement just one or two functions with very natural interfaces. You don't have to worry about buffering or about putting back characters.

This is similar to what iterator adaptors and flex_string do for iterators and std::strings respectively. This makes it easy to write standard conforming components. For example, if you wanted to add support for a COM-based IStream, it would be easy to implement a source and/or sink. I will post a more detailed review once I have had a better look through the documentation and library, but at the moment it looks good. Regards, Reece _________________________________________________________________ Use MSN Messenger to send music and pics to your friends http://www.msn.co.uk/messenger

Show replies by date

Jonathan Turkanis

31 Aug 31 Aug

3:55 p.m.

New subject: IOStreams formal review start

"Reece Dunn" <msclrhd@hotmail.com> wrote in message news:BAY24-F33zTGBGrhUaC00059ac2@hotmail.com...

...

If you want to read a bzip2 file (for example), the bzip2 file will be a binary stream and the resulting stream will be either binary or text depending on what the compressed file stores. You can tell std::ifstream to read the stream as binary, so what is the problem? Also, you might want to convert between text and binary modes:

file > bzip2 (binary) > text > (process) > bzip2 (binary) > file

Line-ending conversions can be done by sticking a newline filter in between the binary and text filters. When converting_stream is up and running, code conversion will be inserted at the appropriate place in a filter chain consisting of mixed narrow- and wide-character components.

...

...
...
1. Actual code using this library is very slick and easy to set up. This ease of use/set-up also applies to the plug-in filters and/or resources.

I also like the ability to chain filters. I have a program that uses a similar mechanism on a character tape stream that allows text to be split to a certain character length and/or merged back into one block. It would be interesting to compare the performance of using my program verses an implementation using this library.

I'd be interested too. There are a number of optimization I've held in reserve because I'm not sure ther'ye necessary. The big question is: what should I compare my library to?

...

...
There are two main resons to write Sources and Sinks instead of stream buffers:

1. Sources and Sinks and sinks express just the core functionality of a component. Usually you have to implement just one or two functions with

...

This is similar to what iterator adaptors and flex_string do for iterators and std::strings respectively. This makes it easy to write standard conforming components.

Exactly.

...

For example, if you wanted to add support for a COM-based IStream, it would be easy to implement a source and/or sink.

This was one of the original applications, actually :-)

...

I will post a more detailed review once I have had a better look through the documentation and library, but at the moment it looks good.

Thanks, I'll look forward to it. Jonathan

Carlo Wood

6:10 p.m.

New subject: IOStreams formal review start

On Tue, Aug 31, 2004 at 09:55:19AM -0600, Jonathan Turkanis wrote:

...

...
file > bzip2 (binary) > text > (process) > bzip2 (binary) > file

Line-ending conversions can be done by sticking a newline filter in between the binary and text filters. When converting_stream is up and running, code conversion will be inserted at the appropriate place in a filter chain consisting of mixed narrow- and wide-character components.

Hiya, me again :) I suppose I should really just download the code and start playing with it in order to be able to review it - but I have no time for that. So, don't take my questions as formal critics please; I am just being interested as a possible (future) user (and possibly even contributor). My concern, when I see those filter chains, remains performance as a result of unnecessary copying of data. Can you tell me; would the average filter copy the data (from one place in memory to another)? I'd suppose so, because it is unlikely that an arbitrary filter can be trusted to use the streambuf as a 'work space'. And if so, does this mean that if I put ten filters in a chain that then the data is copied ten times? Please enlighten me :). Thanks for your time in advance! -- Carlo Wood <carlo@alinoe.com>

Jonathan Turkanis

6:45 p.m.

New subject: IOStreams formal review start

...

On Tue, Aug 31, 2004 at 09:55:19AM -0600, Jonathan Turkanis wrote:

...
...
file > bzip2 (binary) > text > (process) > bzip2 (binary) > file

Line-ending conversions can be done by sticking a newline filter in between

"Carlo Wood" <carlo@alinoe.com> wrote in message news:20040831181013.GA16216@alinoe.com... the

...

...
binary and text filters. When converting_stream is up and running, code conversion will be inserted at the appropriate place in a filter chain consisting of mixed narrow- and wide-character components.

Hiya, me again :)

Howdy!

...

I suppose I should really just download the code and start playing with it in order to be able to review it - but I have no time for that. So, don't take my questions as formal critics please; I am just being interested as a possible (future) user (and possibly even contributor).

My concern, when I see those filter chains, remains performance as a result of unnecessary copying of data.

Yes, that's a major concern.

...

Can you tell me; would the average filter copy the data (from one place in memory to another)? I'd suppose so, because it is unlikely that an arbitrary filter can be trusted to use the streambuf as a 'work space'.

I was planning to write a section on efficiency in the documentation, but I ran out of time. Here's a typical scenario, for output with a chain of several filters. The user (possibly an ostream) of the initial streambuf in the chain writes characters using sputc and sputn. These function simply copy characters to the output buffer of the initial streambuf. When the buffer becomes full, or when pubsync is called, the contents of the buffer will be sent to the first filter in the chain, like so (assuming it's a 'buffered filter'): filter_.write(buf, buf + n); The filter examines the sequence of charaters it has been provided with, and writes a possibly modified sequence to the second streambuf in the chain, using (indirectly) sputc and sputn. This process continues until a sink is reached. So, yes, characters are copied at each juncture. In general, it is not possible to eliminate this copying, as far as I know. In special cases, it can be eliminated, as I mentioned in a previous message, which I repeat here for convenience:

...

There are several optimizations which I have held in reserve which would also minimize copying: a) Allowing resources to advertise that they are streambuf-based, so that i/o is performed directly to the underlying streambuf, with no additional layer of buffering b) Giving special treatment to symmetric filters (used to implement compression/decompression) to allow them to have direct access to the buffers of adjacent streambufs in a chain. c) allowing for a category of 'transparent filters' which simply observe character sequences, forwarding them unchanged. This would allow many useful filters (such as the offsetbuf suggested by David Abrahams) to have essentially zero overhead.

The category of transparent filters could be generalized any filter which can be repesented as a function char f(char); so that the length of a sequence of characters is never changed by the filters. Requiring that all filters be implemented as symmetric filters would also reduce copying, I beleive, but symmetric filters are surprisingly difficult to write correctly.

...

And if so, does this mean that if I put ten filters in a chain that then the data is copied ten times? Please enlighten me :).

Yes, for the time being. If your ideas can eliminate copying further, I'd be glad to try to incorporate them. (But I haven't looked at your library yet.)

...

Thanks for your time in advance!

No trouble. Jonathan

Carlo Wood

9:55 p.m.

New subject: IOStreams formal review start

On Tue, Aug 31, 2004 at 12:45:30PM -0600, Jonathan Turkanis wrote:

...

pubsync is called, the contents of the buffer will be sent to the first filter in the chain, like so (assuming it's a 'buffered filter'):

filter_.write(buf, buf + n); [...] Yes, for the time being. If your ideas can eliminate copying further, I'd be glad to try to incorporate them. (But I haven't looked at your library yet.)

My idea then would involve the introduction of a 'message' object, something that abstracts a contiguous piece of data with a finite size that can be processed as a unit. For example, one line of text in the case of text filters - or one packet of data when processing a UDP stream - or one binary packet that starts with an envelope/header followed by a payload etc. Then, instead of passing (buf, buf + n), this more abstract 'message' object should be used then. The message object would contain the 'buf' pointer and the size 'n' - not the complete data of course. Purely for exposition: struct Message { char* buf; size_t n; }; A filter should then be allowed to do the follow things with this object: 1) Tell it that the data can be freed. If the data is still in the original streambuf then the message object would take care of telling the streambuf that the part it was holding is now free again. 2) Process it inline - it would not write outside the buffer but only examine it and change things perhaps such that the result still fits in the same buffer. 3) Copy the data to a newly allocated memory block (which now can be larger than the orginal), filtering it while copying it if needed. This means that the 'message object' tells the streambuf that the data is now freed. Subsequential 'freeing' of the message would now delete the allocated memory block and not that of the stream buf. To the user of the 'message' only this interface would be visible (for example): Message::start() const : Get the start of the message. Message::size() const : Get the size of the message. Message::reserved() const : Size of the allocated buffer. Message::reserve(size) : Increase buffer size (possibly causing a copy). Message::set_size(size) : Set a new message size. Message::~Message : Free the underlaying data and destruct the message object. Message::Message(size) : Create a new Message object with an uninitialized buffer of size 'size'. The call to a filter would then become: filter_.write(message); // Passing a Message The reason that this is not a trivial change is mostly because the streambuf must be aware of the existance of these Message objects. If you would seriously consider to go for this approach then I am willing to donate my dbstreambuf code. Filters that can be implemented without the need to increase the message size can then always work 'in place', without the need for unnecessary copying. Filters that need to enlarge a buffer also do not always have to copy the data; when the message buffer is already large enough then no copying is needed. For example, to transform a compressed UNIX text file to a compressed windows text file: file >> expand_msg(2000) >> decompress >> add_cariage_return >> compress >> file Only the first filter would copy the data (would call new char [2000] and copy the size of the real message, which can be much smaller - leaving rest of the buffer uninitialized). decompress then would not have to allocate new space - and neither would 'add_cariage_return' etc. [ However, this still isn't satisfactory because a decompress filter will ALWAYS have to copy the data. Better would be to be able to pass a size to the decompress filter: file >> decompress(2000) >> add_cariage_return >> compress >> file or, just tell the decompress filter that it should try to make the resulting message have a buffer that is at least 1 character larger than the size of the resulting message: file >> decompress(1) >> add_cariage_return >> compress >> file Then really only a single copy is needed. On the other hand, the first is also already advantegous in that only a single allocation is needed: malloc is slow too *).] -- Carlo Wood <carlo@alinoe.com> *) Which seems to indicate that the Message object should have an Allocator template parameter (ie, to implement memory pools).

Jonathan Turkanis

10:32 p.m.

New subject: IOStreams formal review start

"Carlo Wood" <carlo@alinoe.com> wrote in message news:20040831215546.GA15507@alinoe.com...

...

On Tue, Aug 31, 2004 at 12:45:30PM -0600, Jonathan Turkanis wrote:

...
pubsync is called, the contents of the buffer will be sent to the first filter in the chain, like so (assuming it's a 'buffered filter'):

filter_.write(buf, buf + n); [...] Yes, for the time being. If your ideas can eliminate copying further, I'd be glad to try to incorporate them. (But I haven't looked at your library yet.)

My idea then would involve the introduction of a 'message' object,

I was hoping to avoid that. At most, I'd want to introduce a category of 'message-aware' filters. In cases where copying is a performance bottleneck, you would make sure that all your filters are message-aware. I definitely wan't to keep the simpler interface as an option -- preferably the default option.

...

something that abstracts a contiguous piece of data with a finite size that can be processed as a unit. For example, one line of text in the case of text filters - or one packet of data when processing a UDP stream - or one binary packet that starts with an envelope/header followed by a payload etc.

<snip details> I'm planning to look at your code tonight. Could you take a quick glance at some of these links about the Apache filtering API and let me know if its similar to what you're talking about? 1. Filtering I/O in Apache 2.0, by Ryan Bloom, http://tinyurl.com/6p5r5 2. Other stuff on Apache filtering by Ryan Bloom, http://tinyurl.com/646os 3. Slide show about Apache Filtering, http://tinyurl.com/5a6b4 Best Regards, Jonathan

Carlo Wood

11:32 p.m.

New subject: IOStreams formal review start

On Tue, Aug 31, 2004 at 04:32:39PM -0600, Jonathan Turkanis wrote:

...

of these links about the Apache filtering API and let me know if its similar to what you're talking about?

1. Filtering I/O in Apache 2.0, by Ryan Bloom, http://tinyurl.com/6p5r5 2. Other stuff on Apache filtering by Ryan Bloom, http://tinyurl.com/646os 3. Slide show about Apache Filtering, http://tinyurl.com/5a6b4

Very interesting articles. This design is much broader than my proposal, but that doesn't hurt imho. What is similar is that they are also using chunks of data (messages) and allow filters to change that in place, if the bucket type allows it. I think it would be great if your library would at least be able to add this type of support later on. It might mean however that some crucial change in the design is needed now already in order not to break code that would be using your current API. I have no overview of that; you are in a much better position to judge that ;). Clearly, their bucket and filters approach is very concerned with efficiency; as they state it "it is not a problem when a filter chooses not to be efficient, but the framework should not make it impossible to be as efficient as possible". While I only glanced at the text - it seems that in our case we'd need a IOSTREAM_STREAMBUF_BUCKET type ;). My proposal than only included that type and their HEAP_BUCKET type. So, still - a change to your streambuf seems to be needed in order to builtin support for this type of 'bucket handling' filters (in the future?). -- Carlo Wood <carlo@alinoe.com>

Jonathan Turkanis

1 Sep 1 Sep

5:32 a.m.

New subject: IOStreams formal review start

"Carlo Wood" <carlo@alinoe.com> wrote in message:

...

I think it would be great if your library would at least be able to add this type of support later on. It might mean however that some crucial change in the design is needed now already in order not to break code that would be using your current API. I have no overview of that; you are in a much better position to judge that ;).

I'll try to address this in more detail after I've had more time to think about it. However, I'm fairly optimistic that message- or bucket- based filtering can be added later, without breaking user code. The reason is that user-defined filters and resources interact with resources generically, using boost::io::read, boost::io::write, etc. They know nothing about the underlying implementaion in terms of linked lists of stream buffers. Therefore the whole framework can be rewritten without requiring changes to existing user-defined filters and resources. (Re-compilation will be necessary, of coure.) The existing components won't be able to take advantageof the extra efficiency provided by the new implementation, but in most cases this may not matter much. Best Regards, Jonathan

Larry Evans

31 Aug 31 Aug

11:41 p.m.

New subject: IOStreams formal review start

On 08/31/2004 04:55 PM, Carlo Wood wrote: [snip]

...

My idea then would involve the introduction of a 'message' object, something that abstracts a contiguous piece of data with a finite size that can be processed as a unit. For example, one line of text in the case of text filters - or one packet of data when processing a UDP stream - or one binary packet that starts with an envelope/header followed by a payload etc.

Then, instead of passing (buf, buf + n), this more abstract 'message' object should be used then. The message object would contain the 'buf' pointer and the size 'n' - not the complete data of course. Purely for exposition:

struct Message { char* buf; size_t n; };

A filter should then be allowed to do the follow things with this object:

1) Tell it that the data can be freed. If the data is still in the original streambuf then the message object would take care of telling the streambuf that the part it was holding is now free again.

2) Process it inline - it would not write outside the buffer but only examine it and change things perhaps such that the result still fits in the same buffer.

3) Copy the data to a newly allocated memory block (which now can be larger than the orginal), filtering it while copying it if needed. This means that the 'message object' tells the streambuf that the data is now freed. Subsequential 'freeing' of the message would now delete the allocated memory block and not that of the stream buf.

To the user of the 'message' only this interface would be visible (for example):

Message::start() const : Get the start of the message. Message::size() const : Get the size of the message. Message::reserved() const : Size of the allocated buffer. Message::reserve(size) : Increase buffer size (possibly causing a copy). Message::set_size(size) : Set a new message size. Message::~Message : Free the underlaying data and destruct the message object. Message::Message(size) : Create a new Message object with an uninitialized buffer of size 'size'.

[snip] I haven't thought deeply about this, but could this message idea and stack of streambuf's be used to implement a part of the upper layers of the OSI model: http://www.geocities.com/SiliconValley/Monitor/3131/ne/osimodel.html ? For example, each layer would correspond to an element in the stack of streambufs, and each streambuf could have a state associated with it. For example, the state could be a complex fsm, as with TCP: http://www.tcpipguide.com/free/t_TCPOperationalOverviewandtheTCPFiniteStateM... In another application, indenting output to reflect code structure, the state could be a simple flag, bol (beginning-of-line), to indicate the next character output will be the first on the line, and a length, to indicate the width of the current margin. An example of a streambuf with such a simple state is: http://cvs.sourceforge.net/viewcvs.py/boost-sandbox/boost-sandbox/boost/io/f... Jonathan, I'll be trying to rewrite the above code with IOStreams. looks like it'll be very easy. I'll just have to add the same member variables in the above code to that in your example: toupper_output_filter; and rename it, of course, and change the put to check the bol flag and print the margin before outputing the character. Then add a method to adjust the margin length. BTW, Jonathan, the documentation looks very good. I'll use it as a model if I ever try to get anything submitted. Regards, Larry.

Carlo Wood

1 Sep 1 Sep

1:43 a.m.

New subject: IOStreams formal review start

On Tue, Aug 31, 2004 at 06:41:36PM -0500, Larry Evans wrote:

...

I haven't thought deeply about this, but could this message idea and stack of streambuf's be used to implement a part of the upper layers of the OSI model:

http://www.geocities.com/SiliconValley/Monitor/3131/ne/osimodel.html

Well, the idea coming from my work on libcw ... one description of libcw is "An Object Oriented C++ library for networking applications", so yes - the whole idea has always been to be a general though efficient method to deal with networking protocols (the top layer of OSI).

...

? For example, each layer would correspond to an element in the stack of streambufs, and each streambuf could have a state associated with it. For example, the state could be a complex fsm, as with TCP:

You lost me here. stack elements of streambufs are not related to OSI layers (which even include the hardware?! A horse != cow). Also, I talked about 'messages' (chunks of data) and not about stacked streambufs; mulitple streambufs means copying of data and I am trying to talk Jonathan out of that :p. Finally, a TCP statemachine is an ISO layer lower - and a statemachine that would decode a specific protocol would be a layer higher then my 'Message' objects.

...

http://www.tcpipguide.com/free/t_TCPOperationalOverviewandtheTCPFiniteStateM...

In another application, indenting output to reflect code structure, the state could be a simple flag, bol (beginning-of-line), to indicate the next character output will be the first on the line, and a length, to indicate the width of the current margin. An example of a streambuf with such a simple state is:

http://cvs.sourceforge.net/viewcvs.py/boost-sandbox/boost-sandbox/boost/io/f...

There is no streambuf on that page. -- Carlo Wood <carlo@alinoe.com>

Larry Evans

2:18 a.m.

New subject: IOStreams formal review start

On 08/31/2004 08:43 PM, Carlo Wood wrote:

...

On Tue, Aug 31, 2004 at 06:41:36PM -0500, Larry Evans wrote:

...
I haven't thought deeply about this, but could this message idea and stack of streambuf's be used to implement a part of the upper layers of the OSI model:

http://www.geocities.com/SiliconValley/Monitor/3131/ne/osimodel.html

Well, the idea coming from my work on libcw ... one description of libcw is "An Object Oriented C++ library for networking applications", so yes - the whole idea has always been to be a general though efficient method to deal with networking protocols (the top layer of OSI).

...
? For example, each layer would correspond to an element in the stack of streambufs, and each streambuf could have a state associated with it. For example, the state could be a complex fsm, as with TCP:

You lost me here. stack elements of streambufs are not related to OSI layers (which even include the hardware?! A horse != cow).

Well, I figured it could be used in the lower layers (the hardware) for that reason, but I figured it might be usable in the higher layers.

...

Also, I talked about 'messages' (chunks of data) and not about stacked streambufs; mulitple streambufs means copying of data and I am trying

Well, streambufs deal with characters, either wide or regular, depending on a template parameter, I guess. Why couldn't the template parameter indicate a message (a chunk of data) instead of a single character. If you look at: http://cvs.sourceforge.net/viewcvs.py/boost-sandbox/boost-sandbox/boost/io/f... again, you'll see the template parameter is called "Fluid" :) . So I was thinking that maybe, in the case of characters, the fluid would be char or wchar, but in your case, the fluid would be messages.

...

to talk Jonathan out of that :p. Finally, a TCP statemachine is an ISO layer lower - and a statemachine that would decode a specific protocol would be a layer higher then my 'Message' objects.

OK. Maybe I'm thinking more in the line of a parser which gather's lower level data, chars, into higher level structures (AST nodes), and comparing that to network layers which gather lower level data (TCP/IP) and transform (filter it) into higher level data (whatever the application layer uses), and also does the reverse. In other words, somewhat like Jonathan's dual-use streambufs. I realize this is way beyond what Jonathan had in mind, but I just couldn't help speculating.

...

...
indicate the width of the current margin. An example of a streambuf with such a simple state is:

http://cvs.sourceforge.net/viewcvs.py/boost-sandbox/boost-sandbox/boost/io/f...

There is no streambuf on that page.

Sorry. See: http://cvs.sourceforge.net/viewcvs.py/boost-sandbox/boost-sandbox/boost/io/f... for how that class in combination with opipeline_from_streambuf is used to create a filtered stream. I can't remember exactly where the streambuf is created, but I believe it doesn't use any buffers other than that already in the sink.

Jonathan Turkanis

4:36 a.m.

New subject: IOStreams formal review start

"Larry Evans" <cppljevans@cox-internet.com> wrote in:

...

I haven't thought deeply about this, but could this message idea and stack of streambuf's be used to implement a part of the upper layers of the OSI model:

http://www.geocities.com/SiliconValley/Monitor/3131/ne/osimodel.html

? For example, each layer would correspond to an element in the stack of streambufs, and each streambuf could have a state associated with it. For example, the state could be a complex fsm, as with TCP:

http://www.tcpipguide.com/free/t_TCPOperationalOverviewandtheTCPFiniteStateM... This is getting pretty far afield, isn't it, Larry? ;-)

...

In another application, indenting output to reflect code structure, the state could be a simple flag, bol (beginning-of-line), to indicate the next character output will be the first on the line, and a length, to indicate the width of the current margin. An example of a streambuf with such a simple state is:

http://cvs.sourceforge.net/viewcvs.py/boost-sandbox/boost-sandbox/boost/io/f...

...

Jonathan, I'll be trying to rewrite the above code with IOStreams. looks like it'll be very easy. I'll just have to add the same member variables in the above code to that in your example:

toupper_output_filter;

and rename it, of course, and change the put to check the bol flag and print the margin before outputing the character. Then add a method to adjust the margin length.

I look forward to this. Maybe I can make it one of the examples, or part of the library in the 'text-processing' category.

...

BTW, Jonathan, the documentation looks very good. I'll use it as a model if I ever try to get anything submitted.

Thanks! BTW, I hope you can find time to submit a review. Best Regards, Jonathan

Larry Evans

2 Sep 2 Sep

2:31 p.m.

New subject: IOStreams formal review start [documentation]

On 08/31/2004 11:36 PM, Jonathan Turkanis wrote:

...

"Larry Evans" <cppljevans@cox-internet.com> wrote in:

[snip]

...

http://www.tcpipguide.com/free/t_TCPOperationalOverviewandtheTCPFiniteStateM...

...

This is getting pretty far afield, isn't it, Larry? ;-)

Yep. Sorry. No more. [snip]

...

http://cvs.sourceforge.net/viewcvs.py/boost-sandbox/boost-sandbox/boost/io/f...

...

...
Jonathan, I'll be trying to rewrite the above code with IOStreams.

...

I look forward to this. Maybe I can make it one of the examples, or

[snip] part of the

...

library in the 'text-processing' category.

As you know, I've sent you a copy of the rewrite with IOStreams; however, I had some problem with using the documentation during the rewrite. Figuring out the filter was the easy part: struct margin_output_filter { template<typename Sink> void put(Sink& snk, char c) { if(at_bol_) { for(unsigned i=marg_len_; 0<i; --i) { boost::io::put(snk,' '); } } boost::io::put(snk, c); at_bol_ = c == '\n'; } bool at_bol_; //indicates at Beginning-Of-Line unsigned marg_len_; //margin length }; However, understanding how to connect it to the stream was more difficult. At first I thought just using the example code from: libs/io/doc/tutorial.html#tutorial_output_filter or, more specifically: filtered_streambuf<output> out; out.push(toupper_output_filter()); out.push(cout); but then I had to figure out if the filter was copied or not. From the above, since the filter was a temporary, it had to be. This seemed a needless copy; hence, I kept looking for other examples. I found the file: libs/io/example/tab_expanding_example.cpp much more helpful since it allowed: out<<" a string"; where: filtering_ostream out; which is almost what I needed. I also needed methods allowing the expressions: ++out; --out; to increase and decrease the margin width. I also wanted to know that filtering_ostream could be used everywhere that ostream could be used; hence, I looked further at the docs: doc/tutorial.html#tutorial_sink - made no mention of ostream doc/filtering_streams.html - made no mention of ostream So, I perused the source code in filtering_stream.hpp. Well, with all the macros, that got pretty difficult, but it did have the comment: // Description: Defines a template derived from std::basic_streambuf which uses // a chain to perform i/o. The template has the following parameters: So, I concluded that it wasn't derived from ostream, but then why did it work with operator<< in tab_expanding_example.cpp? Looking a little further down, it appears publicly derived from the template parameter, Stream, whose default value is: BOOST_DEDUCED_TYPENAME \ boost::io::detail::filter_stream_traits< \ Mode, Ch, Tr \ >::type I looked above at the definition of filter_stream_traits ( which BTW, is underneath: //--------------Definition of filtered_istream--------------------------------// which is misleading since filtered_ostream is also defined there ) and saw std::basic_ostream<Ch, Tr>, so I was pretty sure the Stream default value, when Mode=output, was std::basic_ostream<Ch, Tr>. I also remember reading somewhere (I forget where) that the filter was actually stored by reference instead of by value as suggested by: out.push(toupper_output_filter()); as mentioned previously. Thus the outline of marg_ostream would be: class marg_ostream: public filtered_ostream { public: ... marg_ostream(std::ostream& a_ostrm) { push(marg_filt_); push(a_ostrm); } void adjust_margin(int delta) { marg_filt_.marg_len_+=delta; } ... private: margin_output_filter marg_filt_ ; }; Also, doc/filtering_streams.html contains: filtering_stream contains a chain of instances of streambuf_facade, accessed with an interface similar to that of std::stack. and from that at the initially wrong conclusion about storing the value of filters instead of just a reference, I thought I'd have to access the stack of filters by some member function of filtered_stream. That lead me to look at chain.hpp before I gave up. Obviously, I was hoping it would be a little easier. Maybe more examples, and explicitly showing the superclasses of each xxx_<m>stream where m = 'o' or 'i' or whatever, would have helped as well as emphasizing that each filter was stored as a reference. I heartily recommend inclusion of the library in boost and will begin using it instead of my marg_ostream as soon as it is. Cheers, Larry

Jonathan Turkanis

3:49 p.m.

New subject: IOStreams formal review start [documentation]

"Larry Evans" <cppljevans@cox-internet.com> wrote in:

...

On 08/31/2004 11:36 PM, Jonathan Turkanis wrote:

...
"Larry Evans" <cppljevans@cox-internet.com> wrote in:

...

...
I look forward to this. Maybe I can make it one of the examples, or part of the library in the 'text-processing' category.

As you know, I've sent you a copy of the rewrite with IOStreams; however, I had some problem with using the documentation during the rewrite.

Thanks for posting this material. As I mentioned in an email, the problem lies with my documentation. Let me add the clarifications here, and then comment on your message as needed. Some of these points are mentioned in the documentation, but they need to be featured more prominently. I. Filtering Streams. The template filtering_stream<Mode> derives from: A. std::basic_istream if Mode refines input but not output B. std::basic_ostream if Mode refines output but not input C. std::basic_iostream otherwise. This is alluded to here, http://tinyurl.com/5t6l7, but clearly needs to be spelled out more explicitly and perhaps in an eariler section of the documentation. II. Lifetime Management of Filters and Resources. This material should either go under 'Concepts' in the User's Guide, or have its own section. A. By default, filters and resources are stored internally by value and must be copy constructible. The reason for pass-by-value (really by const ref) is exception safety. This was accidentally omitted from the most recent rewrite of the rationale. B. It is unspecified whether filters and resources which are copy constructible have deep copy semantics. C. Standard streams and stream buffers are models of Resource; they are always stored by reference. D. The library may make an arbitrary number of copies (usually just one) of a filter or resource, but only one is stored, and no copies are made after i/o begins E. Filters and resources can be stored by reference using the function boost::ref (see http://tinyurl.com/4padg). This is useful in two types of cases: 1. The filter or resource type is not copy-constructible 2. Cases like the one you present below, in which you keep an external instance of a filter, and want changes to this external instance to be reflected directly in the filtered i/o. F. Filters and resources must free all associated resources (in the usual sense) either: 1. When the stored copy is destroyed, or 2. If the filter or resource type models Closable (http://tinyurl.com/3pg5j) and i/o has commenced, when the function boost::io::close() is called. I believe these principles are fairly intuitive and easy to work with, but they need to be spelled out in detail somewhere, and probably addressed in the examples. I'd like to include your margin_output_filter and marg_ostream either for use in the tutorial or as part of the text-processing section.

...

Figuring out the filter was the easy part:

struct margin_output_filter {

template<typename Sink> void put(Sink& snk, char c) { if(at_bol_) { for(unsigned i=marg_len_; 0<i; --i) { boost::io::put(snk,' '); } } boost::io::put(snk, c); at_bol_ = c == '\n'; }

bool at_bol_; //indicates at Beginning-Of-Line unsigned marg_len_; //margin length };

You've left out the member types here. Also, if you want the filter to be usable with several streams of data in succession, you should implement close: struct margin_output_filter : output_filter // provides typedefs and a no-op implementation of close { margin_output_filter() : at_bol_(false), marg_len_(0) { } template<typename Sink> void put(Sink& snk, char c) { if(at_bol_) { for(unsigned i=marg_len_; 0<i; --i) { boost::io::put(snk,' '); } } boost::io::put(snk, c); at_bol_ = c == '\n'; } // Allows filter to be reused when a resource is pop'd from // the filtering_ostream and a new one is push'd. template<typename Sink> void close(Sink&) { at_bol_ = false; marg_len_ = 0; } bool at_bol_; //indicates at Beginning-Of-Line unsigned marg_len_; //margin length };

...

However, understanding how to connect it to the stream was more difficult. At first I thought just using the example code from:

libs/io/doc/tutorial.html#tutorial_output_filter

or, more specifically:

filtered_streambuf<output> out; out.push(toupper_output_filter()); out.push(cout);

but then I had to figure out if the filter was copied or not. From the above, since the filter was a temporary, it had to be. This seemed a needless copy; hence, I kept looking for other examples. I found the file:

For you margin_output_filter above, the expense of copying is trivial. In many cases, to make a filter or resource type copy constructible you should use shared_ptr.

...

to increase and decrease the margin width. I also wanted to know that filtering_ostream could be used everywhere that ostream could be used; hence, I looked further at the docs:

doc/tutorial.html#tutorial_sink - made no mention of ostream

doc/filtering_streams.html - made no mention of ostream

It's mentioned in the reference documentation for filtering_stream. I think I'll also put it - In the overview on the library homepage - In the section "Filtering Streams and Stream Buffers" in the user's guide

...

So, I perused the source code in filtering_stream.hpp. Well, with all the macros, that got pretty difficult,

Yeah, you shouldn't have to look here at all. Sorry.

...

but it did have the comment:

// Description: Defines a template derived from std::basic_streambuf which uses // a chain to perform i/o. The template has the following parameters:

This is a copy-and-paste error. Incorrect comments are worse than no comments at all. Sorry again.

...

I looked above at the definition of filter_stream_traits ( which BTW, is underneath:

//--------------Definition of filtered_istream--------------------------------//

which is misleading since filtered_ostream is also defined there )

Another incorrect comment. Thanks for pointing this out.

...

and saw std::basic_ostream<Ch, Tr>, so I was pretty sure the Stream default value, when Mode=output, was std::basic_ostream<Ch, Tr>. I also remember reading somewhere (I forget where) that the filter was actually stored by reference instead of by value as suggested by:

Only if you use boost::ref().

...

as mentioned previously. Thus the outline of marg_ostream would be:

class marg_ostream: public filtered_ostream { public: ...

marg_ostream(std::ostream& a_ostrm) { push(marg_filt_);

This should be: push(ref(marg_filt_)); so changes to member variable marg_filt_ will show up in the filtered output.

...

push(a_ostrm); }

void adjust_margin(int delta) { marg_filt_.marg_len_+=delta; } ... private: margin_output_filter marg_filt_ ;

};

Also, doc/filtering_streams.html contains:

filtering_stream contains a chain of instances of streambuf_facade, accessed with an interface similar to that of std::stack.

and from that at the initially wrong conclusion about storing the value of filters instead of just a reference, I thought I'd have to access the stack of filters by some member function of filtered_stream. That lead me to look at chain.hpp before I gave up.

You use the stack interface just to push and pop filters. You can't do either of these things: - access the stream buffers in the chain; this could cirumvent internal buffering and result in garbled i/o - access the filters or resources in the chain; information about their types is lost when they are added to the chain, so they cannot be accessed in a typesafe manner If you need to access a particular filter or resource, store it externally and add it to the chain using boost::ref(). Note that this situation contrasts with that of streambuf_facade and stream_facade, where the type of the underlying resource is *not* lost. You can access the resource instance directly using operators * and ->.

...

Obviously, I was hoping it would be a little easier.

Naturally.

...

Maybe more examples, and explicitly showing the superclasses of each xxx_<m>stream where m = 'o' or 'i' or whatever, would have helped as well as emphasizing that each filter was stored as a reference.

I heartily recommend inclusion of the library in boost and will begin using it instead of my marg_ostream as soon as it is.

Thanks!

...

Cheers, Larry

Best Regards, Jonathan

Larry Evans

5:09 p.m.

New subject: IOStreams formal review start [documentation]

On 09/02/2004 10:49 AM, Jonathan Turkanis wrote: [snip]

...

Let me add the clarifications here, and then comment on your message as needed. Some of these points are mentioned in the documentation, but they need to be featured more prominently.

I. Filtering Streams. The template filtering_stream<Mode> derives from: A. std::basic_istream if Mode refines input but not output B. std::basic_ostream if Mode refines output but not input C. std::basic_iostream otherwise.

This is alluded to here, http://tinyurl.com/5t6l7, but clearly needs to be spelled out more explicitly and perhaps in an eariler section of the documentation.

Obviously I was wrong when I said: doc/filtering_streams.html - made no mention of ostream I thought I did a search for "ostream", but I guess not :( Oh, wait, filtering_streams.html doesn't have "ostream"; however, filtering_stream.html does. I should have thought to look under reference and then filtering_stream. Sorry. My mistake.

7653

Age (days ago)

7655

Last active (days ago)

List overview

Download

14 comments

4 participants

participants (4)

Carlo Wood
Jonathan Turkanis
Larry Evans
Reece Dunn