IOStreams formal review start

newer
Secret handles and descriptors:...

Jeff Garland

29 Aug 2004 29 Aug '04

12:09 a.m.

All - Today (August 28th, 2004) is the start of the formal review of the Iostreams library by Jonathan Turkanis. I will be serving as review manager. Note that this is a somewhat unusual situation in that we have several libraries that overlap in the same area, so comments related to the MoreIo overlap are needed. As usual, please state in review comments how you reviewed the library and whether the you think the library should be accepted into Boost. Further guidelines for writing reviews can be found on the website at: http://www.boost.org/more/formal_review_process.htm#Comments ********************************************** Library Synopsis ********************************************** The Iostreams Library serves two main purposes: * To allow the easy creation of standard C++ stream and stream buffer classes for new data sources and sinks. * To provide a convenient interface for defining i/o filters and attaching them to standard streams and stream buffers. The library focuses on freeing users from writing boiler plate code and allowing them instead to create highly reusable components. In addition to providing an abstract framework the library provides a number of concrete filters, sources and sinks which serve as example applications of the library but are also useful in their own right. These include components for accessing memory-mapped files, for file access via operating system file descriptors, for code conversion, for text filtering with regular expressions, for line-ending conversion and for compression and decompression in the zlib, gzip and bzip2 formats. The latest package can be found at the following locations: http://home.comcast.net/~jturkanis/iostreams/ Thanks, Jeff

Show replies by date

Daryle Walker

30 Aug 30 Aug

9:11 a.m.

On 8/28/04 8:09 PM, "Jeff Garland" <jeff@crystalclearsoftware.com> wrote:

...

Today (August 28th, 2004) is the start of the formal review of the Iostreams library by Jonathan Turkanis. I will be serving as review manager. Note that this is a somewhat unusual situation in that we have several libraries that overlap in the same area, so comments related to the MoreIo overlap are needed.

As usual, please state in review comments how you reviewed the library and whether the you think the library should be accepted into Boost. Further guidelines for writing reviews can be found on the website at:

http://www.boost.org/more/formal_review_process.htm#Comments

********************************************** Library Synopsis ********************************************** The Iostreams Library serves two main purposes:

* To allow the easy creation of standard C++ stream and stream buffer classes for new data sources and sinks. * To provide a convenient interface for defining i/o filters and attaching them to standard streams and stream buffers.

The library focuses on freeing users from writing boiler plate code and allowing them instead to create highly reusable components.

In addition to providing an abstract framework the library provides a number of concrete filters, sources and sinks which serve as example applications of the library but are also useful in their own right. These include components for accessing memory-mapped files, for file access via operating system file descriptors, for code conversion, for text filtering with regular expressions, for line-ending conversion and for compression and decompression in the zlib, gzip and bzip2 formats.

The latest package can be found at the following locations:

http://home.comcast.net/~jturkanis/iostreams/

Some concerns over what qualifies: 1. Aren't memory-mapped files and file descriptors highly platform specific? Code that works with them would have to be non-portable, so I don't think they're appropriate for Boost. 2. This library does what a lot of other text-I/O libraries do, try to fit in "kewl" compression schemes. The problem is that the types of compression here are binary oriented; they convert between sets of byte streams. However, characters are not bytes (although characters, like other types, are stored as bytes). There are issues of character-to-byte-sequence conversion. These issues, and binary I/O itself, should _not_ be snuck in through a text-I/O component. Worse, we have an upcoming library (Serialization) that also has to deal with binary I/O stuff, so we should have a binary I/O strategy. That means that the compression stuff should be skipped for now. (If someone says that this library is supposed to be text and/or binary I/O, that's even worse! The two types should be distinct and not just glommed together.) Now to the review itself: 1. Actual code using this library is very slick and easy to set up. This ease of use/set-up also applies to the plug-in filters and/or resources. The end-user experience is so awesome that we could just say "ship it" and go home. However, the experience is the only good point. That ease comes with a nasty little price tag, or more accurately, a nasty BIG price tag. The core question on keeping this library is: Besides filtering/chaining, is there any part of the interface that isn't already covered by the standard stream (buffer) system? The whole framework seems like "I/O done 'right'", a "better" implementation of the ideas/concepts shown in the standard I/O framework. The price is a code size many times larger than the conventional system, and a large chunk of it is a "poor man's" reflection system. Yes, filtering is very important. Maybe Jonathan should have just added a filtering stream-buffer base class (that chains together). That, combined with the existing I/O framework, using custom stream-buffer classes for new initial sources/sinks, should give an equivalent to Iostreams' framework. 1a. An issue that appeared during the previous review (my More-I/O library) was the necessity of the stream base classes that could wrap custom stream-buffer classes. The complaint was that the wrappers couldn't be a 100% solution because they can't forward constructors, so a final derived class has to be made the manually carries over the constructors. Well, that is true, mainly because C++ generally doesn't provide any member forwarding (besides in a limited way with inheritance). The sample stream-buffer in More-I/O generally had added-value member functions attached, that perform inspection or (limited) reconfiguration. Those member functions also have to be manually carried over to the final derived stream class. The point (after all this exposition) is that the More-I/O framework at least acknowledges support for value-added member functions. The Iostreams framework seems to totally ignore the issue! (So we could have a 90% solution for a quarter of the work, but triple the #included code length!) 2. Another issue that appeared during the More-I/O review was that plug-ins for Iostreams could be used for other ultimate sources/sinks unrelated to standard streams. What would those be? If a system in question supports Standard C++, it probably uses the Standard I/O framework for (text) I/O. If the system doesn't/can't use Standard I/O, it has to use a custom I/O framework. To use it with Iostreams framework plug-ins, an adapter would have to be made for the custom I/O routines. In that case, an adapter could be made for the Standard I/O framework instead. (As I said in the prelude, I'm skipping over binary-I/O concerns.) To sum it up: I would REJECT the library (at least for now). Maybe chain-filtering stream-buffer and stream base classes could be made instead? -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com

Jonathan Turkanis

4:01 p.m.

...

On 8/28/04 8:09 PM, "Jeff Garland" <jeff@crystalclearsoftware.com> wrote:

...
Today (August 28th, 2004) is the start of the formal review of the Iostreams library by Jonathan Turkanis. I will be serving as review manager. Note

"Daryle Walker" <darylew@hotmail.com> wrote: that

...

...
this is a somewhat unusual situation in that we have several libraries that overlap in the same area, so comments related to the MoreIo overlap are needed.

http://home.comcast.net/~jturkanis/iostreams/

Thanks for the review!

...

Some concerns over what qualifies:

1. Aren't memory-mapped files and file descriptors highly platform specific?

Yes, just like threads, sockets and directory iteration.

...

Code that works with them would have to be non-portable, so I don't think they're appropriate for Boost.

It achieves portability the same way boost.thread and boost.filesystem do: by having separate implementations for different systems. See http://www.boost.org/more/imp_vars.htm ("Implementation variations").

...

2. This library does what a lot of other text-I/O libraries do, try to fit in "kewl" compression schemes. The problem is that the types of compression here are binary oriented; they convert between sets of byte streams. However, characters are not bytes (although characters, like other types, are stored as bytes).

Are you saying there are problems with the implementation of the compression filters, e.g., that they make unwarranted assumptions about 'char'? If so, please let me know. I'm sure it can be fixed.

...

There are issues of character-to-byte-sequence conversion. These issues, and binary I/O itself, should _not_ be snuck in through a text-I/O component. Worse, we have an upcoming library (Serialization) that also has to deal with binary I/O stuff, so we should have a binary I/O strategy. That means that the compression stuff should be skipped for now.

...

(If someone says that this library is supposed to be text and/or binary I/O, that's even worse! The two types should be distinct and not just glommed together.)

I don't see the iostream framework as relating to text streams only: streams can handle text and binary. In some cases, you want text and binary to work together. E.g., suppose you have a compressed text file ("essay.z") and you want to read a 'family-friendly' version of it. You can do so as follows: filtering_istream in; in.push(regex_filter(regex("damn"), "darn")); in.push(zlib_decompressor()); in.push(file_source("essay.z")); // read from in. Isn't this perfectly natural and convenient? What's wrong with using the decompressor and the regex filter in the same chain?

...

Now to the review itself:

I thought it had already started ;-)

...

1. Actual code using this library is very slick and easy to set up. This ease of use/set-up also applies to the plug-in filters and/or resources.

Thanks.

...

The end-user experience is so awesome that we could just say "ship it" and go home. However, the experience is the only good point. That ease comes with a nasty little price tag, or more accurately, a nasty BIG price tag.

The core question on keeping this library is:

Besides filtering/chaining, is there any part of the interface that isn't already covered by the standard stream (buffer) system?

Can I rephrase this as follows: InputFilters and OutputFilters are a useful addition to the standard library, but Sources and Sinks just duplicate functionality alread present? If this is not your point please correct me. There are two main resons to write Sources and Sinks instead of stream buffers: 1. Sources and Sinks and sinks express just the core functionality of a component. Usually you have to implement just one or two functions with very natural interfaces. You don't have to worry about buffering or about putting back characters. I would have thought it would be obvious that it's easier to write: template<typename Ch> struct null_buf { typedef Ch char_type; typedef sink_tag category; void write(const Ch*, std::streamsize) { } }; than to write your null_buf, which is 79 lines long. 2. Sources and sinks can be reused in cases where standard streams and stream buffers are either unnecessary or are not the appropriate abstraction. For example, suppose you want to write the concatenation of three files to a string. You can do so like this: string s; boost::io::copy( concatenate( file_source("file1"), file_source("file2"), file_source("file3") ), back_insert_resource(s) ); This is IMO clearer and possibly much more efficient than: string s; ofstream file1("file1"); ofstream file2("file2"); ofstream file3("file3"); ostringstream out; out << file1.rdbuf(); out << file2.rdbuf(); out << file3.rdbuf(); s = out.str(); (Note: concatenate is unimplemented, pending resolution of some open issues such as the return type of 'write'). Another example, for the future, might be resources for asynchronous and multiplexed i/o. (See 'Future Directions', http://tinyurl.com/6r8p2.)

...

The whole framework seems like "I/O done 'right'", a "better" implementation of the ideas/concepts shown in the standard I/O framework.

I'd say thanks here if 'right' and 'better' weren't in quotes ;-)

...

The price is a code size many times larger than the conventional system,

Are you talking about the size of the libray or the size of the generated code? Most of the filter and resource infrastructure, as found in headers such as <boost/io/io_traits.hpp>, <boost/io/operations.hpp> and <boost/io/detail/resource_adapter.hpp>, should compile down to nothing with optimizations enabled. A typical instantation of streambuf_facade should be only slightly larger, if at all, than a typical hand-written stream buffer. The main difference is that more protected virtual function may be implemented than are actually needed. Even this can be fixed, if necessary; at this point it seems like a premature optimization, unless you have some data.

...

and a large chunk of it is a "poor man's" reflection system.

Do you mean the i/o categories? This follows the example of the standard library and the boost iterator library. It's better than reflection, since you can't get accidental conformance.

...

1a. An issue that appeared during the previous review (my More-I/O library) was the necessity of the stream base classes that could wrap custom stream-buffer classes. The complaint was that the wrappers couldn't be a 100% solution because they can't forward constructors, so a final derived class has to be made the manually carries over the constructors. Well, that is true, mainly because C++ generally doesn't provide any member forwarding (besides in a limited way with inheritance). The sample stream-buffer in More-I/O generally had added-value member functions attached, that perform inspection or (limited) reconfiguration. Those member functions also have to be manually carried over to the final derived stream class. The point (after all this exposition) is that the More-I/O framework at least acknowledges support for value-added member functions. The Iostreams framework seems to totally ignore the issue! (So we could have a 90% solution for a quarter of the work, but triple the #included code length!)

With a streambuf_facade or stream_facade you can access the underlying resource directly using operators * and ->. E.g., stream_facade<tcp_resource> tcp("www.microsoft.com", 80); ... if (tcp->input_closed()) { ... } Maybe I should stress this more in the documentation. (I imagine some people won't like the use of operators * and -> here, but these can be replaced by a member functions such as resource().)

...

2. Another issue that appeared during the More-I/O review was that plug-ins for Iostreams could be used for other ultimate sources/sinks unrelated to standard streams. What would those be? If a system in question supports Standard C++, it probably uses the Standard I/O framework for (text) I/O. If the system doesn't/can't use Standard I/O, it has to use a custom I/O framework. To use it with Iostreams framework plug-ins, an adapter would have to be made for the custom I/O routines. In that case, an adapter could be made for the Standard I/O framework instead. (As I said in the prelude, I'm skipping over binary-I/O concerns.)

I don't follow this.

...

To sum it up:

I would REJECT the library (at least for now). Maybe chain-filtering stream-buffer and stream base classes could be made instead?

Sorry to hear that -- I hope I can change your mind. Jonathan

Jonathan Turkanis

4:23 p.m.

"Jonathan Turkanis" <technews@kangaroologic.com> wrote:

...

template<typename Ch> struct null_buf {

should be null_sink ^^^^

...

typedef Ch char_type; typedef sink_tag category; void write(const Ch*, std::streamsize) { } };

than to write your null_buf, which is 79 lines long.

Jonathan

Daryle Walker

5 Sep 5 Sep

11 p.m.

On 8/30/04 12:01 PM, "Jonathan Turkanis" <technews@kangaroologic.com> wrote:

...

"Daryle Walker" <darylew@hotmail.com> wrote:

...
On 8/28/04 8:09 PM, "Jeff Garland" <jeff@crystalclearsoftware.com> wrote:

...
Today (August 28th, 2004) is the start of the formal review of the Iostreams library by Jonathan Turkanis. I will be serving as review manager. Note that this is a somewhat unusual situation in that we have several libraries that overlap in the same area, so comments related to the MoreIo overlap are needed.

http://home.comcast.net/~jturkanis/iostreams/

Thanks for the review!

...
Some concerns over what qualifies:

1. Aren't memory-mapped files and file descriptors highly platform specific?

Yes, just like threads, sockets and directory iteration.

...
Code that works with them would have to be non-portable, so I don't think they're appropriate for Boost.

It achieves portability the same way boost.thread and boost.filesystem do: by having separate implementations for different systems. See http://www.boost.org/more/imp_vars.htm ("Implementation variations").

But for the thread and file-system libraries, we can define default behavior. Thread-less environments act as if no spare threads can be allocated. All file-systems can simulate a tree/container calculus, so a portable interface can be defined. But memory-mapped files and file descriptors are totally meaningless on some environments; what would the code map to in those cases?

...

...
2. This library does what a lot of other text-I/O libraries do, try to fit in "kewl" compression schemes. The problem is that the types of compression here are binary oriented; they convert between sets of byte streams. However, characters are not bytes (although characters, like other types, are stored as bytes).

Are you saying there are problems with the implementation of the compression filters, e.g., that they make unwarranted assumptions about 'char'? If so, please let me know. I'm sure it can be fixed.

I'm complaining that binary I/O should _not_ be treated as a variant of text I/O (which your library assumes). Binary I/O only concerns itself with bytes, which is too low-level for text I/O. There can and should be bridging code, but the concepts of text sources/sinks should be distinct from binary sources/sinks.

...

...
There are issues of character-to-byte-sequence conversion. These issues, and binary I/O itself, should _not_ be snuck in through a text-I/O component. Worse, we have an upcoming library (Serialization) that also has to deal with binary I/O stuff, so we should have a binary I/O strategy. That means that the compression stuff should be skipped for now.

...
(If someone says that this library is supposed to be text and/or binary I/O, that's even worse! The two types should be distinct and not just glommed together.)

I don't see the iostream framework as relating to text streams only: streams can handle text and binary. In some cases, you want text and binary to work together. E.g., suppose you have a compressed text file ("essay.z") and you want to read a 'family-friendly' version of it. You can do so as follows:

filtering_istream in; in.push(regex_filter(regex("damn"), "darn")); in.push(zlib_decompressor()); in.push(file_source("essay.z")); // read from in.

Isn't this perfectly natural and convenient? What's wrong with using the decompressor and the regex filter in the same chain?

By itself, nothing. But these compression schemes only work with bytes, so you have hidden at least one text <-> binary converter in your code. The issue of text vs. binary is too big to be trivially dismissed! If someone introduces a binary-I/O scheme to Boost, we would have to through all this code again (since Zlib, Bzip, and the like would be better to be redone as binary I/O filters).

...

...
Now to the review itself:

I thought it had already started ;-)

...
1. Actual code using this library is very slick and easy to set up. This ease of use/set-up also applies to the plug-in filters and/or resources.

Thanks.

...
The end-user experience is so awesome that we could just say "ship it" and go home. However, the experience is the only good point. That ease comes with a nasty little price tag, or more accurately, a nasty BIG price tag.

The core question on keeping this library is:

Besides filtering/chaining, is there any part of the interface that isn't already covered by the standard stream (buffer) system?

Can I rephrase this as follows: InputFilters and OutputFilters are a useful addition to the standard library, but Sources and Sinks just duplicate functionality alread present? If this is not your point please correct me

Yes, that's my point. I looked through your code, and thought "this is just a rearrangement of what's already in streams and stream-buffers". I got really convinced of this once I saw that you added member functions for locale control. I've recently noticed that even your documentation for the Resource and Filter concepts admit that they're just like certain C++ or C I/O functions.

...

There are two main resons to write Sources and Sinks instead of stream buffers:

1. Sources and Sinks and sinks express just the core functionality of a component. Usually you have to implement just one or two functions with very natural interfaces. You don't have to worry about buffering or about putting back characters. I would have thought it would be obvious that it's easier to write:

template<typename Ch> struct null_buf { typedef Ch char_type; typedef sink_tag category; void write(const Ch*, std::streamsize) { } };

than to write your null_buf, which is 79 lines long.

That really misleading. The null-sink I have does a lot more. I keep track of how many characters passed through (i.e. a value-added function), and I optimize for single vs. multiple character output. Also, I'm verbose in my writing style. If I wanted to be compact I could just do: //======================================================================== template < typename Ch, class Tr = std::char_traits<Ch> > class basic_nullbuf : public std::basic_streambuf<Ch, Tr> { protected: // Overriden virtual functions virtual int_type overflow( int_type c = traits_type::eof() ) { return traits_type::not_eof( c ); } }; //======================================================================== That's a lot less than 79 lines. And for those of you who think that "traits_type" is scary: get over it! Using the obvious substitutes of "==", "<", "(int)", etc. is just sloppy and WRONG. The whole point of the traits class is so that a character type isn't forced to define those operators. Worse, those operators could exist but be inappropriate. For example, Josuttis' STL book has a string type that implements case-insensitive comparisons with a custom traits type. Using operator== directly would have missed that. Ignoring the policies of the traits type's creator could betray his/her vision of usage.

...

2. Sources and sinks can be reused in cases where standard streams and stream buffers are either unnecessary or are not the appropriate abstraction. For example, suppose you want to write the concatenation of three files to a string. You can do so like this:

string s; boost::io::copy( concatenate( file_source("file1"), file_source("file2"), file_source("file3") ), back_insert_resource(s) );

This is IMO clearer and possibly much more efficient than:

string s; ofstream file1("file1"); ofstream file2("file2"); ofstream file3("file3"); ostringstream out; out << file1.rdbuf(); out << file2.rdbuf(); out << file3.rdbuf(); s = out.str();

(Note: concatenate is unimplemented, pending resolution of some open issues such as the return type of 'write').

A straw-man? Wouldn't an iterator-based solution have been better? (There are stream(-buffer) iterators, and (string) insert iterators. If the Boost iterator library provides a chaining iterator type, then the standard copying procedure could be used.)

...

Another example, for the future, might be resources for asynchronous and multiplexed i/o. (See 'Future Directions', http://tinyurl.com/6r8p2.)

We can deal with that later. For now, what's the advantage of the library over synchronous single-plexed I/O?

...

...
The whole framework seems like "I/O done 'right'", a "better" implementation of the ideas/concepts shown in the standard I/O framework.

I'd say thanks here if 'right' and 'better' weren't in quotes ;-)

It looked like you changed the interface just to change the interface, not out of any actual need. What about the following (untested) code: //======================================================================== template < typename Ch = char > class source_streambuf : public virtual std::basic_streambuf<Ch> { protected: // Lifetime management source_streambuf() : use_buf_( false ), any_more_( true ) {} // Overriden virtual functions virtual std::streamsize xsgetn( char_type *s, std::streamsize n ) { if ( this->any_more_ ) { std::streamsize const nn = this->jon_read( s, n ); this->any_more_ = ( nn >= n ); this->use_buf_ = ( nn >= 1 ); if ( this->use_buf_ ) { traits_type::assign( this->buf_, s[nn - 1] ); } return nn; } else { return 0; } } virtual int_type underflow() { return this->use_buf_ ? traits_type::to_int_type( this->buf_ ) : this->uflow(); } virtual int_type uflow() { char_type c; return ( 1 == this->xsgetn(&c, 1) ) ? traits_type::to_int_type( c ) : traits_type::eof(); } private: // Override this instead of creating a "read" in Jon's library virtual std::streamsize jon_read( Ch *s, std::streamsize n ) = 0; // Member data char_type buf_; bool use_buf_, any_more_; }; template < typename Ch = char > class sink_streambuf : public virtual std::basic_streambuf<Ch> { protected: // Overriden virtual functions virtual std::streamsize xsputn( char_type const *s, std::streamsize n ) { this->jon_write(s, n); return n; } virtual int_type overflow( int_type c = traits_type::eof() ) { if ( !traits_type::eq_int_type(c, traits_type::eof()) ) { char_type const cc = traits_type::to_char_type( c ); this->xsputn( &cc, 1 ); } return traits_type::not_eof( c ); } private: // Override this instead of creating a "write" in Jon's library virtual void jon_write( Ch const *s, std::streamsize n ) = 0; }; template < typename Ch = char > class inoutresource_streambuf : public source_streambuf<Ch> , public sink_streambuf<Ch> { typedef source_streambuf<Ch> i_base_type; typedef sink_streambuf<Ch> o_base_type; public: // Redefine the standard typedef's here due to dual inheritance typedef Ch char_type; typedef std::char_traits<Ch> traits_type; typedef typename traits_type::int_type int_type; typedef typename traits_type::pos_type pos_type; typedef typename traits_type::off_type off_type; protected: // Overriden virtual functions using i_base_type::xsgetn; using i_base_type::underflow; using i_base_type::uflow; using o_base_type::xsputn; using o_base_type::overflow; }; template < typename Ch = char > class seekableresource_streambuf : public inoutresource_streambuf<Ch> { protected: // Overriden virtual functions virtual pos_type seekoff( off_type off, std::ios_base::seekdir way, std::ios_base::openmode which = std::ios_base::in | std::ios_base::out ) { off_type result( -1 ); if ( (std::ios_base::in | std::ios_base::out) == which ) { result = static_cast<off_type>( this->jon_seek( static_cast<std::streamoff>( off ), way) ); } return static_cast<pos_type>( result ); } virtual pos_type seekpos( pos_type sp, std::ios_base::openmode which = std::ios_base::in | std::ios_base::out ) { return this->seekoff( off_type(sp), std::ios_base::beg, which ); } private: // Override this instead of creating a "seek" in Jon's library virtual std::streamoff jon_seek( std::streamoff off, std::ios_base::seekdir way ) = 0; }; //======================================================================== Notes: * The source/sink variants that use a pointer pair could just use something like the Pointer-based streams I provided. * People who feel that the standard interface is "too hard" would simply override the "jon_*" member functions and leave the other re-implementations alone. * You can even ignore the traits type in your implementations of the "jon_*" methods (although I think that could be a bad idea). What disadvantage would these adapter classes like these have over Jon's re-imagining over the stream interface? Jon's higher level code could have been built over something like this (if he really didn't want to optimize an implementation with direct use of stream-buffer methods).

...

...
The price is a code size many times larger than the conventional system,

Are you talking about the size of the libray or the size of the generated code?

The size of the library.

...

Most of the filter and resource infrastructure, as found in headers such as <boost/io/io_traits.hpp>, <boost/io/operations.hpp> and <boost/io/detail/resource_adapter.hpp>, should compile down to nothing with optimizations enabled. A typical instantation of streambuf_facade should be only slightly larger, if at all, than a typical hand-written stream buffer. The main difference is that more protected virtual function may be implemented than are actually needed. Even this can be fixed, if necessary; at this point it seems like a premature optimization, unless you have some data.

...
and a large chunk of it is a "poor man's" reflection system.

Do you mean the i/o categories? This follows the example of the standard library and the boost iterator library. It's better than reflection, since you can't get accidental conformance.

No, I'm talking about the code you used to get the existing standard I/O framework to inter-operate with your framework.

...

...
1a. An issue that appeared during the previous review (my More-I/O library) was the necessity of the stream base classes that could wrap custom stream-buffer classes. The complaint was that the wrappers couldn't be a 100% solution because they can't forward constructors, so a final derived class has to be made the manually carries over the constructors. Well, that is true, mainly because C++ generally doesn't provide any member forwarding (besides in a limited way with inheritance). The sample stream-buffer in More-I/O generally had added-value member functions attached, that perform inspection or (limited) reconfiguration. Those member functions also have to be manually carried over to the final derived stream class. The point (after all this exposition) is that the More-I/O framework at least acknowledges support for value-added member functions. The Iostreams framework seems to totally ignore the issue! (So we could have a 90% solution for a quarter of the work, but triple the #included code length!)

With a streambuf_facade or stream_facade you can access the underlying resource directly using operators * and ->. E.g.,

stream_facade<tcp_resource> tcp("www.microsoft.com", 80); ... if (tcp->input_closed()) { ... }

Maybe I should stress this more in the documentation. (I imagine some people won't like the use of operators * and -> here, but these can be replaced by a member functions such as resource().)

I didn't like the iterator "look" those operations have. Also, is a stream-façade an actual stream? Does it inherit from std::basic_istream and/or std::basic_ostream? If not, I think it should change.

...

...
2. Another issue that appeared during the More-I/O review was that plug-ins for Iostreams could be used for other ultimate sources/sinks unrelated to standard streams. What would those be? If a system in question supports Standard C++, it probably uses the Standard I/O framework for (text) I/O If the system doesn't/can't use Standard I/O, it has to use a custom I/O framework. To use it with Iostreams framework plug-ins, an adapter would have to be made for the custom I/O routines. In that case, an adapter could be made for the Standard I/O framework instead. (As I said in the prelude, I'm skipping over binary-I/O concerns.)

I don't follow this.

1. Are there really any important sources/sinks that can't be put through the existing Standard I/O framework? 2. An existing source/sink, if it wants to work with Standard C++, would work with the standard framework already. 3. If a source/sink doesn't work with either framework, so there would be a choice between making an adapter for the standard framework or making an adapter for your framework, why shouldn't the adapter be made to the standard framework?

...

...
To sum it up:

I would REJECT the library (at least for now). Maybe chain-filtering stream-buffer and stream base classes could be made instead?

Sorry to hear that -- I hope I can change your mind.

You have a potential problem: standard C++ I/O is "too hard" But you got the wrong solution: throw away the standard I/O's legacy and start over from scratch (but include transition code) This is independent of the decisions on memory-mapped files, file descriptors, binary I/O, and filters. Couldn't all of those been implemented around the standard framework? (Possibly starting from something like the simplified abstract base buffers I provided.) I'm burned out from writing this much, so I'll talk about making a version of filters that use the standard framework later. -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com

Jonathan Turkanis

6 Sep 6 Sep

12:58 a.m.

Daryle, I think this discussion is getting overheated. (See, e.g., the long code excerpts containing 'jon_xxx'). If I was a bit harsh in my first comments on your library, I'm sorry. I did vote to include a large percentage of it. On 8/30/04 12:01 PM, "Jonathan Turkanis" <technews@kangaroologic.com> wrote:

...

...
"Daryle Walker" <darylew@hotmail.com> wrote:

...

...
...
1. Aren't memory-mapped files and file descriptors highly platform specific?

Yes, just like threads, sockets and directory iteration.

...
Code that works with them would have to be non-portable, so I don't think they're appropriate for Boost.

It achieves portability the same way boost.thread and boost.filesystem do: by having separate implementations for different systems. See http://www.boost.org/more/imp_vars.htm ("Implementation variations").

But for the thread and file-system libraries, we can define default behavior.

We can do this for memory mapped files as well. Either including the appropriate header could cause a static assertion, or construction of mapped file resources could fail at runtime. Right now I've followed the example of Boost.Filesystem and assumed that every system is either Windows or Posix. This can easily be changed to produce more informative errors. Good point.

...

Thread-less environments act as if no spare threads can be allocated.

That's not the approach of Boost.Thread, IIRC. If thread support is unavailable, you get a preprocessor error (at least on windows.)

...

All file-systems can simulate a tree/container calculus, so a portable interface can be defined.

Again, Boost.Filesystem doesn't do this.

...

But memory-mapped files and file descriptors are totally meaningless on some environments; what would the code map to in those cases?

See above.

...

...
...
2. This library does what a lot of other text-I/O libraries do, try to fit in "kewl" compression schemes. The problem is that the types of

compression

...

...
...
here are binary oriented; they convert between sets of byte streams. However, characters are not bytes (although characters, like other types, are stored as bytes).

Are you saying there are problems with the implementation of the compression filters, e.g., that they make unwarranted assumptions about 'char'? If so, please let me know. I'm sure it can be fixed.

I'm complaining that binary I/O should _not_ be treated as a variant of text I/O (which your library assumes).

All I/O is treated as streams of characters. When these streams of characters require special 'textual' interpretation, you can use a newline_filter, for line-ending conversion, or a converter, for code conversion.

...

Binary I/O only concerns itself with bytes, which is too low-level for text I/O. There can and should be bridging code, but the concepts of text sources/sinks should be distinct from binary sources/sinks.

This just doubles the number of concepts, for little gain.

...

...
I don't see the iostream framework as relating to text streams only: streams can handle text and binary. In some cases, you want text and binary to work together. E.g., suppose you have a compressed text file ("essay.z") and you want to read a 'family-friendly' version of it. You can do so as follows:

filtering_istream in; in.push(regex_filter(regex("damn"), "darn")); in.push(zlib_decompressor()); in.push(file_source("essay.z")); // read from in.

Isn't this perfectly natural and convenient? What's wrong with using the decompressor and the regex filter in the same chain?

By itself, nothing. But these compression schemes only work with bytes, so you have hidden at least one text <-> binary converter in your code.

(BTW, the file_source above should have been opened in binary mode.) All that's assumed in this example is that the characters in the essay file can be mapped directly to chars. If they can't, one would have to add a layer of code conversion (using converter) after the decompression, and use a wide-character filtering stream and wide-character regex_filter. If the above example were disallowed, then in the common case that output is stored in a form which can be directly mapped to the internal character set without code conversion, the user would be forced to insert a do-nothing adapter. The current library trusts users to know when they are dealing with data which must be converter to a wide character type before it can be processed by text-oriented filters.

...

...
Can I rephrase this as follows: InputFilters and OutputFilters are a useful addition to the standard library, but Sources and Sinks just duplicate functionality alread present? If this is not your point please correct me

Yes, that's my point. I looked through your code, and thought "this is just a rearrangement of what's already in streams and stream-buffers". I got really convinced of this once I saw that you added member functions for locale control.

I found I had to add this, rather late in development, to implement converting streams and stream buffers (which still aren't finished). What's wrong with locales? You say it like it's a dirty word.

...

I've recently noticed that even your documentation for the Resource and Filter concepts admit that they're just like certain C++ or C I/O functions.

You mean when I say, for example, "Filters are class types which define one or more member functions get, put, read, write and seek having interfaces resembling the functions fgetc, fputc, fread, fwrite and fseek from <stdio.h>" ? The functions boost::io::read, boost::io::write, etc., are indeed generic versions of these familiar functions. I mention the familiar functions as a way to introduce readers to the generic versions. The benefits of generic programming are well known, I hope.

...

...
There are two main resons to write Sources and Sinks instead of stream buffers:

1. Sources and Sinks and sinks express just the core functionality of a component. Usually you have to implement just one or two functions with very natural interfaces. You don't have to worry about buffering or about putting back characters. I would have thought it would be obvious that it's easier to write:

template<typename Ch> struct null_buf { typedef Ch char_type; typedef sink_tag category; void write(const Ch*, std::streamsize) { } };

than to write your null_buf, which is 79 lines long.

That really misleading. The null-sink I have does a lot more. I keep track of how many characters passed through (i.e. a value-added function), and I optimize for single vs. multiple character output.

Okay, template<typename Ch> class null_buf { public: typedef Ch char_type; typedef sink_tag category; buf() : count_(0) { } void write(const Ch*, std::streamsize n) { count_ += n} int count() const { return count_; } private: int count_; }; This will lead to a stream buffer which keeps track of how many characters pass through, is optimized for single vs. multiple character output, *and* is buffered by default.

...

Also, I'm verbose in my writing style. If I wanted to be compact I could just do:

//======================================================================== template < typename Ch, class Tr = std::char_traits<Ch> > class basic_nullbuf : public std::basic_streambuf<Ch, Tr> { protected: // Overriden virtual functions virtual int_type overflow( int_type c = traits_type::eof() ) { return traits_type::not_eof( c ); } };

But that doesn't do what my version, listed above, does.

...

And for those of you who think that "traits_type" is scary: get over it! Using the obvious substitutes of "==", "<", "(int)", etc. is just sloppy and WRONG. The whole point of the traits class is so that a character type isn't forced to define those operators. Worse, those operators could exist but be inappropriate. For example, Josuttis' STL book has a string type that implements case-insensitive comparisons with a custom traits type. Using operator== directly would have missed that. Ignoring the policies of the traits type's creator could betray his/her vision of usage.

In early versions of my library, filters and resources had traits types as well as charatcer types. Prompted by remarks of Gennadiy Rozental, I made a careful study and found that traits could be eliminated from the public interface of the filter/resource module of the library without sacrificing generality or correctness, except in the case of the return type of get, which is still std::char_traits<char_type>::int_type. Even this could be eliminated by having get return optional<char>. For a more ambitious proposal along these lines, see http://tinyurl.com/6r8p2. Of course, filter and resources authors may need to use char_traits to implement member functions read, write, etc. .... But I'm not sure I see where this discussion is going.

...

...
2. Sources and sinks can be reused in cases where standard streams and stream buffers are either unnecessary or are not the appropriate abstraction. For example, suppose you want to write the concatenation of three files to a string. You can do so like this:

string s; boost::io::copy( concatenate( file_source("file1"), file_source("file2"), file_source("file3") ), back_insert_resource(s) );

...

A straw-man? Wouldn't an iterator-based solution have been better? (There are stream(-buffer) iterators, and (string) insert iterators. If the Boost iterator library provides a chaining iterator type, then the standard copying procedure could be used.)

It's tempting to try to do everything using iterators. In fact, Robert Ramey's original suggestion to expand the library to handle filtering suggested that it be based on iterator adapters. (http://lists.boost.org/MailArchives/boost/msg48300.php) The problem with this approach is that it misses the opportunity for many important optimizations that can be made when one is presented with a contiguous buffer full of characters, instead of one character at a time.

...

...
...
The whole framework seems like "I/O done 'right'", a "better" implementation of the ideas/concepts shown in the standard I/O framework.

I'd say thanks here if 'right' and 'better' weren't in quotes ;-)

It looked like you changed the interface just to change the interface, not out of any actual need. What about the following (untested) code:

I'm going to ignore the code, which seems sarcastic. (Don't name stuff after me until I'm dead.) Instead, let me quote part of my response to Dietmar Kuehl: Jonathan Wrote:

...

... The protected virtual interface of basic_streambuf is, IMO, quite strange. The function have wierd names: underflow, uflow, pbackfail, overflow, showmanyc, xsptun, xsgetn, seekoff, etc -- the functions read, write, and seek are much more intuitive. The specifications of the standard functions are tricky, too. For example, overflow (one of the better-named functions), is specified roughly like this:

virtual int_type overflow(int_type c = traits_type::eof());

"If c is not eof, attempts to insert into the output sequence the result of converting c to a character. If this can't be done, returns eof or throws an exception. Otherwise returns any value other than eof."

Contrast this with

void write(const char_type* s, std::streamsize n);

"Writes the sequence of n characters starting at s to the output sequence, throwing an exception in case of error."

What I've tried to do with the library is to factor out the essential functionality necessary to define a stream buffer. I've found that in most cases writing a stream buffer can be reduced to implementing one or two functions with simple names and specifications. It seems like an obvious win to me. <snip lots of code>

...

...
...
The price is a code size many times larger than the conventional system,

Are you talking about the size of the libray or the size of the generated code?

The size of the library.

1. The library is big partly because it contains a lot of special pupose components, such as compression filters. You don't pay for them if you don't use them. 2. The support for the generic read and write operations is quite lightweight. 3. If you use the library just to define new stream buffer types, then in addition to (3) the main code comes from <boost/io/detail/streambufs/indirect_streambuf.hpp>, which is the generic streambuf implementation, and from <boost/io/detail/adapters/resource_adapter.hpp> <boost/io/detail/adapters/filter_adapter.hpp> which are lightweight wrappers that allow indirect_streambuf to interact with filters and resources using a single interface. 4. If you want to chain filters, then in addition to (2) and (3), the main code comes from <boost/io/detail/chain.hpp> which at 16k is a small price to pay for a flexible filtering framework.

...

...
...
and a large chunk of it is a "poor man's" reflection system.

Do you mean the i/o categories? This follows the example of the standard library and the boost iterator library. It's better than reflection, since you can't get accidental conformance.

No, I'm talking about the code you used to get the existing standard I/O framework to inter-operate with your framework.

Specifically?

...

...
...
... The sample stream-buffer in More-I/O generally had added-value member functions attached, that perform inspection or (limited) reconfiguration. Those member functions also have to be manually carried over to the final derived stream class. ... The Iostreams framework seems to totally ignore the issue! ...

...

...
With a streambuf_facade or stream_facade you can access the underlying resource directly using operators * and ->. E.g.,

stream_facade<tcp_resource> tcp("www.microsoft.com", 80); ... if (tcp->input_closed()) { ... }

Maybe I should stress this more in the documentation. (I imagine some people won't like the use of operators * and -> here, but these can be replaced by a member functions such as resource().)

I didn't like the iterator "look" those operations have.

Noted.

...

Also, is a stream-façade an actual stream?

Yes.

...

1. Are there really any important sources/sinks that can't be put through the existing Standard I/O framework?

The standard library handles non-blocking, asynchronous and multiplexed i/o awkwardly at best. In contrast, for a generic i/o framework, adding such support should be fairly straightforward. We just need to introduce the right concepts.

...

2. An existing source/sink, if it wants to work with Standard C++, would work with the standard framework already.

To summarize: an existing source/sink, if it wants to work with the standard framework, already works with the standard framework?

...

You have a potential problem: standard C++ I/O is "too hard" But you got the wrong solution: throw away the standard I/O's legacy and start over from scratch (but include transition code)

I hope it's possible to improve some of the standard library I/O framework in the future. Perhaps experience with the current library will help form the basis for a proposal. But that's not the point of the current library. The point is to make easy what is currently not-so-easy, and to reduce the difficulty of what is currently very difficult.

...

This is independent of the decisions on memory-mapped files, file descriptors, binary I/O, and filters. Couldn't all of those been implemented around the standard framework?

Of couse -- with massive code duplication. Jonathan

Daryle Walker

8 Sep 8 Sep

8:34 a.m.

On 9/5/04 8:58 PM, "Jonathan Turkanis" <technews@kangaroologic.com> wrote:

...

On 8/30/04 12:01 PM, "Jonathan Turkanis" <technews@kangaroologic.com> wrote:

...
...
"Daryle Walker" <darylew@hotmail.com> wrote:

...
...
...
1. Aren't memory-mapped files and file descriptors highly platform specific?

Yes, just like threads, sockets and directory iteration.

...
Code that works with them would have to be non-portable, so I don't think they're appropriate for Boost.

It achieves portability the same way boost.thread and boost.filesystem do: by having separate implementations for different systems. See http://www.boost.org/more/imp_vars.htm ("Implementation variations").

But for the thread and file-system libraries, we can define default behavior.

We can do this for memory mapped files as well. Either including the appropriate header could cause a static assertion, or construction of mapped file resources could fail at runtime. Right now I've followed the example of Boost.Filesystem and assumed that every system is either Windows or Posix This can easily be changed to produce more informative errors. Good point

An object that can never be configured to work (for those deficient platforms) isn't very useful. I know that thread (and rarely file-system) classes have the same potential drawback, but I feel that threads and file systems are more general "computer science concepts" than memory mapped files, and so allowances could be made for the latter class ideas.

...

...
Thread-less environments act as if no spare threads can be allocated.

That's not the approach of Boost.Thread, IIRC. If thread support is unavailable, you get a preprocessor error (at least on windows.)

Maybe that should be considered a bug.

...

...
All file-systems can simulate a tree/container calculus, so a portable interface can be defined.

Again, Boost.Filesystem doesn't do this.

Considering the discussions of issues that Boost.File-system brings up, maybe it should do what I suggested. (Will give more information if a Boost.File-system person asks.)

...

...
But memory-mapped files and file descriptors are totally meaningless on some environments; what would the code map to in those cases?

See above.

...
...
...
2. This library does what a lot of other text-I/O libraries do, try to fit in "kewl" compression schemes. The problem is that the types of compression here are binary oriented; they convert between sets of byte streams. However, characters are not bytes (although characters, like other types, are stored as bytes).

Are you saying there are problems with the implementation of the compression filters, e.g., that they make unwarranted assumptions about 'char'? If so, please let me know. I'm sure it can be fixed.

I'm complaining that binary I/O should _not_ be treated as a variant of text I/O (which your library assumes).

All I/O is treated as streams of characters. When these streams of characters require special 'textual' interpretation, you can use a newline_filter, for line-ending conversion, or a converter, for code conversion.

...
Binary I/O only concerns itself with bytes, which is too low-level for text I/O. There can and should be bridging code, but the concepts of text sources/sinks should be distinct from binary sources/sinks.

This just doubles the number of concepts, for little gain.

Not separating concepts that have notable distinctions is not a service. (That's why a separated regular pointer-based streams from the ones for pointers-to-const in my library. The "savings" in making only one set of class code wasn't worth mixing the semantics of the two stream types.)

...

...
...
I don't see the iostream framework as relating to text streams only: streams can handle text and binary. In some cases, you want text and binary to work together.

This is why I'm concerned about the text vs. binary issues: In (old) C, the "char" type was used to represent character data. It also was used to represent individual bytes. The problem is that C meshed the two concepts together, which I disagree with. Due to this equivalence, some of the text I/O functions were given a "binary mode" that suppresses any text/binary translation. (To muddy the waters further, that translation was a no-op on C's first environment, UNIX.) Later on, C got more power in the character processing department with "wchar_t" and a locale system, but it never ungrouped binary I/O as a "subset" of text I/O. C++ encapsulated I/O in a class, but followed a path similar to C. It was "char" only, then developed "wchar_t" and locale support. Further, the character type was generalized with templates, which also added support for changing the operation policies with a traits class. C and C++ added more inherent features for I/O that were text-based. Binary I/O stayed as a switch away from text I/O because it was "good enough," even though binary I/O doesn't need extended character types, traits types, and locales. (Translating objects to/from byte sequences would take place in a higher layer.) If you're going to start over from scratch with I/O, why not go all the way and finally split-off binary I/O? Stop it from being treated as "text I/O with funny settings".

...

...
...
E.g., suppose you have a compressed text file ("essay.z") and you want to read a 'family-friendly' version of it. You can do so as follows:

filtering_istream in; in.push(regex_filter(regex("damn"), "darn")); in.push(zlib_decompressor()); in.push(file_source("essay.z")); // read from in.

Isn't this perfectly natural and convenient? What's wrong with using the decompressor and the regex filter in the same chain?

By itself, nothing. But these compression schemes only work with bytes, so you have hidden at least one text <-> binary converter in your code.

(BTW, the file_source above should have been opened in binary mode.)

OK.

...

All that's assumed in this example is that the characters in the essay file can be mapped directly to chars. If they can't, one would have to add a layer of code conversion (using converter) after the decompression, and use a wide-character filtering stream and wide-character regex_filter.

That a major implicit assumption.

...

If the above example were disallowed, then in the common case that output is stored in a form which can be directly mapped to the internal character set without code conversion, the user would be forced to insert a do-nothing adapter.

So you're trying to optimize code that takes advantage of the "char" vs. byte "equivalence".

...

The current library trusts users to know when they are dealing with data which must be converter to a wide character type before it can be processed by text-oriented filters.

...
...
Can I rephrase this as follows: InputFilters and OutputFilters are a useful addition to the standard library, but Sources and Sinks just duplicate functionality alread present? If this is not your point please correct me

Yes, that's my point. I looked through your code, and thought "this is just a rearrangement of what's already in streams and stream-buffers". I got really convinced of this once I saw that you added member functions for locale control.

I found I had to add this, rather late in development, to implement converting streams and stream buffers (which still aren't finished). What's wrong with locales? You say it like it's a dirty word.

I have no problems with locales. I was noting that the more features you added to the base classes, the more they looked like the rearrangements of the standard I/O base classes.

...

...
I've recently noticed that even your documentation for the Resource and Filter concepts admit that they're just like certain C++ or C I/O functions.

You mean when I say, for example,

"Filters are class types which define one or more member functions get, put, read, write and seek having interfaces resembling the functions fgetc, fputc, fread, fwrite and fseek from <stdio.h>"

?

Yes. But I was thinking more of the equivalent paragraph you gave in the documentation about Resources.

...

The functions boost::io::read, boost::io::write, etc., are indeed generic versions of these familiar functions. I mention the familiar functions as a way to introduce readers to the generic versions. The benefits of generic programming are well known, I hope.

...
...
There are two main resons to write Sources and Sinks instead of stream buffers:

1. Sources and Sinks and sinks express just the core functionality of a component. Usually you have to implement just one or two functions with very natural interfaces. You don't have to worry about buffering or about putting back characters. I would have thought it would be obvious that it's easier to write:

template<typename Ch> struct null_buf { typedef Ch char_type; typedef sink_tag category; void write(const Ch*, std::streamsize) { } };

than to write your null_buf, which is 79 lines long.

That really misleading. The null-sink I have does a lot more. I keep track of how many characters passed through (i.e. a value-added function), and I optimize for single vs. multiple character output.

Okay,

template<typename Ch> class null_buf { public: typedef Ch char_type; typedef sink_tag category; buf() : count_(0) { } void write(const Ch*, std::streamsize n) { count_ += n} int count() const { return count_; } private: int count_; };

This will lead to a stream buffer which keeps track of how many characters pass through, is optimized for single vs. multiple character output, *and* is buffered by default.

I don't see any buffering. (I guess it'll be in whatever class you hook this up too, like "streambuf_façade".)

...

...
Also, I'm verbose in my writing style. If I wanted to be compact I could just do:

//======================================================================== template < typename Ch, class Tr = std::char_traits<Ch> > class basic_nullbuf : public std::basic_streambuf<Ch, Tr> { protected: // Overriden virtual functions virtual int_type overflow( int_type c = traits_type::eof() ) { return traits_type::not_eof( c ); } };

But that doesn't do what my version, listed above, does.

Which version, the first or second? (Hopefully the first, since I wrote my code above after the first version, and you wrote the second as a response.) If it's the first, then what is my version missing? (If it's the second, then look at the version of the code under my review before comparing.)

...

...
And for those of you who think that "traits_type" is scary: get over it! Using the obvious substitutes of "==", "<", "(int)", etc. is just sloppy and WRONG. The whole point of the traits class is so that a character type isn't forced to define those operators. Worse, those operators could exist but be inappropriate. For example, Josuttis' STL book has a string type that implements case-insensitive comparisons with a custom traits type. Using operator== directly would have missed that. Ignoring the policies of the traits type's creator could betray his/her vision of usage.

In early versions of my library, filters and resources had traits types as well as charatcer types. Prompted by remarks of Gennadiy Rozental, I made a careful study and found that traits could be eliminated from the public interface of the filter/resource module of the library without sacrificing generality or correctness, except in the case of the return type of get, which is still

std::char_traits<char_type>::int_type.

Even this could be eliminated by having get return optional<char>. For a more ambitious proposal along these lines, see http://tinyurl.com/6r8p2.

Of course, filter and resources authors may need to use char_traits to implement member functions read, write, etc. .... But I'm not sure I see where this discussion is going.

The traits type carries the policies for comparing and copying (and EOF issues). Does the user have the option for overriding policies so they're not based on "std::char_traits<Ch>"?

...

...
...
2. Sources and sinks can be reused in cases where standard streams and stream buffers are either unnecessary or are not the appropriate abstraction. For example, suppose you want to write the concatenation of three files to a string. You can do so like this:

string s; boost::io::copy( concatenate( file_source("file1"), file_source("file2"), file_source("file3") ), back_insert_resource(s) );

...
A straw-man? Wouldn't an iterator-based solution have been better? (There are stream(-buffer) iterators, and (string) insert iterators. If the Boost iterator library provides a chaining iterator type, then the standard copying procedure could be used.)

It's tempting to try to do everything using iterators. In fact, Robert Ramey's original suggestion to expand the library to handle filtering suggested that it be based on iterator adapters. (http://lists.boost.org/MailArchives/boost/msg48300.php)

Interesting.

...

The problem with this approach is that it misses the opportunity for many important optimizations that can be made when one is presented with a contiguous buffer full of characters, instead of one character at a time.

...

...
...
...
The whole framework seems like "I/O done 'right'", a "better" implementation of the ideas/concepts shown in the standard I/O framework.

I'd say thanks here if 'right' and 'better' weren't in quotes ;-)

It looked like you changed the interface just to change the interface, not out of any actual need. What about the following (untested) code: [SNIPped class templates derived from std::basic_streambuf<> that contain

OK. pure virtual member functions from Jon's idea of the simplified interface. The current stream-buffer member functions that handle the same issue just forward to the new member function.]

...

Instead, let me quote part of my response to Dietmar Kuehl:

Jonathan Wrote:

...
... The protected virtual interface of basic_streambuf is, IMO, quite strange. The function have wierd names: underflow, uflow, pbackfail, overflow, showmanyc, xsptun, xsgetn, seekoff, etc -- the functions read, write, and seek are much more intuitive. The specifications of the standard functions are tricky, too. For example, overflow (one of the better-named functions), is specified roughly like this:

virtual int_type overflow(int_type c = traits_type::eof());

"If c is not eof, attempts to insert into the output sequence the result of converting c to a character. If this can't be done, returns eof or throws an exception. Otherwise returns any value other than eof."

(BTW, notice that the public members of "basic_streambuf" that may call "overflow" can't call it with EOF. I'm guess that using EOF means that "overflow" should do the output-specific flushing. That code should not be directly written in the "sync" member function [as it's usually done 99% of the time]; "sync" should instead call "overflow(EOF)" and also do any input-specific flushing.)

...

...
Contrast this with

void write(const char_type* s, std::streamsize n);

"Writes the sequence of n characters starting at s to the output sequence, throwing an exception in case of error."

What I've tried to do with the library is to factor out the essential functionality necessary to define a stream buffer. I've found that in most cases writing a stream buffer can be reduced to implementing one or two functions with simple names and specifications. It seems like an obvious win to me.

But is it always worth the extra layer of indirection you introduce (when you need to interface with standard-looking I/O)? [SNIP concerns about total code size (in terms of header text length)]

...

...
...
...
and a large chunk of it is a "poor man's" reflection system.

Do you mean the i/o categories? This follows the example of the standard library and the boost iterator library. It's better than reflection, since you can't get accidental conformance.

No, I'm talking about the code you used to get the existing standard I/O framework to inter-operate with your framework.

Specifically?

Just the large amount of "detail"-level headers. [SNIP about forwarding to the base-stream's value-added functions and on the nature of the stream facades.]

...

...
1. Are there really any important sources/sinks that can't be put through the existing Standard I/O framework?

The standard library handles non-blocking, asynchronous and multiplexed i/o awkwardly at best. In contrast, for a generic i/o framework, adding such support should be fairly straightforward. We just need to introduce the right concepts.

Whoa. I just had my "a-ha" moment. I thought you re-did the interface for streaming concepts just to be arbitrary. But you actually did it because you have issues about the architectural philosophy used by the standard I/O framework, right?! You want to fix the problems with current streaming with re-imagining the architecture (i.e. starting from scratch), and you decided to re-do the interface to match. I guess one issue is that you're extending functionality through templates, while the standard framework uses virtual member functions.

...

...
2. An existing source/sink, if it wants to work with Standard C++, would work with the standard framework already.

To summarize: an existing source/sink, if it wants to work with the standard framework, already works with the standard framework?

I meant that existing libraries would have already chosen to base their I/O around the standard framework, if they had no need to customize the I/O experience.

...

...
You have a potential problem: standard C++ I/O is "too hard" But you got the wrong solution: throw away the standard I/O's legacy and start over from scratch (but include transition code)

I hope it's possible to improve some of the standard library I/O framework in the future. Perhaps experience with the current library will help form the basis for a proposal. But that's not the point of the current library. The point is to make easy what is currently not-so-easy, and to reduce the difficulty of what is currently very difficult.

I gave an example (the code you snipped) of how the simplified core interface could be integrated with the standard framework. What are the other difficulties?

...

...
This is independent of the decisions on memory-mapped files, file descriptors, binary I/O, and filters. Couldn't all of those been implemented around the standard framework?

Of couse -- with massive code duplication.

Duplication where? (My question above assumed that your new architecture never existed and you built your other stuff around the standard framework.) **************** About the Overlap Between Our Contributions A bunch of people during my I/O review wanted to defer decisions to see your I/O review. I'm not sure that there's a need to pick one-or-the-other due to how they work. I had no intention of redoing the concepts of I/O, so all my sources and sinks extend the standard framework. You built a whole new framework, hopefully to address problems with the standard framework. You build the your sources and sinks to work with your framework. And you added adaptors so the new-I/O classes can work with std-I/O classes. There's no problems with efficiency if new-I/O is used through-out the user's code, since you use a lot of template goodness. However, if the user needs to interface with std-I/O, at the user end or the final destination end, they will have to take a performance hit since std-I/O will call virtual functions which you can't remove. (The guy who writes the "xpressive" library seems to have techniques around the problem, but I'm not sure they can be applied here. [I don't know what the techniques are.] The std-I/O virtual call dispatch takes place in the standard stream classes, so the "xpressive" technique can't work if code changes are needed.) In these mixed cases, using the new framework can be a win if the applied task takes more time in the new framework than in the adaptor code. If the task at hand has a std-I/O interface, doesn't touch the issues that new-I/O was meant to solve, and can be succinctly expressed with std-I/O, then there is no advantage to making and/or using a new-I/O version, since the layer of indirection given by the adaptor class is the bigger bottleneck. (The pointer-based streams are an example of this.) The point is that one set of class doesn't preclude the usage of the other. Each one has situations where it's the better solution. -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com

Jonathan Turkanis

4:47 p.m.

"Daryle Walker" <darylew@hotmail.com> wrote:

...

On 9/5/04 8:58 PM, "Jonathan Turkanis" <technews@kangaroologic.com> wrote:

...
On 8/30/04 12:01 PM, "Jonathan Turkanis" <technews@kangaroologic.com> wrote:

...
...
"Daryle Walker" <darylew@hotmail.com> wrote:

...

...
...
...
...
1. Aren't memory-mapped files and file descriptors highly platform specific?

...

...
...
But for the thread and file-system libraries, we can define default behavior.

We can do this for memory mapped files as well. Either including the appropriate header could cause a static assertion, or construction of mapped file resources could fail at runtime. Right now I've followed the example of Boost.Filesystem and assumed that every system is either Windows or Posix This can easily be changed to produce more informative errors. Good point

An object that can never be configured to work (for those deficient platforms) isn't very useful.

On those platforms, yes. On supported platforms, it is can be very useful.

...

I know that thread (and rarely file-system) classes have the same potential drawback, but I feel that threads and file systems are more general "computer science concepts" than memory mapped files, and so allowances could be made for the latter class ideas.

Threads and filesystem support are good additions to boost (and would be to the standard) because they are useful, not because they are general "computer science concepts".

...

...
...
Thread-less environments act as if no spare threads can be allocated.

That's not the approach of Boost.Thread, IIRC. If thread support is unavailable, you get a preprocessor error (at least on windows.)

Maybe that should be considered a bug.

It's useful in contexts where thread support can be turned on or off with a command-line switch. It's probably a bad approach on systems which don't support threads at all.

...

...
...
Binary I/O only concerns itself with bytes, which is too low-level for text I/O. There can and should be bridging code, but the concepts of text sources/sinks should be distinct from binary sources/sinks.

This just doubles the number of concepts, for little gain.

Not separating concepts that have notable distinctions is not a service. (That's why a separated regular pointer-based streams from the ones for pointers-to-const in my library. The "savings" in making only one set of class code wasn't worth mixing the semantics of the two stream types.)

What's wrong with this analogy: Saying that a sequence of characters represents 'text' is like saying that a sequence of characters represents a 'picture' (i.e., that it conforms to some image file format specification, such as jpeg, png, etc.) In order to interpret the data properly, the user must know something about its internal structure, and must in general apply an additional layer of software for the content to be usable. In the case of a sequence of characters representing Chinese text, the user must apply code conversion to produce a wide character representation. In the case of a sequence of characters representing a jpeg image, the user must apply a jpeg interpretter to produce an object representing the image size, pixel data. etc. In the first case, it would be naive to expect that sending the raw character sequence to std::cout will print Chinese characters to the console. In the second case, it would be naive to expect that sending the raw character sequence to std::cout will display a jpeg image on the console. So, do we need another family of resource concepts for 'pictures'? <snip history of C and C++ text/binary distinction>

...

If you're going to start over from scratch with I/O, why not go all the way and finally split-off binary I/O? Stop it from being treated as "text I/O with funny settings".

I'm not starting from scratch. I'm trying to make it easier to use the existing framework. (In the future, the library may be extended beyond the existing framework.)

...

...
...
...
filtering_istream in; in.push(regex_filter(regex("damn"), "darn")); in.push(zlib_decompressor()); in.push(file_source("essay.z")); // read from in.

...

...
All that's assumed in this example is that the characters in the essay file can be mapped directly to chars. If they can't, one would have to add a layer of code conversion (using converter) after the decompression, and use a wide-character filtering stream and wide-character regex_filter.

...

That a major implicit assumption.

It's not fundamentally different from the assumption that a sequence of characters conatins a gif image. filtering_istream in; in.push(gif_to_jpeg()) in.push(file_source("pony.gif")); // read jpeg data from in. Trust the programmer.

...

...
...
...
Can I rephrase this as follows: InputFilters and OutputFilters are a useful addition to the standard library, but Sources and Sinks just duplicate functionality alread present? If this is not your point please correct me

Yes, that's my point. I looked through your code, and thought "this is just a rearrangement of what's already in streams and stream-buffers". I got really convinced of this once I saw that you added member functions for locale control.

I found I had to add this, rather late in development, to implement converting streams and stream buffers (which still aren't finished). What's wrong with locales? You say it like it's a dirty word.

I have no problems with locales. I was noting that the more features you added to the base classes, the more they looked like the rearrangements of the standard I/O base classes.

Localizability is an optional behavior. Most filters and resources won't implement it. Filters and resources *do not* have to derive from the convenience base classes source, sink, input_filter, etc. Since localizability was so easy to add as a no-op, I gave these base classes no-op implementations of imbue and i/o categories refining localizable_tag. Programmers will rarely use this feature, but it imposes no runtime overhead and very little compile-time overhead, so I don't see any problem.

...

...
...
I've recently noticed that even your documentation for the Resource and Filter concepts admit that they're just like certain C++ or C I/O functions.

You mean when I say, for example,

"Filters are class types which define one or more member functions get, put, read, write and seek having interfaces resembling the functions fgetc, fputc, fread, fwrite and fseek from <stdio.h>"

?

Yes. But I was thinking more of the equivalent paragraph you gave in the documentation about Resources.

I think I need to change this part of the documentation. Unlike fread, etc, the basic_streambuf member functions can't be assumed to be familiar to most programmers. I should probably use istream::read, istream::write, etc. The reason I didn't is that these functions don't have the right return types, which is not a good reason since neither does streambuf::sputn.

...

...
template<typename Ch> class null_buf { public: typedef Ch char_type; typedef sink_tag category; buf() : count_(0) { } void write(const Ch*, std::streamsize n) { count_ += n} int count() const { return count_; } private: int count_; };

This will lead to a stream buffer which keeps track of how many characters pass through, is optimized for single vs. multiple character output, *and* is buffered by default.

I don't see any buffering. (I guess it'll be in whatever class you hook this up too, like "streambuf_façade".)

Right.

...

Which version, the first or second?

The second.

...

(Hopefully the first, since I wrote my code above after the first version, and you wrote the second as a response.) If it's the first, then what is my version missing? (If it's the second, then look at the version of the code under my review before comparing.)

I did. That's how I knew it was 79 lines long. It doesn't provide buffering, as far as I can tell.

...

The traits type carries the policies for comparing and copying (and EOF issues). Does the user have the option for overriding policies so they're not based on "std::char_traits<Ch>"?

As I said, the only place character traits are used in the public interface of filters and resources is in the return type of get. For this purpose, std::char_traits<Ch>::int_type should always be sufficient. At any rate, I'm considering changing it either to optional<char> or to a class type that can store a char, and eof indicator, or a 'no input available -- try back later' indicator. Then there would be absolutely no use of character traits. If you want to define a stream_facade with a custom char_traits type, you can do so using the second template parameter. template< typename T, typename Tr = ... typename Alloc = ... >, typename Mode = ... > class streambuf_facade;

...

...
What I've tried to do with the library is to factor out the essential functionality necessary to define a stream buffer. I've found that in most cases writing a stream buffer can be reduced to implementing one or two functions with simple names and specifications. It seems like an obvious win to me.

But is it always worth the extra layer of indirection you introduce (when you need to interface with standard-looking I/O)?

The indirection, mostly contained in <boost/io/operations.hpp>, is fairly lightweight. Users never need to look at it. I'm not sure why you're so concerned about it.

...

[SNIP concerns about total code size (in terms of header text length)]

...
...
...
...
and a large chunk of it is a "poor man's" reflection system.

Do you mean the i/o categories? This follows the example of the standard library and the boost iterator library. It's better than reflection, since

...

...
...
...
you can't get accidental conformance.

No, I'm talking about the code you used to get the existing standard I/O framework to inter-operate with your framework.

Specifically?

Just the large amount of "detail"-level headers.

Fairly typical for boost, I'm afraid.

...

[SNIP about forwarding to the base-stream's value-added functions and on the nature of the stream facades.]

...
...
1. Are there really any important sources/sinks that can't be put through the existing Standard I/O framework?

The standard library handles non-blocking, asynchronous and multiplexed i/o awkwardly at best. In contrast, for a generic i/o framework, adding such support should be fairly straightforward. We just need to introduce the right concepts.

Whoa.

I just had my "a-ha" moment.

I thought you re-did the interface for streaming concepts just to be arbitrary. But you actually did it because you have issues about the architectural philosophy used by the standard I/O framework, right?! You want to fix the problems with current streaming with re-imagining the architecture (i.e. starting from scratch), and you decided to re-do the interface to match.

As I said above, I don't think I'm redoing it from scratch -- I'm just generalizing a little. Later, I might generalize even more.

...

I guess one issue is that you're extending functionality through templates, while the standard framework uses virtual member functions.

I don't think virtual functions are an issue. Virtual function calls are only slightly more expensive that ordinary (non-inlined) function calls, and one can't expect all function calls to be inlined when you have a chain of non-trivial filters. One must relying on buffering to mitigate the function call overhead. Since the static types of the filtering streams and stream buffers do not depend on the static types of the filters and resources in the underlying chain, some type of runtime indirection, such as virtual functions, is required. I'm actually taking advantage of the streambuf virtual functions as a feature -- not a liability. If I didn't have basic_streambuf to serve as the 'glue' for filter chains, I'd have to write my own version, probably using virtual functions.

...

...
...
2. An existing source/sink, if it wants to work with Standard C++, would work with the standard framework already.

To summarize: an existing source/sink, if it wants to work with the standard framework, already works with the standard framework?

I meant that existing libraries would have already chosen to base their I/O around the standard framework, if they had no need to customize the I/O experience.

If the library is accepted -- and becomes widely used -- I except that developers will want to write sources and sinks instead of stream buffers. Existing stream buffers can be rewritten as source or sinks fairly easily in many cases.

...

...
...
You have a potential problem: standard C++ I/O is "too hard" But you got the wrong solution: throw away the standard I/O's legacy and start over from scratch (but include transition code)

I hope it's possible to improve some of the standard library I/O framework in the future. Perhaps experience with the current library will help form the basis for a proposal. But that's not the point of the current library. The point is to make easy what is currently not-so-easy, and to reduce the difficulty of what is currently very difficult.

I gave an example (the code you snipped) of how the simplified core interface could be integrated with the standard framework. What are the other difficulties?

I don't understand what's wrong with the way I've done it.

...

...
...
This is independent of the decisions on memory-mapped files, file descriptors, binary I/O, and filters. Couldn't all of those been implemented around the standard framework?

Of couse -- with massive code duplication.

Duplication where? (My question above assumed that your new architecture never existed and you built your other stuff around the standard framework.)

Right. A lot of typcial stream buffer implemention is boilerplate, esp. if buffering is used.

...

About the Overlap Between Our Contributions

A bunch of people during my I/O review wanted to defer decisions to see your I/O review. I'm not sure that there's a need to pick one-or-the-other due to how they work.

The review managers will sort this out.

...

I had no intention of redoing the concepts of I/O, so all my sources and sinks extend the standard framework.

You built a whole new framework, hopefully to address problems with the standard framework.

Again, I just wanted to make the standard framework easier to use.

...

You build the your sources and sinks to work with your framework. And you added adaptors so the new-I/O classes can work with std-I/O classes.

It's really the other way around. And the adapters are so thin you could crush them just be leaning against them ;-)

...

There's no problems with efficiency if new-I/O is used through-out the user's code, since you use a lot of template goodness. However, if the user needs to interface with std-I/O, at the user end or the final destination end, they will have to take a performance hit since std-I/O will call virtual functions which you can't remove. (The guy who writes the "xpressive" library seems to have techniques around the problem, but I'm not sure they can be applied here. [I don't know what the techniques are.] The std-I/O virtual call dispatch takes place in the standard stream classes, so the "xpressive" technique can't work if code changes are needed.) In these mixed cases, using the new framework can be a win if the applied task takes more time in the new framework than in the adaptor code. If the task at hand has a std-I/O interface, doesn't touch the issues that new-I/O was meant to solve, and can be succinctly expressed with std-I/O, then there is no advantage to making and/or using a new-I/O version, since the layer of indirection given by the adaptor class is the bigger bottleneck. (The pointer-based streams are an example of this.)

I think there's a basic misunderstanding here. The adapters generally have no virtual functions and function calls through the adapters are optimized away entirely. (I've confirmed this on several compilers. It should be true for any decent optimizing compiler.) There is currently an inefficiency when you add a standard stream or stream buffer to the end of a filtering stream, as I describe in the message "IOStreams Formal review -- Guide for Reviewers". This will be eliminated entirely if the library is accepted.

...

The point is that one set of class doesn't preclude the usage of the other. Each one has situations where it's the better solution.

As far as I can tell, the two valid points you have made, w.r.t. our two contributions, are: 1. Using my library to define a null_buff, pointerbuf or value_buf causes more code to be included. This is a legitimate criticism, but I don't think you've made the case that the amount of code included is so enormous that there should be two versions of the same components in boost. 2. The object code will be slightly larger when using a streambuf_facade (actually, I'm not sure you made that point, but I think it's correct.) This can be mitigated somewhat if it turns out to be a problem, but I don't think you have shown yet that it is. Best Regards, Jonathan

Carlo Wood

30 Aug 30 Aug

12:33 p.m.

Apart from some broken links and typos in the documentation/comments, I am very impressed by the amount of work and profesional setup of the documentation and the over all design. This looks like it is a graduation project for a university or something (and perhaps it is). The only thing that really bothers me is that it seems not possible to replace std::streambuf being used by the library (ie streambuf_facade) with a custom streambuf implementation (also derived from std::streambuf of course). I'd have use for that because very likely I'd want to use my own, optimized streambuf (one that adds an interface to be able to read/write to it in a way that there is hardly ever need to actually _copy_ data, see http://libcw.sourceforge.net/io/dbstreambuf.html) Another thing that is bothering me is that the whole presence of anyting 'stream-like' (ostream/istream) seems not in the right place here. This is not only because the std::ostream/std::istream class are merely 'hooks' to hook into the operator<< and operator>> functions which are primarily intended for text (human readable representations) while this library is about binary data - but more importantly because everything this library does is related to and at the level of streambuf's (which DO have a binary interface) This fact is most apparent by considering the fact that this code should work: filtered_ostream fout; fout.push(filter); fout.push(cout); std::ostream& out(fout); // Only have/use the std::ostream base class. out << "Hello World"; // This must use the filter. A much more logical API would therefore be: filtered_streambuf fbuf; fbuf.push(filter); std::streambuf& buf(fbuf); And then using 'buf' as streambuf for some ostream of operator<< inserters are desirable. To summarize: - I think that the stream interface should be ripped out and replaced by one that is an equivalent streambuf. Providing a stream interface should be merely a 'convenience' interface and not the main API. - This streambuf interface should use a 'Streambuf' template parameter for its base class that only defaults to std::streambuf (and may demand that it is derived from std::streambuf if that is really necessary) but allows the base class to be replaced with a custom implementation. -- Carlo Wood <carlo@alinoe.com>

Jonathan Turkanis

5:32 p.m.

"Carlo Wood" <carlo@alinoe.com> wrote in message news:20040830123305.GA23389@alinoe.com...

...

Apart from some broken links and typos in the documentation/comments,

Would you please point them out?

...

I am very impressed by the amount of work and profesional setup of the documentation and the over all design.

Thanks.

...

This looks like it is a graduation project for a university or something (and perhaps it is).

I wish my 'graduation project' (i.e., dissertation) were going so well. :-)

...

The only thing that really bothers me is that it seems not possible to replace std::streambuf being used by the library (ie streambuf_facade) with a custom streambuf implementation (also derived from std::streambuf of course). I'd have use for that because very likely I'd want to use my own, optimized streambuf (one that adds an interface to be able to read/write to it in a way that there is hardly ever need to actually _copy_ data, see http://libcw.sourceforge.net/io/dbstreambuf.html)

This looks interesting. I'd like to look at it carefully before I comment on it (which I'll try to do soon). For now, let me just make these points: 1. There is already a mechanism to avoid copying data in certain cases: by implementing resources which model the concept Direct. 2. There are several optimizations which I have held in reserve which would also minimize copying: a) Allowing resources to advertise that they are streambuf-based, so that i/o is performed directly to the underlying streambuf, with no additional layer of buffering b) Giving special treatment to symmetric filters (used to implement compression/decompression) to allow them to have direct access to the buffers of adjacent streambufs in a chain. c) allowing for a category of 'transparent filters' which simply observe character sequences, forwarding them unchanged. This would allow many useful filters (such as the offsetbuf suggested by David Abrahams) to have essentially zero overhead. 3. Instead of the current one-size-fits-all buffering policy, hardcoded in <boost/io/detail/streambufs/indirect_streambuf.hpp>, it's possible to factor a buffering policy out of the current implementation of streambuf_facade, so that streambuf_facade would look like this: template< typename Resource, typenmae Tr = ..., typename Buffering = basic_buffering<Resource>, ... > class streambuf_facade; This would allow essentially any buffering policy to be employed. The main application I have in mind is cases where the underlying resource should be accessed in mulitples of a certain block size. In fact, I have already (mostly) implemented such an approach, but I have not incorporated it into the library for several reasons: - The buffering policy has a rather bulky interface which I think I may be able to simplify - I'm not convinced yet that it's a performance win -- only tests will tell. If it makes only a small difference in a few cases, it may not be worth complicating the library. To summarize, I'd like to make streambuf_facade flexible enough so that you don't have to substitute you own home-brewed version. This is *not* a criticism of your library: if you have good ideas about how to make streambufs more efficient, I'd like to incorporate them directly into streambuf_facade -- possibly as buffering policies -- with your permission.

...

Another thing that is bothering me is that the whole presence of anyting 'stream-like' (ostream/istream) seems not in the right place here. This is not only because the std::ostream/std::istream class are merely 'hooks' to hook into the operator<< and operator>> functions which are primarily intended for text (human readable representations) while this library is about binary data - but more importantly because everything this library does is related to and at the level of streambuf's (which DO have a binary interface) This fact is most apparent by considering the fact that this code should work:

filtered_ostream fout; fout.push(filter); fout.push(cout);

It works (with 'filtering_ostream'). What's wrong with it?

...

std::ostream& out(fout); // Only have/use the std::ostream base class. out << "Hello World"; // This must use the filter.

A much more logical API would therefore be:

filtered_streambuf fbuf; fbuf.push(filter);

std::streambuf& buf(fbuf);

And then using 'buf' as streambuf for some ostream of operator<< inserters are desirable.

Could you rephrase this whole argument? I don't think I follow it. If you mean that the best way to perform formatted i/o with custom stream buffers is via plain istreams, ostreams and iostreams, then I'm familiar with this argument. My opinion is that it's a matter of taste; personally, I like having matching streams for my stream buffers. stream_facade and filtering_stream are just thin wrapper, so they don't contribute much to the size of the library. If you want to use plain standard library streams, you can easily do so: filtering_istreambuf buf; in.push(regex_filter(regex("damn"), "darn")); in.push(zlib_decompressor()); in.push(file_source("essay.z")); istream in(&buf); // read from in.

...

To summarize:

- I think that the stream interface should be ripped out and replaced by one that is an equivalent streambuf. Providing a stream interface should be merely a 'convenience' interface and not the main API.

The stream interface already *is* just a convenience, as explained above. Perhaps the misunderstanding stems from the fact that in the examples I tend to use streams, since they are more familiar to most users than stream buffers. 'Ripping out' the stream interface would simply mean omitting the two files stream_facade.hpp and filtering_stream.hpp, for a combined total of about 11k ;-)

...

- This streambuf interface should use a 'Streambuf' template parameter for its base class that only defaults to std::streambuf (and may demand that it is derived from std::streambuf if that is really necessary) but allows the base class to be replaced with a custom implementation.

I think a buffering policy is the way to go. Thanks for your review! Jonathan

Carlo Wood

7:06 p.m.

On Mon, Aug 30, 2004 at 11:32:28AM -0600, Jonathan Turkanis wrote:

...

...
Apart from some broken links and typos in the documentation/comments,

Would you please point them out?

On http://home.comcast.net/~jturkanis/iostreams/libs/io/doc/classes/alphabetica... converting_stream and converting_streambuf do not have hyperlinks. stream_facade and streambuf_facade link to non-existing pages. http://home.comcast.net/~jturkanis/iostreams/boost/io/filtering_stream.hpp contains a comment: // Macro: BOOST_IO_DEFINE_FILTER_STERAM(name_, mode_) while the macro signature is actually: #define BOOST_IO_DEFINE_FILTER_STREAM(name_, chain_type_, default_char_) [..snip..]

...

For now, let me just make these points:

1. There is already a mechanism to avoid copying data in certain cases: by implementing resources which model the concept Direct.

I understand from the documentation that this (Array Resources) is a pre-allocated buffer of fixed size. It has to be fixed of course, otherwise you need to move it when a larger size is needed. However, what if the buffer runs full? My dbstreambuf implementation exists of a list of dynamically allocated memory blocks: a new block of memory is allocated when the buffer needs to grow. As a result it is possible to a piece of data (which I call 'messages') that should be contigious for easy processing, but is not (when it spans two or more internal blocks); but, but allocating blocks of a size that are considerably larger than the average message size, and by automatically starting at the beginning of a block if the block become entirely empty, in practise very very little copying (to make a message contigous) is needed. I don't think this is possible with the Direct concept currently provided. The basic idea that you merely return a pointer to the message inside the stream buf and then process it seems covered, but there is more to it. Please inform me if the following is also possible with the Direct concept: What libcw is aiming for is that data (from file/socket descriptors) is read into a buffer in memory - and then no more copying is needed at all. This means that if you 'read' a 'message' (where a what a 'message' is is determined by a custom virtual function 'decode' in a derived class) then you only return a pointer, and advance an internal pointer so that the next 'message' will get subsequential data. However - that message is *still* in the buffer and may not be overwritten until it is true done with. Therefore, messages are passed as objects with a reference counter that inform the underlaying (now seemingly unrelated streambuf) when the data may be overwritten and/or freed. The application would process the 'message' and destruct the message object once it is totally done with it. You will understand that this is also the reason that it is rather important that the buffer can 'grow': Even if on average you process as many message as that are being received - there will normally always be unprocessed messages in the buffer, disallowing it to start writing again at the beginning of the buffer. And therefore every new message needs to be appended at the end.. until you reach the end of the buffer. At that point a 'buffer full' is unacceptable because it is NOT really entirely full - you are merely only using the bytes at the end of it. [...]

...

streambuf_facade would look like this:

template< typename Resource, typenmae Tr = ..., typename Buffering = basic_buffering<Resource>, ... > class streambuf_facade;

This would allow essentially any buffering policy to be employed.

Including the one I described above? Having a linked list of allocated memory blocks and reference counting 'message' objects that reserve parts of it and communicate with the buffer about those parts really being free for reuse?

...

The main application I have in mind is cases where the underlying resource should be accessed in mulitples of a certain block size.

That block size (my message size I think) does not have to be fixed. There are many protocols put there that have variable sized messages! ;)

...

In fact, I have already (mostly) implemented such an approach, but I have not incorporated it into the library for several reasons: - The buffering policy has a rather bulky interface which I think I may be able to simplify - I'm not convinced yet that it's a performance win -- only tests will tell. If it makes only a small difference in a few cases, it may not be worth complicating the library.

Well, this only makes sense for large servers with thousands of connections that all burst data in huge quantities... exactly the kind of applications I like to write ;). There are two major cpu hogs in that case: 1) finding which filedescriptor is ready, 2) moving data in memory. The first can be solved by not using ancient interfaces like select() or poll() but the more modern ones like kqueue.

...

To summarize, I'd like to make streambuf_facade flexible enough so that you don't have to substitute you own home-brewed version. This is *not* a criticism of your library: if you have good ideas about how to make streambufs more efficient, I'd like to incorporate them directly into streambuf_facade -- possibly as buffering policies -- with your permission.

My ideas are free :p. You won't be able to use libcw ('s code) anyway because it wasn't written with a friendly interface in mind - I designed it with two goals: 1) Speed, 2) The ability to adapt to yet-unknown demands, in other words 'flexibility' at the user level (or 'one size fits all', but that really sounds too bad :p). As a result, the interface so complex that someone who doesn't understand it (and that is everyone else besides me) will call it bloated ;)

...

...
Another thing that is bothering me is that the whole presence of anyting 'stream-like' (ostream/istream) seems not in the right place here. This is not only because the std::ostream/std::istream class are merely 'hooks' to hook into the operator<< and operator>> functions which are primarily intended for text (human readable representations) while this library is about binary data - but more importantly because everything this library does is related to and at the level of streambuf's (which DO have a binary interface) This fact is most apparent by considering the fact that this code should work:

filtered_ostream fout; fout.push(filter); fout.push(cout);

It works (with 'filtering_ostream'). What's wrong with it?

...
std::ostream& out(fout); // Only have/use the std::ostream base class. out << "Hello World"; // This must use the filter.

A much more logical API would therefore be:

filtered_streambuf fbuf; fbuf.push(filter);

std::streambuf& buf(fbuf);

And then using 'buf' as streambuf for some ostream of operator<< inserters are desirable.

Could you rephrase this whole argument? I don't think I follow it.

...

From that follows that if you don't NEED initialization for it - then you don't need the whole ostream class. However, what convinced me more is the notition of what an ostream really is: a hook to the operator<< classes. If you only need a hook and users will only write operator<<(std::ostream& ... functions, then why would you ever need something else then an ostream? It is just too unlogical. When I look at the interface of

I am afraid I cannot explain it ... it's experience :/. By providing an interface foobar_stream while you really only need to provide foobar_streambuf you do something that makes my alarm bells go off. The word "inflexible" comes to mind. This will lead to problems of the kind that a user wants to do something but can't. You are limiting yourself too much this way. Another thing, and I can explain that better, is that users only write serializers for std::ostream (and please don't ask them to do that again for filtering_ostream!). Therefore, if there has to a filtering_ostream then it MUST be derived from std::ostream AND still work (the same) if all you have - at any moment after construction and initialization - is a pointer to the std::ostream base class. this library and see filtering_ostream then that strikes me as "impossible", you just CANNOT need that. So, why is it there? You already answered that yourself later by the way: to make it easier for the user. You provide a filtering_ostream as wrapper around std::ostream so that the initialization functions for the *streambuf* look nicer (and surely, yes, this cleans up code that uses ostreams). Therefore, my only objection against filtering_ostream is that it HIDES the real interface: filtering_streambuf... which is doesn't hide as you told me now ;). So, you can consider this objection to be void. But I still think you should make it a bit more clear that filtering_ostream is just candy, convenience - and not hide the real thing (filtering_streambuf) behind it in all your examples and documentation. I completely missed it! [...]

...

...
To summarize:

- I think that the stream interface should be ripped out and replaced by one that is an equivalent streambuf. Providing a stream interface should be merely a 'convenience' interface and not the main API.

The stream interface already *is* just a convenience, as explained above. Perhaps the misunderstanding stems from the fact that in the examples I tend to use streams, since they are more familiar to most users than stream buffers. 'Ripping out' the stream interface would simply mean omitting the two files stream_facade.hpp and filtering_stream.hpp, for a combined total of about 11k ;-)

...

...
- This streambuf interface should use a 'Streambuf' template parameter for its base class that only defaults to std::streambuf (and may demand that it is derived from std::streambuf if that is really necessary) but allows the base class to be replaced with a custom implementation.

I think a buffering policy is the way to go.

If you can add the functionality of libcw's dbstream into this library - then it will become my favourite boost lib ;) Err... probably not. There is another thing that I'd missing. But I can't ask to add that too; its too... ugly (I wanted to say complex). When I use iostream classes I need TWO streambufs (one buffer for the input and another for the output). This is not supported by std::iostream because it only has a single (virtual) std::ios base class and thus only a single streambuf pointer. You can read on the url to libcw that I gave in the previous post how I solved that, but believe me it makes the interface very hard to understand unless you drag in the whole API that I designed around it - and I don't think that will merge nicely with your IOStreams anymore (?). -- Carlo Wood <carlo@alinoe.com>

Carlo Wood

7:22 p.m.

On Mon, Aug 30, 2004 at 09:06:32PM +0200, Carlo Wood wrote:

...

However, what convinced me more is the notition of what an ostream really is: a hook to the operator<< classes. ^^^^^^^ functions.

I appologies for the heaps and heaps of typos, missing words and grammar errors in my text - but leave it to the intelligence of the reader to interpret it correctly nevertheless ;). -- Carlo Wood <carlo@alinoe.com>

Jonathan Turkanis

11 p.m.

"Carlo Wood" <carlo@alinoe.com> wrote in message news:20040830190632.GB1116@alinoe.com...

...

On Mon, Aug 30, 2004 at 11:32:28AM -0600, Jonathan Turkanis wrote:

...
...
Apart from some broken links and typos in the documentation/comments,

Would you please point them out?

On http://home.comcast.net/~jturkanis/iostreams/libs/io/doc/classes/alphabetica...

converting_stream and converting_streambuf do not have hyperlinks.

They're not implemented. This is explained here -- http://tinyurl.com/4hkut -- but I probably shouldn't have included them in the index. Thanks.

...

stream_facade and streambuf_facade link to non-existing pages.

Which page contains the bad links?

...

http://home.comcast.net/~jturkanis/iostreams/boost/io/filtering_stream.hpp

contains a comment:

// Macro: BOOST_IO_DEFINE_FILTER_STERAM(name_, mode_)

while the macro signature is actually:

#define BOOST_IO_DEFINE_FILTER_STREAM(name_, chain_type_, default_char_)

Thanks -- I'm pleased to see someone is actually looking at the source ;-)

...

[..snip..]

...
For now, let me just make these points:

1. There is already a mechanism to avoid copying data in certain cases: by implementing resources which model the concept Direct.

<snip good discussion> Thank you for the detailed explanation. I don't think I understand all of it yet -- I'll have to look at your implementation. Perhaps you could elaborate on the concept of a 'message'. Are you thinking of something which is specific to network programming or is more general? BTW, this sounds vaguely like the descriptions I have read of the Apache 'bucket brigade' filtering framework. Any connections? At any rate, you are right about the limitations of the Direct concept. I introduced it originally to handle memory mapped files. Later I realized that it is really a degenerate case of a more general concept. In general, for output, when you run out of room in the array provided by the Direct resource you would be able to request a new array to write to. Similarly for input you could request a new array when you finish reading to the end of the current array. For random access, you might request an array of a certain length containing a given offset -- you might not get exactly what you requested, but you'd always get an explanation of how the returned array relates to the requested array. (All this would be handled internally by streambuf_facade, of course.) I didn't implement this for four reasons: - I had limited time - I wasn't sure there was a real need for it - It would make the library harder to document and learn, and - There are cases where resources have to be handled directly rather than indirectly through streambuf_facades (see, e.g., <boost/io/copy.hpp>); generalizing the Direct concept would lead to nightmares in those situations.

...

[...]

...
streambuf_facade would look like this:

template< typename Resource, typenmae Tr = ..., typename Buffering = basic_buffering<Resource>, ... > class streambuf_facade;

This would allow essentially any buffering policy to be employed.

Including the one I described above? Having a linked list of allocated memory blocks and reference counting 'message' objects that reserve parts of it and communicate with the buffer about those parts really being free for reuse?

I hope so. If not, I can change it so it does ;-)

...

...
The main application I have in mind is cases where the underlying resource should be accessed in mulitples of a certain block size.

That block size (my message size I think) does not have to be fixed. There are many protocols put there that have variable sized messages! ;)

...

...
In fact, I have already (mostly) implemented such an approach, but I have not incorporated it into the library for several reasons: - The buffering policy has a rather bulky interface which I think I may be able to simplify - I'm not convinced yet that it's a performance win -- only tests will tell. If it makes only a small difference in a few cases, it may not be worth complicating the library.

Well, this only makes sense for large servers with thousands of connections that all burst data in huge quantities... exactly the kind of applications I

Fixed block size is just one possible buffering policy. like

...

to write ;).

Just to clarify, what do you mean by 'this' here? Your lists of dynamically allocated memoery blocks, or my idea of a buffering policy?

...

...
To summarize, I'd like to make streambuf_facade flexible enough so that you don't have to substitute you own home-brewed version. This is *not* a criticism of your library: if you have good ideas about how to make streambufs more efficient, I'd like to incorporate them directly into streambuf_facade -- possibly as buffering policies -- with your permission.

My ideas are free :p. You won't be able to use libcw ('s code) anyway because it wasn't written with a friendly interface in mind - I designed it with two goals: 1) Speed, 2) The ability to adapt to yet-unknown demands, in other words 'flexibility' at the user level (or 'one size fits all', but that really sounds too bad :p). As a result, the interface so complex that someone who doesn't understand it (and that is everyone else besides me) will call it bloated ;)

I'll use it just for inspiration, then :-)

...

...
...
Another thing that is bothering me is that the whole presence of anyting 'stream-like' (ostream/istream) seems not in the right place here. This is not only because

...

...
Could you rephrase this whole argument? I don't think I follow it.

I am afraid I cannot explain it ... it's experience :/. By providing an interface foobar_stream while you really only need to provide foobar_streambuf you do something that makes my alarm bells go off. The word "inflexible" comes to mind. This will lead to problems of the kind that a user wants to do something but can't. You are limiting yourself too much this way.

How can you be limited by additional functionality? All the streambufs are there right along side the streams. You can use them raw, with standard streams, or with the thin wrappers provided by the library.

...

Another thing, and I can explain that better, is that users only write serializers for std::ostream (and please don't ask them to do that again for filtering_ostream!). Therefore, if there has to a filtering_ostream then it MUST be derived from std::ostream

It is.

...

AND still work (the same) if all you have - at any moment after construction and initialization - is a pointer to the std::ostream base class.

It should.

...

...
From that follows that if you don't NEED initialization for it - then you don't need the whole ostream class. However, what convinced me more is the notition of what an ostream really is: a hook to the operator<< classes. If you only need a hook and users will only write operator<<(std::ostream& ... functions, then why would you ever need something else then an ostream? It is just too unlogical. When I look at the interface of this library and see filtering_ostream then that strikes me as "impossible", you just CANNOT need that. So, why is it there? You already answered that yourself later by the way: to make it easier for the user. You provide a filtering_ostream as wrapper around std::ostream so that the initialization functions for the *streambuf* look nicer (and surely, yes, this cleans up code that uses ostreams).

It's a convenience for people who like it. A lot of people wouldn't like a network library that provided a socketbuf but no socket streams.

...

Therefore, my only objection against filtering_ostream is that it HIDES the real interface: filtering_streambuf... which is doesn't hide as you told me now ;).

So, you can consider this objection to be void. But I still think you should make it a bit more clear that filtering_ostream is just candy, convenience - and not hide the real thing (filtering_streambuf) behind it in all your examples and documentation. I completely missed it!

I think that's the real point. I need to stress in more places that streambuf_facade is the main component. (But see http://tinyurl.com/6k86s: "The fundamental component provided by the Iostreams Library is the class template streambuf_facade.")

...

...
I think a buffering policy is the way to go.

If you can add the functionality of libcw's dbstream into this library - then it will become my favourite boost lib ;)

Great!

...

Err... probably not.

Darn, should have read ahead further :)

...

There is another thing that I'd missing. But I can't ask to add that too; its too... ugly (I wanted to say complex). When I use iostream classes I need TWO streambufs (one buffer for the input and another for the output).

Why can't you use a streambuf_facade with mode inout, which buffers input and output separately? (http://tinyurl.com/6bjbl) Thanks for your detailed comments! Jonathan

Carlo Wood

11:58 p.m.

On Mon, Aug 30, 2004 at 05:00:46PM -0600, Jonathan Turkanis wrote:

...

...
On http://home.comcast.net/~jturkanis/iostreams/libs/io/doc/classes/alphabetica...

converting_stream and converting_streambuf do not have hyperlinks.

They're not implemented. This is explained here -- http://tinyurl.com/4hkut -- but I probably shouldn't have included them in the index. Thanks.

...
stream_facade and streambuf_facade link to non-existing pages.

Which page contains the bad links?

Still http://home.comcast.net/~jturkanis/iostreams/libs/io/doc/classes/alphabetica...

...

Thank you for the detailed explanation. I don't think I understand all of it yet -- I'll have to look at your implementation. Perhaps you could elaborate on the concept of a 'message'. Are you thinking of something which is specific to network programming or is more general?

I am foremost an abstract thinker. Abstract comprehension and analysis are my strongest points (according to official tests). So... more general ;). My (abstract) way of looking at this was that a 'stream' is just that.. a line-up of bytes that are received in a stream: you don't have random acces to the whole thing at once - only to the head of the stream that you currently have in a buffer. This alone already indicates that chunks of data that are to be processed (where 'processing' means that they are allowed to be removed from that buffer after 'processing') need to be more or less contiguous and more or less close to eachother (fit in the buffer). I decided that it was general enough to demand that such 'decodable chunks' HAD to be contiguous (on the stream) and decodable independent on what comes after it on the stream (where 'decodable' means that the application must be able to process it (and allow removal from the stream buffer)). This means that such a stream is to be cut into non-overlapping, contiguous chunks, and that a single chunk then should be decodable somehow - depending at most on the internal state of the state machine that has decoded the previous chunks. The size of these chunks of data is totally dependend on the protocol of the stream; arbitary. My own design is such that you can 'plugin' a protocol class (by using it as template parameter) into a 'device class' and voila, it decodes the stream that comes in on that device. As you will have guessed - the above 'decodable chunks', which are made contiguous on request (by the decoder, iff needed) are the 'messages' that I introduced earlier. In libcw's code I actually call them 'msg_blocks'.

...

BTW, this sounds vaguely like the descriptions I have read of the Apache 'bucket brigade' filtering framework. Any connections?

I never heard of it before.

...

At any rate, you are right about the limitations of the Direct concept. I introduced it originally to handle memory mapped files. Later I realized that it is really a degenerate case of a more general concept. In general, for output, when you run out of room in the array provided by the Direct resource you would be able to request a new array to write to. Similarly for input you could request a new array when you finish reading to the end of the current array.

I don't understand. Don't you only need to enlarge a buffer when you write to it? For 'input' that means that a new array is needed when more data is received from the device then fits in the current array.

...

For random access, you might request an array of a certain length containing a given offset -- you might not get exactly what you requested, but you'd always get an explanation of how the returned array relates to the requested array. (All this would be handled internally by streambuf_facade, of course.)

I don't think that my approach (== using a buffer that actually exists of a list of allocated memory blocks) is compatible with random access. The problem of a message overlapping the edge of two blocks and thus not being contigious make it impossible to treat the "stream" any different then just like that: a stream of messages. Well, I suppose you could allow random access to messages. But you can do that with my implementation too by simply storing all the read msg_block's in a vector (and never destructing them). The streambuf would grow undefinitely then, but you'd have instant and random access to each received message so far. :) What I mean is that it is not compatible with mmap-ed access.

...

I didn't implement this for four reasons:

- I had limited time

duh

...

- I wasn't sure there was a real need for it

there is always a need for everything(tm)

...

- It would make the library harder to document and learn, and

duh

...

- There are cases where resources have to be handled directly rather than indirectly through streambuf_facades (see, e.g., <boost/io/copy.hpp>); generalizing the Direct concept would lead to nightmares in those situations.

Yeah, 'nightmare' describes the libcw interface :). But at least it is powerful *grin*. [...snip rip...]

...

...
Well, this only makes sense for large servers with thousands of connections that all burst data in huge quantities... exactly the kind of applications I like to write ;).

Just to clarify, what do you mean by 'this' here? Your lists of dynamically allocated memoery blocks, or my idea of a buffering policy?

The first. [...deleted finished discussion...]

...

Why can't you use a streambuf_facade with mode inout, which buffers input and output separately? (http://tinyurl.com/6bjbl)

Oh, great! It is clear I didn't really study all of the documentation. It gets better and better ;). Well, then we have two points left now that I wonder about. 1) Is it possible to use this library and somehow implement a growable streambuf that never moves data (== exists of a list of fixed sized allocated blocks). 2) A new point :p. Does your library also have what I call in libcw a 'link_buffer' (actually comparable to your 'mode'); this isn't immedeately clear from the docs and well, I am lazy :/. The 'link' mode make a streambuf input and output for two different devices at the same time, allowing to link two devices together without the need to copy the date (as would be the case when each device always has its _own_ buffer). For example, I can create a file device and a pipe_end device (one end of a UNIX pipe) and tie them together (construct one from the other) and they will share the same buffer - writing the file to the pipe or visa versa (depending on input/output 'mode' template parameters that are part of the device classes). In your url I see 'seekable', but that doesn't seem to be the same as 'pass-through'. Of course you can just create a std::istream and a std::ostream and give them the same buffer... but well, is that an input or output mode buffer then? I seem to fail to see where you define a mode for that case. Perhaps the dual-seekable? *confused* -- Carlo Wood <carlo@alinoe.com>

Jeff Flinn

31 Aug 31 Aug

2:39 p.m.

I've tried to replace a couple of existing derived stream/streambuf classes that are used with boost::serialization i/o archives to support Drag/Drop/Copy/Paste in an MFC application. The Drop/Paste side works great and significantly simplifies and reduces the amount of code req'd: class CSharedFileIStream : public boost::io::stream_facade< boost::io::array_source > { typedef boost::io::array_source tSource; typedef boost::io::stream_facade< tSource > tBase; HGLOBAL mMemHdl; public: ~CSharedFileIStream( ){ ::GlobalUnlock(mMemHdl); } CSharedFileIStream( HGLOBAL aMemHdl ) : tBase( (LPTSTR)::GlobalLock(aMemHdl), ::GlobalSize(aMemHdl) ) , mMemHdl( aMemHdl ) {} }; Used like so: CSharedFileIStream lIn( lMemHdl ); boost::archive::binary_iarchive ia( lIn ); ia >> aVal; Unfortunately, the Drag/Copy side produces a access violation in the std::locale copy constructor, when boost::io::detail::indirect_streambuf::imbue calls std::basic_streambuf::pbuimbue. I've tried the following with both binary/text archive types: struct CSharedFileSink : public boost::io::sink { CSharedFile& mSharedFile; CSharedFileSink( CSharedFile& aSharedFile ) : boost::io::sink() , mSharedFile(aSharedFile) {} void write( const char* s, std::streamsize n ){ mSharedFile.Write( s, n ); } }; class CSharedFileOStream : public boost::io::stream_facade<CSharedFileSink> { typedef boost::io::stream_facade<CSharedFileSink> tBase; public: CSharedFileOStream( CSharedFile& aSharedFile ) : tBase( CSharedFileSink( aSharedFile ) ) {} }; Used like so: CSharedFile lOutFile; CSharedFileOStream lOut( lOutFile ); boost::archive::text_oarchive oa( lOut ); Am I doing anything obviously wrong? I love the source/sink approach and would love to take advantage of it's conceptual simplification in dealing with streams. Thanks, Jeff

Jonathan Turkanis

7 p.m.

"Jeff Flinn" <TriumphSprint2000@hotmail.com> wrote in message news:ch22j5$qr8$1@sea.gmane.org...

...

I've tried to replace a couple of existing derived stream/streambuf classes that are used with boost::serialization i/o archives to support Drag/Drop/Copy/Paste in an MFC application.

The Drop/Paste side works great and significantly simplifies and reduces the amount of code req'd:

I glad to hear it.

...

Unfortunately, the Drag/Copy side produces a access violation in the std::locale copy constructor, when boost::io::detail::indirect_streambuf::imbue calls std::basic_streambuf::pbuimbue. I've tried the following with both binary/text archive types:

At some point I accidentally erased the initialization of the pointer next_ in the indirect_streambuf member initialization list. It should be patched as follows: ---- diff -u -r1.5 indirect_streambuf.hpp --- indirect_streambuf.hpp 31 Aug 2004 18:51:32 -0000 1.5 +++ indirect_streambuf.hpp 31 Aug 2004 18:51:41 -0000 @@ -142,7 +142,7 @@ template<typename T, typename Tr, typename Alloc, typename Mode> indirect_streambuf<T, Tr, Alloc, Mode>::indirect_streambuf() - : pback_size_(0), flags_(0) { } + : next_(0), pback_size_(0), flags_(0) { } ---- Let me know if this works. I think this is a small enough change that I can fix it without confusing reviewers. I'll try to to it later today.

...

Am I doing anything obviously wrong?

No, I was ;-)

...

I love the source/sink approach and would love to take advantage of it's conceptual simplification in dealing with streams.

Thanks! Jonathan

Jeff Flinn

7:12 p.m.

"Jonathan Turkanis" <technews@kangaroologic.com> wrote in message news:ch2hcb$fdo$1@sea.gmane.org...

...

"Jeff Flinn" <TriumphSprint2000@hotmail.com> wrote in message news:ch22j5$qr8$1@sea.gmane.org...

...
I've tried to replace a couple of existing derived stream/streambuf

...

...
that are used with boost::serialization i/o archives to support Drag/Drop/Copy/Paste in an MFC application.

The Drop/Paste side works great and significantly simplifies and reduces

classes the

...

...
amount of code req'd:

I glad to hear it.

...
Unfortunately, the Drag/Copy side produces a access violation in the std::locale copy constructor, when boost::io::detail::indirect_streambuf::imbue calls std::basic_streambuf::pbuimbue. I've tried the following with both binary/text archive types:

At some point I accidentally erased the initialization of the pointer next_ in the indirect_streambuf member initialization list. It should be patched as follows:

----

diff -u -r1.5 indirect_streambuf.hpp --- indirect_streambuf.hpp 31 Aug 2004 18:51:32 -0000 1.5 +++ indirect_streambuf.hpp 31 Aug 2004 18:51:41 -0000 @@ -142,7 +142,7 @@

template<typename T, typename Tr, typename Alloc, typename Mode> indirect_streambuf<T, Tr, Alloc, Mode>::indirect_streambuf() - : pback_size_(0), flags_(0) { } + : next_(0), pback_size_(0), flags_(0) { }

----

Let me know if this works. I think this is a small enough change that I can fix it without confusing reviewers. I'll try to to it later today.

That did it! Thanks, Jeff

Neal D. Becker

7 Sep 7 Sep

7:47 p.m.

Jonathan Turkanis wrote:

...

"Jeff Flinn" <TriumphSprint2000@hotmail.com> wrote in message news:ch22j5$qr8$1@sea.gmane.org...

...
I've tried to replace a couple of existing derived stream/streambuf classes that are used with boost::serialization i/o archives to support Drag/Drop/Copy/Paste in an MFC application.

The Drop/Paste side works great and significantly simplifies and reduces the amount of code req'd:

I glad to hear it.

...
Unfortunately, the Drag/Copy side produces a access violation in the std::locale copy constructor, when boost::io::detail::indirect_streambuf::imbue calls std::basic_streambuf::pbuimbue. I've tried the following with both binary/text archive types:

At some point I accidentally erased the initialization of the pointer next_ in the indirect_streambuf member initialization list. It should be patched as follows:

----

diff -u -r1.5 indirect_streambuf.hpp --- indirect_streambuf.hpp 31 Aug 2004 18:51:32 -0000 1.5 +++ indirect_streambuf.hpp 31 Aug 2004 18:51:41 -0000 @@ -142,7 +142,7 @@

template<typename T, typename Tr, typename Alloc, typename Mode> indirect_streambuf<T, Tr, Alloc, Mode>::indirect_streambuf() - : pback_size_(0), flags_(0) { } + : next_(0), pback_size_(0), flags_(0) { }

I had been getting segfault during destruction coming from imbue, this probably explains it.

Jonathan Turkanis

8:50 p.m.

"Neal D. Becker" <ndbecker2@verizon.net> wrote in message news:chl384$cvr$1@sea.gmane.org...

...

Jonathan Turkanis wrote:

...
"Jeff Flinn" <TriumphSprint2000@hotmail.com> wrote in message news:ch22j5$qr8$1@sea.gmane.org...

...
I've tried to replace a couple of existing derived stream/streambuf classes that are used with boost::serialization i/o archives to support Drag/Drop/Copy/Paste in an MFC application.

The Drop/Paste side works great and significantly simplifies and reduces the amount of code req'd:

I glad to hear it.

...
Unfortunately, the Drag/Copy side produces a access violation in the std::locale copy constructor, when boost::io::detail::indirect_streambuf::imbue calls std::basic_streambuf::pbuimbue. I've tried the following with both binary/text archive types:

At some point I accidentally erased the initialization of the pointer next_ in the indirect_streambuf member initialization list. It should be patched as follows:

----

diff -u -r1.5 indirect_streambuf.hpp --- indirect_streambuf.hpp 31 Aug 2004 18:51:32 -0000 1.5 +++ indirect_streambuf.hpp 31 Aug 2004 18:51:41 -0000 @@ -142,7 +142,7 @@

template<typename T, typename Tr, typename Alloc, typename Mode> indirect_streambuf<T, Tr, Alloc, Mode>::indirect_streambuf() - : pback_size_(0), flags_(0) { } + : next_(0), pback_size_(0), flags_(0) { }

I had been getting segfault during destruction coming from imbue, this probably explains it.

Yep, sorry. Jonathan

Thorsten Ottosen

2 Sep 2 Sep

10:28 p.m.

Hi Jonathan, here are some general comments. 1. Your documentation is clearŽ, plenty and very well organized. It must have been a huge work! 2. How is boundcheking applied in this example: struct vector_sink : public sink { vector_sink(vector<char>& vec) : vec(vec) { } void write(const char* s, std::streamsize n) { vec.insert(vec.end(), s, s + n); } I mean, could't n be too big? 3. The example filtering_streambuf<input> in; in.push(alphabetic_input_filter()); in.push(cin); mentions something about the source comming last. Maybe it would be more explicit to say in.push_filter(alphabetic_input_filter()); in.push_source(cin); assuming there is a stakc of both types. If there is only one source, then maybe in.attach_source( cin ); 4. in the example int c; char* first = s; char* last = s + n; while ( first != last && (c = boost::io::get(src)) != EOF && isalpha(c) ) { *first++ = c; } how can get(src) == EOF, but first != last? Is this "double" checking necessary? Could it be avoided with another design? 5 In the example, void write(Sink& snk, const char* s, streamsize n) { while (n-- > 0) boost::io::put(snk, toupper(*s++)); } maybe a comment should say that streamsize is signed. 6. tab_expanding_output_filter.hpp maybe we should ask JK about using the new boost license 7. in presidential_filter_example.cpp : (std::streamsize) sub.size() ); should this not be boost::numeric_cast<std::streamsize>( sub.size() ); ? (there is probably more) 8. in usenet_input_filter.hpp dictionary_["AFAIK"] = "as far as I know"; dictionary_["HAND"] = "have a nice day"; maybe boost.assign could make the example even coolor? 9. the old discussion of typedef typename char_type<T>::type char_type; vs typedef typename io_char<T>::type char_type; pops up. If there ever is a vote, I would vote for the last version. Also, io_category<T>::type seems better to avoid confusion with other category traits. 10. In example like filtering_istream in(adapt(s.begin(), s.end())); filtering_istream in(s.begin(), s.end()); it seems that you could remove the iterator pair version and provide a ForwardReadableRange version 11. What is the difference between the two examples in "III. Appending to an STL Sequence" besides one is much more clumsy? here are somereview comments: ====================== What is your evaluation of the design? ---------------------------- It seems very well-though out and crisp. What is your evaluation of the implementation? ----------------------------- haven't looked. What is your evaluation of the documentation? ----------------------------- puts most other docs to shame. What is your evaluation of the potential usefulness of the library? ------------------------------ high. I never fully had the time to get into the iostreams framework; it simply seemed to compplicated for a weekend study. I'm very pleased to see yet another library that seems easy to use, yet very powerful. This library shows why we should love C++! Did you try to use the library? With what compiler? Did you have any problems? ------------------------------- no. How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? -------------------------------- to hours walking through the docs. Are you knowledgeable about the problem domain? --------------------------------- no. And finally, every review should answer this question: Do you think the library should be accepted as a Boost library? =========================================== yes. here are some directions I would like to see (but don't require) 1. Use of boost,range instead of iterator pairs 2. possible wierd questions like mine above and others that creep up during the review might form a basis for a FAQ 3. I hope differences and commonalites with Daryle's MoreIo stuff can be solved; perhaps by the two authors in combination. Great work! Thorsten

Jonathan Turkanis

11:51 p.m.

"Thorsten Ottosen" <nesotto@cs.auc.dk> wrote in message news:ch86rj$rqb$1@sea.gmane.org...

...

Hi Jonathan,

Thanks for the review!

...

here are some general comments.

1. Your documentation is clearŽ, plenty and very well organized. It must have been a huge work!

It was exhausting!

...

2. How is boundcheking applied in this example:

struct vector_sink : public sink { vector_sink(vector<char>& vec) : vec(vec) { } void write(const char* s, std::streamsize n) { vec.insert(vec.end(), s, s + n); }

I mean, could't n be too big?

This is supposed to extend the vector. Are you worried that n + vec.size() will exceed std::vector::max_size()?

...

3. The example

filtering_streambuf<input> in; in.push(alphabetic_input_filter()); in.push(cin);

mentions something about the source comming last. Maybe it would be more explicit to say

in.push_filter(alphabetic_input_filter()); in.push_source(cin);

assuming there is a stakc of both types. If there is only one source, then maybe in.attach_source( cin );

There is just one stack. It consists of zero or more filters with an optional resource at the end. See the diagrams here: http://tinyurl.com/5v2ak. Resources represent the ultimate data source/sink, while filters modify data as it passes through. So it wouldn't make sense to have a resource in the middle or a filter at the end. One of the original motivations for introducing the i/o categories (they turned out to be useful for a lot of other things) was to avoid having separate functions push_filter and push_resource. I consider it a major simplification of the interface.

...

4. in the example

int c; char* first = s; char* last = s + n; while ( first != last && (c = boost::io::get(src)) != EOF && isalpha(c) ) { *first++ = c; }

how can get(src) == EOF, but first != last? Is this "double" checking necessary? Could it be avoided with another design?

This function, from the tutorial, reads characters from an arbitrary Source src into a bufer s of size n. It's possible that the end of the character sequence represented by src could be reached before the buffer is full.

...

5 In the example, void write(Sink& snk, const char* s, streamsize n) { while (n-- > 0) boost::io::put(snk, toupper(*s++)); }

maybe a comment should say that streamsize is signed.

Good point. In fact, I shouldn't assume people know what streamsize and streamoff are. I should have an introduction to the basic types from <ios> somewhere.

...

6. tab_expanding_output_filter.hpp maybe we should ask JK about using the new boost license

Good idea. (Maybe I can talk him into reviewing the library.)

...

7. in presidential_filter_example.cpp (std::streamsize) sub.size() );

should this not be boost::numeric_cast<std::streamsize>( sub.size() ); ? (there is probably more)

I was lazy here ;-)

...

8. in usenet_input_filter.hpp

dictionary_["AFAIK"] = "as far as I know"; dictionary_["HAND"] = "have a nice day";

maybe boost.assign could make the example even coolor?

Of course it would be cooler! But I think it's good to introduce just one new thing at a time.

...

9. the old discussion of

typedef typename char_type<T>::type char_type;

I admit this is ugly. That's why I tend to use BOOST_IO_CHAR_TYPE(T) :-)

...

vs

typedef typename io_char<T>::type char_type;

pops up. If there ever is a vote, I would vote for the last version.

Okay.

...

Also, io_category<T>::type seems better to avoid confusion with other category traits.

I agree that the nested type should be called io_category, in case something is both an iterator and a resource. But isn't the namespace io good enough to disambiguate the metafunction? Wait ... I was the one arguing for calling the metafunction by the exact same name as the nested type. ... I'll have to think about it some more.

...

10. In example like

filtering_istream in(adapt(s.begin(), s.end())); filtering_istream in(s.begin(), s.end());

it seems that you could remove the iterator pair version and provide a ForwardReadableRange version

The problem is there's no way to distinguish at compile time between an arbitrary use-defined filter or resource and an arbitrary ForwardReadableRange (unless you're suggesting I do a lot of member-detection). As I explain in Rationale-->Planned Changes-->Item 3 (see http://tinyurl.com/6rtkz), I'm planning to treate boost::iterator_range as (almost) a model of Resource, eliminating the need for adapt. (I'm having second thoughts about one part of that paragraph, though. Identifying output iterators using is_incrementable will misclassify Larry Evans's indentation filters.)

...

11. What is the difference between the two examples in "III. Appending to an STL Sequence" besides one is much more clumsy?

I assume you mean the first is more clumsy. Unfortunately, it's also more efficient. It uses a plain streambuf_facade, while the second example uses a filtering_ostream, which delegates to a (length one) chain of streambuf_facades. The filtering infrastructure has some overhead, so unless I'm actually going to add filters to the chain I'd prefer the first example.

...

here are somereview comments: ======================

What is your evaluation of the design? ---------------------------- It seems very well-though out and crisp.

Thanks.

...

What is your evaluation of the implementation? ----------------------------- haven't looked.

What is your evaluation of the documentation? ----------------------------- puts most other docs to shame.

...

What is your evaluation of the potential usefulness of the library? ------------------------------ high. I never fully had the time to get into the iostreams framework; it simply seemed to compplicated for a weekend study. I'm very pleased to see yet another library that seems easy to use, yet very

Thanks! powerful. This library shows why we should love C++! ;-)

...

Do you think the library should be accepted as a Boost library? ===========================================

yes.

Great.

...

here are some directions I would like to see (but don't require)

1. Use of boost,range instead of iterator pairs

...

2. possible wierd questions like mine above and others that creep up during

This is planned. the review might form a basis for a FAQ The review has brought to light a number of insufficiencies with the docs. A lot of stuff that seemed obvious to me just isn't. So I'll definitely add a FAQ, but I also need to write a more complete introduction.

...

3. I hope differences and commonalites with Daryle's MoreIo stuff can be solved; perhaps by the two authors in combination.

I've made some overtures to Daryle w.r.t combining the libraries. I've learned from his response to my review of More IO that he believes the stream buffers in his library should be written from scratch to minimize included code. I hope we can work something out.

...

Great work!

Thanks again.

...

Thorsten

Jonathan

Thorsten Ottosen

3 Sep 3 Sep

7:36 a.m.

"Jonathan Turkanis" <technews@kangaroologic.com> wrote in message news:ch8b4q$4cd$1@sea.gmane.org... | | "Thorsten Ottosen" <nesotto@cs.auc.dk> wrote in message | > 2. How is boundcheking applied in this example: | > | > struct vector_sink : public sink { | > vector_sink(vector<char>& vec) : vec(vec) { } | > void write(const char* s, std::streamsize n) | > { | > vec.insert(vec.end(), s, s + n); | > } | > | > I mean, could't n be too big? | | This is supposed to extend the vector. Are you worried that n + vec.size() will | exceed std::vector::max_size()? no, I was thinking, is the buffer at 's' big enough to always hold n more elements...ie...is this guaranteed by the framework design. | > 10. In example like | > | > filtering_istream in(adapt(s.begin(), s.end())); | > filtering_istream in(s.begin(), s.end()); | > | > it seems that you could remove the iterator pair version and provide a | ForwardReadableRange version | | The problem is there's no way to distinguish at compile time between an | arbitrary use-defined filter or resource and an arbitrary ForwardReadableRange | (unless you're suggesting I do a lot of member-detection). ok, yeah, that's probably very non-portable. | > 11. What is the difference between the two examples in "III. Appending to an | STL Sequence" besides one is | > much more clumsy? | | I assume you mean the first is more clumsy. Unfortunately, it's also more | efficient. It uses a plain streambuf_facade, while the second example uses a | filtering_ostream, which delegates to a (length one) chain of streambuf_facades. | The filtering infrastructure has some overhead, so unless I'm actually going to | add filters to the chain I'd prefer the first example. ok, did the docs talk about this efficiency difference? br Thorsten

Jonathan Turkanis

8 a.m.

"Thorsten Ottosen" <nesotto@cs.auc.dk> wrote in message news:ch96uf$ie9$1@sea.gmane.org...

...

"Jonathan Turkanis" <technews@kangaroologic.com> wrote in message news:ch8b4q$4cd$1@sea.gmane.org... | | "Thorsten Ottosen" <nesotto@cs.auc.dk> wrote in message

| > 2. How is boundcheking applied in this example: | > | > struct vector_sink : public sink { | > vector_sink(vector<char>& vec) : vec(vec) { } | > void write(const char* s, std::streamsize n) | > { | > vec.insert(vec.end(), s, s + n); | > } | > | > I mean, could't n be too big? | | This is supposed to extend the vector. Are you worried that n + vec.size() will | exceed std::vector::max_size()?

no, I was thinking, is the buffer at 's' big enough to always hold n more elements...ie...is this guaranteed by the framework design.

Whenever this member is called internally by the library, the buffer is guaranteed to be big enough. Users can also call it directly; in this case it's up to them to make sure s is big enough -- just as when a user invokes std::ostream::write.

...

| > 11. What is the difference between the two examples in "III. Appending to an | STL Sequence" besides one is | > much more clumsy? | | I assume you mean the first is more clumsy. Unfortunately, it's also more | efficient. It uses a plain streambuf_facade, while the second example uses a | filtering_ostream, which delegates to a (length one) chain of streambuf_facades. | The filtering infrastructure has some overhead, so unless I'm actually going to | add filters to the chain I'd prefer the first example.

ok, did the docs talk about this efficiency difference?

No, but I guess they should.

...

br

Thorsten

Jonathan

Rob Stewart

7:05 p.m.

From: "Jonathan Turkanis" <technews@kangaroologic.com>

...

"Thorsten Ottosen" <nesotto@cs.auc.dk> wrote in message news:ch86rj$rqb$1@sea.gmane.org...

...
3. The example

filtering_streambuf<input> in; in.push(alphabetic_input_filter()); in.push(cin);

mentions something about the source comming last. Maybe it would be more explicit to say

in.push_filter(alphabetic_input_filter()); in.push_source(cin);

assuming there is a stakc of both types. If there is only one source, then maybe in.attach_source( cin );

There is just one stack. It consists of zero or more filters with an optional resource at the end. See the diagrams here: http://tinyurl.com/5v2ak.

Resources represent the ultimate data source/sink, while filters modify data as it passes through. So it wouldn't make sense to have a resource in the middle or a filter at the end.

One of the original motivations for introducing the i/o categories (they turned out to be useful for a lot of other things) was to avoid having separate functions push_filter and push_resource. I consider it a major simplification of the interface.

I agree with Thorsten that some means of ensuring that parts aren't assembled in the wrong order would be helpful. Whether that means separate functions, or detection of the type of object being pushed, it seems like preventing misuse should be a bigger priority than "a major simplification of the interface." There are plenty of places where one can misuse existing libraries, including the Standard Library, so perhaps requiring that protection from this library is misguided. So, here's another approach: perhaps you could create a set of overloaded make_* functions that take a varying number of filter arguments followed by an optional (via overloading) resource argument. Then, those functions can ensure that if there is a resource, it is push()'d last. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Jonathan Turkanis

9:06 p.m.

"Rob Stewart" <stewart@sig.com> wrote in message news:200409031905.i83J5JU21799@lawrencewelk.systems.susq.com...

...

From: "Jonathan Turkanis" <technews@kangaroologic.com>

...
"Thorsten Ottosen" <nesotto@cs.auc.dk> wrote in message

...

...
One of the original motivations for introducing the i/o categories (they turned out to be useful for a lot of other things) was to avoid having separate functions push_filter and push_resource. I consider it a major simplification of the interface.

I agree with Thorsten that some means of ensuring that parts aren't assembled in the wrong order would be helpful. Whether that means separate functions, or detection of the type of object being pushed, it seems like preventing misuse should be a bigger priority than "a major simplification of the interface."

I'm not sure I follow. You already get a runtime error if you try to add a filter or resource to a chain that is already complete. This is mention in the specification for push (see http://tinyurl.com/49j6u) E.g., filtering_ostream out; out.push(zlib_compressor()); out.push(file_sink("hello.z")); out.push(base64_encoder()); // error !! out.push(tcp_sink(www.microsoft.com, 80)); // error !! Isn't this enough? (Maybe it should be an assertion failure instead of an exception.) Perhaps you would like a compile-time error instead. Note that having separate functions for pushing filters and resources would not help in that case. To generate a compile-time error would require that the types of all the filters and resources be encoded into the type of the filtering stream. This was suggested last year by Larry Evans and recently by Robert Ramey (if I understood hime correctly.) The problems are: - much more complex interface - less flexible a runtime - neglible gain in efficiency, since most filtering operations aren't inlineable Finally, it's already the programmer's responsibility to ensure that the filters are added in the right order -- no amount of template magic will guarantee this -- so making sure to add the resource at the end is not much of an extra burden.

...

There are plenty of places where one can misuse existing libraries, including the Standard Library, so perhaps requiring that protection from this library is misguided. So, here's another approach: perhaps you could create a set of overloaded make_* functions that take a varying number of filter arguments followed by an optional (via overloading) resource argument. Then, those functions can ensure that if there is a resource, it is push()'d last.

Yes, that would work. There are two versions I can think of: 1. Orginally I had a function link(...) which created an inline chain of filters and resources, but I eliminated it to make the library smaller. It didn't occur to me to add a compile-time check that the last element was a resource; in fact, I thought it would also be useful for chaining filters alone. However, this might be a good reason to restore the function link, with the added check. 2. Dietmar Kuehl mentioned a piping syntax originall proposed by JC van Winkel. E.g., filtering_stream out( base64_encoder() | zlib_compressor() | file_sink("file") ); This syntax, too, could be modified to do a compile-time check that the last item in the chain is a resource. I like both of these ideas as a syntactic convenience but not as a way to enforce at compile time that resources are added last. For this enforcement to have teeth, it would be necessary to remove the stack interface, which would be considered 'unsafe'. But the stack interface is natural and convenient, and essential for some purposes. For example, a natural way to implement a compression stream would be to derive from a filtering stream and push a compression filter onto the stack in the stream's constructor. Templated open and close functions would then be implemented by pushing and popping resources: struct zlib_ostream : filtering_ostream { zlib_ostream() { push(zlib_compressor()); } template<typename Source> zlib_ostream(const Source& src ) { push(zlib_compressor()); open(src); } template<typename Source> void open(const Source& src ) { BOOST_STATIC_ASSERT(is_resource<src>::value); push(src); } bool is_open() const { return is_complete(); } template<typename Source> void close(const Source& src ) { assert(is_open()); pop(); } }; I sort of feel like I'm beating a dead horse :-) Is a runtime error (or assertion failure) sufficient, or do you feel strongly that there needs to be a compile-time check? Jonathan

Rob Stewart

4 Sep 4 Sep

1:39 a.m.

From: "Jonathan Turkanis" <technews@kangaroologic.com>

...

"Rob Stewart" <stewart@sig.com> wrote in message news:200409031905.i83J5JU21799@lawrencewelk.systems.susq.com...

...
From: "Jonathan Turkanis" <technews@kangaroologic.com>

...
"Thorsten Ottosen" <nesotto@cs.auc.dk> wrote in message

...
...
One of the original motivations for introducing the i/o categories (they turned out to be useful for a lot of other things) was to avoid having separate functions push_filter and push_resource. I consider it a major simplification of the interface.

I agree with Thorsten that some means of ensuring that parts aren't assembled in the wrong order would be helpful. Whether that means separate functions, or detection of the type of object being pushed, it seems like preventing misuse should be a bigger priority than "a major simplification of the interface."

I'm not sure I follow. You already get a runtime error if you try to add a filter or resource to a chain that is already complete. This is mention in the specification for push (see http://tinyurl.com/49j6u) E.g.,

filtering_ostream out; out.push(zlib_compressor()); out.push(file_sink("hello.z")); out.push(base64_encoder()); // error !! out.push(tcp_sink(www.microsoft.com, 80)); // error !!

Isn't this enough? (Maybe it should be an assertion failure instead of an exception.)

Oh, right. That's good, and is flexible for runtime assembly of the filters and resource.

...

Perhaps you would like a compile-time error instead. Note that having separate

Where possible, that's certainly preferable.

...

functions for pushing filters and resources would not help in that case. To generate a compile-time error would require that the types of all the filters and resources be encoded into the type of the filtering stream. This was suggested last year by Larry Evans and recently by Robert Ramey (if I understood hime correctly.)

The problems are:

- much more complex interface

This is a subjective judgment I can't evaluate due to lack of information.

...

- less flexible a runtime

I can see that is a problem and I can imagine determining the filters to assemble at runtime, so this would be an issue.

...

- neglible gain in efficiency, since most filtering operations aren't inlineable

The request wasn't about efficiency but safety.

...

Finally, it's already the programmer's responsibility to ensure that the filters are added in the right order -- no amount of template magic will guarantee this -- so making sure to add the resource at the end is not much of an extra burden.

You're right, and I already acknowledged (quoted below) that there are many places in which a programmer can hang himself. The flexibility of what you have in place increases runtime possibilities, and can be wrapped to make it safer as discussed below.

...

...
There are plenty of places where one can misuse existing libraries, including the Standard Library, so perhaps requiring that protection from this library is misguided. So, here's another approach: perhaps you could create a set of overloaded make_* functions that take a varying number of filter arguments followed by an optional (via overloading) resource argument. Then, those functions can ensure that if there is a resource, it is push()'d last.

Yes, that would work. There are two versions I can think of:

1. Orginally I had a function link(...) which created an inline chain of filters and resources, but I eliminated it to make the library smaller. It didn't occur to me to add a compile-time check that the last element was a resource; in fact, I thought it would also be useful for chaining filters alone. However, this might be a good reason to restore the function link, with the added check.

This is rather like using smart pointers. You can always call new and delete, and work with raw pointers, but it's up to you to guard against errors, including in the face of exceptions. When using smart pointers, which are wrappers around raw pointers, you gain safety. This should in no way be considered a prerequisite to acceptance of the library.

...

2. Dietmar Kuehl mentioned a piping syntax originall proposed by JC van Winkel. E.g.,

filtering_stream out( base64_encoder() | zlib_compressor() | file_sink("file") );

This syntax, too, could be modified to do a compile-time check that the last item in the chain is a resource.

Does it have to be a resource? Is the check actually to ensure that no filter follows a resource?

...

I like both of these ideas as a syntactic convenience but not as a way to enforce at compile time that resources are added last. For this enforcement to have teeth, it would be necessary to remove the stack interface, which would be considered 'unsafe'. But the stack interface is natural and convenient, and essential for some purposes.

Don't do that; keep things as they are.

...

I sort of feel like I'm beating a dead horse :-) Is a runtime error (or assertion failure) sufficient, or do you feel strongly that there needs to be a compile-time check?

I think that what you have is fine, but that a safer means built atop what exists is a better approach and can be added later as time permits. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Jonathan Turkanis

2:27 a.m.

"Rob Stewart" <stewart@sig.com> wrote in message:

...

From: "Jonathan Turkanis" <technews@kangaroologic.com>

...
"Rob Stewart" <stewart@sig.com> wrote in message

...

...
...
I agree with Thorsten that some means of ensuring that parts aren't assembled in the wrong order would be helpful. Whether that means separate functions, or detection of the type of object being pushed, it seems like preventing misuse should be a bigger priority than "a major simplification of the interface."

I'm not sure I follow. You already get a runtime error if you try to add a filter or resource to a chain that is already complete. This is mention in

I'm sorry if I sounded argumentative in my last post. the

...

...
specification for push (see http://tinyurl.com/49j6u) E.g.,

filtering_ostream out; out.push(zlib_compressor()); out.push(file_sink("hello.z")); out.push(base64_encoder()); // error !! out.push(tcp_sink(www.microsoft.com, 80)); // error !!

Isn't this enough? (Maybe it should be an assertion failure instead of an exception.)

Oh, right. That's good, and is flexible for runtime assembly of the filters and resource.

Good.

...

...
Perhaps you would like a compile-time error instead. Note that having separate

Where possible, that's certainly preferable.

Agreed.

...

...
functions for pushing filters and resources would not help in that case. To generate a compile-time error would require that the types of all the filters and resources be encoded into the type of the filtering stream. This was suggested last year by Larry Evans and recently by Robert Ramey (if I understood hime correctly.)

The problems are:

Here I summarized a bunch of stuff that wasn't directly relevant to your point. Sorry.

...

...
- much more complex interface

This is a subjective judgment I can't evaluate due to lack of information.

...
- less flexible a runtime

I can see that is a problem and I can imagine determining the filters to assemble at runtime, so this would be an issue.

...
- neglible gain in efficiency, since most filtering operations aren't inlineable

...

...
Finally, it's already the programmer's responsibility to ensure that the filters are added in the right order -- no amount of template magic will guarantee this -- so making sure to add the resource at the end is not much of an extra burden.

You're right, and I already acknowledged (quoted below) that there are many places in which a programmer can hang himself. The flexibility of what you have in place increases runtime possibilities, and can be wrapped to make it safer as discussed below.

Okay.

...

...
...
There are plenty of places where one can misuse existing libraries, including the Standard Library, so perhaps requiring that protection from this library is misguided. So, here's another approach: perhaps you could create a set of overloaded make_* functions that take a varying number of filter arguments followed by an optional (via overloading) resource argument. Then, those functions can ensure that if there is a resource, it is push()'d last.

Did I misunderstand you the first time around -- you're not saying the arguments would be automatically reordered if a resource is in the middle, are you? It would just produce an error, right?

...

...
Yes, that would work. There are two versions I can think of:

1. Orginally I had a function link(...) which created an inline chain of filters and resources, but I eliminated it to make the library smaller. It didn't occur to me to add a compile-time check that the last element was a resource; in fact, I thought it would also be useful for chaining filters alone. However, this might be a good reason to restore the function link, with the added check.

This is rather like using smart pointers. You can always call new and delete, and work with raw pointers, but it's up to you to guard against errors, including in the face of exceptions. When using smart pointers, which are wrappers around raw pointers, you gain safety.

So link would just be a programmer-imposed check.

...

This should in no way be considered a prerequisite to acceptance of the library.

Good.

...

...
2. Dietmar Kuehl mentioned a piping syntax originall proposed by JC van Winkel. E.g.,

filtering_stream out( base64_encoder() | zlib_compressor() | file_sink("file") );

This syntax, too, could be modified to do a compile-time check that the last item in the chain is a resource.

Does it have to be a resource? Is the check actually to ensure that no filter follows a resource?

I guess the check should be that nothing follows a resource.

...

...
I like both of these ideas as a syntactic convenience but not as a way to enforce at compile time that resources are added last. For this enforcement to have teeth, it would be necessary to remove the stack interface, which would be considered 'unsafe'. But the stack interface is natural and convenient, and essential for some purposes.

Don't do that; keep things as they are.

Good.

...

...
I sort of feel like I'm beating a dead horse :-) Is a runtime error (or assertion failure) sufficient, or do you feel strongly that there needs to be a compile-time check?

I think that what you have is fine, but that a safer means built atop what exists is a better approach and can be added later as time permits.

Got it. Best Regards, Jonathan

Rob Stewart

4:36 a.m.

From: "Jonathan Turkanis" <technews@kangaroologic.com>

...

"Rob Stewart" <stewart@sig.com> wrote in message:

...
From: "Jonathan Turkanis" <technews@kangaroologic.com>

...
"Rob Stewart" <stewart@sig.com> wrote in message

I'm sorry if I sounded argumentative in my last post.

I didn't notice.

...

...
...
...
There are plenty of places where one can misuse existing libraries, including the Standard Library, so perhaps requiring that protection from this library is misguided. So, here's another approach: perhaps you could create a set of overloaded make_* functions that take a varying number of filter arguments followed by an optional (via overloading) resource argument. Then, those functions can ensure that if there is a resource, it is push()'d last.

Did I misunderstand you the first time around -- you're not saying the arguments would be automatically reordered if a resource is in the middle, are you? It would just produce an error, right?

Yes. A given overload, with arity N, would require that only argument N may be a resource.

...

...
...
Yes, that would work. There are two versions I can think of:

1. Orginally I had a function link(...) which created an inline chain of filters and resources, but I eliminated it to make the library smaller. It didn't occur to me to add a compile-time check that the last element was a resource; in fact, I thought it would also be useful for chaining filters alone. However, this might be a good reason to restore the function link, with the added check.

This is rather like using smart pointers. You can always call new and delete, and work with raw pointers, but it's up to you to guard against errors, including in the face of exceptions. When using smart pointers, which are wrappers around raw pointers, you gain safety.

So link would just be a programmer-imposed check.

Yes.

...

...
I think that what you have is fine, but that a safer means built atop what exists is a better approach and can be added later as time permits.

I didn't mean to say it would be a better approach. I meant it would be safer. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Jonathan Turkanis

9 Sep 9 Sep

3:41 a.m.

"Jonathan Turkanis" <technews@kangaroologic.com> wrote in message news:chalsi$cmb$1@sea.gmane.org...

...

"Rob Stewart" <stewart@sig.com> wrote in message news:200409031905.i83J5JU21799@lawrencewelk.systems.susq.com...

...
From: "Jonathan Turkanis" <technews@kangaroologic.com>

...
"Thorsten Ottosen" <nesotto@cs.auc.dk> wrote in message

...
...
One of the original motivations for introducing the i/o categories (they turned out to be useful for a lot of other things) was to avoid having separate functions push_filter and push_resource. I consider it a major simplification of the interface.

I agree with Thorsten that some means of ensuring that parts aren't assembled in the wrong order would be helpful. Whether that means separate functions, or detection of the type of object being pushed, it seems like preventing misuse should be a bigger priority than "a major simplification of the interface."

I'm not sure I follow. You already get a runtime error if you try to add a filter or resource to a chain that is already complete. This is mention in the specification for push (see http://tinyurl.com/49j6u) E.g.,

filtering_ostream out; out.push(zlib_compressor()); out.push(file_sink("hello.z")); out.push(base64_encoder()); // error !! out.push(tcp_sink(www.microsoft.com, 80)); // error !!

Isn't this enough? (Maybe it should be an assertion failure instead of an exception.)

Perhaps you would like a compile-time error instead. Note that having separate functions for pushing filters and resources would not help in that case. To generate a compile-time error would require that the types of all the filters and resources be encoded into the type of the filtering stream. This was suggested last year by Larry Evans and recently by Robert Ramey (if I

understood

...

hime correctly.)

The problems are:

- much more complex interface - less flexible a runtime - neglible gain in efficiency, since most filtering operations aren't inlineable

Finally, it's already the programmer's responsibility to ensure that the filters are added in the right order -- no amount of template magic will guarantee this -- so making sure to add the resource at the end is not much of an extra burden.

...
There are plenty of places where one can misuse existing libraries, including the Standard Library, so perhaps requiring that protection from this library is misguided. So, here's another approach: perhaps you could create a set of overloaded make_* functions that take a varying number of filter arguments followed by an optional (via overloading) resource argument. Then, those functions can ensure that if there is a resource, it is push()'d last.

Yes, that would work. There are two versions I can think of:

1. Orginally I had a function link(...) which created an inline chain of filters and resources, but I eliminated it to make the library smaller. It didn't occur to me to add a compile-time check that the last element was a resource; in fact, I thought it would also be useful for chaining filters alone. However, this might be a good reason to restore the function link, with the added check.

2. Dietmar Kuehl mentioned a piping syntax originall proposed by JC van Winkel. E.g.,

filtering_stream out( base64_encoder() | zlib_compressor() | file_sink("file") );

This syntax, too, could be modified to do a compile-time check that the last item in the chain is a resource.

I like both of these ideas as a syntactic convenience but not as a way to enforce at compile time that resources are added last. For this enforcement to have teeth, it would be necessary to remove the stack interface, which would be considered 'unsafe'. But the stack interface is natural and convenient, and essential for some purposes.

For example, a natural way to implement a compression stream would be to derive from a filtering stream and push a compression filter onto the stack in the stream's constructor. Templated open and close functions would then be implemented by pushing and popping resources:

struct zlib_ostream : filtering_ostream { zlib_ostream() { push(zlib_compressor()); }

template<typename Source> zlib_ostream(const Source& src ) { push(zlib_compressor()); open(src); }

template<typename Source> void open(const Source& src ) { BOOST_STATIC_ASSERT(is_resource<src>::value); push(src); }

bool is_open() const { return is_complete(); }

template<typename Source> void close(const Source& src ) { assert(is_open()); pop(); } };

I sort of feel like I'm beating a dead horse :-) Is a runtime error (or assertion failure) sufficient, or do you feel strongly that there needs to be a compile-time check?

Jonathan

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Pavel Vozenilek

3 Sep 3 Sep

6:33 p.m.

...

start of the formal review of the Iostreams library by Jonathan Turkanis.

I vote to include Iostreams into Boost with both hands. It is prime example of professional work. Looking back on MoreIo library I recommend: - to prefere Iostreams for all overlapping functionality - to move the applicable rest from MoreIo into Iostreams with: - naming conventions taken from Iostreams - documentation look and feel from Iostreams For user it is always better to have one library with consistent structure/naming/documentation than to have many of them. This is hard earned experience. In case both MoreIo and Iostreams will be added into Boost, then they *BOTH* should have disambiguation section on the top of documentation, with info: - what funcionality overlaps - how can they cooperate together As the Iostreams is large, I didn't yet read reference docs and test code. I plan it and will post rest of review later. Current review notes are bellow. To deal with its complexity I recommend to add many more overcommented code snippets into documentation and add hyperlinks to relevant examples on every fitting place. Pictures, flowcharts and diagrams would help too. Some information would look better in tabular form. (Details bellow). /Pavel __________________________________________________________________ 1. overview docs: The first occurence of terms "data source" and "sink" should be hyperlinked (it is but only later in text). Dtto "filters". It may be also useful to put there some diagram. In addition link to short code example could be added to the first occurence for each term. The enumeration of components as 'access to memory-mapped files' may look better in <ul>...</ul> list. Sentence: "Sources, Sinks and their refinements are called resources." The term is overused in programming. Maybe "data resources" or "data-module" or so would be better. Relation between sins/sources/filters and streams should be explained immediatelly. Possibly using picture. Link to "adapters.html" is broken. The links to iterator_facade and Iterators library should be local. __________________________________________________________________ 2. tutorial docs: The first code example: s[z] = (char) (rand() * 256 / RAND_MAX); I think this causes overflows way too often. s[z] = (char) (rand() % 256); is more likely intended even if it adds small bias. In example: in.open(634875); also stream example should be added, just to show how it is written. streambuf_facade and stream_facade should be hyperlinked. (I am fan of hyperlinking everything everywhere.) In code example line: streambuf_facade<vector_sink> out(v); // error!! it should be sais if it is compile time error, assertion or whether user needs to check it manually. I wonder it is is possible to workaround this limitation internally in library by const_casts/whatever nasty. Sentence: "... and modifies it before returning it to the user." should be "... and modifies or eliminates it before returning it to the user." In example: filtering_streambuf<input> in; in.push(alphabetic_input_filter()); in.push(cin); It should be explained what is "input" and where does it come from. Shouldn't it have more expressive name e.g. input_filter_base? Maybe the source/sink too. Chainig of filters example feels as good place for a picture. It is a bit confusing to see push() of both input_filter descendant and source descendant. Should be noted, maybe with class diagram. All examples use variable names 'in' and 'out'. It may be confusing for one just skimming through code snippets. Maybe different names (e.g. vector_in) could be used. It would be also nice to distinguish variables and types systematically. E.g. variables having _ suffix. filters.html link is broken. Generally, tutorial may be sprinkled with few more overcommented code snippets. They may be hide-able. Other docs won't suffer having them too. __________________________________________________________________ 2. examples docs: Link to regex should be local. Link "Uncommenting Input Filter" is broken. __________________________________________________________________ 3. presidential_output_filter.hpp: a nitpick, in code: if (word == "subliminable") return "subliminal"; else if (word == "nucular") .... the 'else' parts are not needed. __________________________________________________________________ 4. docs: all <img> elements should have their width/height set in HTML __________________________________________________________________ 5. concepts docs: all concepts here can be also listen in table, sorted and with one line info. The individual concepts should be added as sub tree into left panel. Dtto compression filters, etc. Every page should be accessible from left panel. __________________________________________________________________ 6. Installation docs: it should be in moredetail described what is BOOST_IO_NO_LIB for and who needs it for what. Maybe there should be list of *.cpp files one needs to use for zlib/bzip2/..., for those who do not want to use autolink. __________________________________________________________________ 7. "Classes by category" docs: each link could have very short decription on the left: array_resouce (xyz...) array_sink (abc...) Similarly classes sorted alphabetically, functions, etc. It would help when one is trying quickly locate some info. __________________________________________________________________ 8. All applicable headers should have on top: #if (defined _MSC_VER) && (_MSC_VER >= 1200) # pragma once #endif to keep compilation times for VC and Intel C++ down. __________________________________________________________________ 9. Syntax like this was suggested: filter_stream out(tee(std::cout) | encode | gzip | file("some file")); Maybe support for this could be built up on existing framework. __________________________________________________________________ 10. To question:

...

do you prefer the names filter_stream and filter_streambuf to filtering_stream and filtering_streambuf?

The first. Steve McConnell in his Code Complete 2 quotes study recommending identifier length 10-16 chars. __________________________________________________________________ 11. A collection of useful stream(buffer)s could be added into library, so they are handy to users: - /dev/nul like stream (like the one from more_io) - tee, tee3, ... etc stream - random (the one from code snippet) - repeater (some sequence of characters over and over) This one could ideally accept syntax used inn Boost.Assign. It would allow to easily create complex but predictable test data structures. - 'a switcher'. Something what sends data into another stream but has method switch_to_other_stream(s). E.g. for circular logs. __________________________________________________________________ 12. One could dream about ability to use lambda expressions directly for filters. __________________________________________________________________ 13. Possible feature of library (I do not know whether it is implementable or implementable efficiently/elegantly): - lets have chain of streams (e.f. buffer/compress/write file) - someone feeding the head of chain may want to execute an action somwhere down in the chain (e.g. switching files). - now it would require 'someone' to know all details of chain and to have access points to its parts (=> high coupling) It would be nice to be able to 'send a command' down the stream and have some part of chain picking it up and executing. This way sender would not need to know anything about current chain structure. If the command wouldn't be recognizable by any part of stream chain it would become nop. Similarly one can imagine events going upstream. If anyone handles them: OK, if not they will be ignored. (They are different from exceptions that ignoring them is default and harmless.) Examples of downstream commands: - switch file (for logging) - flush compressor dictionaries Examples of upstream events: - predetermined file size reached - current compression ratio info __________________________________________________________________ 14. May it be possible to add "synchronize panels" button into left panel? If clicked, it would expand and highlight item in left panel for currently opened page in right panel. __________________________________________________________________ 15, cloable.html: it should be explained (+ example) what 'notification' here means. __________________________________________________________________ 16. The examples section should contain all available examples, e.g. compressors. There may be subsection for them. Maybe something as: + Examples + Simple Examples .... + Compressors Examples .... + .... It is because many people would not go any futher than here and would miss the rest. __________________________________________________________________ EOF

Rob Stewart

4 Sep 4 Sep

1:21 a.m.

From: "Pavel Vozenilek" <pavel_vozenilek@hotmail.com>

...

11. A collection of useful stream(buffer)s could be added into library, so they are handy to users:

- tee, tee3, ... etc stream

To generalize it, you really want to avoid numbering them. Instead, you want an n-way multiplexer type. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Jonathan Turkanis

2:37 a.m.

"Pavel Vozenilek" <pavel_vozenilek@hotmail.com> wrote in message news:chade8$n8k$1@sea.gmane.org...

...

...
start of the formal review of the Iostreams library by Jonathan Turkanis.

I vote to include Iostreams into Boost with both hands. It is prime example of professional work.

Thanks!

...

Looking back on MoreIo library I recommend: - to prefere Iostreams for all overlapping functionality - to move the applicable rest from MoreIo into Iostreams with: - naming conventions taken from Iostreams - documentation look and feel from Iostreams

I assume you view null_buf, pointerbuf, etc. as overlapping functionality. What about 'streambuf wrapping'?

...

For user it is always better to have one library with consistent structure/naming/documentation than to have many of them. This is hard earned experience.

Yes.

...

As the Iostreams is large, I didn't yet read reference docs and test code. I plan it and will post rest of review later.

I look forward to it. Note that the end date for this review is not yet fixed.

...

Current review notes are bellow.

To deal with its complexity I recommend to add many more overcommented code snippets into documentation and add hyperlinks to relevant examples on every fitting place. Pictures, flowcharts and diagrams would help too. Some information would look better in tabular form. (Details bellow).

Yes; the review so far has made this very clear to me.

...

__________________________________________________________________ 1. overview docs:

The first occurence of terms "data source" and "sink" should be hyperlinked (it is but only later in text). Dtto "filters".

My reason for not hyperlinking the first occurence was my feeling that in the first two paragraphs I was simply using them as 'common sense' terms, but that when I used the terms in the section marked 'Concepts' I was formally introducing them as technical terms defined by the library. Obviously this was not very clear.

...

It may be also useful to put there some diagram. In addition link to short code example could be added to the first occurence for each term.

Good idea. I've become convinced of the need to add explanatory material right at the beginning.

...

The enumeration of components as 'access to memory-mapped files' may look better in <ul>...</ul> list.

Okay.

...

Sentence: "Sources, Sinks and their refinements are called resources."

The term is overused in programming. Maybe "data resources" or "data-module" or so would be better.

I know that 'resource' is a loaded term ;-) but I couldn't think of anything better. 'Data-module' doesn't seem very descriptive. Unfortunately, the standard term seems to be "Source/Sink" -- which I find ugly.

...

Relation between sins/sources/filters and streams should be explained immediatelly. Possibly using picture.

Okay. What do you think of the pictures here: http://tinyurl.com/5v2ak? Too much detail for the introduction?

...

Link to "adapters.html" is broken.

Which link?

...

The links to iterator_facade and Iterators library should be local.

I'm planning to make all the boost links local eventually. I wanted them to work properly for people viewing the docs from the zipped package.

...

__________________________________________________________________ 2. tutorial docs:

The first code example: s[z] = (char) (rand() * 256 / RAND_MAX); I think this causes overflows way too often. s[z] = (char) (rand() % 256); is more likely intended even if it adds small bias.

Thanks.

...

In example: in.open(634875); also stream example should be added, just to show how it is written.

I don't understand.

...

streambuf_facade and stream_facade should be hyperlinked. (I am fan of hyperlinking everything everywhere.)

I tried to be liberal with my hyperlinks. Sometimes I find it annoying when the same word is hyperlinked several times in a sentence. But you're right, in the tutorial streambuf_facade and stream_facade definitely need to link the user's guide or the reference.

...

In code example line: streambuf_facade<vector_sink> out(v); // error!! it should be sais if it is compile time error, assertion or whether user needs to check it manually.

I wonder it is is possible to workaround this limitation internally in library by const_casts/whatever nasty.

These are the standard forwarding problem trade offs. The one thing I defintely want to avoid is allowing const references to bind to temporary stream objects: filtering_istream in(stringstream("hello")); This would immediately lead to disaster.

...

Sentence: "... and modifies it before returning it to the user." should be "... and modifies or eliminates it before returning it to the user."

Isn't elimination a type of modification?

...

In example:

filtering_streambuf<input> in; in.push(alphabetic_input_filter()); in.push(cin);

It should be explained what is "input" and where does it come from.

You're right. To clarify, input is a tag struct representing the i/o mode 'input'. So, filtering_streambuf<input> is a read-only filtering stream, synonymous with filtering_istream, and filtering_streambuf<output> is a write-only filtering stream, synonymous with filtering_ostream

...

Shouldn't it have more expressive name e.g. input_filter_base? Maybe the source/sink too.

I don't follow this.

...

It is a bit confusing to see push() of both input_filter descendant and source descendant. Should be noted, maybe with class diagram.

Others have said this too. Originally I had something like push_filter and push_resource. The i/o categories made this unnecessary. I consider the std::stack-like push/pop/size/empty interface to be much more elegant. After all, there's just one stack. Internally, also, adding a filter is essentially the same as adding a resource.

...

All examples use variable names 'in' and 'out'. It may be confusing for one just skimming through code snippets. Maybe different names (e.g. vector_in) could be used.

Okay.

...

It would be also nice to distinguish variables and types systematically. E.g. variables having _ suffix.

That's the convention I use for member variables.

...

filters.html link is broken.

Which link?

...

Generally, tutorial may be sprinkled with few more overcommented code snippets. They may be hide-able. Other docs won't suffer having them too.

Okay. What do you mean by hideable? I'm hoping not to have to use any javascript except for the menu.

...

__________________________________________________________________ 2. examples docs:

Link to regex should be local.

Link "Uncommenting Input Filter" is broken.

If you mean the link under "Regex OutputFilter", I think it appears broken only because there's nowhere left to scroll.

...

__________________________________________________________________ 3. presidential_output_filter.hpp: a nitpick, in code:

if (word == "subliminable") return "subliminal"; else if (word == "nucular") ....

the 'else' parts are not needed.

True.

...

__________________________________________________________________ 4. docs: all <img> elements should have their width/height set in HTML

For slow connections, I guess?

...

__________________________________________________________________ 5. concepts docs: all concepts here can be also listen in table, sorted and with one line info.

The individual concepts should be added as sub tree into left panel. Dtto compression filters, etc. Every page should be accessible from left panel.

I'll try to do this, but I might find that the menu is so big it takes too long to load.

...

__________________________________________________________________ 6. Installation docs: it should be in moredetail described what is BOOST_IO_NO_LIB for and who needs it for what.

Maybe there should be list of *.cpp files one needs to use for zlib/bzip2/..., for those who do not want to use autolink.

All the .cpp files are enumerated here: http://tinyurl.com/6klot.

...

__________________________________________________________________ 7. "Classes by category" docs: each link could have very short decription on the left:

array_resouce (xyz...) array_sink (abc...)

Similarly classes sorted alphabetically, functions, etc. It would help when one is trying quickly locate some info.

Good idea.

...

__________________________________________________________________ 8. All applicable headers should have on top:

#if (defined _MSC_VER) && (_MSC_VER >= 1200) # pragma once #endif

to keep compilation times for VC and Intel C++ down.

I thought I already made this change. ________________________________________________________________

...

9. Syntax like this was suggested:

filter_stream out(tee(std::cout) | encode | gzip | file("some file"));

Maybe support for this could be built up on existing framework.

I think it's really cool. I've already contacted the (co-)author JC van Winkel, who gave me permission to use it.

...

__________________________________________________________________ 10. To question:

...
do you prefer the names filter_stream and filter_streambuf to filtering_stream and filtering_streambuf?

The first. Steve McConnell in his Code Complete 2 quotes study recommending identifier length 10-16 chars.

Me too.

...

__________________________________________________________________ 11. A collection of useful stream(buffer)s could be added into library, so they are handy to users:

- /dev/nul like stream (like the one from more_io)

- tee, tee3, ... etc stream

- random (the one from code snippet)

These are all good. I'm thinking as the collection of provided components grows, I may need to be more systematic about library organization. So one might have directories boost/io/filters/compression, etc.

...

- repeater (some sequence of characters over and over) This one could ideally accept syntax used inn Boost.Assign. It would allow to easily create complex but predictable test data structures.

Could you elaborate?

...

- 'a switcher'. Something what sends data into another stream but has method switch_to_other_stream(s). E.g. for circular logs.

Doable, but presents some problems with buffer synchronization. There is currently no generic sync() function or Synchronizable concept.

...

__________________________________________________________________ 12. One could dream about ability to use lambda expressions directly for filters.

Lambda expressions which define function objects which transform characters one at a time whould be easy. But I suppose you mean lambda expressions representing complex filtering operations. Let's see .... How about this: instead of place-holders _1, _2, etc. we could have _get, _peek, _put... The following could be an inline version of Kanze's uncommenting input filter (http://tinyurl.com/6qn3a): if_(_get == '#') [ while_(_peek != EOF && _peek != '\n') [ _get ] ] , _peek :-)

...

__________________________________________________________________ 13. Possible feature of library (I do not know whether it is implementable or implementable efficiently/elegantly):

- lets have chain of streams (e.f. buffer/compress/write file)

- someone feeding the head of chain may want to execute an action somwhere down in the chain (e.g. switching files).

- now it would require 'someone' to know all details of chain and to have access points to its parts (=> high coupling)

Right. That's the status quo.

...

It would be nice to be able to 'send a command' down the stream and have some part of chain picking it up and executing. This way sender would not need to know anything about current chain structure.

If the command wouldn't be recognizable by any part of stream chain it would become nop.

Similarly one can imagine events going upstream. If anyone handles them: OK, if not they will be ignored.

(They are different from exceptions that ignoring them is default and harmless.)

Sounds interesting. I believe this would have to rely on runtime rather than compile-time polymorphism. I'll think about it.

...

Examples of downstream commands: - switch file (for logging) - flush compressor dictionaries

Examples of upstream events: - predetermined file size reached - current compression ratio info

...

__________________________________________________________________ 14. May it be possible to add "synchronize panels" button into left panel? If clicked, it would expand and highlight item in left panel for currently opened page in right panel.

The [link to this page] button already expands the tree to the current page, if the tree contains a link to the current page. I guess highlighting would be easy enough to do. I'm afraid some would find this annoying.

...

__________________________________________________________________ 15, cloable.html: it should be explained (+ example) what 'notification' here means.

How's this: A Closable filter or resource receives notifications -- via the function <A HREF='...'>boost::io::close</A> --immediately before a containing stream or stream buffer is closed. I defintely need an example here.

...

__________________________________________________________________ 16. The examples section should contain all available examples, e.g. compressors. There may be subsection for them.

Maybe something as: + Examples + Simple Examples .... + Compressors Examples .... + ....

It is because many people would not go any futher than here and would miss the rest.

Good idea. Thanks for the detailed examination! Jonathan

Carlo Wood

3:03 a.m.

On Fri, Sep 03, 2004 at 08:37:20PM -0600, Jonathan Turkanis wrote:

...

...
2. tutorial docs:

The first code example: s[z] = (char) (rand() * 256 / RAND_MAX); I think this causes overflows way too often. s[z] = (char) (rand() % 256); is more likely intended even if it adds small bias.

Please no, this is in every FAQ on the use of rand(). The lower bits of rand() are not very random and using (rand() % 256) would be a classical case of misuse of that function. The use of (rand() * 256 / RAND_MAX) is fine because RAND_MAX can be devided by 256. For the general way one is supposed to devide the result of rand() in buckets. Ie, if you want to find an integer random number in the range [0, n> then you'd devide the results of rand() into n equally sized buckets and ignore results that fall outside of them in order to get a good distribution namely, while ((res = (rand() / (RAND_MAX / n))) == n); -- Carlo Wood <carlo@alinoe.com>

Thorsten Ottosen

7:38 a.m.

"Carlo Wood" <carlo@alinoe.com> wrote in message news:20040904030317.GA23364@alinoe.com... | On Fri, Sep 03, 2004 at 08:37:20PM -0600, Jonathan Turkanis wrote: | > > 2. tutorial docs: | > > | > > The first code example: | > > s[z] = (char) (rand() * 256 / RAND_MAX); | > > I think this causes overflows way too often. | > > s[z] = (char) (rand() % 256); | > > is more likely intended even if it | > > adds small bias. | | Please no, this is in every FAQ on the use of rand(). | The lower bits of rand() are not very random and | using (rand() % 256) would be a classical case of | misuse of that function. btw, there is no requirement in the standarf that rand() must be a stupid algorithms with this behavior. br Thorsten

Rob Stewart

4:30 a.m.

From: "Jonathan Turkanis" <technews@kangaroologic.com>

...

...
Sentence: "Sources, Sinks and their refinements are called resources."

The term is overused in programming. Maybe "data resources" or "data-module" or so would be better.

I know that 'resource' is a loaded term ;-) but I couldn't think of anything better. 'Data-module' doesn't seem very descriptive. Unfortunately, the standard term seems to be "Source/Sink" -- which I find ugly.

FWIW, I don't think "resource" captures the idea very well and I do agree that "source" and "sink" are rather technical, so they aren't so good, general purpose words. A term used for physical sources and sinks and, colloquially, for their drivers, is "device." Other choices: gadget, terminus, endpoint. Among these, I like "endpoint" best. It even captures the notion that they must be at one end or the other of a chain.

...

...
Sentence: "... and modifies it before returning it to the user." should be "... and modifies or eliminates it before returning it to the user."

Isn't elimination a type of modification?

Well, yes, but I think his change clarifies that the filter can skip input, not just change it from one thing to another. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Jonathan Turkanis

9 Sep 9 Sep

2:35 a.m.

"Rob Stewart" <stewart@sig.com> wrote in message news:200409040430.i844U9410532@lawrencewelk.systems.susq.com...

...

From: "Jonathan Turkanis" <technews@kangaroologic.com>

...
...
Sentence: "Sources, Sinks and their refinements are called resources."

The term is overused in programming. Maybe "data resources" or "data-module" or so would be better.

I know that 'resource' is a loaded term ;-) but I couldn't think of anything better. 'Data-module' doesn't seem very descriptive. Unfortunately, the

standard

...
term seems to be "Source/Sink" -- which I find ugly.

FWIW, I don't think "resource" captures the idea very well and I do agree that "source" and "sink" are rather technical, so they aren't so good, general purpose words. A term used for physical sources and sinks and, colloquially, for their drivers, is "device." Other choices: gadget, terminus, endpoint. Among these, I like "endpoint" best. It even captures the notion that they must be at one end or the other of a chain.

It seems there's not much support for the term 'resource', although it's come to sound quite natural to me. Of the various possibilities you mention, I think I like 'device' best. I'd still want to keep source and sink for as names for input and output devices, though, since they're the standard terms used by filtering libraries. Jonathan

Pavel Vozenilek

4 Sep 4 Sep

10:38 a.m.

"Jonathan Turkanis" wrote:

...

...
Looking back on MoreIo library I recommend: - to prefere Iostreams for all overlapping functionality - to move the applicable rest from MoreIo into Iostreams with: - naming conventions taken from Iostreams - documentation look and feel from Iostreams

I assume you view null_buf, pointerbuf, etc. as overlapping functionality. What about 'streambuf wrapping'?

Doesn't feel to be big help but I could be easily wrong. I have no strong opinion. __________________________________________________________________ 1. overview docs:

...

Okay. What do you think of the pictures here: http://tinyurl.com/5v2ak? Too much detail for the introduction?

Maybe some general, high level describing structure could come first. If it can have up to three rectangles, people could keep it in short term memory while reading futher. Maybe use of colors can be reduced.

...

...
Link to "adapters.html" is broken.

Which link?

Oops. External link when not being online. __________________________________________________________________ 2. tutorial docs:

...

...
...
In example: in.open(634875); also stream example should be added, just to show how it is written.

I don't understand.

streambuf_facade<random_source> in(5); in.close(); in.open(634875); stream_facade<random_source> in2(random_source(5)); in2.close(); in2.open(123);

...

...
Sentence: "... and modifies it before returning it to the user." should be "... and modifies or eliminates it before returning it to the user."

Isn't elimination a type of modification?

I think this is one of disputed questions in philosophy. Non-english readers may like if explicit.

...

...
Shouldn't it have more expressive name e.g. input_filter_base? Maybe the source/sink too.

I don't follow this.

The name 'input', without namespace qualification may: - suggest full featured, ready to use class, not something needed to be extended - may clash when one uses "using namespace xyz" a lot I personally like all base classes suffixed with "_base".

...

...
filters.html link is broken.

Which link?

Tutorial, section "Filtering Output", link with text "output_filter". It points to file "filters.html", with superfluous 's' at the end of filename.

...

...
Generally, tutorial may be sprinkled with few more overcommented code snippets. They may be hide-able. Other docs won't suffer having them too.

Okay. What do you mean by hideable? I'm hoping not to have to use any javascript except for the menu.

Exactly that. Javascript powered "Show Example"/"Hide Example" code snippet. __________________________________________________________________

...

...
4. docs: all <img> elements should have their width/height set in HTML

For slow connections, I guess?

For people who switched off images by default. __________________________________________________________________

...

...
5. concepts docs: all concepts here can be also listen in table, sorted and with one line info.

The individual concepts should be added as sub tree into left panel. Dtto compression filters, etc. Every page should be accessible from left panel.

I'll try to do this, but I might find that the menu is so big it takes too long to load.

I got confused seeing pages in right panel for whom there was no left panel item. It got me uneasy feeling I'll surely miss something. __________________________________________________________________

...

...
11. A collection of useful stream(buffer)s could be added into library, so they are handy to users:

...
- repeater (some sequence of characters over and over) This one could ideally accept syntax used inn Boost.Assign. It would allow to easily create complex but predictable test data structures.

Could you elaborate?

Thinko with Boost.Assigne. I meant something like: repeater_stream(repeat(2, 'a', 'b', 'c', 'd', repeat('e', 4)), 'f', 'g'); which would produce sequence: a, b, c, d, e, e, e, e, a, b, c, d, e, e, e, e, f, g and again The syntax is secondary, main point is that the generated data could be typed easily.

...

...
- 'a switcher'. Something what sends data into another stream but has method switch_to_other_stream(s). E.g. for circular logs.

Doable, but presents some problems with buffer synchronization. There is currently no generic sync() function or Synchronizable concept.

...

...
13. Possible feature of library

...
It would be nice to be able to 'send a command' down the stream and have some part of chain picking it up and executing. This way sender would not need to know anything about current chain structure.

Similarly one can imagine events going upstream. If anyone handles them: OK, if not they will be ignored.

Sounds interesting. I believe this would have to rely on runtime rather

Maybe this could be added (with Flush name)? __________________________________________________________________ than

...

compile-time polymorphism. I'll think about it.

...

...
__________________________________________________________________ 14. May it be possible to add "synchronize panels" button into left panel? If clicked, it would expand and highlight item in left panel for currently opened page in right panel.

The [link to this page] button already expands the tree to the current

Yes, maybe something typeless: x_stream s; unsigned invoked_count = s.send_command("cmd name", boost::any& parameter); s.register_event_handler("event name", boost::function<void, stream_base*> parameter); s.unregister_event_handlers("event name"); x_stream { bool on_command(const char* cmd, boost::any& parameter) { .... invoke_event("event_name"); send_command("other command downstream", parameter); .... return true; // continue with this command downstream } }; page, if

...

the tree contains a link to the current page. I guess highlighting would be easy enough to do. I'm afraid some would find this annoying.

...

...
15, cloable.html: it should be explained (+ example) what 'notification' here means.

How's this: A Closable filter or resource receives notifications -- via

I noticed this button only now. When I am on page not listed in left panel it just does left panel refresh. Maybe it could at least pick 'parent' or 'related' page in left panel. __________________________________________________________________ the

...

function <A HREF='...'>boost::io::close</A> --immediately before a containing stream or stream buffer is closed.

I have uneasy feeling about the word 'notification'. Maybe "When you close stream, filters implementing Closeable concept will have their member function close() called" or so. /Pavel

Jonathan Turkanis

2:52 p.m.

"Pavel Vozenilek" <pavel_vozenilek@hotmail.com> wrote in message news:chc5vj$rfd$1@sea.gmane.org...

...

"Jonathan Turkanis" wrote: __________________________________________________________________ 2. tutorial docs:

...
...
...
In example: in.open(634875); also stream example should be added, just to show how it is written.

I don't understand.

streambuf_facade<random_source> in(5); in.close(); in.open(634875);

stream_facade<random_source> in2(random_source(5)); in2.close(); in2.open(123);

Okay.

...

...
...
Shouldn't it have more expressive name e.g. input_filter_base? Maybe the source/sink too.

I don't follow this.

The name 'input', without namespace qualification may: - suggest full featured, ready to use class, not something needed to be extended - may clash when one uses "using namespace xyz" a lot

I personally like all base classes suffixed with "_base".

But input is not a base class. If anything, the convention would be to call it input_tag. But since it's used a lot, I thought plain 'input' would be better.

...

...
...
filters.html link is broken.

Which link?

Tutorial, section "Filtering Output", link with text "output_filter".

It points to file "filters.html", with superfluous 's' at the end of filename.

Thanks.

...

...
...
Generally, tutorial may be sprinkled with few more overcommented code snippets. They may be hide-able. Other docs won't suffer having them too.

Okay. What do you mean by hideable? I'm hoping not to have to use any javascript except for the menu.

Exactly that. Javascript powered "Show Example"/"Hide Example" code snippet.

I already went out of my way to make a case for the tree component, and I think most people accepted it. The major selling point was that it's a module which can be tested by itself on many browsers. I'm not sure I want to be an advocate for more javascript, until the general prohibition is lifted. I do like the idea.

...

__________________________________________________________________

...
...
4. docs: all <img> elements should have their width/height set in HTML

For slow connections, I guess?

For people who switched off images by default.

I see.

...

__________________________________________________________________

...
...
5. concepts docs: all concepts here can be also listen in table, sorted and with one line info.

The individual concepts should be added as sub tree into left panel. Dtto compression filters, etc. Every page should be accessible from left panel.

I'll try to do this, but I might find that the menu is so big it takes too long to load.

I got confused seeing pages in right panel for whom there was no left panel item. It got me uneasy feeling I'll surely miss something.

Okay.

...

__________________________________________________________________

...
...
11. A collection of useful stream(buffer)s could be added into library, so they are handy to users:

...
- repeater (some sequence of characters over and over) This one could ideally accept syntax used inn Boost.Assign. It would allow to easily create complex but predictable test data structures.

Could you elaborate?

Thinko with Boost.Assigne. I meant something like:

repeater_stream(repeat(2, 'a', 'b', 'c', 'd', repeat('e', 4)), 'f', 'g');

which would produce sequence:

a, b, c, d, e, e, e, e, a, b, c, d, e, e, e, e, f, g and again

The syntax is secondary, main point is that the generated data could be typed easily.

Could you give an example application?

...

...
...
- 'a switcher'. Something what sends data into another stream but has method switch_to_other_stream(s). E.g. for circular logs.

Doable, but presents some problems with buffer synchronization. There is currently no generic sync() function or Synchronizable concept.

Maybe this could be added (with Flush name)?

I had this before, but ripped it out when I added close -- probably prematurely.

...

...
...
13. Possible feature of library

...
It would be nice to be able to 'send a command' down the stream and have some part of chain picking it up and executing. This way sender would not need to know anything about current chain structure.

Similarly one can imagine events going upstream. If anyone handles them: OK, if not they will be ignored.

Sounds interesting. I believe this would have to rely on runtime rather

_________________________________________________________________ than

...
compile-time polymorphism. I'll think about it.

Yes, maybe something typeless:

x_stream s;

unsigned invoked_count = s.send_command("cmd name", boost::any& parameter);

s.register_event_handler("event name", boost::function<void, stream_base*> parameter); s.unregister_event_handlers("event name");

x_stream { bool on_command(const char* cmd, boost::any& parameter) { .... invoke_event("event_name"); send_command("other command downstream", parameter); .... return true; // continue with this command downstream } };

I'll have to give this more thought. Maybe I'll email you.

...

...
...
__________________________________________________________________ 14. May it be possible to add "synchronize panels" button into left panel? If clicked, it would expand and highlight item in left panel for currently opened page in right panel.

...

I noticed this button only now. When I am on page not listed in left panel it just does left panel refresh. Maybe it could at least pick 'parent' or 'related' page in left panel.

This is on my 'to do' list.

...

...
...
15, cloable.html: it should be explained (+ example) what 'notification' here means.

How's this: A Closable filter or resource receives notifications -- via

__________________________________________________________________ the

...
function <A HREF='...'>boost::io::close</A> --immediately before a containing stream or stream buffer is closed.

I have uneasy feeling about the word 'notification'. Maybe "When you close stream, filters implementing Closeable concept will have their member function close() called" or so.

Okay. Jonathan

Pavel Vozenilek

10:04 p.m.

"Jonathan Turkanis" wrote:

...

...
I meant something like:

repeater_stream(repeat(2, 'a', 'b', 'c', 'd', repeat('e', 4)), 'f', 'g');

which would produce sequence:

a, b, c, d, e, e, e, e, a, b, c, d, e, e, e, e, f, g and again

The syntax is secondary, main point is that the generated data could be typed easily.

Could you give an example application?

...

...
...
Doable, but presents some problems with buffer synchronization. There is currently no generic sync() function or Synchronizable concept.

Maybe this could be added (with Flush name)?

I had this before, but ripped it out when I added close -- probably

Generator of tests for compression engine. But it coule be worked around: vector<char> v; v += 'a, b', repeat(.....), ....... stream_from_vector s(v); so its not really needed. Just be be able to repeat input collection N times or forever should be enough. prematurely.

...

Would it make sense to have Discard concept which would discard whatever cached data the streams may hold? /Pavel

Jonathan Turkanis

6 Sep 6 Sep

1:56 a.m.

"Pavel Vozenilek" <pavel_vozenilek@hotmail.com> wrote in message news:chde69$sre$1@sea.gmane.org...

...

"Jonathan Turkanis" wrote:

...
...
I meant something like:

repeater_stream(repeat(2, 'a', 'b', 'c', 'd', repeat('e', 4)), 'f', 'g');

which would produce sequence:

a, b, c, d, e, e, e, e, a, b, c, d, e, e, e, e, f, g and again

The syntax is secondary, main point is that the generated data could be typed easily.

Could you give an example application?

Generator of tests for compression engine.

But it coule be worked around: vector<char> v; v += 'a, b', repeat(.....), ....... stream_from_vector s(v);

so its not really needed. Just be be able to repeat input collection N times or forever should be enough.

Okay.

...

...
...
...
Doable, but presents some problems with buffer synchronization. There is currently no generic sync() function or Synchronizable concept.

Maybe this could be added (with Flush name)?

I had this before, but ripped it out when I added close -- probably prematurely.

Would it make sense to have Discard concept which would discard whatever cached data the streams may hold?

I'm starting to think it would be very useful to have a Flushable concept, which would allow components to advertise that they can send all buffered data downstream at any time. I haven't given it much though yet, but I think it would allow OuputFilters to be inserted and removed in the middle of processing a character sequence, as Dietmar suggested. Input and output are usually very similar, but in this case there doesn't seem to be a suitable concept corresponding to Flushable for input sequences. True, one could discard data that had been read in advance for efficiency, but is that what users would really want? Ideally, one would want each component in the chain to put back the buffered data. This sounds like it might be possible, until you realize that some of the data would have to be 'unfiltered' before it could be put back. Any suggestions? Jonathan

Pavel Vozenilek

7:13 a.m.

"Jonathan Turkanis" wrote:

...

...
Would it make sense to have Discard concept which would discard whatever cached data the streams may hold?

I'm starting to think it would be very useful to have a Flushable concept, which would allow components to advertise that they can send all buffered data downstream at any time. I haven't given it much though yet, but I think it would allow OuputFilters to be inserted and removed in the middle of processing a character sequence, as Dietmar suggested.

...

Input and output are usually very similar, but in this case there doesn't seem to be a suitable concept corresponding to Flushable for input sequences. True, one could discard data that had been read in advance for efficiency, but is that what users would really want? Ideally, one would want each component in

...

chain to put back the buffered data. This sounds like it might be

I can think about following concepts (interfaces) for streams: - close (does flush for output streams) - flush any caches (for ouput streams) - discard any cached data E.g. streams are used to send data to radio link. If problem in airspace is detected, current transmission is stopped, unsent data thrown aways, jamming sequence tramsitted and then normal transmission will continue later. For InputStreams one may seek N bytes ahead in file or skip next N bytes from socket. I don't know if this can be propagated in chain. - halt temporarily/restart (for output streams only?). This action may be propagated downstream since some filters may have busy sub-threads of their own - reset stream settings, either for individual stream or all of them at once. I have no clue how generic solution may look like - generic command/event interface for stream specific actions. E.g. using named commands/events. I could invent more exotic functionality, e.g. - bypass a stream in chain temporarily (I have no use case for it) - replace stream X (anywhere/first occurence within chain) for stream Y. This would require unique identification of X - generate string description of what is in stream chain (like stream::dump() for debugging) - serialize/deserialize whole stream chain (deserialization would open files/memory mappings, do seek, restore caches, whatever is feasible) The dreaming got me an idea: streams used for testing may generated data as function of time, e.g. simulate slow sinusoid wave. Maybe an example of InputStream doing this could be added as starting point for experimenting. There's one more stream that could be added into library: thread-bridging stream. It would simply transfer data across thread boundary. There could be also example of how to handle 'buffer overflow' errors e.g. when writing into full socket or time series generated too fast to be read. Something a user can look at, learn and use without need to dig into Standard. the possible,

...

until you realize that some of the data would have to be 'unfiltered' before it could be put back.

The 'unfilter' sounds as impossible. I feel InputStreams should stay simpler. OutputStreams often need to be rather sophisticated. /Pavel

Pavel Vozenilek

7:13 a.m.

"Jonathan Turkanis" wrote:

...

...
Would it make sense to have Discard concept which would discard whatever cached data the streams may hold?

I'm starting to think it would be very useful to have a Flushable concept, which would allow components to advertise that they can send all buffered data downstream at any time. I haven't given it much though yet, but I think it would allow OuputFilters to be inserted and removed in the middle of processing a character sequence, as Dietmar suggested.

...

Input and output are usually very similar, but in this case there doesn't seem to be a suitable concept corresponding to Flushable for input sequences. True, one could discard data that had been read in advance for efficiency, but is that what users would really want? Ideally, one would want each component in

...

chain to put back the buffered data. This sounds like it might be

...

until you realize that some of the data would have to be 'unfiltered' before it could be put back.

The 'unfilter' sounds as impossible. I feel InputStreams should stay simpler. OutputStreams often need to be rather sophisticated. /Pavel

Jonathan Turkanis

7 Sep 7 Sep

7:24 p.m.

"Pavel Vozenilek" <pavel_vozenilek@hotmail.com> wrote in message news:chi276$cg7$1@sea.gmane.org...

...

"Jonathan Turkanis" wrote:

...
...
Would it make sense to have Discard concept which would discard whatever cached data the streams may hold?

I'm starting to think it would be very useful to have a Flushable concept,

which

...

...
would allow components to advertise that they can send all buffered data downstream at any time. I haven't given it much though yet, but I think it would allow OuputFilters to be inserted and removed in the middle of processing a character sequence, as Dietmar suggested.

I can think about following concepts (interfaces) for streams:

- close (does flush for output streams)

This is already implemented, as you know.

...

- flush any caches (for ouput streams)

This was implemented, then torn out. I'm thinking I should put it back (but I have to decide on the correct semantics). Reordering:

...

- reset stream settings, either for individual stream or all of them at once. I have no clue how generic solution may look like

You can reset the state of a filter by calling close. Filters which are not Closable can be assumed to be stateless. The problem comes with the end of the chain. Calling close on a resource may cause it to send buffered data downstream, which may not be what is desired. Furthermore, though I left it out of the docs, Resources are implicitly treated as single-use, so they don't typically implement close(). Cleanup occurs at destruction. So it looks to me like reset would have to be an addition to the Resource interface.

...

- discard any cached data

E.g. streams are used to send data to radio link. If problem in airspace is detected, current transmission is stopped, unsent data thrown aways, jamming sequence tramsitted and then normal transmission will continue later.

This would be almost easy, using flushable, since you could flush all the filters and connect the last one temporarily to a null_sink. Unfortunately, you'd need a special function to tell the resource to discard its data.

...

For InputStreams one may seek N bytes ahead in file or skip next N bytes from socket. I don't know if this can be propagated in chain.

Skipping a certain number of characters in the utlimate filtered sequence is easy. Skipping a certain number of characters in the sequence controled by the resource or by an intermediate filter sounds like it *might* be possible without changing the interface. I'll have to think about it.

...

- halt temporarily/restart (for output streams only?). This action may be propagated downstream since some filters may have busy sub-threads of their own

I'm not sure I understand.

...

- reset stream settings, either for individual stream or all of them at once. I have no clue how generic solution may look like

This sounds like discard + open.

...

- generic command/event interface for stream specific actions. E.g. using named commands/events.

This might be a better approach than making the Concept interfaces fatter and fatter.

...

I could invent more exotic functionality, e.g. - bypass a stream in chain temporarily (I have no use case for it)

This would be useful, but raises some synchronization issues like the ones discussed above.

...

- replace stream X (anywhere/first occurence within chain) for stream Y. This would require unique identification of X

Since the type information is lost -- as far as templates are concerned -- when a filter or resource is added to a chain, some sort of RTTI has to be used. (I've seriously considered this, but haven't decided how useful it would be.)

...

- generate string description of what is in stream chain (like stream::dump() for debugging)

The character sequences, or the filter/resource sequence?

...

- serialize/deserialize whole stream chain (deserialization would open files/memory mappings, do seek, restore caches, whatever is feasible)

Great! If you are volunteering to implement it ;-)

...

The dreaming got me an idea: streams used for testing may generated data as function of time, e.g. simulate slow sinusoid wave. Maybe an example of InputStream doing this could be added as starting point for experimenting.

Interesting.

...

There's one more stream that could be added into library: thread-bridging stream. It would simply transfer data across thread boundary.

This is very useful. I'd defintely like to implement it, though I have a feeling it must wait for a resoultion of the blocking/non-blocking issue. I also want to write a windows-posix version of the 'unix filters' propsed by JC van Winkel and John van Krieken, which send data to the standard input of another process and forward the standard output of that process to the next downstream filter. Really, it should be implemented as an InoutResource, which filters can use internally if they want.

...

There could be also example of how to handle 'buffer overflow' errors e.g. when writing into full socket or time series generated too fast to be read. Something a user can look at, learn and use without need to dig into Standard.

Could you elaborate?

...

...
chain to put back the buffered data. This sounds like it might be possible, until you realize that some of the data would have to be 'unfiltered' before it could be put back.

The 'unfilter' sounds as impossible.

Right. Think of toupper.

...

I feel InputStreams should stay simpler. OutputStreams often need to be rather sophisticated.

This sounds reasonable. Thanks again for your ideas. Jonathan

Pavel Vozenilek

9:37 p.m.

"Jonathan Turkanis" wrote:

...

...
- reset stream settings, either for individual stream or all of them at once. I have no clue how generic solution may look like

You can reset the state of a filter by calling close.

Maybe. Maybe one may like just to modify some minor property of pipeline w/o affecting its main functionality.

...

...
- discard any cached data

E.g. streams are used to send data to radio link. If problem in airspace is detected, current transmission is stopped, unsent data thrown aways, jamming sequence tramsitted and then normal transmission will continue later.

This would be almost easy, using flushable, since you could flush all the filters and connect the last one temporarily to a null_sink. Unfortunately, you'd need a special function to tell the resource to discard its data.

I see two problems: - the flushing may take lot of time, if some complex filters are involved, discarding may be required quickly-handle-error technique - it is not really sure flush would flush: for example filter removing certain word may wait intil words get finished. What is flush() semantic of theirs?

...

...
- halt temporarily/restart (for output streams only?). This action may be propagated downstream since some filters may have busy sub-threads of their own

I'm not sure I understand.

Equivalent to putting null sink on the top of pipeline (and maybe after each part of pipeline, if the part acts asynchonously). Not sure now it it is worth the troubles.

...

...
- reset stream settings, either for individual stream or all of them at once. I have no clue how generic solution may look like

This sounds like discard + open.

Not really, rather like sending SIGHUP to a Unix process. No data loss, no open/close.

...

...
- generic command/event interface for stream specific actions. E.g. using named commands/events.

This might be a better approach than making the Concept interfaces fatter and fatter.

Functions like flush() etc have exact interface, well-defined names and semantic. The event/command would be likely typeless. I think generic functionality should be explicitly exposed and the rest could be handled by commands.

...

...
- generate string description of what is in stream chain (like stream::dump() for debugging)

The character sequences, or the filter/resource sequence?

The latter. E.g. for to print debug into on console.

...

...
- serialize/deserialize whole stream chain (deserialization would open files/memory mappings, do seek, restore caches, whatever is feasible)

Great! If you are volunteering to implement it ;-)

Well...it was just an idea.

...

...
There could be also example of how to handle 'buffer overflow' errors e.g. when writing into full socket or time series generated too fast to be read.

Could you elaborate?

Example which throws (and exception gets caught and handled), example which sets failbit or whatever and how it could be handled. /Pavel

Jonathan Turkanis

8 Sep 8 Sep

7:32 p.m.

"Pavel Vozenilek" <pavel_vozenilek@hotmail.com> wrote in message news:chl9nu$sh9$1@sea.gmane.org...

...

"Jonathan Turkanis" wrote:

Pavel, I clearly misunderstood some of your suggesstion (sorry). Before getting back into the details, let me make some general comments: I'm hoping to stick, as far as possible, to concepts which can be given correct default implementations for filters and resources which don't model them. Otherwise, a concept will be useful only when all the components in a chain of filter and resources model it, so either - it won't be very useful, since lots of common components won't model it, OR - users will in effect be required to make their components model the additional concepts, which interferes with my goal of making it extremely easy to write simple filters and resources. Therefore, to the extent possible, I'd like to discover the bare minimum requirements on a filter or resource which would allow reasonable default behaviors to be defined for the proposed concepts. Here's a suggestion for a concept which might be reusable in this way (view with fixed-width font): Concept: Resettable Valid Expressions | Return Type | Semantics -------------------------------------------------------------------- t.reset(discard) | bool | resets t to the state it had | | upon construction, discarding | | cached data if discard is true. | | Returns true for success. Perhaps reset(false) could be a resonable default implementation for open and/or close for many simple filters such as the newline filter, uncommenting filter, etc Here's a problem, though: Implementing open and/or close will still be common enough that I don't want library users to have to specify the categories tags openable_tag and closable_tag explicitly. So when I define the convenience base classes, I want to make the i/o category refine openable, closable and resettable, and provide default no-op implementations. Let'd say the convenience base class input_filter looks like this: struct input_filter { typedef char char_type; struct category : input_filtering_tag, openable_tag, closable_tag, localizable_tag { }; template<typename Source> void open(Source&) { } template<typename Source> void close(Source&) { } void reset(bool discard) { } ... }; Now, if the user only implements reset, I'd like boost::io::open and boost::io::close to call reset. But if the user implements open, boost::io::open should call open. Maybe this could be done by comparing the member pointers &my_input_filter::close<Dummy> and &input_filter::close<Dummy>, but 1. I don't know if this is portable 2. It creates wasteful instantiations. The other solution would be to use CRTP: template<typename Derived> struct input_filter { typedef char char_type; struct category : input_filtering_tag, openable_tag, closable_tag, localizable_tag { }; template<typename Source> void open(Source&) { static_cast<Derived&>(*this).reset(false); } template<typename Source> void close(Source&) { static_cast<Derived&>(*this).reset(false); } void reset(bool discard) { } }; But this make using the convenience base classes more complex: struct my_input_filter : input_filter<my_input_filter> { ... };

...

...
...
- reset stream settings, either for individual stream or all of them at once. I have no clue how generic solution may look like

You can reset the state of a filter by calling close.

Maybe. Maybe one may like just to modify some minor property of pipeline w/o affecting its main functionality.

The Resettable concept, above, is more along these lines.

...

...
...
- discard any cached data

E.g. streams are used to send data to radio link. If problem in airspace is detected, current transmission is stopped, unsent data thrown aways, jamming sequence tramsitted and then normal transmission will continue later.

We could use reset(true) for this case.

...

...
This would be almost easy, using flushable, since you could flush all the filters and connect the last one temporarily to a null_sink. Unfortunately, you'd need a special function to tell the resource to discard its data.

I see two problems:

- the flushing may take lot of time, if some complex filters are involved, discarding may be required quickly-handle-error technique

Right.

...

- it is not really sure flush would flush: for example filter removing certain word may wait intil words get finished. What is flush() semantic of theirs?

This is an important point. When I originally implemented Flushable, flush() was just a 'suggestion' to flush buffers. So users could never be sure there wasn't any cached data remaining. I did it this way because for many filters, as you mentioned, you can't force a complete flush without violating data integrity. But this whimpy version of flush turned out not to be very useful, which is partly why I eliminated it. If I restore the concept, I will probably have flush return true only if all cached data could be safely sent downstream. This means that sometimes flushing a filtering_ostream will fail, and in those cases you won't be able to insert or remove filters while i/o is in progress. Many text oriented filters will probably be line-preserving, in the sense that a single line of text is translated to one or more complete lines. Such filters will often be flushable as long as the end of a line has been reached, so the end user will know that it's okay to flush the stream after writing a line and then to insert or remove filters.

...

...
...
- halt temporarily/restart (for output streams only?). This action may be propagated downstream since some filters may have busy sub-threads of their own

I'm not sure I understand.

Equivalent to putting null sink on the top of pipeline (and maybe after each part of pipeline, if the part acts asynchonously).

Not sure now it it is worth the troubles.

Neither am I ;-). Maybe if you can think of an important use case ...

...

...
...
- reset stream settings, either for individual stream or all of them at once. I have no clue how generic solution may look like

This sounds like discard + open.

Not really, rather like sending SIGHUP to a Unix process. No data loss, no open/close.

Hmmm. I guess I quoted the same part of you message twice. Anyway, this would be reset(false).

...

...
...
- generic command/event interface for stream specific actions. E.g. using named commands/events.

This might be a better approach than making the Concept interfaces fatter and fatter.

Functions like flush() etc have exact interface, well-defined names and semantic. The event/command would be likely typeless. I think generic functionality should be explicitly exposed and the rest could be handled by commands.

I agree.

...

...
...
- generate string description of what is in stream chain (like stream::dump() for debugging)

The character sequences, or the filter/resource sequence?

The latter. E.g. for to print debug into on console.

This is a good idea for a debug-mode feature, but I think it only really makes sense if I implement flushable and allow filters to be swapped in and out during i/o. Otherwise, it should be pretty obvious which filters are in a chain just by looking the sequence of pushes.

...

...
...
There could be also example of how to handle 'buffer overflow' errors e.g. when writing into full socket or time series generated too fast to be read.

Could you elaborate?

Example which throws (and exception gets caught and handled), example which sets failbit or whatever and how it could be handled.

I see. I think I should put some examples in the user's guide under 'Exceptions'.

...

/Pavel

Jonathan

Pavel Vozenilek

9 Sep 9 Sep

10:04 a.m.

"Jonathan Turkanis" wrote: ____________________________________________________

...

Here's a suggestion for a concept which might be reusable in this way (view with fixed-width font):

Concept: Resettable

Valid Expressions | Return Type | Semantics -------------------------------------------------------------------- t.reset(discard) | bool | resets t to the state it had | | upon construction, discarding | | cached data if discard is true. | | Returns true for success.

Counterexample: the radio link pipeline has member counting # of passed bytes, member that keeps ouput rate in sync with current air conditions. If you want discard data in pipeline, you would reset all these settings as side-effect. I have feeling there should be two concepts: - reset(bool discard) - discard() ____________________________________________________

...

So when I define the convenience base classes, I want to make the i/o category refine openable, closable and resettable, and provide default no-op implementations.

...

Now, if the user only implements reset, I'd like boost::io::open and boost::io::close to call reset.

... variant with dummy instance + member function compare ... variant with CRTP

...

...
- it is not really sure flush would flush: for example filter removing certain word may wait intil words get finished. What is flush() semantic of theirs?

This is an important point. When I originally implemented Flushable, flush() was just a 'suggestion' to flush buffers. So users could never be sure

I do not have clear picture: when io::open() is called the real type of filter is not known? ____________________________________________________ there

...

wasn't any cached data remaining. I did it this way because for many filters, as you mentioned, you can't force a complete flush without violating data integrity. But this whimpy version of flush turned out not to be very useful, which is partly why I eliminated it.

If I restore the concept, I will probably have flush return true only if all cached data could be safely sent downstream. This means that sometimes flushing a filtering_ostream will fail, and in those cases you won't be able to insert or remove filters while i/o is in progress.

Maybe bool flush(bool force = false). E.g. close() should probably call flush(true) to avoid data loss. If an filter decides individually its current data just cannot be flushed it could ignore the 'force' flag. The flush() returns true if flush was complete. User then may discard the rest of data sitting in caches. ____________________________________________________

...

...
...
...
- halt temporarily/restart (for output streams only?). This action may be propagated downstream since some filters may have busy sub-threads of their own

I'm not sure I understand.

Equivalent to putting null sink on the top of pipeline (and maybe after each part of pipeline, if the part acts asynchonously).

Not sure now it it is worth the troubles.

Neither am I ;-). Maybe if you can think of an important use case ...

Only contrived: someone has reference to one and only one member of pipeline, not to initial data source or end data sink. This one may want to halt temporarily the flow but doesn't want any dependencies on the rest of application. E.g. module who keeps TCP/IP throughput limit and manages multiple opened sockets. User registers socket_streams and they could all be halted/restarted by the module. (Here the halt means stop reading/sending, not null-sink equivalent. The make-it-null-sink looks as yet another thing.) ____________________________________________________

...

...
...
...
- generate string description of what is in stream chain (like stream::dump() for debugging)

The character sequences, or the filter/resource sequence?

The latter. E.g. for to print debug into on console.

This is a good idea for a debug-mode feature, but I think it only really makes sense if I implement flushable and allow filters to be swapped in and out during i/o. Otherwise, it should be pretty obvious which filters are in a chain just by looking the sequence of pushes.

And good idea for maintenace programmer. ____________________________________________________ Few notes from reading sources: 1. scope_guard.hpp: maybe this file could be moved into boost/detail and used by multi_index + iostreams until something gets boostified. 2. utility/select_by_size.hpp: would it be possible to use PP local iteration here to make it bit simpler? 3. io/zlib.hpp: #ifdef BOOST_MSVC # pragma warning(push) # pragma warning(disable:4251 4231 4660) // Dependencies not exported. #endif could be rather #if BOOST_WORKAROUND(BOOST_MSVC, <= ....) # pragma warning(push) # pragma warning(disable:4251 4231 4660) // Dependencies not exported. #endif 4. the #if (defined _MSC_VER) && (_MSC_VER >= 1200) # pragma once #endif should be added everywhere. I do not see it in sources I have here, maybe its old version. 5. io/io_traits.hpp: #define BOOST_SELECT_BY_SIZE_MAX_CASE 9 ==> #ifndef BOOST_SELECT_BY_SIZE_MAX_CASE # define BOOST_SELECT_BY_SIZE_MAX_CASE 9 #endif 6. docs "Function Template close": the link to the header is broken. In this page: somesting like UML's sequnce or collaboration diagram could be added. (I am visual type so I always ask for pictures and code examples.) 7. windows_posix_config.hpp: I find the macros BOOST_WINDOWS/BOOST_POSIX too general for iostreams. Either they should be in Boost.Config or something as BOOST_IO_WINDOWS should be used. It may also fail on exotics as AS400. 8. Maybe the library files could be more structured. Right now its directories contain 32, 24, 11 and 9 files and it may be confusing to orient in it. There may be subdirectories as utils/, filters/, filters/compression/ etc. 9. Maybe assert() could be replaced with BOOST_ASSERT(). 10. disable_warnings.hpp: maybe this and other similar files could be merged into iostream_config.hpp. (Others: enable_stream.hpp.) 11. details/assert_convertible.hpp: The macro here doesn't feel as useful too much. Straight BOOST_STATIC_ASSERT(is_convertible....) would take the same space and would convey more information and immediatelly. 12. detail/access_control.hpp: this feel as as it could be in boost/utility/ or in boost/detail/. An example(s) could be provided in the header. 13. detail/buffer.hpp: should have #include <boost/noncopyable.hpp> Commenting nitpick: maybe instead of // Template name: buffer // Description: Character buffer. // Template paramters: // Ch - The character type. // Alloc - The Allocator type. // template< typename Ch, typename Alloc = std::allocator<Ch> > class basic_buffer : private noncopyable { could be // what it is used for.... template< typename Ch, typename Alloc = std::allocator<Ch> > class basic_buffer : private noncopyable { Too many comments here are obvious and this makes reader easy to skip them all. OTOH it should be explained why there's basic_buffer and buffer and why its functionality isn't merged into one coherent class. This is far from obvious. E.g. the design allows swap(basic_buffer, buffer). It should be explained why std::vector or boost::array isn't enough. Maybe these class(es) could be factored out into boost/details/ or standalone mini-library. 14. detail/config.hpp: the trick with #ifndef BOOST_IO_NO_SCOPE_GUARD I have feeling it should be removed. Compilers who cannot handle even scope guard are not worth to support. Or the scope_guard for these could be dummy. Number of macros in iostreams library would be better reduced. The BOOST_IO_NO_FULL_SMART_ADAPTER_SUPPORT macro: it should be explained in code comment what it means. Maybe something like it could be moved into Boost.Config or maybe something like this is already in Boost.Config. BOOST_IO_DECL: this should be in Boost.Config as BOOST_DECLSPEC. Other libraries (regex) have their own macros and it is all one big mess. 15. converting_chain.hpp: how brackets are positioned would be better unified. Here I see the if (xxxxxxxxx) { .... } and elsewhere Kernigham notation and it makes me wonder whether whether it has some hidden meaning. Comments in code would be welocomed here. 16. double_object.hpp: typo in source "simalr" Wouldn't it make sense to have this in compressed_pair library? 17. forwarding.hpp: the macro BOOST_IO_DEFINE_FORWARDING_FUNCTIONS is used on exactly one place. Maybe it could be defined/used/undefined here to make overall structure simpler. 18. io/detail/streambufs/ : there are temporary files indirect_streambuf.hpp.bak.hpp and indirect_streambuf.~hpp. 19. details/chain.hpp: #if defined(BOOST_MSVC) && _MSC_VER == 1300 virtual ~chain_base() { } // If omitted, some tests fail on VC7.0. Why? #endif doesn't this change semantics of the class? Ans ASCII class diagram could be here. 20. detail/iterator_traits.hpp: looking on specializations: would it make sense to have unsigned char/signed char versions as well? (There are more places with explicit instantions that would need possible update). 21. test lzo.cpp refers to non-existing boost/io/lzo.hpp. 22. io/file.hpp: what is exactly reason to have pimpl in basic_file_resource class? I do not see headers that don't need to be included, I do not see dynamic swicthing of pimpls, I do not see eager or lazy optimizations. I see only overhead and complexity. 23. io/memmap_file.hpp: why is pimpl in mapped_file_resource class? 24. io/regex_filter.hpp, function do_filter() contains: void do_filter(const vector_type& src, vector_type& dest) { ...... iterator first(&src[0], &src[0] + src.size(), re_, flags_); Is the &src[0] safe if vec is empty? I don't know if standard allows it, it just caught my eyes. /Pavel

Pavel Vozenilek

5:27 p.m.

"Jonathan Turkanis" wrote: ____________________________________________________

...

Here's a suggestion for a concept which might be reusable in this way (view with fixed-width font):

Concept: Resettable

Valid Expressions | Return Type | Semantics -------------------------------------------------------------------- t.reset(discard) | bool | resets t to the state it had | | upon construction, discarding | | cached data if discard is true. | | Returns true for success.

...

So when I define the convenience base classes, I want to make the i/o category refine openable, closable and resettable, and provide default no-op implementations.

...

Now, if the user only implements reset, I'd like boost::io::open and boost::io::close to call reset.

... variant with dummy instance + member function compare ... variant with CRTP

...

...
- it is not really sure flush would flush: for example filter removing certain word may wait intil words get finished. What is flush() semantic of theirs?

This is an important point. When I originally implemented Flushable, flush() was just a 'suggestion' to flush buffers. So users could never be sure

I do not have clear picture: when io::open() is called the real type of filter is not known? ____________________________________________________ there

...

wasn't any cached data remaining. I did it this way because for many filters, as you mentioned, you can't force a complete flush without violating data integrity. But this whimpy version of flush turned out not to be very useful, which is partly why I eliminated it.

If I restore the concept, I will probably have flush return true only if all cached data could be safely sent downstream. This means that sometimes flushing a filtering_ostream will fail, and in those cases you won't be able to insert or remove filters while i/o is in progress.

...

...
...
...
- halt temporarily/restart (for output streams only?). This action may be propagated downstream since some filters may have busy sub-threads of their own

I'm not sure I understand.

Equivalent to putting null sink on the top of pipeline (and maybe after each part of pipeline, if the part acts asynchonously).

Not sure now it it is worth the troubles.

Neither am I ;-). Maybe if you can think of an important use case ...

...

...
...
...
- generate string description of what is in stream chain (like stream::dump() for debugging)

The character sequences, or the filter/resource sequence?

The latter. E.g. for to print debug into on console.

This is a good idea for a debug-mode feature, but I think it only really makes sense if I implement flushable and allow filters to be swapped in and out during i/o. Otherwise, it should be pretty obvious which filters are in a chain just by looking the sequence of pushes.

Jonathan Turkanis

10 Sep 10 Sep

12:51 a.m.

"Pavel Vozenilek" <pavel_vozenilek@hotmail.com> wrote in message news:chq3qp$pc4$2@sea.gmane.org...

...

"Jonathan Turkanis" wrote:

____________________________________________________

...
Here's a suggestion for a concept which might be reusable in this way (view with fixed-width font):

Concept: Resettable

Valid Expressions | Return Type | Semantics -------------------------------------------------------------------- t.reset(discard) | bool | resets t to the state it had | | upon construction, discarding | | cached data if discard is true. | | Returns true for success.

Counterexample: the radio link pipeline has member counting # of passed bytes, member that keeps ouput rate in sync with current air conditions.

If you want discard data in pipeline, you would reset all these settings as side-effect.

I see.

...

I have feeling there should be two concepts: - reset(bool discard) - discard()

...

____________________________________________________

...
So when I define the convenience base classes, I want to make the i/o category refine openable, closable and resettable, and provide default no-op implementations.

...
Now, if the user only implements reset, I'd like boost::io::open and boost::io::close to call reset.

... variant with dummy instance + member function compare ... variant with CRTP

I do not have clear picture: when io::open() is called the real type of filter is not known?

Right now, there's no function open. And in pronciple, there never needs to be since acomponent can set an 'open' flag upon each i/o operation, and clear it when close() is called. I'm thinking Openable might be a good addition, as a convenience. If so, it would be called as soon as a chain becomes complete.

...

...
...
- it is not really sure flush would flush: for example filter removing certain word may wait intil words get finished. What is flush() semantic of theirs?

This is an important point. When I originally implemented Flushable, flush() was just a 'suggestion' to flush buffers. So users could never be sure there wasn't any cached data remaining. I did it this way because for many filters, as you mentioned, you can't force a complete flush without violating data integrity. But this whimpy version of flush turned out not to be very useful, which is partly why I eliminated it.

If I restore the concept, I will probably have flush return true only if all cached data could be safely sent downstream. This means that sometimes flushing a filtering_ostream will fail, and in those cases you won't be able to insert or remove filters while i/o is in progress.

Maybe bool flush(bool force = false).

...

E.g. close() should probably call flush(true) to avoid data loss.

If an filter decides individually its current data just cannot be flushed it could ignore the 'force' flag.

The flush() returns true if flush was complete. User then may discard the rest of data sitting in caches.

Sounds good. _______________________________________________

...

...
...
...
...
- halt temporarily/restart (for output streams only?). This action may be propagated downstream since some filters may have busy sub-threads of their own

I'm not sure I understand.

Equivalent to putting null sink on the top of pipeline (and maybe after each part of pipeline, if the part acts asynchonously).

Not sure now it it is worth the troubles.

Neither am I ;-). Maybe if you can think of an important use case ...

Only contrived: someone has reference to one and only one member of pipeline, not to initial data source or end data sink. This one may want to halt temporarily the flow but doesn't want any dependencies on the rest of application.

E.g. module who keeps TCP/IP throughput limit and manages multiple opened sockets. User registers socket_streams and they could all be halted/restarted by the module.

(Here the halt means stop reading/sending, not null-sink equivalent. The make-it-null-sink looks as yet another thing.)

I think this could be done currently just by keeping a reference to the filter. I'm fairly confident we can work out reasonable semantics for these operations. To me, the most important question is what should happen if not all the components in a chain model the concept. I'm very nervous about assuming that a no-op is a reasonable default behavior, so I'm inclined to say the operations must fail in that case. But doesn't that, in effect, force people who want to write reusable components to provide implementations for a long list of operations? ________________________________________________

...

...
...
...
...
- generate string description of what is in stream chain (like stream::dump() for debugging)

The character sequences, or the filter/resource sequence?

The latter. E.g. for to print debug into on console.

This is a good idea for a debug-mode feature, but I think it only really makes sense if I implement flushable and allow filters to be swapped in and out during i/o. Otherwise, it should be pretty obvious which filters are in a chain just by looking the sequence of pushes.

And good idea for maintenace programmer.

True.

...

____________________________________________________ Few notes from reading sources:

1. scope_guard.hpp: maybe this file could be moved into boost/detail and used by multi_index + iostreams until something gets boostified.

I'm for this. Unfortunately, my simplified scope_guard isn't working on CW8.3 (though it passes the regression tests I've written for it.) So I'll probably use Joaquín's.

...

2. utility/select_by_size.hpp: would it be possible to use PP local iteration here to make it bit simpler?

Probably. I didn't know about local iteration when I wrote it ;-)

...

3. io/zlib.hpp:

#ifdef BOOST_MSVC # pragma warning(push) # pragma warning(disable:4251 4231 4660) // Dependencies not exported. #endif

could be rather

#if BOOST_WORKAROUND(BOOST_MSVC, <= ....) # pragma warning(push) # pragma warning(disable:4251 4231 4660) // Dependencies not exported. #endif

Good. I guess I should use TESTED_AT here.

...

4. the

#if (defined _MSC_VER) && (_MSC_VER >= 1200) # pragma once #endif

should be added everywhere. I do not see it in sources I have here, maybe its old version.

It was a relatively recent addition. It seems to be on the web; maybe I didn't update the zips. Anyway, the version you have should be almost identical. ( I actually used #if defined(_MSC_VER) && (_MSC_VER >= 1020) # pragma once #endif which I copied from somewhere; this is a cruel joke because I doubt the library will ever work for _MSC_VER < 1300.)

...

5. io/io_traits.hpp:

#define BOOST_SELECT_BY_SIZE_MAX_CASE 9

==>

#ifndef BOOST_SELECT_BY_SIZE_MAX_CASE # define BOOST_SELECT_BY_SIZE_MAX_CASE 9 #endif

The docs for select-by-size are here: http://tinyurl.com/3vjwf. Maybe I should add it to the current lib docs. The usage is supposed to be #define BOOST_SELECT_BY_SIZE_MAX_CASE xxx #include <boost/utility/select_by_size.hpp> Including the header undef's BOOST_SELECT_BY_SIZE_MAX_CASE.

...

6. docs "Function Template close": the link to the header is broken.

Thanks. When I wrote that page operations was still in boost::detail.

...

In this page: somesting like UML's sequnce or collaboration diagram could be added.

Good idea. Closable is actually the hardest concept to document.

...

(I am visual type so I always ask for pictures and code examples.)

7. windows_posix_config.hpp: I find the macros BOOST_WINDOWS/BOOST_POSIX too general for iostreams. Either they should be in Boost.Config or something as BOOST_IO_WINDOWS should be used.

That's one of the other changes I made late. I had been borrowing BOOST_WINDOWS/BOOST_POSIX from Boost.Filesystem, but didn't realize until recently that cygwin users are supposed to be able to pick either configuration. That defintely doesn't work for my library.

...

It may also fail on exotics as AS400.

Do you mean neither configuration will work in this case? I have to figure out graceful ways for mmap and file descriptors to fail on unsupported systems.

...

8. Maybe the library files could be more structured. Right now its directories contain 32, 24, 11 and 9 files and it may be confusing to orient in it.

There may be subdirectories as utils/, filters/, filters/compression/ etc.

I'm leaning in that direction. If the library is accepted, I may ask your advice on organization after I decide what other filters to include.

...

9. Maybe assert() could be replaced with BOOST_ASSERT().

Okay. I always forget about BOOST_ASSERT. There doesn't even seem to be a link to it from the libraries page or from Boost.Utility.

...

10. disable_warnings.hpp: maybe this and other similar files could be merged into iostream_config.hpp.

I like to disable warnings at the beginning of a file, and enable them again at the end. I only use this in a few places.

...

(Others: enable_stream.hpp.)

Okay.

...

11. details/assert_convertible.hpp:

The macro here doesn't feel as useful too much. Straight BOOST_STATIC_ASSERT(is_convertible....) would take the same space and would convey more information and immediatelly.

I'll probably use BOOST_MPL_ASSERT(is_convertible<.. >), now that it's available.

...

12. detail/access_control.hpp: this feel as as it could be in boost/utility/ or in boost/detail/.

An example(s) could be provided in the header.

I'm glad to hear you think it may be useful.

...

13. detail/buffer.hpp: should have #include <boost/noncopyable.hpp>

Commenting nitpick: maybe instead of

// Template name: buffer // Description: Character buffer. // Template paramters: // Ch - The character type. // Alloc - The Allocator type. // template< typename Ch, typename Alloc = std::allocator<Ch> > class basic_buffer : private noncopyable {

could be

// what it is used for.... template< typename Ch, typename Alloc = std::allocator<Ch> > class basic_buffer : private noncopyable {

Too many comments here are obvious and this makes reader easy to skip them all.

Agreed. I have a semi-standard way of documenting templates, which sometime results in the sort of uninformative comments you quote above. But I don't follow this pattern consistently, so there's not much point.

...

OTOH it should be explained why there's basic_buffer and buffer and why its functionality isn't merged into one coherent class. This is far from obvious.

Okay. For the record: - The extra pointers aren't needed most of the time, so basic_buffer is used for (pretty trivial) space-saving and to emphasize that only the limited interface is used.

...

E.g. the design allows swap(basic_buffer, buffer).

I guess I could disable that.

...

It should be explained why std::vector or boost::array isn't enough.

- vector<Ch> initializes each character - boost::array is statically sized

...

Maybe these class(es) could be factored out into boost/details/ or standalone mini-library.

14. detail/config.hpp: the trick with

#ifndef BOOST_IO_NO_SCOPE_GUARD

I have feeling it should be removed. Compilers who cannot handle even scope guard are not worth to support. Or the scope_guard for these could be dummy.

Unfortunately, my scopeguard fails to work on one pretty good compiler. I assume I'll either fix it or use Joaquín's version. Either way, these conditional sections will be removed. But I need to be able to run the regression tests in the mean time.

...

Number of macros in iostreams library would be better reduced.

The BOOST_IO_NO_FULL_SMART_ADAPTER_SUPPORT macro: it should be explained in code comment what it means. Maybe something like it could be moved into Boost.Config or maybe something like this is already in Boost.Config.

This is for Borland, which seems to go into an infinte loop of template instantiations without this workaround. I'm not sure what the underlying problem is.

...

BOOST_IO_DECL: this should be in Boost.Config as BOOST_DECLSPEC. Other libraries (regex) have their own macros and it is all one big mess.

But it uses BOOST_IO_DYN_LINK. Are you saying users shouldn't be able to link dynamically to selected boost libraries? Maybe it could be BOOST_DECLSPEC(IO).

...

15. converting_chain.hpp: how brackets are positioned would be better unified. Here I see the if (xxxxxxxxx) { .... }

I don't see that in converting_chain.hpp. I typically write if's that way only if the condition takes multiple lines.

...

and elsewhere Kernigham notation and it makes me wonder whether whether it has some hidden meaning.

Are you perhaps talking about the distintion between void f() { } and void f() { } ? I prefer the latter when defining functions within a class body, because it seems more readable. But I admit I am somewhat inconsistent.

...

Comments in code would be welocomed here.

Agreed. converting_chain, converting_streambuf and converting_stream are not yet up and running.

...

16. double_object.hpp:

typo in source "simalr"

Thanks.

...

Wouldn't it make sense to have this in compressed_pair library?

I find it very useful, but I wasn't sure if others would. Adding it to compressed_pair makes sense. Maybe I'll start by putting it in detail.

...

17. forwarding.hpp: the macro BOOST_IO_DEFINE_FORWARDING_FUNCTIONS is used on exactly one place. Maybe it could be defined/used/undefined here to make overall structure simpler.

That's probably a good idea. Originally it was part of detail/push, which could be useful to end-users (when I document it). I guess when I factored out the forwarding part I didn't realize that it had limited utility.

...

18. io/detail/streambufs/ : there are temporary files indirect_streambuf.hpp.bak.hpp and indirect_streambuf.~hpp.

These are gone now.

...

19. details/chain.hpp:

#if defined(BOOST_MSVC) && _MSC_VER == 1300 virtual ~chain_base() { } // If omitted, some tests fail on VC7.0. Why? #endif

doesn't this change semantics of the class?

Theoretically. But it's never supposed to be used for run-time polymorphism. And it's an implementation detail.

...

Ans ASCII class diagram could be here.

I guess that't a reasonable request, since chain is the heart of the filtering implementation.

...

20. detail/iterator_traits.hpp: looking on specializations: would it make sense to have unsigned char/signed char versions as well?

I guess it wouldn't hurt. But is char_traits typically specialized for these types?

...

(There are more places with explicit instantions that would need possible update).

I don't see any. Unless you mean std::char_traits ;-)

...

21. test lzo.cpp refers to non-existing boost/io/lzo.hpp.

Right. I got rid of it because of copyright issues.

...

22. io/file.hpp: what is exactly reason to have pimpl in basic_file_resource class?

I do not see headers that don't need to be included, I do not see dynamic swicthing of pimpls, I do not see eager or lazy optimizations. I see only overhead and complexity.

Exception safety. I left this out of the latest rationale, but it's an important part I'm going to put back in. Several iteration ago, the usage would have been: filtering_istream in; in.push(new gzip_decompressor()); in.push(new file_source("hello.gz")); Generalizing this convention of passing by pointer and transfering ownership led to exception safety problems in other parts of the library. So the current convention is that filters and resources are passed by value, and so must be copy constructible. (streams and stream buffers are stored by reference by default, and the same effect can be achieved for an arbitrary component using boost::ref()) So, to answer your question: basic_file_resource wraps a basic_filebuf, which is generally non-copyable, so I used a shared_ptr.

...

23. io/memmap_file.hpp: why is pimpl in mapped_file_resource class?

To avoid having to include operating system headers from a header file.

...

24. io/regex_filter.hpp, function do_filter() contains:

void do_filter(const vector_type& src, vector_type& dest) { ...... iterator first(&src[0], &src[0] + src.size(), re_, flags_);

Is the &src[0] safe if vec is empty? I don't know if standard allows it, it just caught my eyes.

Good point. I'd don't know if it's safe. But it's easy enough to handle this case separately. Thanks for the very detailed criticism!

...

/Pavel

Jonathan

George M. Garner Jr.

2:23 a.m.

Johnathan, Are there zlib and bzip samples? I see the files in the lib/io/src directory but they do not appear to be a complete application. Regards, George.

Jonathan Turkanis

2:47 a.m.

"George M. Garner Jr." <gmgarner@erols.com> wrote in message news:chr374$l9$1@sea.gmane.org...

...

Johnathan,

Are there zlib and bzip samples? I see the files in the lib/io/src directory but they do not appear to be a complete application.

Currently there are only these lame examples, from the docs: http://tinyurl.com/6ndbv, http://tinyurl.com/5ymvp and http://tinyurl.com/4rd8x. But they should give you an idea how these filters are used.

...

Regards,

George.

Jonathan

George M. Garner Jr.

2:49 a.m.

Jonathan, memmap.cpp makes assumptions about character width that will not be true in the specific case where UNICODE is defined. For example, the declaration assumes that strings will be narrow but CreateFile is called. CreateFile is a macro that translates into either CreateFileA or CreateFileW depending on the _UNICODE/UNICODE macro. If _UNICODE is define the code will not compile. You should explicity call Ansi versions of Windows api's where narrow character streams are intended. A better solution for cases such as memmap.cpp would be to parameterize the character type and provide both Ansi and Unicode implementations. Try calling readonly_mapped_file::open with a simplified chinese path, for example. Regards, George.

Jonathan Turkanis

3:23 a.m.

"George M. Garner Jr." <gmgarner@erols.com> wrote in message news:chr4oq$38p$1@sea.gmane.org...

...

Jonathan,

memmap.cpp makes assumptions about character width that will not be true in the specific case where UNICODE is defined. For example, the declaration assumes that strings will be narrow but CreateFile is called. CreateFile is a macro that translates into either CreateFileA or CreateFileW depending on the _UNICODE/UNICODE macro. If _UNICODE is define the code will not compile. You should explicity call Ansi versions of Windows api's where narrow character streams are intended.

Apparently I noticed and fixed this error after posting the final version for review. The two changes I made were CreateFile --> CreateFileA CreateFileMappingA--> CreateFileMappingA Is that sufficient?

...

A better solution for cases such as memmap.cpp would be to parameterize the character type and provide both Ansi and Unicode implementations.

I think I'll wait to see how Boost.Filesystem eventually deals with these issues (which are actively being discussed), and attempt to adopt the same solution. I image that eventually the mapped_file constructors will accept fs::path and fs::wpath arguments.

...

Try calling readonly_mapped_file::open with a simplified chinese path, for example.

Thanks for checking all this stuff!

...

Regards,

George.

Jonathan

George M. Garner Jr.

3:43 a.m.

Johnathan,

...

The two changes I made were

CreateFile --> CreateFileA CreateFileMappingA--> CreateFileMappingA

Is that sufficient?

Yes, for the immediate problem. Regards, George.

George M. Garner Jr.

3:52 a.m.

Johnathan, How do I do locale specific Unicode-to-Multibyte conversion. As I understand the current design I need to do something like this: filtering_wostream out; out.push(unicode_to_multibyte_output_filter<__wchar_t>(locale)); out.push(cout); out << L"This gets converted to multibyte characters according to the current locale." << endl; But the second to last line will generate an error with the present design because I am attaching a narrow character stream to a wide character filtering ostream. Do I need to use the boost::io::copy() paradigm? :(- Regards, George.

Jonathan Turkanis

4:44 p.m.

"George M. Garner Jr." <gmgarner@erols.com> wrote in message news:chr8dl$8rr$1@sea.gmane.org...

...

Johnathan,

How do I do locale specific Unicode-to-Multibyte conversion. As I understand the current design I need to do something like this:

filtering_wostream out; out.push(unicode_to_multibyte_output_filter<__wchar_t>(locale)); out.push(cout); out << L"This gets converted to multibyte characters according to the current locale." << endl;

The problem is really that I haven't finished writing the components that make this easy. For this, I apologize. (See User's Guide-->Code Conversion.) The *easiest* way to do what you want should be this: converting_ostream out; out.push(std::cout); out.imbue( some_locale ); out << L"This gets converted to multibyte characters"; This will also allow any number of wide- and narrow- character filters before cout, as long as all the wide ones come first. But as I said, this component is not ready yet. The current way to do this is extremely verbose: typedef converter< reference_wrapper<std::ostream> > my_converter; stream_facade<my_converter> out; out.open(my_converter(ref(std::cout), whatever_locale)); out << L"Hello Wide World!\n"; Reference wrapper is required here because std::ostream is non-copyable. The above code really should be typedef converter<std::ostream> my_converter; stream_facade<my_converter> out; out.open(my_converter(std::cout, whatever_locale)); out << L"Hello Wide World!\n"; but I forgot to make the template converter do the reference-wrapping automatically. (I'll fix this.) There's one last problem. The file boost/io/detail/streambufs/indirect_streambufs contains this code: enum { f_open = 1, f_input_closed = f_open << 1, f_output_closed = f_input_closed << 1, f_output_buffered = f_output_closed }; This should be enum { f_open = 1, f_input_closed = f_open << 1, f_output_closed = f_input_closed << 1, f_output_buffered = f_output_closed << 1 }; I'm not sure if it will make a difference in the above case. To summarize: what you want to do is perfectly resonable and should be easy, but I haven't finished developing all the components. The current method is unnecessarily verbose, but should work. Thanks again! Jonathan

...

But the second to last line will generate an error with the present design because I am attaching a narrow character stream to a wide character filtering ostream. Do I need to use the boost::io::copy() paradigm? :(-

Regards,

George.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

George M. Garner Jr.

5:27 p.m.

Johnathan,

...

The problem is really that I haven't finished writing the components that make this easy. <

Do you have an ETA on these components? Regards, George.

Jonathan Turkanis

5:58 p.m.

"George M. Garner Jr." <gmgarner@erols.com> wrote in message news:chso5v$me5$1@sea.gmane.org...

...

Johnathan,

...
The problem is really that I haven't finished writing the components that make this easy. <

Do you have an ETA on these components?

If the library is accepted, they'll definitely be in the first release. The problem is that I thought of them rather late. Originally a converter had its own internal narrow character chain. So usage was converter cvt(mylocale) cvt.push(cout); wfiltering_ostream out(cvt); out << L"Hello Wide World!\n"; Discussion on comp.std.c++ about the Dinkumware template wbuffer from the CoreX library convinced me that converter should be a lightweight resource adapter. Finally, it occurred to me that this adapter could be insterted automatically in a wide-character chain as soon as a narrow character component is added. I quickly wrote converting_stream and converting_streambuf, but ran out of time to test them. (I'm sure the current versions don't work yet.)

...

Regards,

George.

Jonathan

Rob Stewart

7:06 p.m.

From: "Jonathan Turkanis" <technews@kangaroologic.com>

...

There's one last problem. The file boost/io/detail/streambufs/indirect_streambufs contains this code:

enum { f_open = 1, f_input_closed = f_open << 1, f_output_closed = f_input_closed << 1, f_output_buffered = f_output_closed };

This should be

enum { f_open = 1, f_input_closed = f_open << 1, f_output_closed = f_input_closed << 1, f_output_buffered = f_output_closed << 1 };

This is a pain to read and maintain. You should write them like this: enum { f_open = 1<<0, f_input_closed = 1<<1, f_output_closed = 1<<2, f_output_buffered = 1<<3 }; You can easily spot that the values are in ascending order and you don't have to replicate the name of the preceding enumerator (which would make reordering a pain, should that be needed). -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Jonathan Turkanis

7:46 p.m.

"Rob Stewart" <stewart@sig.com> wrote in message news:200409101906.i8AJ6M921012@lawrencewelk.systems.susq.com...

...

From: "Jonathan Turkanis" <technews@kangaroologic.com>

...

...
enum { f_open = 1, f_input_closed = f_open << 1, f_output_closed = f_input_closed << 1, f_output_buffered = f_output_closed << 1 };

This is a pain to read and maintain. You should write them like this:

enum { f_open = 1<<0, f_input_closed = 1<<1, f_output_closed = 1<<2, f_output_buffered = 1<<3 };

You can easily spot that the values are in ascending order and you don't have to replicate the name of the preceding enumerator (which would make reordering a pain, should that be needed).

I stole this idiom from John Maddock: http://tinyurl.com/4no5s. It's supposed to make insertion in the middle easier. I think it's the vector vs. list tradeoff. Jonathan.

Rob Stewart

8:50 p.m.

From: "Jonathan Turkanis" <technews@kangaroologic.com>

...

"Rob Stewart" <stewart@sig.com> wrote in message news:200409101906.i8AJ6M921012@lawrencewelk.systems.susq.com...

...
From: "Jonathan Turkanis" <technews@kangaroologic.com>

...
...
enum { f_open = 1, f_input_closed = f_open << 1, f_output_closed = f_input_closed << 1, f_output_buffered = f_output_closed << 1 };

This is a pain to read and maintain. You should write them like this:

enum { f_open = 1<<0, f_input_closed = 1<<1, f_output_closed = 1<<2, f_output_buffered = 1<<3 };

I stole this idiom from John Maddock: http://tinyurl.com/4no5s. It's supposed to make insertion in the middle easier. I think it's the vector vs. list tradeoff.

It fails to make insertion in the middle easy, at least comparatively. Start with: Yours/John's Mine enum { name_a = 1, 1<<0, name_b = name_a << 1, 1<<1, name_c = name_b << 1 1<<2 }; Now add name_x after name_b: enum { name_a = 1, 1<<0, name_b = name_a << 1, 1<<1, name_x = name_b << 1, 1<<2, name_c = name_x << 1 1<<3 }; Your version requires a more extensive change due to the use of the enumerator name and makes it harder to determine if the new order is, in fact, sequential. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Jonathan Turkanis

9:41 p.m.

New subject: How To Declare Enumerators (was: IOStreams formal review start)

"Rob Stewart" <stewart@sig.com> wrote in message:

...

From: "Jonathan Turkanis" <technews@kangaroologic.com>

...
"Rob Stewart" <stewart@sig.com> wrote in message:

...
From: "Jonathan Turkanis" <technews@kangaroologic.com>

...

...
I stole this idiom from John Maddock: http://tinyurl.com/4no5s. It's supposed to make insertion in the middle easier. I think it's the vector vs. list tradeoff.

It fails to make insertion in the middle easy, at least comparatively. Start with:

Yours/John's Mine enum { name_a = 1, 1<<0, name_b = name_a << 1, 1<<1, name_c = name_b << 1 1<<2 };

Now add name_x after name_b:

enum { name_a = 1, 1<<0, name_b = name_a << 1, 1<<1, name_x = name_b << 1, 1<<2, name_c = name_x << 1 1<<3 };

Your version requires a more extensive change

True -- if there are only three enumerators!

...

due to the use of the enumerator name and makes it harder to determine if the new order is, in fact, sequential.

Also true. Best Regards, Jonathan

Rob Stewart

11 Sep 11 Sep

12:04 a.m.

New subject: How To Declare Enumerators (was: IOStreams formal review start)

From: "Jonathan Turkanis" <technews@kangaroologic.com>

...

"Rob Stewart" <stewart@sig.com> wrote in message:

...
From: "Jonathan Turkanis" <technews@kangaroologic.com>

...
...
I stole this idiom from John Maddock: http://tinyurl.com/4no5s. It's supposed to make insertion in the middle easier. I think it's the vector vs. list tradeoff.

It fails to make insertion in the middle easy, at least comparatively. Start with:

Yours/John's Mine enum { name_a = 1, 1<<0, name_b = name_a << 1, 1<<1, name_c = name_b << 1 1<<2 };

Now add name_x after name_b:

enum { name_a = 1, 1<<0, name_b = name_a << 1, 1<<1, name_x = name_b << 1, 1<<2, name_c = name_x << 1 1<<3 };

Your version requires a more extensive change

True -- if there are only three enumerators!

How does the number of enumerators matter? -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

Jonathan Turkanis

1:04 a.m.

New subject: How To Declare Enumerators (was: IOStreams formal reviewstart)

"Rob Stewart" <stewart@sig.com> wrote in message

...

From: "Jonathan Turkanis" <technews@kangaroologic.com>

...
"Rob Stewart" <stewart@sig.com> wrote in message:

...
From: "Jonathan Turkanis" <technews@kangaroologic.com>

...

...
...
Yours/John's Mine enum { name_a = 1, 1<<0, name_b = name_a << 1, 1<<1, name_c = name_b << 1 1<<2 };

Now add name_x after name_b:

enum { name_a = 1, 1<<0, name_b = name_a << 1, 1<<1, name_x = name_b << 1, 1<<2, name_c = name_x << 1 1<<3 };

Your version requires a more extensive change

True -- if there are only three enumerators!

How does the number of enumerators matter?

With my method, when you stick an enumerator into the middle, you only have to adjust the definitions of the adjacent enumerators. With your method, it looks to me like you have to renumber all the enumerators which follow the insertion point. Best Regards, Jonathan

Rob Stewart

3:08 a.m.

New subject: How To Declare Enumerators (was: IOStreams formal reviewstart)

...

From: "Jonathan Turkanis" <technews@kangaroologic.com> "Rob Stewart" <stewart@sig.com> wrote in message

...
How does the number of enumerators matter?

With my method, when you stick an enumerator into the middle, you only have to adjust the definitions of the adjacent enumerators. With your method, it looks to me like you have to renumber all the enumerators which follow the insertion point.

I see your point. However, it is a simple matter to scan down the sequence of values in my version to ensure that things are sequential, whereas one must check left-right, left-right, down your list to ensure each is tied to the correct one to produce sequential values. Each has its virtues, so one's choice must depend upon which set of virtues is deemed most valuable. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

George M. Garner Jr.

2:47 p.m.

New subject: Compression usinig IOStreams library

Johnathan, I got zlib, gzip and bzip2 compression working with the IOStreams library using zlib-1.2.1 and bzip2-1.0.1. There were just a few minor glitches as follows: 1. The zip file format output by zlib_compressor does not appear to work with either Winzip 9.0 SR-1 or Windows XP native zip support. zlib_decompressor is able to decompress files previously compressed using zlib_compressor. 2. Compiling IOStreams bzip2 support, there is a conflict with the macro "small" that is defined somewhere in the header files that you include. I had to insert the following at the beginning of bzip2.hpp to get it to compile: #include <boost/config/abi_prefix.hpp> // Must be the last header. ++#ifdef small ++#undef small ++#endif //small namespace boost { namespace io { 3. bzip2_decompressor enters into an infinite loop inside of symmetric_filter_adapter_impl<>::read() unless the following modification to symmetric_filter_adapter.hpp is made: bool eof = (state_ & f_eof) != 0; if (buf_.ptr() != buf_.end() || eof) { bool done = !filter_->filter( const_cast<const char_type*&>(buf_.ptr()), buf_.end(), next_s, end_s, eof --) && eof; ++) || eof; 4. There doesn't appear to be any convenient way to copy decompressed data to a wide character stream. Compressed data is a binary data stream which by definition is narrow. However the case occurs where the decompressed data is wide. So you would like to be able to do the following: ifstream ifile("hello.zip", ios_base::in | ios_base::binary); filtering_streambuf<input> in; in.push(zlib_decompressor()); in.push(ifile); boost::io::copy(in, wcout); But this doesn't compile. This would appear to be another application for your converting_ostream discussed earlier. 5. Do you really want to include the external compression headers (e.g. zlib.h, bzip2.h) in your library? This will create a problem keeping your library in sync with the external compression libraries. Your zlib header is already out of date, for example. Wouldn't it be better to require the user to obtain the external headers and include macros to contigently compile dependent sections? My compression code follows at the end of this message. Now for some real fun. I am going to try and attach an overlapped filebuf to one of your streams. :-) Regards, George. // compression_example.cpp #include <windows.h> #include <tchar.h> #include <crtdbg.h> #include <fstream> #include <iostream> #include <boost/io/filtering_streambuf.hpp> #include <boost/io/filtering_stream.hpp> #include <boost/io/copy.hpp> #include <boost/io/zlib.hpp> #include <boost\io\gzip.hpp> #include <boost/io/bzip2.hpp> #include <libs/io/src/zlib.cpp> #include <libs/io/src/bzip2.cpp> #ifdef _UNICODE #define _tcout wcout #define _tcerr wcerr #else #define _tcout cout #define _tcerr cerr #endif //_UNICODE bool zip_compression() { using namespace std; using namespace boost::io; try { { ofstream ofile("hello.zip", ios_base::out | ios_base::binary); filtering_ostream out; out.push(zlib_compressor()); out.push(ofile); // write() is used as opposed to << so as to permit writing Unicode characters // to binary compression streams. Is there a better way? out.write((char*)__T("This gets compressed using one of the compression formats.\n"), _tcslen(__T("This gets compressed using one of the compression formats.\n")) * sizeof(_TCHAR)); } ifstream ifile("hello.zip", ios_base::in | ios_base::binary); filtering_streambuf<input> in; in.push(zlib_decompressor()); in.push(ifile); _tcout.clear(); boost::io::copy(in, _tcout); return true; } catch(zlib_error& e) { _tcerr << __T("zlib error: ") << dec << e.error() << endl; } return false; } bool gzip_compression() { using namespace std; using namespace boost::io; try { { ofstream ofile("hello.gz", ios_base::out | ios_base::binary); filtering_ostream out; out.push(gzip_compressor()); out.push(ofile); _RPT1(_CRT_WARN, "The length of output string is %ld\n", _tcslen(__T("This gets compressed using one of the compression formats.\n"))); out.write((char*)__T("This gets compressed using one of the compression formats.\n"), _tcslen(__T("This gets compressed using one of the compression formats.\n")) * sizeof(_TCHAR)); } ifstream ifile("hello.gz", ios_base::in | ios_base::binary); filtering_streambuf<input> in; in.push(gzip_decompressor()); in.push(ifile); _tcout.clear(); size_t nRead = boost::io::copy(in, _tcout); _RPT1(_CRT_WARN, "The length of input string is %ld\n", nRead); return true; } catch(gzip_error& e) { _tcerr << __T("gzip error: ") << dec << e.error() << __T(" zlib error: 0x") << e.zlib_error() << endl; } return false; } bool bzip2_compression() { using namespace std; using namespace boost::io; try { { ofstream ofile("hello.bz2", ios_base::out | ios_base::binary); filtering_ostream out; out.push(bzip2_compressor()); out.push(ofile); _RPT1(_CRT_WARN, "The length of output string is %ld\n", _tcslen(__T("This gets compressed using one of the compression formats.\n"))); out.write((char*)__T("This gets compressed using one of the compression formats.\n"), _tcslen(__T("This gets compressed using one of the compression formats.\n")) * sizeof(_TCHAR)); //_RPT1(_CRT_WARN, "The length of output string is %ld\n", nWritten); } ifstream ifile("hello.bz2", ios_base::in | ios_base::binary); filtering_streambuf<input> in; in.push(bzip2_decompressor()); in.push(ifile); _tcout.clear(); size_t nRead = boost::io::copy(in, _tcout); _RPT1(_CRT_WARN, "The length of input string is %ld\n", nRead); return true; } catch(bzip2_error& e) { _tcerr << __T("bzlib error: ") << dec << e.error() << endl; } return false; } int _tmain() { zip_compression(); gzip_compression(); bzip2_compression(); return 0; }

Jonathan Turkanis

8:04 p.m.

New subject: Compression usinig IOStreams library

"George M. Garner Jr." <gmgarner@erols.com> wrote in message news:chv36b$gg$1@sea.gmane.org...

...

Johnathan,

I got zlib, gzip and bzip2 compression working with the IOStreams library using zlib-1.2.1 and bzip2-1.0.1.

Good.

...

There were just a few minor glitches as follows:

1. The zip file format output by zlib_compressor does not appear to work with either Winzip 9.0 SR-1 or Windows XP native zip support. zlib_decompressor is able to decompress files previously compressed using zlib_compressor.

Dumb question - is Winzip supposed to be able to read the zlib format? I usually use it to decompress .zip or tar.gz archives.

...

2. Compiling IOStreams bzip2 support, there is a conflict with the macro "small" that is defined somewhere in the header files that you include. I had to insert the following at the beginning of bzip2.hpp to get it to compile:

I hate these stupid lowercase macros! I've already had problems with the maco unix in gzip.hpp and newline_filter.hpp. I think I'll just change 'small' to slow everywhere -- unless there's a macro 'slow'.

...

#include <boost/config/abi_prefix.hpp> // Must be the last header. ++#ifdef small ++#undef small ++#endif //small namespace boost { namespace io {

...

3. bzip2_decompressor enters into an infinite loop inside of symmetric_filter_adapter_impl<>::read() unless the following modification to symmetric_filter_adapter.hpp is made:

...

bool eof = (state_ & f_eof) != 0; if (buf_.ptr() != buf_.end() || eof) { bool done = !filter_->filter( const_cast<const char_type*&>(buf_.ptr()), buf_.end(), next_s, end_s, eof --) && eof; ++) || eof;

Thanks. It took me a long time to get this to work with my sample data, but since I tried only a few samples, I guess I'm not surprised it's still wrong. Somehow, though, I seem to remember "|| eof" failing, too.

...

4. There doesn't appear to be any convenient way to copy decompressed data to a wide character stream. Compressed data is a binary data stream which by definition is narrow. However the case occurs where the decompressed data is wide. So you would like to be able to do the following:

ifstream ifile("hello.zip", ios_base::in | ios_base::binary); filtering_streambuf<input> in; in.push(zlib_decompressor()); in.push(ifile); boost::io::copy(in, wcout);

But this doesn't compile. This would appear to be another application for your converting_ostream discussed earlier.

Right. The usage would be the same as above, but it would actually work.

...

5. Do you really want to include the external compression headers (e.g. zlib.h, bzip2.h) in your library? This will create a problem keeping your library in sync with the external compression libraries. Your zlib header is already out of date, for example. Wouldn't it be better to require the user to obtain the external headers and include macros to contigently compile dependent sections?

I guess the way to do this would be to have the user define variables which tell bjam where to look for the headers. I think we should make packages containing win32 binaries and the relevant headers available somewhere. I think Jeff Garland once suggested making them available through the wiki. They could be updated more frequently than the boost distribution.

...

My compression code follows at the end of this message. Now for some real fun. I am going to try and attach an overlapped filebuf to one of your streams. :-)

Good luck! BTW, I just now realized that you're the one I was discussing these issues with over at microsoft.public.vc.stl ;-)

...

Regards,

George.

Jonathan

George M. Garner Jr.

8:44 p.m.

New subject: Compression usinig IOStreams library

Johnathan,

...

Dumb question - is Winzip supposed to be able to read the zlib format? I usually use it to decompress .zip or tar.gz archives. <

Well I made that assumption because Winzip and XP both maintain associations for the .z extension. This is from the Winzip help file:

...

Since almost all new archives are created in Zip format, WinZip does not provide facilities to add to or create files in these formats (however, all other WinZip functions are supported). <

In the code that I included in my previous post I see that I am using a zip extension. But I tried it both ways to no avail.

...

I hate these stupid lowercase macros!<

Yes. It is very bad coding style. I grepped for in in your library, boost and bzip and couldn't find who was defining it. :-(

...

...
bool eof = (state_ & f_eof) != 0; if (buf_.ptr() != buf_.end() || eof) { bool done = !filter_->filter( const_cast<const char_type*&>(buf_.ptr()), buf_.end(), next_s, end_s, eof --) && eof; ++) || eof;

Thanks. It took me a long time to get this to work with my sample data, but since I tried only a few samples, I guess I'm not surprised it's still wrong. Somehow, though, I seem to remember "|| eof" failing, too.

You're welcome to the sample project if you like. The problem is that, with the bzip2 sample, eof is reached and next_s != end_s. That may be a bug in itself however I don't think that you can rely on this condition always being true at eof. You also would enter the infinite loop by trying to read a malformed zlib or gzip archive, for example. When eof is reached you have to be "done."

...

BTW, I just now realized that you're the one I was discussing these issues with over at microsoft.public.vc.stl ;-)

Busted. Regards, George.

Jonathan Turkanis

11:47 p.m.

New subject: Compression usinig IOStreams library

"George M. Garner Jr." <gmgarner@erols.com> wrote in message news:chvo31$nlc$1@sea.gmane.org...

...

Johnathan,

...
Dumb question - is Winzip supposed to be able to read the zlib format? I usually use it to decompress .zip or tar.gz archives. <

Well I made that assumption because Winzip and XP both maintain associations for the .z extension. This is from the Winzip help file:

...
Since almost all new archives are created in Zip format, WinZip does not provide facilities to add to or create files in these formats (however, all other WinZip functions are supported). <

I tried compressing some data using the zlib utility function compress, sidestepping that the iostreams library entirely. To my relief, WinZip couldn't read these files either. So whatever the problem is, it doesn't seem to be in the iostreams layer. Also, gzip is just a thin wrapper around zlib, and I have been able to use WinZip to unzip files compressed with the gzip_compressor.

...

...
Thanks. It took me a long time to get this to work with my sample data, but since I tried only a few samples, I guess I'm not surprised it's still wrong. Somehow, though, I seem to remember "|| eof" failing, too.

You're welcome to the sample project if you like. The problem is that, with the bzip2 sample, eof is reached and next_s != end_s. That may be a bug in itself however I don't think that you can rely on this condition always being true at eof. You also would enter the infinite loop by trying to read a malformed zlib or gzip archive, for example. When eof is reached you have to be "done."

This sounds right. Anyway, the regression tests pass with your fix, so all seems to be well. Thanks! Best Regards, Jonathan

George M. Garner Jr.

11:21 p.m.

New subject: Overlapped IO usinig IOStreams library (was: re Compression usinig IOStreams library)

Johnathan,

...

Now for some real fun. I am going to try and attach an overlapped filebuf to one of your

...
streams. :-)

Good luck!

That worked without a hitch. But that is only because my overlapped_filebuf uses blocking semantics. My original goal was to be able to transparently interchange regular files and sockets as "sinks" and this will allow me to do that. But the std stream and filebuf interfaces are really a straight-jacket when it comes to more advanced applications, such as truly asynchronous io; and the sooner we realize that the better. Imagine if, for example, instead of doing write(char_type*, size_t) you could do write(ref_string<char_type>&) or write(ref_string_plus_overlapped<char_type>&). An observer could transparently pass the buffer to the next layer without modification while an asynchronous sink could take ownership of buffer by calling a swap() member function and then pass the buffer to a waiting thread. The buffer could grow or shrink as it passes from one filter to another. The buffer could even pass in round robin fashion back to its origin if every observer or filter called swap() in turn. But that is just a dream. Regards, George.

Jonathan Turkanis

12 Sep 12 Sep

12:49 a.m.

New subject: Overlapped IO usinig IOStreams library (was: re Compressionusinig IOStreams library)

"George M. Garner Jr." <gmgarner@erols.com> wrote in message news:ci01a3$a6n$1@sea.gmane.org...

...

Johnathan,

...
Now for some real fun. I am going to try and attach an overlapped filebuf to one of your

...
streams. :-)

Good luck!

That worked without a hitch.

Great!

...

But that is only because my overlapped_filebuf uses blocking semantics.

Yeah, I know.

...

My original goal was to be able to transparently interchange regular files and sockets as "sinks" and this will allow me to do that. But the std stream and filebuf interfaces are really a straight-jacket when it comes to more advanced applications, such as truly asynchronous io; and the sooner we realize that the better.

I agree. If you haven't already read it see 'Future Directions' (http://tinyurl.com/6r8p2). Forgive me if I repeat myself a bit: My view is that to handle asynchronous i/o properly one will need to define an AsyncronousResource concept. Using AsyncronousResources and filters one should be able to define a number of different i/o abstractions, some of which may be standard streams and stream buffers, but some of which may be entirely different. I think the crucial design question now is to make sure that current filters will work with future AsyncronousResources, and I have tentatively concluded that it suffices to give filters a way to indicate that fewer than the requested number of characters have been read or written, even though EOF has not been reached and no error has occured. The only hard part is the return type og the member function get() for an input filter. I suggested that it could return a class type convertible to the character type which can be explicitly tested for eof or temporary unavailability of data. It would look like this:

...

Imagine if, for example, instead of doing write(char_type*, size_t) you could do write(ref_string<char_type>&) or write(ref_string_plus_overlapped<char_type>&). An observer could transparently pass the buffer to the next layer without modification while an asynchronous sink could take ownership of buffer by calling a swap() member function and then pass the buffer to a waiting thread. The buffer could grow or shrink as it passes from one filter to another. The buffer could even pass in round robin fashion back to its origin if every observer or filter called swap() in turn. But that is just a dream.

Regards,

George.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Jonathan Turkanis

12:59 a.m.

New subject: Overlapped IO usinig IOStreams library (was: reCompressionusinig IOStreams library)

"Jonathan Turkanis" <technews@kangaroologic.com> wrote in message news:ci05t1$7ov$1@sea.gmane.org...

...

The only hard part is the return type og the member function get() for an input filter. I suggested that it could return a class type convertible to the character type which can be explicitly tested for eof or temporary unavailability of data.

It would look like this:

I hit 'send' prematurely. The full message will follow. Jonathan

Jonathan Turkanis

12:58 a.m.

New subject: Overlapped IO usinig IOStreams library (was: re Compressionusinig IOStreams library)

"George M. Garner Jr." <gmgarner@erols.com> wrote in message news:ci01a3$a6n$1@sea.gmane.org...

...

Johnathan,

...
Now for some real fun. I am going to try and attach an overlapped filebuf to one of your

...
streams. :-)

Good luck!

That worked without a hitch.

Great!

...

But that is only because my overlapped_filebuf uses blocking semantics.

Yeah, I know.

...

My original goal was to be able to transparently interchange regular files and sockets as "sinks" and this will allow me to do that. But the std stream and filebuf interfaces are really a straight-jacket when it comes to more advanced applications, such as truly asynchronous io; and the sooner we realize that the better.

I agree. If you haven't already read it see 'Future Directions' (http://tinyurl.com/6r8p2). Forgive me if I repeat myself a bit: My view is that to handle asynchronous i/o properly one will need to define an AsyncronousResource concept. Using AsyncronousResources and filters one should be able to define a number of different i/o abstractions, some of which may be standard streams and stream buffers, but some of which may be entirely different. I think the crucial design question now is to make sure that current filters will work with future AsyncronousResources, and I have tentatively concluded that it suffices to give filters a way to indicate that fewer than the requested number of characters have been read or written, even though EOF has not been reached and no error has occured. The only hard part is the return type of the member function get() for an input filter. I suggested that it could return a class type convertible to the character type which can be explicitly tested for eof or temporary unavailability of data. It would work like this: "Jonathan Turkanis" <technews@kangaroologic.com> wrote:

...

... looking at the alphabet_input filter from the tutorial, instead of

struct alphabetic_input_filter : public input_filter { template<typename Source> int get(Source& src) { int c; while ((c = boost::io::get(src)) != EOF && !isalpha(c)) ; return c; } };

you'd write:

struct alphabetic_input_filter : public input_filter { template<typename Source> int get(Source& src) { character c; while ((c = boost::io::get(src)).good() && !isalpha(c)) ; return c; } }; Here, eof and fail values are passed on to the caller unchanged. If you want to send an eof or fail notification explicitly, you'd write return eof() or return fail().

I'd like to get your input on this idea.

...

Imagine if, for example, instead of doing write(char_type*, size_t) you could do write(ref_string<char_type>&) or write(ref_string_plus_overlapped<char_type>&). An observer could transparently pass the buffer to the next layer without modification while an asynchronous sink could take ownership of buffer by calling a swap() member function and then pass the buffer to a waiting thread. The buffer could grow or shrink as it passes from one filter to another. The buffer could even pass in round robin fashion back to its origin if every observer or filter called swap() in turn. But that is just a dream.

My hope is that this will all be possible. It sounds like you think we might need AynchronousFilters as well. Is that right? I'm hoping to avoid this, since I wouldn't want to force people to write several versions of rhe same filter, one for blocking i/o, one for asyn i/o, etc. In particular, I'm interested in interoperability with Hugo Duncan's library (see http://tinyurl.com/w0f7 and http://tinyurl.com/6w2l ), but I haven't contacted hime yet. By the way, if I'm not mistaken, you haven't yet submitted a formal review. Would you care to do so? Best Regards, Jonathan

George M. Garner Jr.

13 Sep 13 Sep

4:38 a.m.

New subject: Overlapped IO usinig IOStreams library (was: reCompressionusinig IOStreams library)

Johnathan,

...

...
character c; while ((c = boost::io::get(src)).good() && !isalpha(c)) ;

Basically, I do not see any scenario where you would want to write a buffer to a modern operating system one character at a time, more or less do so asynchronously. The above quoted code needs to be buffered to be useful and there is no reason to write to a user-mode buffer asynchronously. So my recommendation is that this particular type of filter should remain oblivious to asynchronous IO. It would be helpful to think of asynchronous IO in terms of two basic paradigms: 1. user-mode synchronous/kernel mode asynchronous IO. 2. user-mode asynchronous/kernel mode asynchronous IO. The first paradigm does not require any particular modification of your library. As I already mentioned, I have overlapped IO working with your library in its current form. The IOStreams library has no reason to know or care that I am blocking in user rather than kernel mode. The second paradigm (for example, using IO completion ports) is very difficult to fit within the Standard Library stream and filebuf concepts. The logic is quite different. To begin with read and write will need at least one additional parameter that encapsulates OS specific thread local stream state, including stream position and asynchronous io result. Second, you will need some way to retrieve extended error information on a stream that differentiates between failure that indicates the io operation is pending and failure that indicates no further processing is possible. Sockets, DCOM and CORBA all have conceptions of extended error info on procedures. Windows has GetLastError(). Exceptions might be used but they suck up too many cpu cycles for an error condition that is expected on virtually every asynchronous io operation. Perhaps you might just introduce an io_pending_bit into your stream state. But I would leave the return value alone. Finally, you need some way to relinquish the read or write buffer, and its lifetime, to the asynchronous completion port. My current problem is that I have just discovered that your line_wrapping_output_filter is writing to my overlapped_filebuf one character at a time. I purposely did not implement buffer in it because this is not thread-safe and must occur at a higher level. I need a buffering filter or wrapper. Hope this helps. Regards, George.

Matthias Schabel

11 Sep 11 Sep

9:43 p.m.

New subject: How To Declare Enumerators (was: IOStreams formal reviewstart)

Why give an explicit integer label to any of the values (except the starting value)? It seems to me that the point of an enum declaration is to guarantee unique values for each tag, but relying on having any specific value correspond to a given tag is dangerous. I would write it like this : enum { f_open = 0, f_input_closed, f_output_closed, f_output_buffered }; This seems to solve the insertion problem transparently, whether at the end or in the middle... Matthias

...

...
...
This is a pain to read and maintain. You should write them like this:

enum { f_open = 1<<0, f_input_closed = 1<<1, f_output_closed = 1<<2, f_output_buffered = 1<<3 };

I stole this idiom from John Maddock: http://tinyurl.com/4no5s. It's supposed to make insertion in the middle easier. I think it's the vector vs. list tradeoff.

------------------------------------------------------------------------ --------------------------- Matthias Schabel, Ph.D. Utah Center for Advanced Imaging Research 729 Arapeen Drive Salt Lake City, UT 84108 801-587-9413 (work) 801-585-3592 (fax) 801-706-5760 (cell) 801-484-0811 (home) mschabel at ucair med utah edu

Jonathan Turkanis

10:14 p.m.

New subject: How To Declare Enumerators (was: IOStreams formalreviewstart)

"Matthias Schabel" <boost@schabel-family.org> wrote: Hi Matthias, Until last year I lived near research park, just above Hogel Zoo. So you're right in my old stomping grounds.

...

Why give an explicit integer label to any of the values (except the starting value)? It seems to me that the point of an enum declaration is to guarantee unique values for each tag, but relying on having any specific value correspond to a given tag is dangerous. I would write it like this :

enum { f_open = 0, f_input_closed, f_output_closed, f_output_buffered };

If all you want is unique values, this is fine. But in the cases I was describing (from iostreams and regex) the goal is to define a set of constants which can serve as bit flags. The easiest way to do this is for each constant to have twice the value of the previous constant -- starting at 1 instead of 0, of course ;-) Thus the proliferation of <<'s

...

Matthias

Jonathan

...

--------------------------- Matthias Schabel, Ph.D. Utah Center for Advanced Imaging Research 729 Arapeen Drive Salt Lake City, UT 84108

Noah

11:55 p.m.

New subject: How To Declare Enumerators (was: IOStreamsformalreviewstart)

...

Jonathan Turkanis

...

"Matthias Schabel" <boost@schabel-family.org> wrote:

...

...
Why give an explicit integer label to any of the values (except the starting value)? It seems to me that the point of an enum declaration is to guarantee unique values for each tag, but relying on having any specific value correspond to a given tag is dangerous. I would write it like this :

enum { f_open = 0, f_input_closed, f_output_closed, f_output_buffered };

If all you want is unique values, this is fine. But in the cases I was describing (from iostreams and regex) the goal is to define a set of constants which can serve as bit flags. The easiest way to do this is for each constant to have twice the value of the previous constant -- starting at 1 instead of 0, of course ;-)

Thus the proliferation of <<'s

I hate having the shift value represent the flag - it's too easy to believe that you're just supposed to pass the value as is. Then you have to go lookup whether the value is a shift or the flag value itself. It's a serious pain. You could have the best of both worlds if you wrap the parameter up in a class. Here's a brief sketch off the top of my head (so don't try to compile and use it!): template<typename E> class BitFlag { int flag; public: BitFlag(E val) { flag = 1 << val; } }; It makes the enum much easier to deal with since you can just simply put the flags in a list and not worry about explicitly initializing each flag. You can also add additional functionality. You could overload operators to handle passing or masking multiple flags. You could also, through the use of a traits class, validate flag combinations. -- Noah

Rob Stewart

13 Sep 13 Sep

4:38 p.m.

New subject: How To Declare Enumerators (was: IOStreamsformalreviewstart)

From: "Noah" <noah@acm.org>

...

I hate having the shift value represent the flag - it's too easy to believe that you're just supposed to pass the value as is. Then you have to go lookup whether the value is a shift or the flag value itself. It's a serious pain.

I don't understand what you mean. I've never been confused when using left shift to generate the enumerator-definition.

...

You could have the best of both worlds if you wrap the parameter up in a class. Here's a brief sketch off the top of my head (so don't try to compile and use it!):

template<typename E> class BitFlag { int flag;

public: BitFlag(E val) { flag = 1 << val; } };

It makes the enum much easier to deal with since you can just simply put the flags in a list and not worry about explicitly initializing each flag. You can also add additional functionality. You could overload operators to handle passing or masking multiple flags. You could also, through the use of a traits class, validate flag combinations.

However wonderful that could be, the fact is that an enumerator-definition must be a constant-expression. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;

George M. Garner Jr.

12 Sep 12 Sep

12:30 a.m.

New subject: Converters (Was: Re: IOStreams formal review start)

Johnathan,

...

The problem is really that I haven't finished writing the components that make this easy. For this, I apologize. (See User's Guide-->Code Conversion.)

Looking at the documentation for Code Conversion I do not see any way to convert from Unicode to multibyte. The converter class only works one way from multibyte to Unicode. I suppose this falls within the category of "not-yet-implemented." I would like to see converter_impl accept codecvt_type as a template argument perhaps with a default of std::codecvt but so that I can replace it. const codecvt_type& cvt = codecvt_type() also would be added to the converter constructor. Regex works this way. The other common code conversion task that is fairly Windows specific but a common task nonetheless is to convert \r\n to and from \n in text streams. Regards, George.

Jonathan Turkanis

1:08 a.m.

New subject: Converters (Was: Re: IOStreams formal review start)

"George M. Garner Jr." <gmgarner@erols.com> wrote in message news:ci05c3$4d3$1@sea.gmane.org...

...

Johnathan,

...
The problem is really that I haven't finished writing the components that make this easy. For this, I apologize. (See User's Guide-->Code Conversion.)

Looking at the documentation for Code Conversion I do not see any way to convert from Unicode to multibyte. The converter class only works one way from multibyte to Unicode. I suppose this falls within the category of "not-yet-implemented."

You can convert from unicode to multibyte when performing output, and from multibyte to unicode when performing input. I thought about implementing the other directions, but I wasn't sure there was need for them. It certainly can be done.

...

I would like to see converter_impl accept codecvt_type as a template argument perhaps with a default of std::codecvt but so that I can replace it. const codecvt_type& cvt = codecvt_type() also would be added to the converter constructor.

I had it this way originally. I can't remember why I changed. Maybe concerns over code bloat. But it would certainly make the interface simpler, and Dinkumware CoreX does it this way, so maybe I'll switch back.

...

Regex works this way.

I wasn't aware of this, and can't find a reference in the docs. Could you post a link?

...

The other common code conversion task that is fairly Windows specific but a common task nonetheless is to convert \r\n to and from \n in text streams.

See basic_newline_filter (http://tinyurl.com/6x89t) :-)

...

Regards,

George.

Jonathan

George M. Garner Jr.

2:27 a.m.

New subject: Regex localization model (Was: Re: Converters (Was: Re: IOStreams formal review start))

Johnathan,

...

...
Regex works this way.

I wasn't aware of this, and can't find a reference in the docs. Could you post a link?

http://www.boost.org/libs/regex/doc/localisation.html ("Win32 localization model. This is the default model when the library is compiled under Win32, and is encapsulated by the traits class w32_regex_traits. When this model is in effect there is a single global locale as defined by the user's control panel settings, and returned by GetUserDefaultLCID. All the settings used by boost.regex are acquired directly from the operating system bypassing the C run time library."). Regards, George.

George M. Garner Jr.

13 Sep 13 Sep

9:42 p.m.

New subject: IOStream conversions (was: Re: IOStreams formal review start)

Johnathan,

...

typedef converter< reference_wrapper<std::ostream> > my_converter; stream_facade<my_converter> out; out.open(my_converter(ref(std::cout), whatever_locale)); out << L"Hello Wide World!\n"; <

This doesn't compile. I get the following error: h:\source code\Turkanis\iostream\boost\io\detail\adapters\resource_wrapper.hpp(39) : error C2100: illegal indirection h:\source code\Turkanis\iostream\boost\io\detail\adapters\resource_wrapper.hpp(39) : while compiling class-template member function 'boost::reference_wrapper<T> &boost::io::detail::resource_wrapper<Resource>::operator *(void)' with [ T=std::ostream, Resource=boost::reference_wrapper<std::ostream> ] h:\source code\boost\boost_1_31_0\boost\optional.hpp(95) : see reference to class template instantiation 'boost::io::detail::resource_wrapper<Resource>' being compiled with [ Resource=boost::reference_wrapper<std::ostream> ] h:\source code\boost\boost_1_31_0\boost\optional.hpp(98) : see reference to class template instantiation 'boost::optional_detail::aligned_storage<T>::dummy_u' being compiled with [ T=boost::optional_detail::optional_base<boost::io::detail::resource_wrapper<boost::reference_wrapper<std::ostream>>>::internal_type ] h:\source code\boost\boost_1_31_0\boost\optional.hpp(357) : see reference to class template instantiation 'boost::optional_detail::aligned_storage<T>' being compiled with [ T=boost::optional_detail::optional_base<boost::io::detail::resource_wrapper<boost::reference_wrapper<std::ostream>>>::internal_type ] h:\source code\boost\boost_1_31_0\boost\optional.hpp(364) : see reference to class template instantiation 'boost::optional_detail::optional_base<T>' being compiled with [ T=boost::io::detail::resource_wrapper<boost::reference_wrapper<std::ostream>> ] h:\source code\Turkanis\iostream\boost\io\converter.hpp(138) : see reference to class template instantiation 'boost::optional<T>' being compiled with [ T=boost::io::detail::resource_wrapper<boost::reference_wrapper<std::ostream>> ] h:\source code\Turkanis\iostream\boost\io\converter.hpp(162) : see reference to class template instantiation 'boost::io::detail::converter_impl<Resource,Int,Ext,State,Alloc>' being compiled with [ Resource=boost::reference_wrapper<std::ostream>, Int=wchar_t, Ext=boost::io::converter<boost::reference_wrapper<std::ostream>>::extern_type, State=boost::io::converter<boost::reference_wrapper<std::ostream>>::state_type, Alloc=std::allocator<char> ] h:\source code\Turkanis\iostream\boost\io\io_traits.hpp(38) : see reference to class template instantiation 'boost::io::converter<Resource>' being compiled with [ Resource=boost::reference_wrapper<std::ostream> ] h:\source code\Turkanis\iostream\boost\io\io_traits.hpp(47) : see reference to class template instantiation 'boost::io::detail::member_char_type<T>' being compiled with [ T=my_converter ] h:\source code\Turkanis\iostream\boost\io\stream_facade.hpp(81) : see reference to class template instantiation 'boost::io::char_type<T>' being compiled with [ T=my_converter ] Regards, George. "Jonathan Turkanis" <technews@kangaroologic.com> wrote in message news:chsl49$e6v$1@sea.gmane.org...

...

"George M. Garner Jr." <gmgarner@erols.com> wrote in message news:chr8dl$8rr$1@sea.gmane.org...

...
Johnathan,

How do I do locale specific Unicode-to-Multibyte conversion. As I understand the current design I need to do something like this:

filtering_wostream out; out.push(unicode_to_multibyte_output_filter<__wchar_t>(locale)); out.push(cout); out << L"This gets converted to multibyte characters according to the current locale." << endl;

The problem is really that I haven't finished writing the components that make this easy. For this, I apologize. (See User's Guide-->Code Conversion.)

The *easiest* way to do what you want should be this:

converting_ostream out; out.push(std::cout); out.imbue( some_locale ); out << L"This gets converted to multibyte characters";

This will also allow any number of wide- and narrow- character filters before cout, as long as all the wide ones come first. But as I said, this component is not ready yet.

The current way to do this is extremely verbose:

typedef converter< reference_wrapper<std::ostream> > my_converter; stream_facade<my_converter> out; out.open(my_converter(ref(std::cout), whatever_locale)); out << L"Hello Wide World!\n";

Reference wrapper is required here because std::ostream is non-copyable. The above code really should be

typedef converter<std::ostream> my_converter; stream_facade<my_converter> out; out.open(my_converter(std::cout, whatever_locale)); out << L"Hello Wide World!\n";

but I forgot to make the template converter do the reference-wrapping automatically. (I'll fix this.)

There's one last problem. The file boost/io/detail/streambufs/indirect_streambufs contains this code:

enum { f_open = 1, f_input_closed = f_open << 1, f_output_closed = f_input_closed << 1, f_output_buffered = f_output_closed };

This should be

enum { f_open = 1, f_input_closed = f_open << 1, f_output_closed = f_input_closed << 1, f_output_buffered = f_output_closed << 1 };

I'm not sure if it will make a difference in the above case.

To summarize: what you want to do is perfectly resonable and should be easy, but I haven't finished developing all the components. The current method is unnecessarily verbose, but should work.

Thanks again!

Jonathan

...
But the second to last line will generate an error with the present design because I am attaching a narrow character stream to a wide character filtering ostream. Do I need to use the boost::io::copy() paradigm? :(-

Regards,

George.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Jonathan Turkanis

14 Sep 14 Sep

2:50 a.m.

New subject: IOStream conversions (was: Re: IOStreams formal reviewstart)

"George M. Garner Jr." <gmgarner@erols.com> wrote in message news:ci548u$69a$1@sea.gmane.org...

...

Johnathan,

...
typedef converter< reference_wrapper<std::ostream> > my_converter; stream_facade<my_converter> out; out.open(my_converter(ref(std::cout), whatever_locale)); out << L"Hello Wide World!\n"; <

This doesn't compile. I get the following error:

<snip> Thanks for pointing out this error. I don't think it makes sense for me to track down its source, since I've change the resource_wrapping infrastructure a bit and the code compiles fine for me on VC7.1. Best Regards, Jonathan

Pavel Vozenilek

10 Sep 10 Sep

10:04 a.m.

"Jonathan Turkanis" wrote: _______________________________________________

...

...
5. io/io_traits.hpp:

#define BOOST_SELECT_BY_SIZE_MAX_CASE 9

==>

#ifndef BOOST_SELECT_BY_SIZE_MAX_CASE # define BOOST_SELECT_BY_SIZE_MAX_CASE 9 #endif

The docs for select-by-size are here: http://tinyurl.com/3vjwf. Maybe I should add it to the current lib docs.

The usage is supposed to be

#define BOOST_SELECT_BY_SIZE_MAX_CASE xxx #include <boost/utility/select_by_size.hpp>

Including the header undef's BOOST_SELECT_BY_SIZE_MAX_CASE.

I mean someone may define the value BOOST_SELECT_BY_SIZE_MAX_CASE in commandline and it would be ignored. _______________________________________________

...

...
7. windows_posix_config.hpp: I find the macros BOOST_WINDOWS/BOOST_POSIX too general for iostreams. Either they should be in Boost.Config or something as BOOST_IO_WINDOWS should be used.

That's one of the other changes I made late. I had been borrowing BOOST_WINDOWS/BOOST_POSIX from Boost.Filesystem, but didn't realize until recently that cygwin users are supposed to be able to pick either configuration. That defintely doesn't work for my library.

...
It may also fail on exotics as AS400.

Do you mean neither configuration will work in this case? I have to figure out graceful ways for mmap and file descriptors to fail on unsupported systems.

...

...
BOOST_IO_DECL: this should be in Boost.Config as BOOST_DECLSPEC. Other libraries (regex) have their own macros and it is all one big mess.

But it uses BOOST_IO_DYN_LINK. Are you saying users shouldn't be able to

I don't know. I just have some strange feeling something could go wrong. Maybe older Macs could have problems here. This should be solved in Boost.Config. _______________________________________________ link

...

dynamically to selected boost libraries? Maybe it could be BOOST_DECLSPEC(IO).

My opinion is BOOST_DECL should exist in Boost.Config. _______________________________________________

...

...
21. test lzo.cpp refers to non-existing boost/io/lzo.hpp.

Right. I got rid of it because of copyright issues.

That's unfortunate. Maybe it could be there as example.

...

...
22. io/file.hpp: what is exactly reason to have pimpl in basic_file_resource class?

Exception safety....

So, to answer your question: basic_file_resource wraps a basic_filebuf, which is generally non-copyable, so I used a shared_ptr.

Hmmm. Would it be possible to use intrusive_ptr here? _______________________________________________

...

...
23. io/memmap_file.hpp: why is pimpl in mapped_file_resource class?

To avoid having to include operating system headers from a header file.

But io/detail/memmap.hpp is included and must be included anyway. Or do I miss something? /Pavel

Jonathan Turkanis

3:44 p.m.

"Pavel Vozenilek" <pavel_vozenilek@hotmail.com> wrote in message news:chru8b$ik8$1@sea.gmane.org...

...

"Jonathan Turkanis" wrote:

...

...
...
#define BOOST_SELECT_BY_SIZE_MAX_CASE 9

==>

#ifndef BOOST_SELECT_BY_SIZE_MAX_CASE # define BOOST_SELECT_BY_SIZE_MAX_CASE 9 #endif

...

...
The usage is supposed to be

#define BOOST_SELECT_BY_SIZE_MAX_CASE xxx #include <boost/utility/select_by_size.hpp>

Including the header undef's BOOST_SELECT_BY_SIZE_MAX_CASE.

I mean someone may define the value BOOST_SELECT_BY_SIZE_MAX_CASE in commandline and it would be ignored.

I see; the docs say BOOST_SELECT_BY_SIZE_MAX_CASE should be defined right before use. The command-line seems a bit to far removed to make sure that enough cases are covered, and

...

...
...
#ifndef BOOST_SELECT_BY_SIZE_MAX_CASE # define BOOST_SELECT_BY_SIZE_MAX_CASE 9 #endif

seem like unnecessary work for users.

...

_______________________________________________

...
...
7. windows_posix_config.hpp: I find the macros BOOST_WINDOWS/BOOST_POSIX too general for iostreams. Either they should be in Boost.Config or something as BOOST_IO_WINDOWS should be used.

That's one of the other changes I made late. I had been borrowing BOOST_WINDOWS/BOOST_POSIX from Boost.Filesystem, but didn't realize until recently that cygwin users are supposed to be able to pick either configuration. That defintely doesn't work for my library.

...
It may also fail on exotics as AS400.

Do you mean neither configuration will work in this case? I have to figure out graceful ways for mmap and file descriptors to fail on unsupported systems.

I don't know. I just have some strange feeling something could go wrong. Maybe older Macs could have problems here.

In theory, I'd like to provide implementations for all boost-supported platforms that support file-mapping and have runtime "can't map file" errors for other platforms. Currently, following Boost.Filesystem, BOOST_IO_POSIX is defined by a process of elimination. On unsupported systems users will get compiler errors such as "can't open file 'sys/mman.h'; no such file."

...

This should be solved in Boost.Config.

...
...
BOOST_IO_DECL: this should be in Boost.Config as BOOST_DECLSPEC. Other libraries (regex) have their own macros and it is all one big mess.

But it uses BOOST_IO_DYN_LINK. Are you saying users shouldn't be able to

_______________________________________________ link

...
dynamically to selected boost libraries? Maybe it could be BOOST_DECLSPEC(IO).

My opinion is BOOST_DECL should exist in Boost.Config.

I guess this should be taken up with John Maddock.

...

_______________________________________________

...
...
21. test lzo.cpp refers to non-existing boost/io/lzo.hpp.

Right. I got rid of it because of copyright issues.

That's unfortunate. Maybe it could be there as example.

Doesn't the boost license policy cover examples, too?

...

...
...
22. io/file.hpp: what is exactly reason to have pimpl in basic_file_resource class?

Exception safety....

So, to answer your question: basic_file_resource wraps a basic_filebuf, which is generally non-copyable, so I used a shared_ptr.

Hmmm. Would it be possible to use intrusive_ptr here?

I think so. Why do you think it's important? ___________________________________________

...

...
...
23. io/memmap_file.hpp: why is pimpl in mapped_file_resource class?

To avoid having to include operating system headers from a header file.

But io/detail/memmap.hpp is included and must be included anyway. Or do I miss something?

I misread your question. detail/memmap.hpp uses the pimpl idiom to avoid including operating system headers. for boost/io/mapped_file, the reason for shared_ptr is the same as for bost/io/file.hpp.

...

/Pavel

Jonathan

Pavel Vozenilek

6:06 p.m.

"Jonathan Turkanis" wrote: _______________________________________________

...

...
...
...
21. test lzo.cpp refers to non-existing boost/io/lzo.hpp.

Right. I got rid of it because of copyright issues.

That's unfortunate. Maybe it could be there as example.

Doesn't the boost license policy cover examples, too?

I have absolutely no clue. I think LZO quality is so big that it should be supported. _______________________________________________

...

...
...
...
22. io/file.hpp: what is exactly reason to have pimpl in basic_file_resource class?

Exception safety....

So, to answer your question: basic_file_resource wraps a basic_filebuf, which is generally non-copyable, so I used a shared_ptr.

Hmmm. Would it be possible to use intrusive_ptr here?

I think so. Why do you think it's important?

No thread locking, one dynamic allocation less, less space eaten, more cache coherence. Could be handy if one uses lot of filters of fine granularity. ---------------------- One more possible optimization for Borland: empty class in BCB by default has sizeof == 8 bytes.While this can be changed by pragmas one may also write: class exmpty_one { #if BOOST_WORKAROUND(__BORLANDC__, BOOST_TESTED_AT(0x564)) char dummy_; // BCB would by default create empty class with size == 8 #endif }; It is not possible to go better with Borland than this. If there are empty classes in Iostreams this should be applied. I'll try to search sources if/where this could be done. /Pavel

Joaquin M Lopez Munoz

11:03 a.m.

Jonathan Turkanis <technews <at> kangaroologic.com> writes: [...]

...

...
1. scope_guard.hpp: maybe this file could be moved into boost/detail and used by multi_index + iostreams until something gets boostified.

I'm for this. Unfortunately, my simplified scope_guard isn't working on CW8.3 (though it passes the regression tests I've written for it.) So I'll probably use Joaquín's.

Jonathan, does your CW 8.3 problem has to do with premature execution of the guard code? if so, then this is due not to your code, but to a compiler bug about the lifetime of temporary objects bound to references. See the following for a description: http://lists.boost.org/MailArchives/boost/msg65882.php This problem happens only when the "ISO C++ Template Parser" mode is on, so it can be avoided by explicitly setting the mode off with pragmas. Go to boost/multi_index_container.hpp and search for the two occurrences of parse_mfunc_templ in lines 77 and 531. Hope this helps, Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Jonathan Turkanis

3:22 p.m.

...

[...]

...
...
1. scope_guard.hpp: maybe this file could be moved into boost/detail and used by multi_index + iostreams until something gets boostified.

I'm for this. Unfortunately, my simplified scope_guard isn't working on CW8.3 (though it passes the regression tests I've written for it.) So I'll

Jonathan Turkanis <technews <at> kangaroologic.com> writes: probably

...

...
use Joaquín's.

Jonathan, does your CW 8.3 problem has to do with premature execution of the guard code?

Yes, exactly!

...

if so, then this is due not to your code, but to a compiler bug about the lifetime of temporary objects bound to references. See the following for a description:

http://lists.boost.org/MailArchives/boost/msg65882.php

Thanks.

...

This problem happens only when the "ISO C++ Template Parser" mode is on, so it can be avoided by explicitly setting the mode off with pragmas. Go to boost/multi_index_container.hpp and search for the two occurrences of parse_mfunc_templ in lines 77 and 531.

Hope this helps,

Very much.

...

Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Jonathan

Paul A Bristow

11 Sep 11 Sep

2:55 p.m.

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Jeff Garland | Sent: 29 August 2004 01:10 | To: boost | Subject: [boost] IOStreams formal review start | | http://home.comcast.net/~jturkanis/iostreams/ | I have returned from holiday and finally caught up with all the extensive (gruelling!) review postings. I now feel I need another holiday ;-) In view of the erudite comments from experts, I will be brief (brevity being the soul of wit). What is your evaluation of the design? Sound. What is your evaluation of the implementation? Sound. What is your evaluation of the documentation? Much better than average. What is your evaluation of the potential usefulness of the library? Very useful. Did you try to use the library? With what compiler? Did you have any problems? Have worked on/with some parts previously. How much effort did you put into your evaluation? A re-reading. Are you knowledgeable about the problem domain? Slightly. Do you think the library should be accepted as a Boost library? Yes definitely. After considering the overlap between more_io4, I am less worried that there is a serious problem. It is not clear to me that there are any real clashes - except perhaps between the authors ;-) If there is duplication, and the authors can't agree, then perhaps we can simply leave users to decide which they prefer. PS It was disappointing to see the discussion on newl /newline going over the same ground again. As a group, we have failed to come to a conclusion and document our conclusions, rationale and decision on this minor but ubiquitous issue. My recollection is that the case was not overwhelming, but that it was the 'Right Thing To Do', and that we should have concluded the matter there and then with a mini-review.

Jonathan Turkanis

7:26 p.m.

"Paul A Bristow" <pbristow@hetp.u-net.com> wrote in message news:E1C69EL-00060r-Ci@he203war.uk.vianw.net...

...

| -----Original Message----- | From: boost-bounces@lists.boost.org | [mailto:boost-bounces@lists.boost.org] On Behalf Of Jeff Garland | Sent: 29 August 2004 01:10 | To: boost | Subject: [boost] IOStreams formal review start | | http://home.comcast.net/~jturkanis/iostreams/ |

Thanks for the review.

...

I have returned from holiday and finally caught up with all the extensive (gruelling!)

Tell me about it ;-)

...

review postings.

I now feel I need another holiday ;-)

...

Do you think the library should be accepted as a Boost library? Yes definitely.

Great!

...

After considering the overlap between more_io4, I am less worried that there is a serious problem.

It is not clear to me that there are any real clashes - except perhaps between the authors ;-)

For my part, I'm sorry if I was unprofessional in my comments comparing the libraries.

...

If there is duplication, and the authors can't agree, then perhaps we can simply leave users to decide which they prefer.

I hope we can come to some agreement, otherwise users will be confused. Thanks again. Jonathan

7616

Age (days ago)

7632

Last active (days ago)

List overview

Download

88 comments

14 participants

participants (14)

Carlo Wood
Daryle Walker
George M. Garner Jr.
Jeff Flinn
Jeff Garland
Joaquin M Lopez Munoz
Jonathan Turkanis
Matthias Schabel
Neal D. Becker
Noah
Paul A Bristow
Pavel Vozenilek
Rob Stewart
Thorsten Ottosen