[Boost-users] Programming practices for asynchronous operations (asio, etc)

9 Aug 2008

      Hello

I have recently started to rework a network server project that uses a thread 
per connection (with blocking/synchronous network I/O) handling model to 
using asio and an asynchronous operation model (where the number of threads 
should depend on hardware resources and be decoupled of  software design 
constraints).

The problem I have with this is that it makes one write too much boiler plate 
code for all those "completion handlers". Take very this simple example of 
code that is using the one thread per connection synchronous I/O model:

// synchronously resolves, connects, throws on error
connection con("1.2.3.4");

{
	// application "packet", usually known as "record"
	// uses "con" and starts an application packet with a given type
	packet pack(con, PacketTypeRequest);

	// serializes portably given data and synchronously sends the data
	// throws on error
	pack << uint32(10) << std::string("/home/user") << uint8(3);

}	// calls packet::~packet() which signals in the "con" stream end of packet

Now to transform this very simple (I repeat, this is a very simple example, a 
lot more complex examples write "packets" of dozens of fields of which the 
last one may have "infinte" length meaning you cannot know the size of it to 
put it in the stream and you just have to send it as much as you can and then 
signal the end of the "packet") into an asynchronous version one would need:

connection con; // breaks invariants of "con" being always connected

// should asynchronously resolve and connect
con.async_connect("1.2.3.4", handle_connect);
// break code flow here

// handle_connect()
// breaks invariant of allowing "serialization" only after packet type
// has been sent over the network
packet pack(con); 

pack.async_serialize(uint32(10), handle_serialize1);
// return to caller

// handle_serialize1
pack.async_serialize(std::string("/home/user"), handle_serialize2);
// return to caller

// handle_serialize2
pack.async_serialize(uint8(3)), handle_serialize3);
// return to caller

// handle_serialize3
// breaks RAIIness of original packet which automatically signaled
// "end" from the dtor of it
pack.async_end, handle_endpacket);
// return to caller

And imagine that the original code was just a small function a big class, so 
now each such small function transforms into dozens of smaller functions, the 
code explosion is huge.

I am curious to the code practices that some of you employ to solve these 
issues (to still have compile time checked code as much as possible by strong 
invariants and RAII idioms and not have to write millions of small functions).

Some of the things I have thought of that seem to solve these issues:

- instead of packets being adhoc serialized have structures encapsulating the 
network packets and have serialization code moved into them and let them 
deal with all the small functions (they could use some buffer to cache 
serialization of the fixed fields and async_write that buffer contents in a 
single operation); this however means an additional copy of the data vs how 
the code was before and it just moves the problem, instead of having many 
small functions in the high level code you have them in the lower level packet 
structures serialiation code (thu the output buffer being can reduce some of 
them)

- using template expressions or whatever do some kind of "lazy evaluation"; 
basically still use syntax similar to the synchronous solution like:
pack << uint32(10) << std::string("/home/user") << uint8(3);
but this code instead of doing network I/O would enqueue the serialization 
actions needed have all those completion handlers internally and completely 
abstract to the user all those details; the user just does a 
pack.async_write(user_handler) and "user_handler" will be called after all the 
serializations have been asynchronously written

- if this weren't C++ but C then we could use setjmp/longjmp to transparently 
(to the user) simulate synchronous operation while asynchronous operation is 
done behind the scenes; the user writes code such as:
pack << uint32(10) << std::string("/home/user") << uint8(3);
but what it does is on each op<< (each asynchronous serialization) the code 
does an async_write() with an internal handler, saves the context (setjmp) 
then longjmp()s to the previously saved context in the main event loop; when 
the async_write completes the handler does setjmp to restore the original user 
context and continue with the next op<< and so on; this however does not work 
in C++ because the jumped code path with longjmp may have exceptions 
being thrown/catched and as the standard says that's UB not to mention form 
what I could gather on the Internet some C++ compilers call dtors of auto 
objects when you longjmp "back" thus not leaving the original context 
untouched (which is what I need)

-- 
Mihai RUSU
                      "Linux is obsolete" -- AST

[Boost-users] Programming practices for asynchronous operations (asio, etc)

dizzy