Re: [boost] Re: (Another) socket streams library

5 May 2005

      On Wed, May 04, 2005 at 06:20:06PM +0100, Iain K. Hanson wrote:
...
...
...
Nathan Myers wrote:
...
Another goal is a zero-copy streambuf whose buffer is an mmap
page that can be read into or written from without actually
copying any bytes from kernel to user space, or back.
You will still at a minimum have another kernel to device copy as
previously stated. Another problem is that mmap files *I think* need
to be seek()able.
When speaking of zero-copy I/O, it is conventional not to count the 
act of moving bits between the wire and memory.  In principle, it's 
true, one could conceive of operating on the bytes in real time 
without ever storing them.  However, most people start and stop 
counting copies at the point where the data has landed in a kernel 
buffer, ready to DMA to or from a device.

To mmap a file, it must be seekable, but that's not what I was 
describing.  On NetBSD as on Linux, if a page of memory has been
obtained via "anonymous" mmap, it is not actually mapping a file,
it's just an page of physical memory handed over to the caller
to write in, that may be returned to the system any time,
independently of any other page.  (On some systems, e.g. Solarix,
you pretend to map /dev/zero, but that's just for tidiness.)  

Under UVM, if you have a page or run of pages mapped, and pass a 
pointer to the beginning of it to a system call, the kernel can 
claim those physical memory pages and map them into kernel space 
as regular buffers.  Or, it can pick kernel buffer pages and expose 
them to that range in your address space, in place of whatever was
there, all without copying bytes, What you see there is what the 
kernel wants you to see.  It looks as if it copied from its buffer 
to yours, but you are really seeing its actual buffer.  This is what 
is normally described as zero-copy I/O.

It's quite an elegant way to rescue the apparently archaic read()
and write() model of I/O from ignominy.  The only problem is that 
fooling around with page maps can itself be quite expensive on a 
multiprocessor system.

Nathan Myers
ncm@cantrip.org