
On Wed, May 04, 2005 at 06:20:06PM +0100, Iain K. Hanson wrote:
Nathan Myers wrote:
Another goal is a zero-copy streambuf whose buffer is an mmap page that can be read into or written from without actually copying any bytes from kernel to user space, or back.
You will still at a minimum have another kernel to device copy as previously stated. Another problem is that mmap files *I think* need to be seek()able.
When speaking of zero-copy I/O, it is conventional not to count the act of moving bits between the wire and memory. In principle, it's true, one could conceive of operating on the bytes in real time without ever storing them. However, most people start and stop counting copies at the point where the data has landed in a kernel buffer, ready to DMA to or from a device. To mmap a file, it must be seekable, but that's not what I was describing. On NetBSD as on Linux, if a page of memory has been obtained via "anonymous" mmap, it is not actually mapping a file, it's just an page of physical memory handed over to the caller to write in, that may be returned to the system any time, independently of any other page. (On some systems, e.g. Solarix, you pretend to map /dev/zero, but that's just for tidiness.) Under UVM, if you have a page or run of pages mapped, and pass a pointer to the beginning of it to a system call, the kernel can claim those physical memory pages and map them into kernel space as regular buffers. Or, it can pick kernel buffer pages and expose them to that range in your address space, in place of whatever was there, all without copying bytes, What you see there is what the kernel wants you to see. It looks as if it copied from its buffer to yours, but you are really seeing its actual buffer. This is what is normally described as zero-copy I/O. It's quite an elegant way to rescue the apparently archaic read() and write() model of I/O from ignominy. The only problem is that fooling around with page maps can itself be quite expensive on a multiprocessor system. Nathan Myers ncm@cantrip.org