[IOStreams] windowing in mapped_file

I have, on occasion, had a need to read large files. Sometimes, boost::iostreams::mapped_file works just fine, as long as I have enough memory space available to map the file in its entirety. Sometimes I don't. I have been thinking of making io::mapped_file support the notion of memory-mapped files with sliding windows. I have tried a little bit of code, but without much success because I hadn't thought through the design very well (and part of that I blame on my current position on the learning curve of developing boost libraries). In particular, I tried adding a couple of parameters to the mapped_file_params struct, and then coding support for it in the mapped_file_source class. This doesn't get very far in getting what I want for several reasons mostly because it doesn't address how the upper layers of iostreams will use the device. The data(), begin(), and end() member functions return simple char* (the iterator type is just typedef char *). I'd like to take another approach, which I think would be better than my initial attempt. I'd like to do something approximating the following: template<class W = default_window_manager> class mapped_file_source { public: typedef W::iterator iterator_type; // ... iterator_type begin() const; iterator_type end() const; iterator_type data() const; // ... private: W _manager; }; Where 'class W' would have to model something like the following: class window_manager { public: struct iterator { operator++() { if(current_ptr+1 >= current_map_end) // remap file to the next block, re-adjust pointers } operator--() { if(current_ptr-1 <= current_map_begin) // remap file to the previous block, re-adjust pointers } // operators for +,- might also need to do something non-trivial }; private: // member functions to perform re-mapping current_ptr; current_map_begin; current_map_end; }; Of course, the above does not deal with random access at all. 'default_window_manager' could just provide a windowed memory map with a single window that is exactly the size of the file to make mapped_file behave as it currently does. It seems to me that this could be generally useful, but perhaps reality is that I need to be saved from myself. Comments? I apologize in advance if this is has been hashed over before. My searches of the archives turned up nothing, but the internet is too big a place to prove a negative. -- Benjamin A. Collins <ben.collins@acm.org> http://bloggoergosum.us

Hi! Benjamin Collins schrieb:
class window_manager { public: struct iterator { operator++() { if(current_ptr+1 >= current_map_end) // remap file to the next block, re-adjust pointers }
operator--() { if(current_ptr-1 <= current_map_begin) // remap file to the previous block, re-adjust pointers }
How is that different from using "buffers"? The regular std::fstream shows just this behavior: when reading it fills a buffer; when the end of the buffer is reached it loads the next part of the file into memory, and so on. The only difference is that writing to a memory location does not implicitly change the file content. But do you need this kind of random access for writing? Frank

On 9/3/07, Frank Birbacher <bloodymir.crap@gmx.net> wrote:
Hi!
How is that different from using "buffers"? The regular std::fstream shows just this behavior: when reading it fills a buffer; when the end of the buffer is reached it loads the next part of the file into memory, and so on. The only difference is that writing to a memory location does not implicitly change the file content. But do you need this kind of random access for writing?
I'm not concerned with random access; I'm concerned with doing really fast reads and writes of large files, which can just be linear reads and writes as far as I'm concerned The difference between std::fstream and what I'm proposing is performance (hopefully). std::fstream, as I understand it, uses read()/write(). mmap() provides better performance than read(), and increasingly so as your file gets larger. See here for a Solaris-oriented analysis (http://developers.sun.com/solaris/articles/read_mmap.html). I can't find any recent benchmarks for Linux, but I think I would be suprised if it was very much different than Solaris (which wasn't the case for the old benchmarks I did find). Of course, finding out if I'm wrong is part of why posted this message in the first place, but I don't *think* I'm wrong. . -- Benjamin A. Collins <ben.collins@acm.org> http://bloggoergosum.us

Benjamin Collins wrote:
On 9/3/07, Frank Birbacher <bloodymir.crap@gmx.net> wrote:
Hi!
How is that different from using "buffers"? The regular std::fstream shows just this behavior: when reading it fills a buffer; when the end of the buffer is reached it loads the next part of the file into memory, and so on. The only difference is that writing to a memory location does not implicitly change the file content. But do you need this kind of random access for writing?
I'm not concerned with random access; I'm concerned with doing really fast reads and writes of large files, which can just be linear reads and writes as far as I'm concerned
The difference between std::fstream and what I'm proposing is performance (hopefully). std::fstream, as I understand it, uses read()/write(). mmap() provides better performance than read(), and increasingly so as your file gets larger. See here for a Solaris-oriented analysis (http://developers.sun.com/solaris/articles/read_mmap.html).
I can't find any recent benchmarks for Linux, but I think I would be suprised if it was very much different than Solaris (which wasn't the case for the old benchmarks I did find).
IIRC, at least under Windows, file i/o uses memory mapped files for all file access. I think there were some threads discussing this in the spirit mailing list. Have you looked to see if the memory_mapping facilities in the interprocess library could meet your needs? Jeff Flinn

On 9/4/07, Jeff Flinn <TriumphSprint2000@hotmail.com> wrote:
Have you looked to see if the memory_mapping facilities in the interprocess library could meet your needs?
Jeff Flinn
The redirection in the interprocess documentation html is broken (the target redirect results in a 404 error). Is there a set of working interprocess docs mirrored anywhere? I've tried off and on today to generate the latest, but bjam fails on uknown doxygen features. -- Benjamin A. Collins <ben.collins@acm.org> http://bloggoergosum.us

On 9/4/07, Benjamin Collins <ben.collins@acm.org> wrote:
On 9/4/07, Jeff Flinn <TriumphSprint2000@hotmail.com> wrote:
Have you looked to see if the memory_mapping facilities in the interprocess library could meet your needs?
Jeff Flinn
The redirection in the interprocess documentation html is broken (the target redirect results in a 404 error). Is there a set of working interprocess docs mirrored anywhere? I've tried off and on today to generate the latest, but bjam fails on uknown doxygen features.
I should have looked a little harder before posting: http://lists.boost.org/Archives/boost/2007/07/125017.php An online copy of the interprocess docs is referenced in the above post. -- Benjamin A. Collins <ben.collins@acm.org> http://bloggoergosum.us

On 9/4/07, Jeff Flinn <TriumphSprint2000@hotmail.com> wrote:
Have you looked to see if the memory_mapping facilities in the interprocess library could meet your needs?
I have now, and it doesn't seem to me that interprocess really gets me anything that iostreams::mapped_file doesn't already provide. interprocess mapped files allow you to map a portion of a file rather than the whole thing, but not to slide the window transparently. Additionally, by using interprocess, I'd lose the ability to set up chains and filters provided by iostreams. For example, one thing I'd like to do is read very large zlib-compressed files. iostreams filtering_stream is a really nice way to do that. -- Benjamin A. Collins <ben.collins@acm.org> http://bloggoergosum.us
participants (3)
-
Benjamin Collins
-
Frank Birbacher
-
Jeff Flinn