Random Access to Files (RAF) - any interest ? - Boost

Random Access to Files (RAF) - any interest ?

older
Re: [boost] Is there interest in...

Slava, Alex

18 May 2006 18 May '06

5:50 p.m.

Dear Boost Community, Is there any interest in submission of a Random Access to Files (RAF) Library to Boost? Please see http://www.trukhanov.kiev.ua/RAF/ for documentation and an initial implementation. Thank you, -- Svyatoslav Trukhanov, Oleksii Ursulenko

Show replies by date

Thorsten Ottosen

18 May 18 May

7:58 p.m.

Slava, Alex wrote:

...

Dear Boost Community,

Is there any interest in submission of a Random Access to Files (RAF) Library to Boost?

I'm sure there would be interest. best regards -Thorsten

Olaf van der Spek

8:24 p.m.

On 5/18/06, Slava, Alex <raf.devel@gmail.com> wrote:

...

Dear Boost Community,

Is there any interest in submission of a Random Access to Files (RAF) Library to Boost? Please see http://www.trukhanov.kiev.ua/RAF/ for documentation and an initial implementation.

Is it possible to read a range of bytes with a single function call? Or would that require one function call per byte? And how are IO errors handled if memory mapping is used?

Slava, Alex

19 May 19 May

4:33 a.m.

...

Is it possible to read a range of bytes with a single function call? Or would that require one function call per byte?

At the moment, yes. We will implement insert() for range copy using iterators, and also low-level methods read() and write() for block copy. However, the whole sense of random access files is that once the data has been accessed, it is very likely to be in memory, and you rarely have to copy it into separate buffer. Operations can be performed with data in place with no performance degradation (operator [] is inline and incurs virtually no overhead).

...

And how are IO errors handled if memory mapping is used?

Interesting question. All errors that are received from system routines can, and actually are handled. But when it comes to errors raised by virtual memory manager, i.e I/O errors during paging occurs, or when the storage where the mapped file resides is removed, there seems no way for library code to handle it, since the error has to be handled on the level of memory management. However, we will investigate this case further. -- Svyatoslav Trukhanov, Oleksii Ursulenko

Beman Dawes

18 May 18 May

8:45 p.m.

"Slava, Alex" <raf.devel@gmail.com> wrote in message news:766dea960605181050h690b4e1eg6d367a77742e209c@mail.gmail.com...

...

Dear Boost Community,

Is there any interest in submission of a Random Access to Files (RAF) Library to Boost? Please see http://www.trukhanov.kiev.ua/RAF/ for documentation and an initial implementation.

Yes, considerable, at least from me. A few comments. * Thanks for providing timings. That is much more believable than unsupported assertions that one technique is faster than another. * A design needs to support both memory-mapped files and regular files, with the same interface, so either implementation can be used as the need dictates. A programmer should be able to write portable code that will work with either implementation. - Not all operating systems support memory-mapped files. - Files may be larger than the available address space. - Some algorithms (Lehman & Yao's B-link tree concurrent locking algorithm, IIRC) don't work well with memory-mapped files, yet memory-mapped files are ideal when concurrent locking isn't an issue. * It isn't possible for a regular file implementation to efficiently support non-constant iterators, because there is no way to tell when a buffer has been written into, and thus must be rewritten to disk. An update() function is needed. It can be a no-op on memory-mapped implementations. * Note that a regular file implementation of an STL iterator based interface may be more efficient than fread/fwrite, because unnecessary copying of data is eliminated. I'm really curious to see timings on this. * Your design uses a class template for the record type, so is limited to files containing a single type. Real files often contain a mix of types. By use of member templates, it is easy to support files containing a mixture of types. If that isn't clear, I can post an example of an experimental interface that does so. That should be enough to start discussion! --Beman

Slava, Alex

20 May 20 May

4:12 a.m.

On 5/18/06, Beman Dawes <bdawes@acm.org> wrote:

...

* Thanks for providing timings. That is much more believable than unsupported assertions that one technique is faster than another.

Thanks. Guided by lots of requests, we are working on providing much more elaborate set of benchmarks. Sorry for late answer; most of your questions hit us as a sniper rifle, and we appreciate it greatly. We divide answers into several emails, because some discussion topics could terminate faster than others. Notation: "memory-mapped file(s)" = "MMF"

...

* A design needs to support both memory-mapped files and regular files, with the same interface, so either implementation can be used as the need dictates. A programmer should be able to write portable code that will work with either implementation.

Good point. However, this question could converge to another: wherther it makes sense to use RAF because of when its efficiency cannot be exploited. Access by index or iterator does seem so convenient, but maybe such approach could cause additional overhead, unlikely though. Maybe other issues will show up. Need to implement and benchmark. Seems worth a try.

...

- Not all operating systems support memory-mapped files.

We do regard portability issues among the most important. We will surely implement RAF for the systems that do not support MMF. This is just an initial implementation to point out our view of the problem, and where we intend to go. Currently, our point of view is that the RAF library should provide functionality that can be implemented efficiently on systems that support MMF. It must be possible to implement on systems that do not support this feature, but with no efficiency guarantees (reasonable efficiency). Primarily this is because we beleave that MMF technology is the main reason for the RAF library to appear on C++ wish list, and why people would like to use it. Notably, it is difficult to find an OS not supporting them: All Unix systems conforming to SVr4, POSIX.1b (formerly POSIX.4), 4.4BSD, SUSv2 have it; in particular this includes Mac OS X and FreeBSD. MS Windows Systems have it since Win 95, and even Win CE has it. OS/2 does not support it explicitly, but it can be remedied by installation of page fault exception handler (things could be even better in Warp and Aurora). Mac OS 9.x and earlier Macintosh systems do not support file mapping, but to the best of our knowledge Apple has ceased to support those systems officially 2 or 3 years ago. -- Svyatoslav Trukhanov, Oleksii Ursulenko

Slava, Alex

4:17 a.m.

On 5/18/06, Beman Dawes <bdawes@acm.org> wrote:

...

- Files may be larger than the available address space.

This problem (Large file support) was mentioned in our "Technical Discussion" section. In this case a "mapping windows" may be implemented that would map parts of a file into memory. There are a lot of implementatin issues with this right now, but I guess this would be a much better way than implementing it without MMF. However, only after implementing both MMF emulation and "mapping windows" we will be able to say wich is better. this takes time though...

...

--Beman

-- Svyatoslav Trukhanov, Oleksii Ursulenko

Slava, Alex

4:18 a.m.

On 5/18/06, Beman Dawes <bdawes@acm.org> wrote:

...

- Some algorithms (Lehman & Yao's B-link tree concurrent locking algorithm, IIRC) don't work well with memory-mapped files, yet memory-mapped files are ideal when concurrent locking isn't an issue.

Locking doesn not work with MMF, that is true. In this particular case there may be workarounds involving MMF, such as a separate file, or shared memory segment, with flags corresponding to each record in the data file, indicating state of the record. Access to this structure should be protected with syncronization primitives. Of course, this workaround is not always appropriate. Again, the question is wherther it makes sense to use RAF when its efficiency cannot be exploited.

...

--Beman

-- Svyatoslav Trukhanov, Oleksii Ursulenko

Slava, Alex

4:28 a.m.

On 5/18/06, Beman Dawes <bdawes@acm.org> wrote:

...

* It isn't possible for a regular file implementation to efficiently support non-constant iterators, because there is no way to tell when a buffer has been written into, and thus must be rewritten to disk. An update() function is needed. It can be a no-op on memory-mapped implementations.

We probably should, something like fflush() in C library. And it may not necessarily be no-op for MMF, because there are syncronization routines for this facility, so it might make sense to use them.

...

* Note that a regular file implementation of an STL iterator based interface may be more efficient than fread/fwrite, because unnecessary copying of data is eliminated. I'm really curious to see timings on this.

Highly unlikely, but we will perform the test and post the results.

...

* Your design uses a class template for the record type, so is limited to files containing a single type. Real files often contain a mix of types. By use of member templates, it is easy to support files containing a mixture of types. If that isn't clear, I can post an example of an experimental interface that does so.

This seems to be a burning question, and could significantly affect implementation.. Could you please post an example ASAP ?

...

That should be enough to start discussion!

--Beman

-- Svyatoslav Trukhanov, Oleksii Ursulenko

Xi Wang

19 May 19 May

8:11 a.m.

Nice:-) I'd like to see more methods such as push_back, though resize could make it but not so convenient. And are there any plans for supporting insert/erase a record or a range of records? Here are some more timings, FYI. All tests were performed on WinXP SP2 with P4 3GHz HT, 1G memory, using test.cpp compiled by VC8 (cl /O2 /EHsc). Sequential tests read/write on a file sized 40M, while random tests were also limited to 1,000,000 iterations. Test type stub cstdio fstream randfile ---------------------------------------------- Sequential read 0.03 6.87 23.76 0.05 Sequential write 1.70 10.09 11.61 1.97 Random read 0.03 4.31 9.58 0.05 Random write 0.09 7.42 13.56 0.09 Random read/write 0.14 6.95 13.34 0.14 ps: cstdio and fstream implementations are those shipped with VC8. On 5/19/06, Slava, Alex <raf.devel@gmail.com> wrote:

...

Dear Boost Community,

Is there any interest in submission of a Random Access to Files (RAF) Library to Boost? Please see http://www.trukhanov.kiev.ua/RAF/ for documentation and an initial implementation.

Thank you,

-- Svyatoslav Trukhanov, Oleksii Ursulenko _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Slava, Alex

20 May 20 May

7:57 a.m.

On 5/19/06, Xi Wang <xi.wang@gmail.com> wrote:

...

Nice:-) I'd like to see more methods such as push_back, though resize could make it but not so convenient. And are there any plans for supporting insert/erase a record or a range of records?

resize routine is very expensive, thus it should be invoked as rare as posible. push_back() makes resize each time called, so it is not effective. insert/erase question: what to do with the tail of the file in this cases?

...

Here are some more timings, FYI. All tests were performed on WinXP SP2 with P4 3GHz HT, 1G memory, using test.cpp compiled by VC8 (cl /O2 /EHsc). Sequential tests read/write on a file sized 40M, while random tests were also limited to 1,000,000 iterations.

Test type stub cstdio fstream randfile ---------------------------------------------- Sequential read 0.03 6.87 23.76 0.05 Sequential write 1.70 10.09 11.61 1.97 Random read 0.03 4.31 9.58 0.05 Random write 0.09 7.42 13.56 0.09 Random read/write 0.14 6.95 13.34 0.14

ps: cstdio and fstream implementations are those shipped with VC8.

Thank you very much for performance testing. -- Svyatoslav Trukhanov, Oleksii Ursulenko

Olaf van der Spek

19 May 19 May

10:33 a.m.

On 5/18/06, Slava, Alex <raf.devel@gmail.com> wrote: Could you include the low-level (and probably non-portable) _open, _read, etc calls in your benchmark? And I noticed you consider 0 byte files invalid. I think this exception should not be made and they should be considered valid.

Slava, Alex

20 May 20 May

7:53 a.m.

On 5/19/06, Olaf van der Spek <olafvdspek@gmail.com> wrote:

...

On 5/18/06, Slava, Alex <raf.devel@gmail.com> wrote: Could you include the low-level (and probably non-portable) _open, _read, etc calls in your benchmark? Ok. Such test has just been made, results are not very different from stdio implementation. System call overhead and data copying are killing performance. And I noticed you consider 0 byte files invalid. I think this exception should not be made and they should be considered valid. But what is possible to do with this file? Only resize() is valid operation.

-- Svyatoslav Trukhanov, Oleksii Ursulenko

Olaf van der Spek

9:39 a.m.

On 5/20/06, Slava, Alex <raf.devel@gmail.com> wrote:

...

On 5/19/06, Olaf van der Spek <olafvdspek@gmail.com> wrote:

...
On 5/18/06, Slava, Alex <raf.devel@gmail.com> wrote: Could you include the low-level (and probably non-portable) _open, _read, etc calls in your benchmark? Ok. Such test has just been made, results are not very different from stdio implementation. System call overhead and data copying are killing performance. And I noticed you consider 0 byte files invalid. I think this exception should not be made and they should be considered valid. But what is possible to do with this file? Only resize() is valid operation.

Nothing else, but I don't think that's a problem.

Jose

22 May 22 May

11:55 a.m.

I am very interested in using your library and I hope is included in Boost soon. Some comments I have related to what has been discussed in this thread: * It should be clearly stated in the docs that the performance gain is by using MMF (not RAF) which was not clear when I looked at the docs * Concurrent access - I think this is a key feature. Supporting regular files will be good for this as you can support easily concurrent access via locking at the expense of performance * Large file support - A practical approach to this would be to limit the support to 64-bit systems, which are starting to be very common and in which you don't have mmap issues. Would this be a good idea ? * Do you also plan to support what's described in Yao's paper "efficient locking for concurrent operations on b-trees" regards jose On 5/18/06, Slava, Alex <raf.devel@gmail.com> wrote:

...

Dear Boost Community,

Is there any interest in submission of a Random Access to Files (RAF) Library to Boost? Please see http://www.trukhanov.kiev.ua/RAF/ for documentation and an initial implementation.

Thank you,

-- Svyatoslav Trukhanov, Oleksii Ursulenko _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Slava, Alex

8:24 p.m.

On 5/22/06, Jose <jmalv04@gmail.com> wrote:

...

I am very interested in using your library and I hope is included in Boost soon.

Very pleased to hear that !

...

Some comments I have related to what has been discussed in this thread:

* It should be clearly stated in the docs that the performance gain is by using MMF (not RAF) which was not clear when I looked at the docs

Sure.

...

* Concurrent access - I think this is a key feature. Supporting regular files will be good for this as you can support easily concurrent access via locking at the expense of performance

Yes, seems a lot of people would like this. Regular files will be supported.

...

* Large file support - A practical approach to this would be to limit the support to 64-bit systems, which are starting to be very common and in which you don't have mmap issues. Would this be a good idea ?

This would be an easy way; but 32-bit systems are still going to be around for years.

...

* Do you also plan to support what's described in Yao's paper "efficient locking for concurrent operations on b-trees"

will be decided soon -- Svyatoslav Trukhanov, Oleksii Ursulenko

Pavel Vozenilek

23 May 23 May

1:13 a.m.

"Slava, Alex" wrote:

...

...
* Large file support - A practical approach to this would be to limit the support to 64-bit systems, which are starting to be very common and in which you don't have mmap issues. Would this be a good idea ?

This would be an easy way; but 32-bit systems are still going to be around for years.

4+ GB files are processed part by part on 32-bits systems. /Pavel

Tomas Puverle

1:08 a.m.

...

* Concurrent access - I think this is a key feature. Supporting regular files will be good for this as you can support easily concurrent access via locking at the expense of performance

What kind of locking do you have in mind?

...

* Large file support - A practical approach to this would be to limit the support to 64-bit systems, which are starting to be very common and in which you don't have mmap issues. Would this be a good idea ?

There are filesystems out there that support 128 bits for file sizes. So I think there needs to be a more general solution.

Slava, Alex

6:15 a.m.

On 5/22/06, Tomas Puverle <Tomas.Puverle@morganstanley.com> wrote: ...

...

...
* Large file support - A practical approach to this would be to limit the support to 64-bit systems, which are starting to be very common and in which you don't have mmap issues. Would this be a good idea ?

There are filesystems out there that support 128 bits for file sizes. So I think there needs to be a more general solution.

We think that large file implementation is required and it will be provided. "Large" will be different for different OSes. Now for Windows systems "Large" will start after 2GB, Linux by default allows just 1GB for a process user space, so the bound in less then 1GB, SunOS in 64-bit mode allows (according to manual, not tested) to mmap file of size up to 2^64 bytes. If OS provides necessary API to 128-bit file system the corresonding implementation of RAF may be developed. -- Svyatoslav Trukhanov, Oleksii Ursulenko

Tomas Puverle

1:37 p.m.

Slava, Alex <raf.devel <at> gmail.com> writes:

...

On 5/22/06, Tomas Puverle <Tomas.Puverle <at> morganstanley.com> wrote: ...

...
...
* Large file support - A practical approach to this would be to limit the support to 64-bit systems, which are starting to be very common and in

which

...

...
...
you don't have mmap issues. Would this be a good idea ?

There are filesystems out there that support 128 bits for file sizes. So I think there needs to be a more general solution.

We think that large file implementation is required and it will be provided. "Large" will be different for different OSes. Now for Windows systems "Large" will start after 2GB,

Perhaps it would be worthwhile to also allow for /3G configured systems? Since Windows dropped its support for MIPS, it's been possible to extend the user space to 3GB under x86 (provided the system is booted with the flag)

...

Linux by default allows just 1GB for a process user space, so the bound in less then 1GB,

Again, Linux can now support 3GB and even 4GB of userspace in so called hugemem kernels.

...

SunOS in 64-bit mode allows (according to manual, not tested) to mmap file of size up to 2^64 bytes.

Yes. I've tested some very large files on some Solaris boxes with 200GB of RAM. Not quite 2^64 but it seems that so far, so good.

...

If OS provides necessary API to 128-bit file system the corresonding implementation of RAF may be developed.

Well, you can't map more of the file than the allowable address space of the machine at once :) Hence my earlier reference to the fact that this will be the same problem as trying to map 64-bit files under 32-bit OSs. Also, AFAIK these 128-bit filesystems preserve posix semantics, so they support read(), write() etc transparently. Since you can't put more than 64- bits of data into memory on a 64-bit system, the interface of most syscalls doesn't need to be changed. Tom

Slava, Alex

9:31 p.m.

...

On 5/22/06, Tomas Puverle <Tomas.Puverle@morganstanley.com> wrote: ...

...
There are filesystems out there that support 128 bits for file sizes. So I think there needs to be a more general solution.

Totally agree about more general solution. That does not change anything, but just out of curiosity some calculations were made: the number of atoms in the graphite of a pencil is < 2^72, http://www.madsci.org/posts/archives/oct98/905633072.As.r.html so the storage providing capacity to hold files with 128-bit offset would have to contain at least as many atoms as 8*2^56 = 2^59 pencil leads ! Hundreds of tons of material that should be. And this is with just 1 atom per bit of information. Seems a bit too much for any conceivable technology... What FS would need to provide support for such capacities ? -- Svyatoslav Trukhanov, Oleksii Ursulenko

Tomas Puverle

10:09 p.m.

...

Totally agree about more general solution.

That does not change anything, but just out of curiosity some calculations were made: the number of atoms in the graphite of a pencil is < 2^72,

http://www.madsci.org/posts/archives/oct98/905633072.As.r.html

so the storage providing capacity to hold files with 128-bit offset would have to contain at least as many atoms as 8*2^56 = 2^59 pencil leads ! Hundreds of tons of material that should be. And this is with just 1 atom per bit of information. Seems a bit too much for any conceivable technology... What FS would need to provide support for such capacities ?

Agreed, it would be a lot of data. However, I was mistaken about the maximum file size. It seems that even on the very large filesystems such as ZFS and the Veritas VxFS, the maximum size of an individual file is only 2^64. Look here: http://en.wikipedia.org/wiki/Comparison_of_file_systems under the section "Limits". With regards to your calculation, Jeff Bonwick, the designer of ZFS filesystem made a similar calculation to yours in one of his initial blog posts: "Although we'd all like Moore's Law to continue forever, quantum mechanics imposes some fundamental limits on the computation rate and information capacity of any physical device. In particular, it has been shown that 1 kilogram of matter confined to 1 liter of space can perform at most 1051 operations per second on at most 1031 bits of information [see Seth Lloyd, "Ultimate physical limits to computation." Nature 406, 1047-1054 (2000)]. A fully-populated 128-bit storage pool would contain 2128 blocks = 2137 bytes = 2140 bits; therefore the minimum mass required to hold the bits would be (2140 bits) / (1031 bits/kg) = 136 billion kg. To operate at the 1031 bits/kg limit, however, the entire mass of the computer must be in the form of pure energy. By E=mc2, the rest energy of 136 billion kg is 1.2x1028 J. The mass of the oceans is about 1.4x1021 kg. It takes about 4,000 J to raise the temperature of 1 kg of water by 1 degree Celcius, and thus about 400,000 J to heat 1 kg of water from freezing to boiling. The latent heat of vaporization adds another 2 million J/kg. Thus the energy required to boil the oceans is about 2.4x106 J/kg * 1.4x1021 kg = 3.4x1027 J. Thus, fully populating a 128-bit storage pool would, literally, require more energy than boiling the oceans."

Pavel Vozenilek

24 May 24 May

11 a.m.

"Tomas Puverle" wrote: [ snip estimates how huge 128 bits storage would be ] OT: hypothetical future flesystem may use these 128 bits to provide hierarchical structuring of the data, ala IPv6. ---------- More realistic feature request would be support for sparse files. The empty areas could be used for appending new data efficiently, normal operations on RAF should ignore then. Support for sparse files is available in Win32 via DeviceIoControl(FSCTL_QUERY_ALLOCATED_RANGES). I have vague feeling Linux has something like that too. /Pavel

Tomas Puverle

3:02 p.m.

...

OT: hypothetical future flesystem may use these 128 bits to provide hierarchical structuring of the data, ala IPv6.

That is an interesting idea.

...

More realistic feature request would be support for sparse files.

The empty areas could be used for appending new data efficiently, normal operations on RAF should ignore then.

Support for sparse files is available in Win32 via DeviceIoControl(FSCTL_QUERY_ALLOCATED_RANGES).

I have vague feeling Linux has something like that too.

ZFS on Solaris 10 has an extension to the the Posix lseek() flags, SEEK_DATA and SEEK_HOLE. It makes it easy to find the holes in your files. I may be wrong here but I don't think Linux supports this.

Slava, Alex

25 May 25 May

8:35 a.m.

...

...
More realistic feature request would be support for sparse files.

The empty areas could be used for appending new data efficiently, normal operations on RAF should ignore then.

Support for sparse files is available in Win32 via DeviceIoControl(FSCTL_QUERY_ALLOCATED_RANGES).

I have vague feeling Linux has something like that too.

ZFS on Solaris 10 has an extension to the the Posix lseek() flags, SEEK_DATA and SEEK_HOLE. It makes it easy to find the holes in your files. I may be wrong here but I don't think Linux supports this.

Sparce file handling was described in Kernighan book more then 20 years ago. I think that most UNIX-systems produces "holes" by default (Linux do this on ext2/ext3 and XFS for sure). It is not a problem to RAF. It just ask a system to allocate space and it is not important whether the data allocated or "hole" created. when writing in the "hole" system will allocate blocks for a file. There may be a problem that there is no more space on device, exactly such case is descried in MSDN (using memory-maped file on sparce or comprressed file). We will just check for an error in such cases. -- Svyatoslav Trukhanov, Oleksii Ursulenko

Olaf van der Spek

9:28 a.m.

On 5/25/06, Slava, Alex <raf.devel@gmail.com> wrote:

...

...
ZFS on Solaris 10 has an extension to the the Posix lseek() flags, SEEK_DATA and SEEK_HOLE. It makes it easy to find the holes in your files. I may be wrong here but I don't think Linux supports this.

Sparce file handling was described in Kernighan book more then 20 years ago. I think that most UNIX-systems produces "holes" by default (Linux do this on ext2/ext3 and XFS for sure). It is not a problem to RAF. It just ask a system to allocate space and it is not important whether the data allocated or "hole" created. when writing in the "hole" system will allocate blocks for a file. There may be a problem that there is no more space on device, exactly such case is descried in MSDN (using memory-maped file on sparce or comprressed file). We will just check for an error in such cases.

What about (avoiding) fragmentation? I think a way to allocate the entire file at one time should be available.

Slava, Alex

26 May 26 May

4:25 p.m.

...

What about (avoiding) fragmentation? I think a way to allocate the entire file at one time should be available.

It is up to OS. We are not able to force it to allocate file blocks consequently. Also we don't sure that allocating entire file at one time guaranties non-fragmentated chunk of disk space. -- Svyatoslav Trukhanov, Oleksii Ursulenko

Jose

13 Jul 13 Jul

9:15 a.m.

Is this project still active ? The library link seems broken right now On 5/26/06, Slava, Alex <raf.devel@gmail.com> wrote:

...

...
What about (avoiding) fragmentation? I think a way to allocate the entire file at one time should be available.

It is up to OS. We are not able to force it to allocate file blocks consequently. Also we don't sure that allocating entire file at one time guaranties non-fragmentated chunk of disk space.

-- Svyatoslav Trukhanov, Oleksii Ursulenko _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Tomas Puverle

25 May 25 May

7:53 p.m.

...

...
...
More realistic feature request would be support for sparse files.

The empty areas could be used for appending new data efficiently, normal operations on RAF should ignore then.

Support for sparse files is available in Win32 via DeviceIoControl(FSCTL_QUERY_ALLOCATED_RANGES).

I have vague feeling Linux has something like that too.

ZFS on Solaris 10 has an extension to the the Posix lseek() flags, SEEK_DATA and SEEK_HOLE. It makes it easy to find the holes in your files. I may be wrong here but I don't think Linux supports this.

Sparce file handling was described in Kernighan book more then 20 years ago. I think that most UNIX-systems produces "holes" by default (Linux do this on ext2/ext3 and XFS for sure). It is not a problem to RAF. It just ask a system to allocate space and it is not important whether the data allocated or "hole" created. when writing in the "hole" system will allocate blocks for a file. There may be a problem that there is no more space on device, exactly such case is descried in MSDN (using memory-maped file on sparce or comprressed file). We will just check for an error in such cases.

Oh, I think you misunderstood what I was trying to say. The current Posix lseek() interface supports 3 flags: SEEK_BEG, SEEK_CUR and SEEK_END. However, ZFS adds 2 more flags: SEEK_HOLE and SEEK_DATA. They allow you to discover the holes in your files, which is not possible under Unix. Under normal Unix semantics, just like you said, a read of a "hole" will just return zeros, and even the on-disk size will display the "right" size, even though the date isn't actually present. However, with ZFS, an application can, for example, ignore holes. That could be advantageous for many programs, e.g. compression, tar, etc...

Tomas Puverle

23 May 23 May

1:35 a.m.

...

Dear Boost Community,

Is there any interest in submission of a Random Access to Files (RAF) Library to Boost? Please see http://www.trukhanov.kiev.ua/RAF/ for documentation and an initial implementation.

I'm affraid I've not had the time to look at your implementation but I had a quick scan of the docs. An approach I've taken when implementing a memory mapped file class in the pass was to try to improve on the compile-time guarantees one can provide. When you open a file as read only, you can still read it but write access causes a SEGV on Unix. This is undesirable, especially if at compile time you know that you will only access the MMF for reads. What I am suggesting is that you take advantage of the type system and distinguish between readonly (const) MMFs (which will consequently return const_iterators). Also, aside from the RW criteria, it would be useful to be able to provide a range for the mapping. This brings up the possibility of what happens should the file be smaller than the range that was initially requested, and also an interesting question as to what would I do if I am trying to read a file that is growing as I read it (similar to tail -f). Finally, it would be nice to be able to specify some of the flags that memory mapped files can take when they are being opened, such as MAP_PRIVATE, MAP_ANON, MAP_FIXED etc. Also as a word of caution: please don't assume any particular page size in your library. (You may need to know the page size if you decide to provide subrange mappings). For very large files we use page sizes larger than the usual 4/8KB because using anything smaller destroys our TLB caches. But I am sure you're not doing anything like that anyway. Thanks! Tom

Slava, Alex

6:07 a.m.

On 5/22/06, Tomas Puverle <Tomas.Puverle@morganstanley.com> wrote:

...

...
Dear Boost Community,

Is there any interest in submission of a Random Access to Files (RAF) Library to Boost? Please see http://www.trukhanov.kiev.ua/RAF/ for documentation and an initial implementation.

I'm affraid I've not had the time to look at your implementation but I had a quick scan of the docs.

An approach I've taken when implementing a memory mapped file class in the pass was to try to improve on the compile-time guarantees one can provide. When you open a file as read only, you can still read it but write access causes a SEGV on Unix. This is undesirable, especially if at compile time you know that you will only access the MMF for reads. What I am suggesting is that you take advantage of the type system and distinguish between readonly (const) MMFs (which will consequently return const_iterators).

It is already done. We have 2 classes one for read-only access and another for read-write. Read-only class provides only const version of operator[], at() and iterators so the above error will be detected at compile time.

...

Also, aside from the RW criteria, it would be useful to be able to provide a range for the mapping. This brings up the possibility of what happens should the file be smaller than the range that was initially requested, and also an interesting question as to what would I do if I am trying to read a file that is growing as I read it (similar to tail -f).

We are thinking about range mapping and it will be implemented soon. Changing the file "from outside" is a big problem. What if it will be shortened? Also it seems varying in diffrent OSes.

...

Finally, it would be nice to be able to specify some of the flags that memory mapped files can take when they are being opened, such as MAP_PRIVATE, MAP_ANON, MAP_FIXED etc.

1. Windows doesn't have such flags 2. RAF may be implemented without MM for systems that don't provide it. 3. Working with large files will map part of the file and will perform mapping/unmapping not only at the begining, so MAP_FIXED is not good here.

...

Also as a word of caution: please don't assume any particular page size in your library. (You may need to know the page size if you decide to provide subrange mappings). For very large files we use page sizes larger than the usual 4/8KB because using anything smaller destroys our TLB caches. But I am sure you're not doing anything like that anyway.

Thank you for caution. It seems that small page size is a reason of performance degrading when working with large files. Also page size related issues is OS dependant. For now I think we will not play with this at all. In the future maybe this will be considered as a better performance implementation.

...

Thanks!

Tom

-- Svyatoslav Trukhanov, Oleksii Ursulenko

Tomas Puverle

1:58 p.m.

...

It is already done. We have 2 classes one for read-only access and another for read-write.

Great!

...

Changing the file "from outside" is a big problem. What if it will be shortened? Also it seems varying in diffrent OSes.

I understand the problem but there isn't much that you can do about a file being truncated under you. That is a higher level synchronisation problem that your library won't solve. Now, assuming you can ignore the file getting shorter, can you provide something for the file getting longer?

...

...
Finally, it would be nice to be able to specify some of the flags that memory mapped files can take when they are being opened, such as MAP_PRIVATE, MAP_ANON, MAP_FIXED etc.

1. Windows doesn't have such flags

But it has an equivalent set of flags/functionality. You can specify a base address to MapViewOfFileEx. This is similar to MAP_FIXED. MAP_PRIVATE can be set up by calling MapViewOfFileEx with FILE_MAP_COPY (And the underlying mapping object has to be create with PAGE_WRITECOPY protection). MAP_ANON can be achieved by setting the file handle to map to INVALID_HANDLE_VALUE when creating the file mapping.

...

2. RAF may be implemented without MM for systems that don't provide it.

I understand. These flags could be emulated or it could be a caveat of the implementation. But like you said yourself, there are a lot of systems that provide mmapping these days.

...

3. Working with large files will map part of the file and will perform mapping/unmapping not only at the begining, so MAP_FIXED is not good here.

But shouldn't that be up to the user to decide?

...

...
Also as a word of caution: please don't assume any particular page size in your library. (You may need to know the page size if you decide to provide subrange mappings). For very large files we use page sizes larger than the usual 4/8KB because using anything smaller destroys our TLB caches. But I am sure you're not doing anything like that anyway.

Thank you for caution. It seems that small page size is a reason of performance degrading when working with large files. Also page size related issues is OS dependant. For now I think we will not play with this at all. In the future maybe this will be considered as a better performance implementation.

The reason why I mention it is that I think you will need to play with it. For example, if I request a range of file starting at byte 104 to byte 30035, you will need to request a mapping for an integral number of pages and then set your begin() and end() pointers inside the mapped region. To perform this mapping, you will need to know the page size. I expect the same case will happen if you try to map a large file into a small address space, since you will need to perform a map on a subrange of the whole file. Tom

Slava, Alex

6:04 p.m.

On 5/23/06, Tomas Puverle <Tomas.Puverle@morganstanley.com> wrote:

...

...
It is already done. We have 2 classes one for read-only access and another for read-write.

Great!

...
Changing the file "from outside" is a big problem. What if it will be shortened? Also it seems varying in diffrent OSes.

I understand the problem but there isn't much that you can do about a file being truncated under you. That is a higher level synchronisation problem that your library won't solve. Now, assuming you can ignore the file getting shorter, can you provide something for the file getting longer?

Conceptually it should be the same thing. If nobody notifies the library about change in file size, it just doesn't know. Theoretically, to handle such a situation checking can be done whether a hit into the area that was originally beyond the file bounds is actually an error. If not, depending on the implementation, a remapping can be done, or "mapping window" will be moved appropriately, otherwise an exception is raised, or whatever. Such behaviour would make all access operations costly on all implementations, and render RAF useless for the purpose it was originally designed. On the other hand, if the hit outside the file bounds was made, it is either an application error, or an evidence that the application *knows* of the changes in file size. Than the appropriate measures can be taken explicitly by the application itself. At least this is how we see it at the moment.

...

...
...
Finally, it would be nice to be able to specify some of the flags that memory mapped files can take when they are being opened, such as MAP_PRIVATE, MAP_ANON, MAP_FIXED etc.

1. Windows doesn't have such flags

But it has an equivalent set of flags/functionality. You can specify a base address to MapViewOfFileEx. This is similar to MAP_FIXED. MAP_PRIVATE can be set up by calling MapViewOfFileEx with FILE_MAP_COPY (And the underlying mapping object has to be create with PAGE_WRITECOPY protection). MAP_ANON can be achieved by setting the file handle to map to INVALID_HANDLE_VALUE when creating the file mapping.

Right. Thanks. Overlooked in the MSDN.

...

...
2. RAF may be implemented without MM for systems that don't provide it.

I understand. These flags could be emulated or it could be a caveat of the implementation. But like you said yourself, there are a lot of systems that provide mmapping these days.

There is one more thing. There is a thouhgt to provide at least both MMF and non-MMF implementations on all systems because of concurrent locking problems with memory mapping. Then these flags either have to be emulated, or they would add confusion, because of being of use for one of implementations. On the other hand this functionality seems important. This requires thorough analysis...

...

...
3. Working with large files will map part of the file and will perform mapping/unmapping not only at the begining, so MAP_FIXED is not good here.

But shouldn't that be up to the user to decide?

If we provide other flags, then this one perhaps also should be provided...

...

...
...
Also as a word of caution: please don't assume any particular page size in your library. (You may need to know the page size if you decide to provide [snip] The reason why I mention it is that I think you will need to play with it. For example, if I request a range of file starting at byte 104 to byte 30035, you will need to request a mapping for an integral number of pages and then set your begin() and end() pointers inside the mapped region. To perform this mapping, you will need to know the page size. I expect the same case will happen if you try to map a large file into a small address space, since you will need to perform a map on a subrange of the whole file.

Thanks. Very true.

...

Tom

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- Svyatoslav Trukhanov, Oleksii Ursulenko

rasmus ekman

5:13 p.m.

Slava, Alex wrote:

...

On 5/22/06, Tomas Puverle <Tomas.Puverle@morganstanley.com> wrote:

...
Finally, it would be nice to be able to specify some of the flags that memory mapped files can take when they are being opened, such as MAP_PRIVATE, MAP_ANON, MAP_FIXED etc.

1. Windows doesn't have such flags

While the semantics may not be identical, there are seductively similar flags. MAP_PRIVATE - CreateFileMapping with PAGE_WRITECOPY MAP_ANONYMOUS - somewhat similar to CreateFileMapping with INVALID_HANDLE_VALUE (-1) (uses system page file) MAP_FIXED - MapViewOfFileEx with placement address. Cheers, re

Slava, Alex

6:04 p.m.

On 5/23/06, rasmus ekman <m11048@abc.se> wrote:

...

Slava, Alex wrote:

...
On 5/22/06, Tomas Puverle <Tomas.Puverle@morganstanley.com> wrote:

...
Finally, it would be nice to be able to specify some of the flags that memory mapped files can take when they are being opened, such as MAP_PRIVATE, MAP_ANON, MAP_FIXED etc.

1. Windows doesn't have such flags

While the semantics may not be identical, there are seductively similar flags. MAP_PRIVATE - CreateFileMapping with PAGE_WRITECOPY MAP_ANONYMOUS - somewhat similar to CreateFileMapping with INVALID_HANDLE_VALUE (-1) (uses system page file) MAP_FIXED - MapViewOfFileEx with placement address.

Right. Thanks. Overlooked in the MSDN.

...

Cheers,

re

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- Svyatoslav Trukhanov, Oleksii Ursulenko

6907

Age (days ago)

6963

Last active (days ago)

List overview

Download

34 comments

9 participants

participants (9)

Beman Dawes
Jose
Olaf van der Spek
Pavel Vozenilek
rasmus ekman
Slava, Alex
Thorsten Ottosen
Tomas Puverle
Xi Wang