
On 10/11/13 7:14 PM, Brian Budge wrote:
If you're on a 64-bit system, you can simply mmap the entire file. There is no need to break the file into regions just because it's huge :) The OS will page the data in as required. On 32-bit, you do need to manage regions because you would otherwise exceed your address space. This might be kinda crappy as you'd ideally want to split your regions at EOL boundaries, and you need to parse your file before you know where these are. In practice, you'd be stuck worrying about straddling EOL, but hey, that's the price you pay if you want to run 32-bit code.
But on 32-bit systems I need to say "hey this program will go fubar as you load big files, use it at your own peril!" :)
Another side-question, if you don't mind. I'm not sure that what I'm doing is efficient, especially the need to copy from the region to a string. If you have suggestions, I'm more than happy to hear these.
I would use boost's new string_ref instead of string. The obvious solution would be to use boost.tokenizer to break up the giant string into string_ref lines; however, I'm unsure that this is supported yet. An EOL tokenizer should be only a few lines of code though, and you could fairly trivially tokenize your string into string_refs.
Awesome classes, I will try them! Thanks!