iostream large file - performance
Hello.. I've been testing fstream libary to read a huge text file (45 megabytes), which has around 2 million lines.. When I do (using fstream): Code: int line = 0; string str; ifstream is("file.txt"); while (!is.eof()) { getline(is, str); line++; }It takes about 9 seconds to read the whole file.. If I open file it with any 'good' text editor (editplus, notepad++ for instance) it takes a few seconds to read/load the whole file.. So I'm wondering.. What am I doing wrong? What should I do to achieve that speed - better preformance? Can boost::iostream do any better? Another thing: I tried writing data to deque (deque.pushback(str)) and was shocked on what happened.. It took ages to process the file, cpu constant at 99% so I closed the program after around 3 minutes.. Whats the 'proper' way to read files to memory? How do text editors do this? They load the same file in 4 seconds! I need a fastest way to read about 10 000 lines from text file at once.. Any help highly appreciated! Best regards A
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Aljaz Sent: Thursday, March 08, 2007 7:31 AM To: boost-users@lists.boost.org Subject: [Boost-users] iostream large file - performance I've been testing fstream libary to read a huge text file (45 megabytes), which has around 2 million lines.. When I do (using fstream): It takes about 9 seconds to read the whole file.. If I open file it with any 'good' text editor (editplus, notepad++ for instance) it takes a few seconds to read/load the whole file.. [Nat] A good text editor might *not* read the entire file. It probably reads enough to display the first page, plus another couple of bufferloads to anticipate you scrolling forward. From then on, it probably reads more on demand.
[Nat] A good text editor might *not* read the entire file. It probably reads enough to display the first page, plus another couple of bufferloads to anticipate you scrolling forward. From then on, it probably reads more on demand.
And a good text editor probably reads the file as a binary file (using the read() method or something like that). -- Hermann O. Rodrigues
So it would be better to do read instead of getline? Would read() and boost::tokenizer to split lines be faster? Is there even faster way? Thanks for help
I'm not sure what your need is, but for fastest performance, read the file into a large byte array, and only split the lines where you need them. For example, if you want to display the data, you will only display the first 100 lines, so set up a separate array with pointers into the large array where the lines begin, and determine the pointers to line start for the first 100 lines only. Then display that. Whenever your display window changes you can determine the line starts from the known lines, forwards or backwards. You can also have a background thread determine the pointers to line starts for the remaining lines. The end result is you have two arrays. One contains the entire file as a character array, the other contains an array of pointers to the beginning of each line. A line would be defined as starting wherever the pointer points to and ending with a carriage-return. Hope this helps, Eric T.
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users- bounces@lists.boost.org] On Behalf Of Aljaz Sent: Thursday, March 08, 2007 16:15 To: boost-users@lists.boost.org Subject: Re: [Boost-users] iostream large file - performance
So it would be better to do read instead of getline?
Would read() and boost::tokenizer to split lines be faster? Is there even faster way?
Thanks for help
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users- bounces@lists.boost.org] On Behalf Of Aljaz Sent: Thursday, March 08, 2007 16:15 To: boost-users@lists.boost.org Subject: Re: [Boost-users] iostream large file - performance
So it would be better to do read instead of getline?
Would read() and boost::tokenizer to split lines be faster? Is there even faster way? Eric Teutsch wrote: I'm not sure what your need is, but for fastest performance, read the file into a large byte array, and only split the lines where you need
Wouldn't the fastest way be to not read it into an array, but to open via memory mapped file and address it directly? I've not used the boost.interprocess library, but IIRC, it provides this facility. Jeff F
Aljaz wrote:
Whats the 'proper' way to read files to memory? How do text editors do this? They load the same file in 4 seconds! I need a fastest way to read about 10 000 lines from text file at once..
Not exactly relevant, but a co-worker of mine wrote some blog entries about the performance of writing files and a bit about reading. For reading, the main point was that you can read a unfragmented file much faster than a fragmented one. http://justin-michel.spaces.live.com/?_c11_BlogPart_blogpart=blogview&_c=BlogPart&partqs=cat%3dCode%2bPerformance KevinH -- Kevin Heifner heifner @ ociweb.com http://heifner.blogspot.com Object Computing, Inc. (OCI) www.ociweb.com
Hi Aljaz,
On 3/8/07, Aljaz
I've been testing fstream libary to read a huge text file (45 megabytes), which has around 2 million lines..
When I do (using fstream):
Code: int line = 0; string str; ifstream is("file.txt"); while (!is.eof()) { getline(is, str); line++; }It takes about 9 seconds to read the whole file..
If I open file it with any 'good' text editor (editplus, notepad++ for instance) it takes a few seconds to read/load the whole file..
So I'm wondering.. What am I doing wrong? What should I do to achieve that speed - better preformance?
As others have said earlier, text editor read your files in a piecemeal fashion - a little at a time appropriately when you need to move to corresponding lines/pages. I am newbie to boost myself but I feel your problem has more to do with the relationship between the file allocation unit size of your OS and that of your buffer/array and how large a buffer you allocate to read the data. Also, it looks like you want to "count" the lines instead of reading. Anyway, I think looking at the appropriate OS system services, which are efficiently related to corresponding OS file allocation unit sizes, and then looking at the source code of the library (boost or whatever) you are using would be helpful. -- Best, Asif
participants (7)
-
Aljaz
-
Asif Lodhi
-
Eric Teutsch
-
Hermann Rodrigues
-
Jeff F
-
Kevin Heifner
-
Nat Goodspeed