Hello
I have following problem:
I need to filter some records from one file and save it to another in
my c++ application.
For example from this file:
/////////////////// in.txt ////////////////////////////////
http://google.com
http://yahoo.com
...
http://google.com/analytics
////////////////////////////////////////////////////////////
I want to only extract lines that match regex:
^(?:http://google.com).*
to get:
//////////////// out.txt ///////////////////////////////
http://google.com
...
http://google.com/analytics
/////////////////////////////////////////////////////////
So I wrote something like this:
class Writer
{
public:
Writer()
:matchesCount_(0){}
virtual std::string operator() (const boost::match_results& result)
{
matchesCount_ = result.size();
return aux_;
}
int getMatchesCount() const
{
return matchesCount_;
}
virtual ~Writer(){}
private:
std::string aux_; //this i completely useless but i must return
something in operator()
int matchesCount_;
};
///////////////////////////////////////////////////////////////////////////////
class FileWriter : public Writer
{
public:
FileWriter(std::ostream* of)
:of_(of)
{}
std::string operator() (const boost::match_results& result)
{
*of_ << *result.begin() << endl;
return Writer::operator()(result);
}
private:
std::ostream* of_;
};
///////////////////////////////////////////////////////////////////////////////
int main(int argc, char *argv[])
{
boost::regex match_lower("^(?:http://google.com).*");
std::ofstream out("out.txt");
string str;
filtering_istream
first(boost::iostreams::regex_filter(match_lower, FileWriter(&out)));
first.push(file_source("in.txt", ios_base::in));
first.ignore();// my output is a side effect of filtering so I
don't have to process this stream
return 0;
}
/////////////////////////////////////////////////////////////////////////////
It works fine for short files (IMO for files which size is smaller
then size of stream buffer). But I work with very large files (~4,7
GB) and then this is not a good solution. Do you have any idea how to
solve it?
--
Regards
Michał Nowotka