[spirit][qi] Fastest way to parse file

Hi, Boost.Spirit documentation advices to use multi_pass iterators for parsing files (or reading data to a STL container and then passing the begin and end of the container to Spirit.Qi). Much better solution would be to use a memory mapped file: boost::interprocess::file_mapping fm(filename.c_str(), boost::interprocess::read_only); boost::interprocess::mapped_region region(fm, boost::interprocess::read_only, 0, 0); const char* begin = reinterpret_cast<const char*>(region.get_address()); const char* const end = b + region.get_size(); Comparing to multi_pass iterator, mmap approach increased parsing speed more than 5 times, reduced memory usage and CPU load. Mmap approach is also a little bit faster than reading data to STL container and mmap sometimes requires less memory (depending on OS). I think, that such solution should be at least mentioned in Spirit documentation. May be mmap shall be wrapped in some class for tighter integration with Spirit and simpler usage: int main() { namespace spirit = boost::spirit; using spirit::ascii::space; using spirit::ascii::char_; using spirit::qi::double_; using spirit::qi::eol; spirit::file_parser first("multi_pass.txt"); // class that does mmap`ing std::vector<double> v; bool result = spirit::qi::phrase_parse(first , spirit::make_default_multi_pass(base_iterator_type()) , double_ >> *(',' >> double_) // recognize list of doubles , space | '#' >> *(char_ - eol) >> eol // comment skipper , v); // data read from file if (!result) { std::cout << "Failed parsing input file!" << std::endl; return -2; } std::cout << "Successfully parsed input file!" << std::endl; return 0; } Best regards, Antony Polukhin

On 2/23/12 4:15 AM, Antony Polukhin wrote:
Hi,
Boost.Spirit documentation advices to use multi_pass iterators for parsing files (or reading data to a STL container and then passing the begin and end of the container to Spirit.Qi).
Much better solution would be to use a memory mapped file:
[snip]
Comparing to multi_pass iterator, mmap approach increased parsing speed more than 5 times, reduced memory usage and CPU load.
Mmap approach is also a little bit faster than reading data to STL container and mmap sometimes requires less memory (depending on OS).
I think, that such solution should be at least mentioned in Spirit documentation.
Agreed!
May be mmap shall be wrapped in some class for tighter integration with Spirit and simpler usage:
[snip]
Thanks for posting. If you don't mind tweaking your text for publication, I'd gladly post this as an article in Spirit's site. This should be a good addition to the doc addendum page. Regards, -- Joel de Guzman http://www.boostpro.com http://boost-spirit.com

2012/2/23 Joel de Guzman <joel@boost-consulting.com>:
Thanks for posting. If you don't mind tweaking your text for publication, I'd gladly post this as an article in Spirit's site. This should be a good addition to the doc addendum page.
I don`t mind, I would enjoy it. Best regards, Antony Polukhin

Antony Polukhin wrote:
Hi,
Boost.Spirit documentation advices to use multi_pass iterators for parsing files (or reading data to a STL container and then passing the begin and end of the container to Spirit.Qi).
Much better solution would be to use a memory mapped file:
boost::interprocess::file_mapping fm(filename.c_str(), boost::interprocess::read_only);
boost::interprocess::mapped_region region(fm, boost::interprocess::read_only, 0, 0);
const char* begin = reinterpret_cast<const char*>(region.get_address());
const char* const end = b + region.get_size();
I switched from interprocess::file_mapping to iostreams::mapped_file_source because of the formers lack of support for wchar_t/unicode file names. It's also directly usable with boost::filesystem::path. Here's an encapsulation of iterator/range view of the mapped file: #include <boost/filesystem/path.hpp> #include <boost/iostreams/device/mapped_file.hpp> #include <boost/range/iterator_range.hpp> class const_mapped_file { public: typedef boost::filesystem::path Path; typedef const char* Iterator; typedef boost::iterator_range<Iterator> IteratorRange; explicit const_mapped_file(const Path& p) : m_src(p) {} Iterator begin() const { return m_src.data(); } Iterator end() const { return m_src.data() + m_src.size(); } IteratorRange range() const { return IteratorRange(begin(), end()); } private: boost::iostreams::mapped_file_source m_src; }; Jeff

On 2/24/12 12:29 AM, Jeff Flinn wrote:
Antony Polukhin wrote:
Hi,
Boost.Spirit documentation advices to use multi_pass iterators for parsing files (or reading data to a STL container and then passing the begin and end of the container to Spirit.Qi).
Much better solution would be to use a memory mapped file:
[snip]
I switched from interprocess::file_mapping to iostreams::mapped_file_source because of the formers lack of support for wchar_t/unicode file names. It's also directly usable with boost::filesystem::path. Here's an encapsulation of iterator/range view of the mapped file:
[snip] That would be a cool addition too. Thanks, Jeff. Regards, -- Joel de Guzman http://www.boostpro.com http://boost-spirit.com
participants (3)
-
Antony Polukhin
-
Jeff Flinn
-
Joel de Guzman