
Eric Niebler wrote:
Aries Tao wrote:
hi everybody,I use boost.xpressive to search email address in a binary file which size is 10*1024*1024 bytes. every bytes is 0x6f in that file.boost.xpressive is inefficient. anyone can help me? thanks! the code is below: <snip>
I've done some investigation, and I've discovered a couple of things...
Correct file attached now... -- Eric Niebler Boost Consulting www.boost-consulting.com /////////////////////////////////////////////////////////////////////////////// // main.hpp // // Copyright 2007 Eric Niebler. Distributed under the Boost // Software License, Version 1.0. (See accompanying file // LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) #include <cstring> #include <iostream> #include <boost/regex.hpp> int main() { std::size_t const Mb = 1048576; // 1Mb char *begin = new char[Mb]; char *end = begin + Mb; std::memset(begin, 0x6f, 1048576); char const *pattern = "([a-z#~_\\.!\\#$%\\^&\\*\\(\\)\\-]+@[a-z#_\\-]+\\.[a-z#_\\-\\.]+)"; try { using namespace boost; regex token(pattern); // fast, doesn't throw: cregex_iterator cur(begin, end, token); // slow, throws on memory exhaustion: regex_search(begin, end, token); } catch(std::exception const &e) { std::cout << "boost.regex error: " << e.what() << std::endl; } return 0; }