[regex] Mitigating mischief and malice
Say you wanted to give web users a boost::regex interface to a set of data, knowing that some will try to use it for mischief and malice. I'm vaguely aware that one can write a regex to consume lots of CPU (denial-of-service attack), but also lots of stack and/or memory. What are the risks and how would you address them? Would you filter out certain classes of regular expressions? Tune it via BOOST_REGEX_NON_RECURSIVE and/or other parameters? Would you forbid it altogether? Thanks in Advance, -Jim
On 2/28/2011 6:28 AM, Jim Bell wrote:
Say you wanted to give web users a boost::regex interface to a set of data, knowing that some will try to use it for mischief and malice. I'm vaguely aware that one can write a regex to consume lots of CPU (denial-of-service attack), but also lots of stack and/or memory.
What are the risks and how would you address them?
Would you filter out certain classes of regular expressions?
Tune it via BOOST_REGEX_NON_RECURSIVE and/or other parameters?
Would you forbid it altogether?
John can correct me if I'm wrong, but I believe boost.regex throws an exception if too many states are visited during pattern matching. That keeps it from spinning off into infinity. I don't know if this is tunable. Xpressive has no such feature. It has a recursive implementation and -- on MSVC -- fixes up the stack on overflow and throws an exception. On other platform, yeah, DoS. :-( -- Eric Niebler BoostPro Computing http://www.boostpro.com
Say you wanted to give web users a boost::regex interface to a set of data, knowing that some will try to use it for mischief and malice. I'm vaguely aware that one can write a regex to consume lots of CPU (denial-of-service attack), but also lots of stack and/or memory.
Boost.Regex has two protections against that: * When BOOST_REGEX_NON_RECURSIVE is defined (the default for all current compilers) then memory usage is strictly limited. This can be configured in boost/regex/user.hpp since the maximum amount of memory used is BOOST_REGEX_MAX_BLOCKS*BOOST_REGEX_BLOCKSIZE, which defaults to 4Mb in total. * The total number of machine states visited (and hence CPU time consumed) is controlled by perl_matcher::estimate_max_state_count, the macro BOOST_REGEX_MAX_STATE_COUNT sets an upper limit on the number of states visited. HTH, John.
On 1:59 PM, John Maddock wrote:
Say you wanted to give web users a boost::regex interface to a set of data, knowing that some will try to use it for mischief and malice. I'm vaguely aware that one can write a regex to consume lots of CPU (denial-of-service attack), but also lots of stack and/or memory.
Boost.Regex has two protections against that:
* When BOOST_REGEX_NON_RECURSIVE is defined (the default for all current compilers) then memory usage is strictly limited. This can be configured in boost/regex/user.hpp since the maximum amount of memory used is BOOST_REGEX_MAX_BLOCKS*BOOST_REGEX_BLOCKSIZE, which defaults to 4Mb in total. * The total number of machine states visited (and hence CPU time consumed) is controlled by perl_matcher::estimate_max_state_count, the macro BOOST_REGEX_MAX_STATE_COUNT sets an upper limit on the number of states visited.
Thanks, John and Eric. So if one deliberately sets the values BOOST_REGEX_MAX_BLOCKS, BOOST_REGEX_BLOCKSIZE, and BOOST_REGEX_MAX_STATE_COUNT, and catches the exceptions thrown, it ought to be ok? And, by the way, the exceptions thrown would be std::bad_alloc, std::runtime_error, or boost::regex_error (from the regex FAQ). Does that cover them? (I know a catch (...) wouldn't hurt...) I don't want to be the guy who brings CERT around.
Say you wanted to give web users a boost::regex interface to a set of data, knowing that some will try to use it for mischief and malice. I'm vaguely aware that one can write a regex to consume lots of CPU (denial-of-service attack), but also lots of stack and/or memory.
Boost.Regex has two protections against that:
* When BOOST_REGEX_NON_RECURSIVE is defined (the default for all current compilers) then memory usage is strictly limited. This can be configured in boost/regex/user.hpp since the maximum amount of memory used is BOOST_REGEX_MAX_BLOCKS*BOOST_REGEX_BLOCKSIZE, which defaults to 4Mb in total. * The total number of machine states visited (and hence CPU time consumed) is controlled by perl_matcher::estimate_max_state_count, the macro BOOST_REGEX_MAX_STATE_COUNT sets an upper limit on the number of states visited.
Thanks, John and Eric.
So if one deliberately sets the values BOOST_REGEX_MAX_BLOCKS, BOOST_REGEX_BLOCKSIZE, and BOOST_REGEX_MAX_STATE_COUNT, and catches the exceptions thrown, it ought to be ok?
Yep, but note that those macros have sensible defaults already.
And, by the way, the exceptions thrown would be std::bad_alloc, std::runtime_error, or boost::regex_error (from the regex FAQ). Does that cover them? (I know a catch (...) wouldn't hurt...)
I believe so yes, ultimately anything that's ever thrown will inherit from std::exception anyway. John.
participants (3)
-
Eric Niebler
-
Jim Bell
-
John Maddock