[regex_spirit_xpressive] Are timings of search algo's available vs "by-hand"?

12 Apr 2006

      <alert comment="boost newbie">

I've been impressed by the functionality provided by the regex-related 
libraries in boost that I've looked at so far. However, before 
trekking to far-distant "grok-land" off in the mists, I wanted to get 
some idea if there were negligible or minor or major performance 
tradeoffs.

I've seen comparisons between regex, spirit, and xpressive (that were 
from several years ago and probably obsolete .... done by the library 
developers). I'm wondering how these would compare to a "hand-tuned" 
state-machine routine, and to an automatically generated state-machine 
from a FSM utility (tend to be bloated but can be fast).

My interest is specialized to finding which pattern in a "group" of 
patterns was detected, and the offset within the testStr. To 
illustrate, the regex would be something like detecting the full or 
abbreviated Day-Of-Week:

((Sunday|Sun)|(Monday|Mon)|(Tuesday|Tue)
         |(Wednesday|Wed)|(Thursday|Thu)
         |(Friday|Fri)|(Saturday|Sat))

The testStr is something like:
std::string testStr =
  "Alternate days of the week are Tue and Thursday and Sat and Monday. 
"
  "And then Monday and Wed and Friday and Sun. "
  "Near misses are WeD TuE ThU SuN SaT MoN FrI ";

The real application is inputting a batch of 2mb files and generating 
SGML-like output with embedded tags. (e.g. enclose Tue in 
<dow=2>Tue</dow> and <dow=4>Thursday</dow>)

The above seems like the kind of task for which regex libraries would 
be appropriate, would be beyond strstr, but wouldn't be excessively 
difficult to accomplish "by hand". The unknown is whether there is a 
perfomance trade-off in using a regex library, and whether it is 
positive, negative, minor, or major.

I've started some preliminary timings with vc7.1 release /O2 build 
with HiResTimer using QueryPerformanceTimer in 
ABOVE_NORMAL_PRIORITY_CLASS

Before proceeding much further, this newbie thought it would be good 
to check if people with real boost experience have done this kind of 
benchmarking. A search for "benchmark" and "timings" in boost-user and 
boost-devel didn't turn up much.

Preview: so far the preliminary results look VERY GOOD, but "consider 
the source".

</alert>

Lynn Allan

tags

participants (1)