Regular Expression Performance Comparison The following tables provide comparisons between the following regular expression libraries: [1]GRETA. [2]The Boost regex = library. [3]Henry Spencer's regular = expression library - this is provided for comparison as a typical non-backtracking = implementation. Philip Hazel's [4]PCRE = library. Details Machine: Intel Pentium 4 2.8GHz PC. Compiler: Microsoft Visual C++ version 7.1. C++ Standard Library: Dinkumware standard library version = 313. OS: Win32. Boost version: 1.33.0. PCRE version: 4.5. As ever care should be taken in interpreting the results, only sensible regular expressions (rather than pathological cases) are given, most = are taken from the Boost regex examples, or from the [5]Library of Regular Expressions. In addition, some variation in the = relative performance of these libraries can be expected on other = machines - as memory access and processor caching effects can be quite large for = most finite state machine algorithms. Averages The following are the average relative scores for all the = tests: the perfect regular expression library would score 1, in practice anything less than 2 is pretty good. Boost + C++ locale PCRE Dynamic Xpressive 1.71231 1.75638 4.97007 Comparison 1: Long Search For each of the following regular expressions the time taken to = find all occurrences of the expression within a long English language text was measured ([6]mtent12.txt<= /a> from [7]Project Gutenberg, = 19Mb). Expression Boost + C++ locale PCRE Dynamic Xpressive Twain 2.13 (0.12s) 1 (0.0563s) 8.54 (0.481s) Huck[[:alpha:]]+ 2.35 (0.119s) 1 (0.0507s) 10.5 (0.53s) [[:alpha:]]+ing 1 (1.06s) 5.71 (6.07s) 9.28 (9.85s) ^[^ ]*?Twain 1 (0.305s) 2.59 (0.791s) 9.62 (2.93s) Tom|Sawyer|Huckleberry|Finn 1 (0.138s) 1.44 (0.198s) 18.2 (2.5s) = (Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawy er|Huckleberry|Finn) 1 (0.285s) 2.32 (0.66s) 25 (7.12s) Comparison 2: Medium Sized Search For each of the following regular expressions the time taken to = find all occurrences of the expression within a medium sized English language text was measured (the first 50K from mtent12.txt). Expression Boost + C++ locale PCRE Dynamic Xpressive Twain 1.21 (0.000372s) 1 (0.000308s) 4.07 (0.00125s) Huck[[:alpha:]]+ 1.58 (0.000317s) 1 (0.0002s) 6.93 (0.00139s) [[:alpha:]]+ing 1 (0.00227s) 6.76 (0.0153s) 11.6 (0.0263s) ^[^ ]*?Twain 1 (0.000851s) 2.44 (0.00207s) 8.83 (0.00751s) Tom|Sawyer|Huckleberry|Finn 1 (0.000704s) 1.89 (0.00133s) 6.78 (0.00477s) = (Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawy er|Huckleberry|Finn) 1 (0.00149s) 1.71 (0.00254s) 12.4 (0.0185s) Comparison 3: C++ Code Search For each of the following regular expressions the time taken to = find all occurrences of the expression within the C++ source file [8]boost/crc.hpp was measured. Expression Boost + C++ locale PCRE Dynamic Xpressive = ^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?(class|struc t)[[:space:]]*(\<\w+\>([ ]*\([^)]*\))?[[:space:]]*)*(\<\w*\>)[[:space:]]*(<[^;:{]+>[[: space:]]*)?(\{|:[^;\{()]*\{) 1 (0.000235s) 2.79 (0.000655s) 2.5 (0.000587s) (^[ ]*#(?:[^\\\n]|\\[^\n_[:punct:][:alnum:]]*[\n[:punct:][:word:]])*)|(//[ ^\n ]*|/\*.*?\*/)|\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\.)?[[: digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\> |('(?:[^\\']|\\.)*'|"(?:[^\\"]|\\.)*")|\<(__asm|__cdecl|__declspec|__e xport|__far16|__fastcall|__fortran|__import|__pascal|__rtti|__stdcall| _as m|_cdecl|__except|_export|_far16|_fastcall|__finally|_fortran|_import| _pa scal|_stdcall|__thread|__try|asm|auto|bool|break|case|catch|cdecl|char |cl ass|const|const_cast|continue|default|delete|do|double|dynamic_cast|el se| enum|explicit|extern|false|float|for|friend|goto|if|inline|int|long|mu tab le|namespace|new|operator|pascal|private|protected|public|register|rei nte rpret_cast|return|short|signed|sizeof|static|static_cast|struct|switch |te mplate|this|throw|true|try|typedef|typeid|typename|union|unsigned|usin g|v= irtual|void|volatile|wchar_t|while)\> 1 (0.0097s) 2.32 (0.0225s) NA ^[ ]*#[ ]*include[ = ]+("[^"]+"|<[^>]+>) 1 (0.000367s) 1.55 (0.000567s) 2.45 (0.000899s) ^[ ]*#[ ]*include[ = ]+("boost/[^"]+"|]+>) 1 (0.000367s) 1.55 (0.000567s) 2.45 (0.000899s) Comparison 4: HTML Document Search For each of the following regular expressions the time taken to = find all occurrences of the expression within the html file [9]libs/libraries.htm was measured. Expression Boost + C++ locale PCRE Dynamic Xpressive beman|john|dave 1 (0.000519s) 1.3 (0.000675s) 2.87 (0.00149s)

.*?

2.13 (0.000406s) 1.87 (0.000357s) 1 (0.000191s) = ]+href=("[^"]*"|[^[:space:]]+)[^>]*><= /td> 2.12 (0.00141s) 1 (0.000665s) 1.71 (0.00113s) = ]*>.*? 2.37 (0.000435s) 1.76 (0.000323s) 1 (0.000183s) = ]+src=("[^"]*"|[^[:space:]]+)[^>]*>= 4.67 (0.000411s) 3.33 (0.000293s) 1 (8.8e-005s) = ]+face=("[^"]*"|[^[:space:]]+)[^>]*>.*?&l= t;/font> 4.19 (0.000425s) 2.89 (0.000293s) 1 (0.000101s) Comparison 3: Simple Matches For each of the following regular expressions the time taken to match against the text indicated was measured. Expression Text Boost + C++ locale PCRE Dynamic Xpressive abc abc 2.29 (4.2e-007s) 1 (1.84e-007s) 1.32 (2.43e-007s) ^([0-9]+)(\-| |$)(.*)$ 100- this is a line of ftp response which contains a = message string 2.04 (9.54e-007s) 1 (4.67e-007s) 1.43 (6.69e-007s) ([[:digit:]]{4}[- = ]){3}[[:digit:]]{3,4} 1234-5678-1234-456 2.5 (1.34e-006s) 1 (5.34e-007s) 5.08 (2.71e-006s) = ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)| (([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ john@johnmaddock.co.uk 1.59 (4.12e-006s) 1 (2.6e-006s) 1.76 (4.58e-006s) = ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)| (([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ foo12@foo.edu 1.65 (3.59e-006s) 1 (2.18e-006s) 1.72 (3.74e-006s) = ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)| (([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ bob.smith@foo.tv 1.68 (3.59e-006s) 1 (2.14e-006s) 1.79 (3.81e-006s) ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} = {0,1}[0-9][A-Za-z]{2}$ EH10 2QQ 1.78 (1.36e-006s) 1 (7.64e-007s) 1.52 (1.16e-006s) ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} = {0,1}[0-9][A-Za-z]{2}$ G1 1AA 1.77 (1.34e-006s) 1 (7.54e-007s) 1.54 (1.17e-006s) ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} = {0,1}[0-9][A-Za-z]{2}$ SW1 1ZZ 1.73 (1.32e-006s) 1 (7.64e-007s) 1.52 (1.16e-006s) = ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ 4/1/2001 1.85 (1.2e-006s) 1 (6.49e-007s) 2 (1.3e-006s) = ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ 12/12/2001 1.88 (1.2e-006s) 1 (6.4e-007s) 2.03 (1.3e-006s) ^[-+]?[[:digit:]]*\.?[[:digit:]]*$ 123 1.76 (1.26e-006s) 1 (7.16e-007s) 1.25 (8.97e-007s) ^[-+]?[[:digit:]]*\.?[[:digit:]]*$ +3.14159 1.68 (1.3e-006s) 1 (7.74e-007s) 1.68 (1.3e-006s) ^[-+]?[[:digit:]]*\.?[[:digit:]]*$ -3.14159 1.71 (1.34e-006s) 1 (7.83e-007s) 1.63 (1.28e-006s) _________________________________________________________________ � Copyright John Maddock 2005 Use, modification and distribution are subject to the Boost = Software License, Version 1.0. (See accompanying file [10]LICENSE_1_0.txt or copy at [11]http://www.boost.org/LICENS= E_1_0.txt) References 1. 3D"http://research.microsoft.com/projects/greta" 2. 3D"http://www.boost.org/" 3. 3D"http://arglist.com/regex/" 4. 3D"http://www.pcre.org"/ 5. 3D"http://www.regxlib.com/" 6. 3D"http://www.gutenberg.org/files/3200/old/mtent12.zip" 7. 3D"http://promo.net/pg/" 8. file://localhost/boost/crc.hpp" 9. file://localhost/tmp/libraries.htm" 10. file://localhost/LICENSE_1_0.txt" 11. 3D"http://www.boost.org/LICENSE_1_0.txt"