
Hey all, I've been working on something to reverse boost::format, using regex mainly. So say you're working with IRC, and to create the server line, you use a boost::format of: SERVER %1% %2% :%3% The expanded string might be: SERVER test.neuromancy.net 1 :My test IRC server To un-do it you would need a regex of: SERVER ([-.[:alnum:]]+) ([[:digit:]]+) :(.*) My unformatter (right now, part of my Mantra project) allows you to specify the format string, and the expanded version of that string, and populate a map with what each formatting token is (assuming you have your formatting tokens numbered, it doesn't work if you don't). It does handle formatting args (ie. %1$3.2f% or %1|-20s|%), however it does not validate them, just handles them in the regex. It also allows you to specify your own regex's to be used to read certain arguments, so for example you could do: mantra::unformat fmt; fmt.ElementRegex(1, "[-.[:alnum:]]+"); fmt.ElementRegex(2, "[[:digit:]]+"); A 'default' unformatting regex may also be specified (it defaults to ".*") (which can be specified as a constructor argument, or using DefaultRegex). I've also made a convenient way to do all of the above more or less inline, inspired by boost::format, with: (mantra::unformat(".*") % "[-.[:alnum:]]+" % "[[:digit:]]+") My reason for posting this is, I wanted someone else to tell me if there is a better way than the way I am using to do all of this, or is there a better regex than what I'm using to pull out the formatting strings. You can see the source at (search for basic_unformat): http://www.neuromancy.net/viewcvs/Mantra-I/include/mantra/core/algorithms.h?root=mantra&rev=1.12&view=auto You'll notice I use convert_string<C> a lot, this just converts between character types (because basic_unformat, like basic_format or basic_string can be passed an arbitary character type (though it usually gets char or wchar_t), I need to be able to convert my const char * regex strings into that character type so it will still work with wchar_t). The way it basically works now is in 5 steps. 1) It searches the format string for all the 'extended' %N% tags and goes through and remembers the order they are in. 2) It replaces the 'extended' %N% tags with --<<N>>-- tags (the "--<<" and ">>--" parts can be defined by the user if it conflicts with something in the input). 3) It replaces all regex-specific characters in the string so that they will not be evaluated when the input string is later used as a regex. 4) It replaces all the --<<N>>-- tags with either the user-defined element regex, or the 'default' regex. 5) It evaluates the new format string (now a regex) against the 'expanded' string, and pulls out each element (using the order it remembered from before). This step also checks to ensure that every instance of the same element are the same (ie. %1% should be the same every time it is used). This is a little involved, but using regex, its still quite quick. As always, comments, corrections, suggestions, etc. are appreciated, and in this case, solicited :) If you want to yoink the code for this for your own purposes, go ahead. PreZ :)
participants (1)
-
Preston A. Elder