regular expression to extract numbers
Hi, I am new to using boost and am trying to learn basics so I can use regex features in my code. In my test program, I want a function to simply extract some numbers that follow after a certain string pattern and then to read those numbers into a vector. This is an outline of what i would like to do: // function definition #ifndef findnumbers.h #define findnumbers.h #include <string> #include <fstream> #include <vector> #include <boost/regex.hpp> #endif void findnumbers (ifstream& afile) { // assume file is opened in main - i have passed a reference boost::regex expression ( ....having trouble with this); // search for matches - there will be a fixed number of numbers e.g 10 std::string const_iterator start, end; start = afile.begin( ); end = afile.end( ); std::vector<int> results; boost::match_results<std::vector<int>::const_iterator> what; for(int count = 0; count !=10; count++) { boost::regex_search(start,end,what,expression) { //if a number is found place in the vector results.push_back(count); } So the first problem I have is defining a correct expression. The pattern has the format "sometext>NUMBERS<sometext" newline. The NUMBERS may be one or two digits combined and the pattern is repeated 10 times. I have searched the archives and tried to build something based on the examples I have read, but cannot seem to find a working solution. Can anyone help please in defining an expression? I do not need a solution to my function overall as I should be able to write it correctly once I know how to define the pattern. Kind regards
So the first problem I have is defining a correct expression. The pattern has the format "sometext>NUMBERS<sometext" newline. The NUMBERS may be one or two digits combined and the pattern is repeated 10 times. I have searched the archives and tried to build something based on the examples I have read, but cannot seem to find a working solution. Can anyone help please in defining an expression?
What's wrong with just: "sometext\\>(\\d{1,2})\\<sometext" then use a regex_iterator to iterate over all occurances, extracting $1 from each match. Even better use regex_token_iterator to spit out each number directly - take a look at the last example on the bottom of this page: http://www.boost.org/doc/libs/1_52_0/libs/regex/doc/html/boost_regex/ref/reg... HTH, John.
I am writing a very simple program that extracts numbers from a string. The numbers are actually lottery numbers. So far, my program connects to a certain url and downloads a file that contains the latest lottery results. I have managed to reach the point where the barest amount of relevant data is contained in a std::string. The data is in the following format - though of course the date and numbers vary: 26-Jan-2013,2,6,21,29,34,47,11,X,X Note I am not interested - at this stage - in the last two numbers represented by X,X. I am only interested in the first seven numbers following the date. So I figured that it should be easy to write a regular expression to match this pattern: boost::regex pattern("\d\d\d\d,\\>(\\d{1,2})\\<,"); I have written a function called getnumbers that is declared as void getnumbers(std::string data); This function should eventually be passed the string (In the format above) and it should extract the first seven numbers after the integer year in my pattern. It is not completed yet as I am stuck now. I have asked for help before and it was suggested I use a sregex_token_iterator - so the code below is just an effort to experiment with this that I found in the documentation. --------- function definition --------- #include <string> #include <iostream #include <boost/regex.hpp> using namespace std; void getnumbers(string data) { boost::regex pattern("\d\d\d\d,\\>(\\d{1,2})\\<,"); boost::sregex_token_iterator i(data.begin(), data.end(), pattern, -1); // not sure what -1 does? boost::sregex_token_iterator j; unsigned count = 0; while(i != j) { cout << *i++ << endl; count++; } cout << "There were " << count << " tokens found." << endl; return; } If passed the data string in the format above, count should be 9 - representing 9 integers following the date. How can I even get this to compile? I get 4 warnings about the pattern - 'unknown eascape sequence '\d'. and finally an error that says the linker failed with exit code 1. Also, what are the correct header guards to include in this function source file? Regards
On Tue, Jan 29, 2013 at 2:24 PM, Neil Sutton <neilmsutton@gmail.com> wrote:
I am writing a very simple program that extracts numbers from a string. The numbers are actually lottery numbers. So far, my program connects to a certain url and downloads a file that contains the latest lottery results. I have managed to reach the point where the barest amount of relevant data is contained in a std::string.
The data is in the following format - though of course the date and numbers vary:
26-Jan-2013,2,6,21,29,34,47,11,X,X
Note I am not interested - at this stage - in the last two numbers represented by X,X. I am only interested in the first seven numbers following the date.
So I figured that it should be easy to write a regular expression to match this pattern:
boost::regex pattern("\d\d\d\d,\\>(\\d{1,2})\\<,");
I do not know regex well enough to know whether or not a regex can provide the basis for the 'fastest' implementation (I know from some of my experiments, there can be an order of magnitude difference in performance between the fastest and slowest algorithms to do the same thing - subject to the caveat that they all satisfy the functional requirements correctly), but if the only consideration right now is to get it working, why not examine boost more thoroughly. It has a tokenizer already ( http://www.boost.org/doc/libs/1_52_0/libs/tokenizer/) that, once you know how to use it, may eliminate the need for you to roll your own. It also has a split function in the string algorithms library ( http://www.boost.org/doc/libs/1_52_0/doc/html/string_algo.html). In both cases, you'd just split your example string on the comma. The first element so extracted would be your date, and the rest would be your numbers. HTH Ted
On Tue, Jan 29, 2013 at 2:24 PM, Neil Sutton <neilmsutton@gmail.com> wrote:
I am writing a very simple program that extracts numbers from a string. The numbers are actually lottery numbers. So far, my program connects to a certain url and downloads a file that contains the latest lottery results. I have managed to reach the point where the barest amount of relevant data is contained in a std::string.
I almost forgot, over a decade ago, I did write my own string splitter, similar to: void split(const string& str, const string& delimiters , vector<string>& tokens) { // Skip delimiters at beginning. string::size_type lastPos = str.find_first_not_of(delimiters, 0); // Find first "non-delimiter". string::size_type pos = str.find_first_of(delimiters, lastPos); while (string::npos != pos || string::npos != lastPos) { // Found a token, add it to the vector. tokens.push_back(str.substr(lastPos, pos - lastPos)); // Skip delimiters. Note the "not_of" lastPos = str.find_first_not_of(delimiters, pos); // Find next "non-delimiter" pos = str.find_first_of(delimiters, lastPos); } } If you search, you will find this simple string splitter algorithm has been posted by several different people, and it is unknown to me whether those who did so copied material others had posted or developed it themselves (the algorithm itself is so simple and obvious it would not surprise me if many who considered the problem developed it independently. Back when I served as an educator, it would be something I'd have assigned a second year programming class to implement as one of the course's exercises; as a help in understanding the resources of STL and how to apply them in a common problem. Cheers Ted
participants (3)
-
John Maddock
-
Neil Sutton
-
Ted Byers