regular expression to extract numbers
data:image/s3,"s3://crabby-images/3ff37/3ff378763e7b310c0e8f1b0ef8659c605d266cec" alt=""
Hi,
I am new to using boost and am trying to learn basics so I can use regex
features in my code.
In my test program, I want a function to simply extract some numbers that
follow after a certain string pattern and then to read those numbers into a
vector.
This is an outline of what i would like to do:
// function definition
#ifndef findnumbers.h
#define findnumbers.h
#include <string>
#include <fstream>
#include <vector>
#include
data:image/s3,"s3://crabby-images/3ff37/3ff378763e7b310c0e8f1b0ef8659c605d266cec" alt=""
I am writing a very simple program that extracts numbers from a string. The
numbers are actually lottery numbers.
So far, my program connects to a certain url and downloads a file that
contains the latest lottery results. I have managed to reach the point
where the barest amount of relevant data is contained in a std::string.
The data is in the following format - though of course the date and numbers
vary:
26-Jan-2013,2,6,21,29,34,47,11,X,X
Note I am not interested - at this stage - in the last two numbers
represented by X,X. I am only interested in the first seven numbers
following the date.
So I figured that it should be easy to write a regular expression to match
this pattern:
boost::regex pattern("\d\d\d\d,\\>(\\d{1,2})\\<,");
I have written a function called getnumbers that is declared as void
getnumbers(std::string data); This function should eventually be passed the
string (In the format above) and it should extract the first seven numbers
after the integer year in my pattern. It is not completed yet as I am stuck
now. I have asked for help before and it was suggested I use a
sregex_token_iterator - so the code below is just an effort to experiment
with this that I found in the documentation.
--------- function definition ---------
#include <string>
#include
data:image/s3,"s3://crabby-images/7f5df/7f5df4a15e5a50e7e79aca9b353387cf8ec8990d" alt=""
On Tue, Jan 29, 2013 at 2:24 PM, Neil Sutton
I am writing a very simple program that extracts numbers from a string. The numbers are actually lottery numbers. So far, my program connects to a certain url and downloads a file that contains the latest lottery results. I have managed to reach the point where the barest amount of relevant data is contained in a std::string.
The data is in the following format - though of course the date and numbers vary:
26-Jan-2013,2,6,21,29,34,47,11,X,X
Note I am not interested - at this stage - in the last two numbers represented by X,X. I am only interested in the first seven numbers following the date.
So I figured that it should be easy to write a regular expression to match this pattern:
boost::regex pattern("\d\d\d\d,\\>(\\d{1,2})\\<,");
I do not know regex well enough to know whether or not a regex can provide the basis for the 'fastest' implementation (I know from some of my experiments, there can be an order of magnitude difference in performance between the fastest and slowest algorithms to do the same thing - subject to the caveat that they all satisfy the functional requirements correctly), but if the only consideration right now is to get it working, why not examine boost more thoroughly. It has a tokenizer already ( http://www.boost.org/doc/libs/1_52_0/libs/tokenizer/) that, once you know how to use it, may eliminate the need for you to roll your own. It also has a split function in the string algorithms library ( http://www.boost.org/doc/libs/1_52_0/doc/html/string_algo.html). In both cases, you'd just split your example string on the comma. The first element so extracted would be your date, and the rest would be your numbers. HTH Ted
data:image/s3,"s3://crabby-images/7f5df/7f5df4a15e5a50e7e79aca9b353387cf8ec8990d" alt=""
On Tue, Jan 29, 2013 at 2:24 PM, Neil Sutton
I am writing a very simple program that extracts numbers from a string. The numbers are actually lottery numbers. So far, my program connects to a certain url and downloads a file that contains the latest lottery results. I have managed to reach the point where the barest amount of relevant data is contained in a std::string.
I almost forgot, over a decade ago, I did write my own string splitter, similar to: void split(const string& str, const string& delimiters , vector<string>& tokens) { // Skip delimiters at beginning. string::size_type lastPos = str.find_first_not_of(delimiters, 0); // Find first "non-delimiter". string::size_type pos = str.find_first_of(delimiters, lastPos); while (string::npos != pos || string::npos != lastPos) { // Found a token, add it to the vector. tokens.push_back(str.substr(lastPos, pos - lastPos)); // Skip delimiters. Note the "not_of" lastPos = str.find_first_not_of(delimiters, pos); // Find next "non-delimiter" pos = str.find_first_of(delimiters, lastPos); } } If you search, you will find this simple string splitter algorithm has been posted by several different people, and it is unknown to me whether those who did so copied material others had posted or developed it themselves (the algorithm itself is so simple and obvious it would not surprise me if many who considered the problem developed it independently. Back when I served as an educator, it would be something I'd have assigned a second year programming class to implement as one of the course's exercises; as a help in understanding the resources of STL and how to apply them in a common problem. Cheers Ted
participants (3)
-
John Maddock
-
Neil Sutton
-
Ted Byers