[date_time] extract date from string using a list of patterns
I'm looking to take a string and convert it to a date. The only problem is the string can be one of many patterns. i.e. ("%Y%m%d", "%Y-%m-%d", "%d/%m/%Y", etc...). It is also possible that the given string will fail all pattern matches, and thus return false. Can anyone give me an example of how to do this? I know you need to use the date_input_facet, but I am VERY new to boost and can't get it to work. Thanks! -Jason
Jason Dolan wrote:
I'm looking to take a string and convert it to a date. The only problem is the string can be one of many patterns. i.e. ("%Y%m%d", "%Y-%m-%d", "%d/%m/%Y", etc...). It is also possible that the given string will fail all pattern matches, and thus return false.
And there's nothing to indicate which pattern it might be?
Can anyone give me an example of how to do this?
The date input facet can only parse one pattern at a time. So you might need some way to distinguish between the various types to use the input facet. Here's the pseudo code sketch for the examples above: if (date_string.find('-')) { date_input_facet di1(...); ... } else if (date_string.find('/')) { date_input_facet di2(...); ... } The problem you are going to have is if you have formats that can't be distinguished in this fashion. For example: %Y%m%d %d%m%Y There's no parser that can handle this case.
I know you need to use the date_input_facet, but I am VERY new to boost and can't get it to work.
Did you look at the docs? It basically comes down to: using namespace boost::gregorian; date_input_facet* input_facet = new date_input_facet("%Y-%m-%d); std::istringstream iss("2006-06-01"); iss.imbue(std::locale(std::locale::classic(), input_facet)); date t; in >> t; Note, this won't work at all on old compilers...so best to tell me your compiler / version of boost if you are having compile issues. Jeff
Jason Dolan wrote:
I'm looking to take a string and convert it to a date. The only problem is the string can be one of many patterns. i.e. ("%Y%m%d", "%Y-%m-%d", "%d/%m/%Y", etc...). It is also possible that the given string will fail all pattern matches, and thus return false.
And there's nothing to indicate which pattern it might be? Nope. I'm basically allowing the user to input a date and time *almost* anyway they want. What I want to do is test that string against my list of patterns(i.e. known ways to write a date and time) to try and parse
Jeff Garland wrote: the date. What I'm doing right now is: bool SetDate(string &strDate) { m_vecFormats[0] = "%Y%m%d"; m_vecFormats[1] = "%Y-%m-%d"; m_vecFormats[2] = "%d/%m/%Y"; m_vecFormats[3] = "%d/%m/%Y %H:M%"; ... ... ... for(int iter=0; iter < m_vecFormats.size() && bValidDate == false; iter++) { cerr << "Trying format: " << m_vecFormats[iter]; if(SetDate(strDate, m_vecFormats[iter])) { cerr << "\tWORKED!!" << endl; bValidDate = true; } else cerr << "\tFAILED!!" << endl; } } bool SetDate(string &strDate, string &strFormat) { bool bValidDate = false; time_input_facet *f = new time_input_facet(); f->format(strFormat.c_str()); ptime d(not_a_date_time); stringstream ss; ss.imbue(locale(ss.getloc(), f)); ss << strDate; ss >> d; if(!d.is_not_a_date_time()) { bValidDate = true; } return bValidDate; } But I'm not sure if this is the right way to go about it. Further, what happens if they just put in a time (it would make sense to assume it is the current date), Can this handle a two digit year? I wouldn't think so... Besides that, each time the second SetDate function is called (which will be once for each format for the worst case), I have to create a time_input_facet object, a stringstream object and a pdate object. It would be nice to have a function like this use less resources since it's called so much.
Jason Dolan wrote:
Jason Dolan wrote:
I'm looking to take a string and convert it to a date. The only problem is the string can be one of many patterns. i.e. ("%Y%m%d", "%Y-%m-%d", "%d/%m/%Y", etc...). It is also possible that the given string will fail all pattern matches, and thus return false. And there's nothing to indicate which pattern it might be? Nope. I'm basically allowing the user to input a date and time *almost* anyway they want. What I want to do is test that string against my list of patterns(i.e. known ways to write a date and time) to try and parse
Jeff Garland wrote: the date.
Well, again some of them are going to be ambiguous. And I'll just say, this is a big (read not doable) undertaking. Are all these valid? They certainly are to someone somewhere: june 5 2006 june 5, 2006 june 5, 06 jun 5 //just assume the current year junio 5, 2006 //spanish 5-jun-2006 5-june-2006 05-June-2006 05 JUNE 2006 7/5/6 7/6/5 5/6/7 07/05/06 05-07-06 05.07.06 050706 Now unless there is some other context you can bring to bear, like user validation, format preferences, or "should be around the current date" it's going to be impossible to get right.
What I'm doing right now is:
... snip detail...
But I'm not sure if this is the right way to go about it. Further, what happens if they just put in a time (it would make sense to assume it is the current date), Can this handle a two digit year? I wouldn't think so...
%y is a 2 digit year. But again, you will run into serious problems with ambiguity. What's this date? 05-06-07 As for the time, that just adds another level of complication. Seems like you just want to ignore them not actually parse them.
Besides that, each time the second SetDate function is called (which will be once for each format for the worst case), I have to create a time_input_facet object, a stringstream object and a pdate object. It would be nice to have a function like this use less resources since it's called so much.
You could refactor your code so you don't reallocate the stringstream and facet (just reset the strings and formats). But if you really need a high level of efficiency, then you're going to just have to bite the bullet and write a custom parser. The iostreams solution will always be inherently less efficient than custom solutions. Jeff
Jeff Garland wrote:
Jason Dolan wrote:
Jason Dolan wrote:
I'm looking to take a string and convert it to a date. The only problem is the string can be one of many patterns. i.e. ("%Y%m%d", "%Y-%m-%d", "%d/%m/%Y", etc...). It is also possible that the given string will fail all pattern matches, and thus return false. And there's nothing to indicate which pattern it might be? Nope. I'm basically allowing the user to input a date and time *almost* anyway they want. What I want to do is test that string against my list of patterns(i.e. known ways to write a date and time) to try and parse
Jeff Garland wrote: the date.
Well, again some of them are going to be ambiguous. And I'll just say, this is a big (read not doable) undertaking. Are all these valid? They certainly are to someone somewhere:
june 5 2006 june 5, 2006 june 5, 06 jun 5 //just assume the current year junio 5, 2006 //spanish 5-jun-2006 5-june-2006 05-June-2006 05 JUNE 2006 7/5/6 7/6/5 5/6/7 07/05/06 05-07-06 05.07.06 050706 Yikes! Maybe I'll be a little more restrictive... like force 4 digit years, etc...
...snip...
You could refactor your code so you don't reallocate the stringstream and facet (just reset the strings and formats). But if you really need a high level of efficiency, then you're going to just have to bite the bullet and write a custom parser. The iostreams solution will always be inherently less efficient than custom solutions.
Thats why I was looking into it this way, I was hoping not to have to create my own parser. I'm actually kinda of surprised that there isn't a open source natural language date string parser out there already.
Jeff
On 6/26/06, Jason Dolan
I'm looking to take a string and convert it to a date. The only problem is the string can be one of many patterns. i.e. ("%Y%m%d", "%Y-%m-%d", "%d/%m/%Y", etc...). It is also possible that the given string will fail all pattern matches, and thus return false.
What I did was to manually parse the date, and accept the input only if exactly one of the possible parsings produced a valid date. There's probably an easier way out there, but my way works for me. As Jeff points out, the "%Y%m%d" vs "%d/%m/%Y" detection will be ... interesting, to say the least. Dale
participants (3)
-
Dale McCoy
-
Jason Dolan
-
Jeff Garland