[Date_Time]: efficient use, fuzzy formats and a valgrind error
I today made first (hands on) contact with the Boost.Date_Time library and it really is quite intuitive to use and I'd like to see if it fits our needs. I want to replace some rather heavily used C functions with something Boost.Date_Time based as the old code is rather ugly and unintellegible whereas the Boost lib is tested, well documented and so on. i) As I have mentioned before, one important aspect is efficient parsing of dates. Our old function is configurable through format strings. Is the following code (esp. the function parse_date) efficient? #define ID_INP_NORMDAT "%d.%m.%Y" static MDATUM date_to_mdatum(const date &d); static MDATUM parse_date(const char * const sDate, const char * const sFormat) { string s(sDate); // stringstream needs a std::string? stringstream ss(s); date_input_facet* facet(new date_input_facet(sFormat)); // expensive? ss.imbue(locale(ss.getloc(), facet)); date d; ss >> d; // yes, I need to do error checking here // I need to produce our internal date format again return date_to_mdatum(d); } What I wonder are: * Esp. (as I am new to facets and imbueing etc as well): is the construction of facets etc expensive? * I need to support some twenty input/output formats. Should I construct (and cache) facets in advance, or is it ok to construct these every time they are needed? * valgrind tells me I do not have a memory leak. Who owns facets? * Do I have to unimbue a stream (assuming I had imbued cout)? * Can I avoid constructing the std::string? It would be great if you could point out flaws and inefficiencies (or just best practices in this case) in the above code to me. I'd like to say in advance thanks for looking into this. ii) 'Fuzzy formats' I need to allow the user some degree of freedom (lazyness) when entering dates. It is required that (German dates) 01.10.2007 01102007 011007 (and more) all produce the same date (Oct 1st 2007). Is this possible with Boost.Date_Time's format strings? (I don't think so.) Is there a(n efficient) way to achieve that (ideally not trying the different possible formats one by one ;-) iii) When I ran the attached program through valgrind (with options --tool=memcheck --leak-check=full -v) I got the following error report: ==7762== Conditional jump or move depends on uninitialised value(s) ==7762== at 0xC9FAC99: strftime_l (in /lib64/libc-2.6.1.so) ==7762== by 0xC2C63DB: std::__timepunct<char>::_M_put(char*, unsigned long, char const*, tm const*) const (in /usr/lib64/libstdc++.so.6.0.9) ==7762== by 0xC2874DE: std::time_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::do_put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, tm const*, char, char) const (in /usr/lib64/libstdc++.so.6.0.9) ==7762== by 0xC285A7F: std::time_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, tm const*, char const*, char const*) const (in /usr/lib64/libstdc++.so.6.0.9) ==7762== by 0x40CFBD: std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > boost::date_time::gather_month_strings<char>(std::locale const&, bool) (strings_from_facet.hpp:57) ==7762== by 0x40E991: boost::date_time::format_date_parser<boost::gregorian::date, char>::format_date_parser(std::string const&, std::locale const&) (format_date_parser.hpp:175) ==7762== by 0x40F6DC: boost::date_time::date_input_facet<boost::gregorian::date, char, std::istreambuf_iterator<char, std::char_traits<char> >
::date_input_facet(std::string const&, unsigned long) (date_facet.hpp:474) ==7762== by 0x409539: parse_date(char const*, char const*) (datum_test.cpp:29) ==7762== by 0x40976C: main (datum_test.cpp:40) --7762-- REDIR: 0xC9E1C70 (index) redirected to 0x4C22A20 (index)
Just in case I am doing something wrong here. Am I? Thank you for producing a great library and best regards Christoph
Christoph wrote:
I today made first (hands on) contact with the Boost.Date_Time library and it really is quite intuitive to use and I'd like to see if it fits our needs.
I want to replace some rather heavily used C functions with something Boost.Date_Time based as the old code is rather ugly and unintellegible whereas the Boost lib is tested, well documented and so on.
i) As I have mentioned before, one important aspect is efficient parsing of dates. Our old function is configurable through format strings.
If you need extreme efficiency, then you will probably want to write your own parsing functions. The C++ i/o streaming system can't compete with hand written code...and efficiency/speed tests haven't really been studied in the Boost date-time implementation. That said, I only know of a few people that really needed to write their own parsing in the end b/c of speed...of course I'm sure there are some that haven't told me about it.
Is the following code (esp. the function parse_date) efficient?
#define ID_INP_NORMDAT "%d.%m.%Y" static MDATUM date_to_mdatum(const date &d); static MDATUM parse_date(const char * const sDate, const char * const sFormat) { string s(sDate); // stringstream needs a std::string? stringstream ss(s); date_input_facet* facet(new date_input_facet(sFormat)); // expensive? ss.imbue(locale(ss.getloc(), facet)); date d; ss >> d; // yes, I need to do error checking here // I need to produce our internal date format again return date_to_mdatum(d); }
What I wonder are: * Esp. (as I am new to facets and imbueing etc as well): is the construction of facets etc expensive?
Somewhat, but you shouldn't need to do this more than once. Once it is constructed and imbued it's sticky unless you destroy the stream.
* I need to support some twenty input/output formats. Should I construct (and cache) facets in advance, or is it ok to construct these every time they are needed?
You don't need to reconstruct, you can reset the format of the facet if you keep the pointer to it.
* valgrind tells me I do not have a memory leak. Who owns facets?
The stream takes ownership when you imbue.
* Do I have to unimbue a stream (assuming I had imbued cout)?
Not sure you can....
* Can I avoid constructing the std::string?
AFAIK you can't b/c stringstream only has std::string constructors.
It would be great if you could point out flaws and inefficiencies (or just best practices in this case) in the above code to me. I'd like to say in advance thanks for looking into this.
Main thing is I'd avoid reconstructing the stringstream and facet on each call to the function -- make it static, class member, whatever. IME stringstream construction is pretty expensive and there's no real reason to do it every time you call this function.
ii) 'Fuzzy formats' I need to allow the user some degree of freedom (lazyness) when entering dates. It is required that (German dates) 01.10.2007 01102007 011007 (and more) all produce the same date (Oct 1st 2007). Is this possible with Boost.Date_Time's format strings? (I don't think so.)
No, you'd have to try one at a time.
Is there a(n efficient) way to achieve that (ideally not trying the different possible formats one by one ;-)
You'd have to write your own parser I'm afraid.
iii) When I ran the attached program through valgrind (with options --tool=memcheck --leak-check=full -v) I got the following error report:
==7762== Conditional jump or move depends on uninitialised value(s) ==7762== at 0xC9FAC99: strftime_l (in /lib64/libc-2.6.1.so) ==7762== by 0xC2C63DB: std::__timepunct<char>::_M_put(char*, unsigned long, char const*, tm const*) const (in /usr/lib64/libstdc++.so.6.0.9) ==7762== by 0xC2874DE: std::time_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::do_put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, tm const*, char, char) const (in /usr/lib64/libstdc++.so.6.0.9) ==7762== by 0xC285A7F: std::time_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, tm const*, char const*, char const*) const (in /usr/lib64/libstdc++.so.6.0.9) ==7762== by 0x40CFBD: std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > boost::date_time::gather_month_strings<char>(std::locale const&, bool) (strings_from_facet.hpp:57) ==7762== by 0x40E991: boost::date_time::format_date_parser<boost::gregorian::date, char>::format_date_parser(std::string const&, std::locale const&) (format_date_parser.hpp:175) ==7762== by 0x40F6DC: boost::date_time::date_input_facet<boost::gregorian::date, char, std::istreambuf_iterator<char, std::char_traits<char> >
::date_input_facet(std::string const&, unsigned long) (date_facet.hpp:474) ==7762== by 0x409539: parse_date(char const*, char const*) (datum_test.cpp:29) ==7762== by 0x40976C: main (datum_test.cpp:40) --7762-- REDIR: 0xC9E1C70 (index) redirected to 0x4C22A20 (index)
Just in case I am doing something wrong here. Am I?
Looks like it's reporting an issue in the C library...I suspect it's ok although you might want to report it to the gcc library folks.
Thank you for producing a great library and best regards
My pleasure, thanks! Jeff
participants (2)
-
Christoph
-
Jeff Garland