
"consider trying to store the lifespans of people in a genealogy program: recording a few thousand people living over a few hundred years would approach the worse-case space complexity and need megabytes of storage" This was part of my motivation to do project 1 above, because these few
Hi List, Hartmut, Joachim, This is my review of the Interval Template Library (ITL). YES, I vote +1, this library should be accepted into Boost. It is very very useful. During my review my enthousiasm was only growing. Even if there was a thing confusing on the operators (for me). Yes, I can use this library in real life and it will solve problems for me. 1) WHAT I DID For this review I've done three projects of which two are related to my day-to-day or hobby programming efforts. a) The first project is related to genealogy. I took up the idea of Phil and generated an interval map of livetimes of all persons in my genealogical database. This database contains ~3000 persons. Many of the persons are still alive. ITL contains a right open interval but that denotes the right-hand-side excluding, while I need an interval that is unlimited at one side. Of course I could workaround using 9999 as year of death. An additional problem is that for many people death date is unknown. I don't expect ITL to handle fuzzy intervals so I ignore this. Everything went smoothly. There was not any delay noticable (see below). I started with a provided example but changed some things, such as using a const_iterator and const references, not yet using the documentation, and everything is standard and works as expected. The function I drafted is below. b) The second project is related to planning and "leave requests". I've implemented this some years ago and remember that it got quite complex, with weekends, national holiday days and people working sometimes 40 hours a week, but sometimes not on Friday afternoon etc. So I tried if Boost.Interval would have simplified my live there. And yes, it would have done for sure! Starting with the manpower example I made adaptions, actually only simplifying it, plus subtracting days-of-per-week, until I had what I wanted. The whole routine is in one screen now and definitely logic and simple. The drafted functions are below. c) I did some small other studies as well, especially on the map operators, and to draft an alternative "intro for dummies"... Of this, pieces are here in this review. Let me repeat that I found this library VERY USEFUL and that I'm surprised that there are not many reviews yet. 2) DOCUMENTATION The documentation is extensive and good. I've some small remarks about the introduction. When I started I was confused by the definitions of interval_set and interval_map, and when to use them. It might be explained a little better for people not having worked with intervals before, probably with some sample code and images. The documentation contains, at the start: "It means that we can use an interval_set or interval_map like a set or map of elements. It exposes the same functions." and then gives the example " mySet.contains(42)". But contains is NOT an method of std::set, but YES from interval_set. This was confusing. I then expected a sample for the map below but not there... Furthermore, that threelined set-sample means that the *interval* 42-42 is inserted, which was also not clear to me at first sight... I would change that section. I would suggest something like this: void intro_sets_for_dummies() { namespace itl = boost::itl; // STANDARD std::set works with values std::set<int> std_set; std_set.insert(3); std_set.insert(7); BOOST_FOREACH(int const& value, std_set) { std::cout << " " << value; } //std::cout << " " << (std_set.contains(3) ? "YES" : "NO"); //not implemented in std::set std::cout << std::endl; // BOOST.ITL itl::interval_set works with intervals instead of values itl::interval_set<int> itl_set; // Interval's of integer, e.g. seconds or years itl_set.insert(itl::interval<int>(3,5)); itl_set.insert(itl::interval<int>(4,8)); // Overlap with previous is detected automatically itl_set.insert(1); // Interval of (1,1) BOOST_FOREACH(itl::interval<int> const& interval, itl_set) { std::cout << " " << interval.first() << ".." << interval.last(); } std::cout << " " << (itl_set.contains(3) ? "YES" : "NO") << std::endl; // 1..1 3..8 YES } The same for map. The itl::interval_map is REALLY very useful. My suggestion: void intro_maps_for_dummies() { namespace itl = boost::itl; // STANDARD std::map std::map<int, int> std_map; std_map[3] = 4; std_map[7] = 8; for(std::map<int,int>::const_iterator it = std_map.begin(); it != std_map.end(); ++it) { std::cout << " " << it->first << ": " << it->second; } std::cout << std::endl; // BOOST.ITL itl::interval_map itl::interval_map<int, int> itl_map; itl_map += std::make_pair(itl::interval<int>(3,5), 4); // add 4 to [3,5] itl_map += std::make_pair(itl::interval<int>(4,8), 8); // add 8 to [4,8] -> will generate 3 intervals itl_map -= std::make_pair(itl::interval<int>(5,5), 2); // subtract 2 -> will split another interval for(itl::interval_map<int, int>::const_iterator it = itl_map.begin(); it != itl_map.end(); ++it) { itl::interval<int> const& interval = it->first; std::cout << " " << interval.first() << ".." << interval.last() << ": " << it->second; } std::cout << std::endl; // OUTPUT: 3..3: 4 4..4: 12 5..5: 10 6..8: 8 } This would give me a clearer picture how it works. At the same time I now realize that during our discussion on +=, -=, |= , I didn't realize enough that += really does mathematics behind the screens. So it adds values where to the map on the specified interval, it subtracts values from the map on the specified interval. This goes beyond "set theoretic operations", "set union" and especially "set difference". See below for more on this. I realize that a similar sample is there in the documentation, and that this principle, "aggregate on overlap" is well explained there. However, a sample with the numeric values is (for me) somewhat more clear than the given sample with names which are added to a set; the sample of interval_map with itl::set probably does too much at a time... But let me repeat that the overall documentation is impressive and complete! 3) PERFORMANCE Phil claimed that: thousand people over a few hundred years is exactly what I have. Boost.Itl calculates and reports the livespans in 0.031 seconds. So, measured in time, no problem at all. I compared it with another implementation which run in 0.032 seconds, so the same. I will not say that it cannot be enhanced somehow or whatever, I didn't study it in depth, I would only say: the library is well performing, also if the input set grows larger. It should not be rejected for the assumption that it could have been built otherwise. Besides that, Boost.Itl does more than normal intervals: the interval_map associates values to intervals. 4) SOME NOTES - the samples do not use it, but sets and maps are working with Boost.Range - sets are also working with BOOST_FOREACH - in the code samples, "using" is used in many places (or everywhere). While many Boost libraries do this, I would say that this is unclear for users visiting for the first time. What is that "date"? Is it a gregorian date, or an ITL date? So I would say something like namespace gr = boost::gregorian; namespace itl = boost::itl; and use this like "gr::date" and "itl::interval<...>" everywhere, such that the origin of all those terms is immediately clear. As said, this is not criticism on Boost.ITL, it is a hint for enhancing Boost docs for first time users. I'm often struggling with the origin of terms especially if two libraries are used together. I realize we've done this ourselves in Geometry as well and I think changing it would make it more clear. 5) OPERATORS and FUNCTIONS During the review period I had a discussion on the list with Joachim about the operators and the naming of functions. However, I thought that the += and |= were both doing a pure union, but it appears that they are doing a union PLUS a (sometimes mathematic) addition. So my current opinion is: For sets (o.a. interval_set), the following operators are appropriate, as also used in Boost.dynamic_bitset: &= for intersection |= for union ^= for symmetric difference -= for difference So this is how it is implemented but I would remove += as a synonym for |= For interval_map, it is currently much less clear to me now. The operators do two things at the same time, which is confusing me. Let's see the following code pieces: 1| itl::interval_map<int, int> map1, map2; 2| map1 += std::make_pair(itl::interval<int>(3,5), 4); 3| map2 += std::make_pair(itl::interval<int>(4,6), 5); 4| map1 |= map2; This gives: 3..3: 4 4..5: 9 6..6: 5. So it is a set-theoretic union, PLUS mathematical addition. Aggregation on overlap. It IS very useful, but it does two things at the same time, and should be implemented and described carefully. Changing the last line to intersection: 4| map1 &= map2 gives 4..5: 9, so it is a set-theoretic intersection on intervals, PLUS mathematical addition. I still agree. But now: 4| map1 -= map2 gives 3..3: 4 4..5: -1 This is actually unexpected. It is not set-difference, because that would leave only 3..3 in the set and remove 4..5 completely. So what it actually does is subtracting 5 from all there was before in map1, ONLY mathematical operation (minus), no set-theory here... The documentation describes it like this: "Subtraction on Maps implements a map difference function similar to set difference", and it forwards to a link with more info about this ("While addition and subtraction on Sets are implemented as set union and set difference,"), I still don't agree, it is not set difference here. To be complete: 4| map1 ^= map2 gives 3..3: 4 6..6: 5. So yes, this IS the set-symmetric difference and (by nature of that) does or does not mathematical addtion, that is not important here. Therefore I find the interval_map operators still confusing, even more confusing than before. Are they doing mathematics or set-theory? So I would suggest something like this: 4| map1 |= something(map2, std::plus); to make it more explicit. This is the set-theoretic union where on all overlaps the two sets are added, and 4| map1 |= something(map2, std::minus); where the two sets are unioned, and the values of map2 are subtracted from map1. This is slightly different from the current -= implementation. 4| map1 |= something(map2, std::multiplies); item, but values are multiplied And then the same for &= (intersection), -= (set theoretic difference). Then ^= probably need no policy because it always handled the intervals which are not common, an operator is not applicable here. There are combine functors described (I saw later), but they are used different (I think) than this suggestion. This would make it less confusing AND probably would give it more functionality, because the *= is currently not implemented (and might be reserved for scaling). This was map + map. The same applies for the operator-= on map, interval, so: 1| itl::interval_map<int, int> map; 2| map += std::make_pair(itl::interval<int>(3,5), 4); 3| map -= std::make_pair(itl::interval<int>(4,6), 3); result in 3..3: 4 4..5: 1 This could actually be stated clearer by using the std::map approach, on intervals, I tried it intuitively but it does not compile: map[itl::interval<int>(3,5)]+= 4; map[itl::interval<int>(4,6)]-= 3; I would not have any problems with this, and I would expect still doing aggregate on overlap behind the scenes, and not set-theory, resulting in: 3..3: 4 4..5: 1 6..6: -3 6) STANDARD QUESTIONS
- What is your evaluation of the design?
The library is well designed. The comments above were only on one detail.
- What is your evaluation of the implementation?
I had only a glance to the implementation. It is clear, nice short functions everywhere, and well documented. It is build on top of the std::set and std::map, which is reported to be problematic, but I don't think it is. My test runs very fast.
- What is your evaluation of the documentation?
Impressive and quite complete. The introduction might add a section or tutorial for real-starters
- What is your evaluation of the potential usefulness of the library?
Very useful. I mentioned two projects where the library would have helped me if it had been into Boost earlier, and I'm considering rewriting them such that the problems make use of the library, and simplify them in that way.
- Did you try to use the library? With what compiler? Did you have any problems?
I used it using Visual C++ 2008 Express Edition and I didn't have any problems.
- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
About 10 hours spread over three days.
- Are you knowledgeable about the problem domain?
Finally, thanks to Joachim for developing and submitting this excellent library, and to Hartmut for managing this review. Regards, Barend Gehrels, Amsterdam Ad a: void CreateTimeline(std::ofstream& stream) { namespace itl = boost::itl; typedef itl::interval_map<int, int> map; map livetimes; for (TGenealogyVertexIterator vi = vertices(m_graph).first; vi != vertices(m_graph).second; ++vi) { CGenealogyVertexProperty p = GetProperty(*vi); if (p.indi.type == TYPE_PERSON && p.indi.birth.date.hasdate) { int death_date = p.indi.death.date.hasdate ? p.indi.death.date.year : 9999; livetimes += std::make_pair(itl::interval<int>(p.indi.birth.date.year, death_date), 1); } } stream << "LIVETIMES: " << std::endl; for (map::const_iterator it = livetimes.begin(); it != livetimes.end(); ++it) { boost::itl::interval<int> const& when = it->first; stream << when.first() << " - " << when.upper() << " : " << it->second << std::endl; } } Ad b: void calculate_leave_request() { namespace gr = boost::gregorian; namespace itl = boost::itl; itl::interval<gr::date> scope ( gr::from_simple_string("2010-04-01"), gr::from_simple_string("2010-05-31") ); itl::interval_set<gr::date> weekends_plus_festival_days(scope); weekends_plus_festival_days -= weekends(scope); weekends_plus_festival_days -= gr::from_simple_string("2010-04-05"); // Eastern weekends_plus_festival_days -= gr::from_simple_string("2010-04-30"); // Queensday weekends_plus_festival_days -= gr::from_simple_string("2010-05-13"); // Ascension day weekends_plus_festival_days -= gr::from_simple_string("2010-05-24"); // Whitsun itl::interval_map<gr::date, double> request_in_hours; // Add 8 hour per day request_in_hours += std::make_pair(scope, 8.0); request_in_hours &= weekends_plus_festival_days; // Subtract 4 hour for all Fridays { gr::date date = gr::next_weekday(scope.first(), gr::greg_weekday(gr::Friday)); gr::week_iterator it(date); while(it <= scope.last()) { itl::interval<gr::date> scope(*it, *it); request_in_hours -= std::make_pair(scope, 4.0); ++it; } } report("Request", scope, request_in_hours); } // where report is: template <typename Scope, typename Map> double report(std::string const& caption, Scope const& scope, Map const& map) { namespace itl = boost::itl; namespace gr = boost::gregorian; std::cout << caption << ": " << scope.first() << " - " << scope.last() << std::endl; double hours = 0; for(typename boost::range_iterator<Map const>::type it = boost::begin(map); it != boost::end(map); ++it) { itl::interval<gr::date> const& interval = it->first; // Add 1, because Boost.Date counts [2010-05-01 ... 2010-05-02 as one day] int days = 1 + gr::date_period(interval.first(), interval.last()).length().days(); std::cout << interval.first() << " - " << interval.last() << " -> " << days << " * " << it->second << std::endl; hours += days * it->second; } std::cout << "Total hours: " << hours << std::endl; return hours; }