[boost-users] [date_time] length of generic periods

Hi, the concept of the length of generic periods puzzles me. The "normal" case p.length() = p.end() - p.begin() makes sense to me, but why is a period with last() == begin() treated as a special case, having length 0? Having this special case make the length concept behave strangely. For example, [2005-May-1/2005-May-1] has length 0, but [2005-May-1/2005-May-2] has length 2! Actually there is no period of length 1 at all. As I understand from the documentation, the philosophical reason behind this seems to be that the end points of periods are considered "points" and that a point and therefore a period that consists of a single point has zero extension. This sounds plausible if you think of the periods like intervals on the real number line. But our situation is different, I think, since we always deal with discrete quantities that have a finite length (or "duration"). I would interpret the period [2005-May-1/2005-May-1] as the time period that lasts during the whole day 2005-May-1, from 00:00 up to 23:59:59.999... Therefore, I would give it the length (duration) 1. [2005-May-1/2005-Apr-30] would accordingly be an empty period of length 0, and [2005-May-1/2005-Apr-29] would be an invalid period of length -1. The length of a date_period p would then be simply the number of days in p, multiplied with the unit of the duration_type. The template parameter duration_type can be used to defined different measurements of the length. In the standard case (duration of a day is 1) you measure the period length in days. But you could also measure it in seconds, by using a duration_type with unit() = 86400. What do you think about this? Sincerely, Friedrich

On Thu, 21 Jul 2005 22:28:08 +0200, Friedrich Wilckens wrote
Hi,
the concept of the length of generic periods puzzles me. The "normal" case p.length() = p.end() - p.begin() makes sense to me, but why is a period with last() == begin() treated as a special case, having length 0?
Long ago in a mailing list somewhere the issue of null intervals appeared and that's where this logic came from...
Having this special case make the length concept behave strangely. For example, [2005-May-1/2005-May-1] has length 0,
Well, that seems incorrect [2005-May-1/2005-May-1) has length zero (Note the non-inclusive end). But, then again, what is the duration of [1,1] -- zero I believe.
but [2005-May- 1/2005-May-2] has length 2!
Hmm.
Actually there is no period of length 1 at all.
Oh my...that doesn't seem right at all....
As I understand from the documentation, the philosophical reason behind this seems to be that the end points of periods are considered "points" and that a point and therefore a period that consists of a single point has zero extension.
This sounds plausible if you think of the periods like intervals on the real number line. But our situation is different, I think, since we always deal with discrete quantities that have a finite length (or "duration").
Actually, I don't think it is different from an 'integer numberline'.
I would interpret the period [2005-May-1/2005-May-1] as the time period that lasts during the whole day 2005-May-1, from 00:00 up to 23:59:59.999... Therefore, I would give it the length (duration) 1.
Yes, that seems correct except that a date-period cannot represent anything less than a single day. Just as integers in a range [1,1] cannot represent 1.99999.
[2005-May-1/2005-Apr-30] would accordingly be an empty period of length 0, and [2005-May-1/2005-Apr-29] would be an invalid period of length -1.
The length of a date_period p would then be simply the number of days in p, multiplied with the unit of the duration_type. The template parameter duration_type can be used to defined different measurements of the length. In the standard case (duration of a day is 1) you measure the period length in days. But you could also measure it in seconds, by using a duration_type with unit() = 86400.
What do you think about this?
I'm worried that there is a problem here, but I'm going to need some time to look at it. I won't be able to for a couple days... Jeff

On Thu, 2005-07-21 at 21:01 -0700, Jeff Garland wrote: <skip>
For example, [2005-May-1/2005-May-1] has length 0,
Well, that seems incorrect [2005-May-1/2005-May-1) has length zero (Note the non-inclusive end). But, then again, what is the duration of [1,1] -- zero I believe.
As I understand from the documentation, the philosophical reason behind this seems to be that the end points of periods are considered "points" and that a point and therefore a period that consists of a single point has zero extension.
This sounds plausible if you think of the periods like intervals on the real number line. But our situation is different, I think, since we always deal with discrete quantities that have a finite length (or "duration").
Actually, I don't think it is different from an 'integer numberline'.
I would interpret the period [2005-May-1/2005-May-1] as the time period that lasts during the whole day 2005-May-1, from 00:00 up to 23:59:59.999... Therefore, I would give it the length (duration) 1.
Yes, that seems correct except that a date-period cannot represent anything less than a single day. Just as integers in a range [1,1] cannot represent 1.99999.
Jeff
I'm somewhat confused and I fear we don't quite understand each other. Let me try to clarify. If you interpret the periods as subsets of the integer number line Z, so that date_periods are sets of days, you get into troubles, for it is unclear what the "openness" of the intervals should mean. For example, [1, 2) = [1, 1] = {1}, [1, 3) = [1, 2] = {1, 2}, and so on. Z with the usual distance metric carries the discrete topology, i.e., *every* subset is open and closed. You would get a consistent system if you define the length of [a, b + 1) = [a, b] as b -a (i.e., last() - begin()). Then, a single day period [a, a + 1) would indeed have length 0, and [1, 3) would have length 1 (not 2). What I had in mind is a different interpretation, and I believe it better captures what we have in mind when we use the periods. I consider periods as intervals on the real number line R. Now, a day is not a "point" on R, it is an interval itself. I would (as I said above) treat it as the right-open interval lasting from 00:00 to 23:59:59.999.... Similar, a second like 2005-May-1 10:13:30 is a right-open interval, lasting from 10:13:30.0 until 10:13:30.9999... The points of R are time instances with zero duration, whereas a second has a duration of, well, a second. What *is* a point on R is the moment at which a day starts. We have a little confusion here since in notations like [2005-May-1, 2005-May-2), "2005-May-1" does not refer to the whole day, but to its start, so it should be considered as a shorthand for 2005-May-1 00:00:00.0; likewise for 2005-May-2. In this interpretation, a date_period is is not a set of days, but a union of days. It lasts from the begin of its starting day to the end of its last day. It is a truly half-open interval, so [2005-May-1, 2005-May-2) is different from the closed interval [2005-May-1, 2005-May-2] (though they have the same length); it is also different from the closed interval [2005-May-1, 2005-May-1] which is a single point of length 0. The interpretation as half-open intervals is built into the semantics of date_period, so closed intervals cannot be expressed at all. [2005-May-1, 2005-May-1) can be expressed and is the empty set (of length 0). begin() returns the first day (not its starting point) contained in the date_period, and last() returns its last day (again, not its starting point). The length of a period can generally be defined as end() - begin() (not as last() - begin() as in the Z-interpretation above). It turns out that [2005-May-1, 2005-May-2) just describes the whole day 2001-May-1 and has length 1. [2005-May-1, 2005-May-1) is the empty set and has length 0. In this interpretation, time_periods (based on ptime) are likewise subsets of R. The difference is only that time_periods allow a much finer resolution. Every date_period could be expressed as a time_period. For time periods, we can ask if a certain microsecond (again, this is not a point, but a half-open interval of finite length) is contained in it; for date_periods, days are the smallest objects we consider. Sorry, this email became somewhat lengthy, but I do not know how to express what I mean with fewer words. Sincerely, Friedrich

On Fri, 22 Jul 2005 22:22:56 +0200, Friedrich Wilckens wrote
I'm somewhat confused and I fear we don't quite understand each other. Let me try to clarify.
If you interpret the periods as subsets of the integer number line Z, so that date_periods are sets of days, you get into troubles, for it is unclear what the "openness" of the intervals should mean. For example, [1, 2) = [1, 1] = {1}, [1, 3) = [1, 2] = {1, 2}, and so on.
Yes.
Z with the usual distance metric carries the discrete topology, i.e., *every* subset is open and closed. You would get a consistent system if you define the length of [a, b + 1) = [a, b] as b -a (i.e., last() - begin()). Then, a single day period [a, a + 1) would indeed have length 0, and [1, 3) would have length 1 (not 2).
Agree. And the current implementation doesn't do this :-(
What I had in mind is a different interpretation, and I believe it better captures what we have in mind when we use the periods. I consider periods as intervals on the real number line R. Now, a day is not a "point" on R, it is an interval itself. I would (as I said above) treat it as the right-open interval lasting from 00:00 to 23:59:59.999.... Similar, a second like 2005-May-1 10:13:30 is a right-open interval,
lasting from 10:13:30.0 until 10:13:30.9999... The points of R are time instances with zero duration, whereas a second has a duration of, well, a second.
What *is* a point on R is the moment at which a day starts. We have a little confusion here since in notations like [2005-May-1, 2005-May- 2), "2005-May-1" does not refer to the whole day, but to its start, so it should be considered as a shorthand for 2005-May-1 00:00:00.0; likewise for 2005-May-2.
In this interpretation, a date_period is is not a set of days, but a union of days. It lasts from the begin of its starting day to the end of its last day. It is a truly half-open interval, so [2005-May- 1, 2005-May-2) is different from the closed interval [2005-May-1, 2005-May-2] (though they have the same length); it is also
I'm having trouble with the idea that open and closed ranges have the same length. It seems like that will lead to problems. More below...
different from the closed interval [2005-May-1, 2005-May-1] which is a single point of length 0.
This makes sense.
The interpretation as half-open intervals is built into the semantics of date_period, so closed intervals cannot be expressed at all. [2005-May-1, 2005-May-1) can be expressed and is the empty set (of length 0).
I agree with this.
begin() returns the first day (not its starting point) contained in the date_period, and last() returns its last day (again, not its starting point).
Ah, interesting...
The length of a period can generally be defined as end() - begin() (not as last() - begin() as in the Z-interpretation above). It turns out that [2005-May-1, 2005-May-2) just describes the whole day 2001-May-1 and has length 1. [2005-May-1, 2005-May-1) is the empty set and has length 0.
Yes.
In this interpretation, time_periods (based on ptime) are likewise subsets of R. The difference is only that time_periods allow a much finer resolution. Every date_period could be expressed as a time_period. For time periods, we can ask if a certain microsecond (again, this is not a point, but a half-open interval of finite length) is contained in it; for date_periods, days are the smallest objects we consider.
Agree.
Sorry, this email became somewhat lengthy, but I do not know how to express what I mean with fewer words.
No problem. This seemingly simple subject is actually quite tricky. So after looking at this for awhile, here's what I'm thinking. Periods have 2 types of constructor. One which accepts 2 points in a 'half-open' form. One with a point and a duration. So as I understand your proposal we would have the following: date d(2005,Jan,1); //0 date date_period dp1(d, date(2004,Dec,31)); //len=-1 last=Jan1 is_null=true date_period dp2(d, days(-1)); //dp1 == dp2 (and so on for the rest of these examples) date_period dp3(d, d); //len=0 last=Jan1 is_null=true date_period dp4(d, days(0)); date_period dp5(d, date(2005,Jan,2)); //len=1 last=Jan1 is_null=false date_period dp6(d, days(1)); date_period dp7(d, date(2005,Jan,3)); //len=2 last=Jan2 is_null=false date_period dp8(d, days(2)); As it turns out, this is pretty close to the originally intended behavior with respect to lengths and nulls. Unfortunately, as you've seen there's some bugs that prevent the library from behaving this way currently. The main difference between your proposal and the current design will report last == Dec 31 and not Jan 1 for the zero length durations. So if you print the above periods you will get: [2005-Jan-01/2004-Dec-30] //dp1 negative one length [2005-Jan-01/2004-Dec-31] //dp3 zero length [2005-Jan-01/2005-Jan-01] //dp5 one length [2005-Jan-01/2005-Jan-02] //dp7 two length Under your proposal we would get: [2005-Jan-01/2005-Jan-01] //dp1 negative one length [2005-Jan-01/2005-Jan-01] //dp3 zero length [2005-Jan-01/2005-Jan-01] //dp5 one length [2005-Jan-01/2005-Jan-02] //dp7 two length I can see arguments for both, but the main thing that makes me think the current approach is better is that it correctly distinguishes the length even thought it is hard to see the zero and negative length cases. But with last being the same for 3 cases there is no way to distinguish between the zero, one and negative one length durations. So serialization of periods in this form becomes a problem. So I recommend that we fix the length bug and leave last alone. Make sense? Jeff
participants (2)
-
Friedrich Wilckens
-
Jeff Garland