[containers] flat_set/flat_map could add insert_sorted(begin, end)

Hi Ion, I think the above member could be useful (taking an already sorted sequence), especially if the container needs to reallocate, in which case we may merge the two sequences [begin,end) [lower_bound(*begin),upper_bound(*(end-1))] and copy the unaltered prefix and suffix. Furthermore, even when we don't have to reallocate, we can bulk copy the suffix and simulate merging for the infix range. Any comments? -Thorsten

On 20/04/2010 17:46, Thorsten Ottosen wrote:
Hi Ion,
I think the above member could be useful (taking an already sorted sequence), especially if the container needs to reallocate, in which case we may merge the two sequences
[begin,end) [lower_bound(*begin),upper_bound(*(end-1))]
and copy the unaltered prefix and suffix.
But we don't know the final size of the the container without comparing each value to be inesrted, what's your suggestion? And should the input sequence contain unique values for set/map?
Furthermore, even when we don't have to reallocate, we can bulk copy the suffix and simulate merging for the infix range.
The question is how we know we don't need to reallocate. Best, Ion

On 21/04/2010 0:10, igaztanaga@gmail.com wrote:
On 20/04/2010 17:46, Thorsten Ottosen wrote:
Hi Ion, The question is how we know we don't need to reallocate.
Well, for multimap/set this is trivial. The problem is with set/map, if we want to avoid a new allocation if it's not necessary: input_size = distance(begin, end); merge_region_size = distance(lower_bound(begin), upper_bound(end)); non_merge_region_size = size()-merge_region_size; final size is between (input_size + non_merge_region_size) and ((input_size + merge_region_size) + non_merge_region_size) Depending on input_size, (e.g.: if it's less than free_mem = capacity()-size()) we could first avoid reallocation. Or we could take some input elements (begin(), begin() + free_mem), merge and see what happens (if free_space is nearly exhausted or not). If there is no space left or is much less than the remaining imput, we grow vector capacity and try again. Best, Ion

Ion Gaztañaga skrev:
On 21/04/2010 0:10, igaztanaga@gmail.com wrote:
On 20/04/2010 17:46, Thorsten Ottosen wrote:
Hi Ion, The question is how we know we don't need to reallocate.
Well, for multimap/set this is trivial. The problem is with set/map, if we want to avoid a new allocation if it's not necessary:
input_size = distance(begin, end); merge_region_size = distance(lower_bound(begin), upper_bound(end)); non_merge_region_size = size()-merge_region_size;
final size is between (input_size + non_merge_region_size) and ((input_size + merge_region_size) + non_merge_region_size)
Depending on input_size, (e.g.: if it's less than free_mem = capacity()-size()) we could first avoid reallocation. Or we could take some input elements (begin(), begin() + free_mem), merge and see what happens (if free_space is nearly exhausted or not). If there is no space left or is much less than the remaining imput, we grow vector capacity and try again.
That's one appproach. We could also strengthen the precondition and require that the input sequence is sorted and unique. The function with a weaker precondition could arguably do the same thing, but I think leaving that responsibility to the caller gives the best flexibility; (1) sometimes the caller knows the sequence is already unique, or (2) sometimes the caller can call unique() first and reuse memory, or (3) sometimes the caller is forced to use unique_copy(). In case (1) and (2) we will get better performance, and possibel also in case (3) where the caller might have additional information that allows him to use stack-buffers etc. So for flat_set/flat_map I think this strong precondition is the best we can do, and for flat_multiset/flat_multimap, only is_sorted should be the precondition. I'm wondering if the same interface would make sense in normal set/map, but I'm thinking the extra info is little worth compared to heap-allocations and balancing. -Thorsten

On 21/04/2010 12:57, Thorsten Ottosen wrote:
That's one appproach. We could also strengthen the precondition and require that the input sequence is sorted and unique.
You still need to remove duplicates between the unique sorted input range and stored values. Unless you require that the input sequence should not contain any value already stored in the container (this could be a more optimized function, it has a worst case with just a single reallocation since the final size is size() + input_range_size). Best, Ion

Ion Gaztañaga skrev:
On 21/04/2010 12:57, Thorsten Ottosen wrote:
That's one appproach. We could also strengthen the precondition and require that the input sequence is sorted and unique.
You still need to remove duplicates between the unique sorted input range and stored values. Unless you require that the input sequence should not contain any value already stored in the container (this could be a more optimized function, it has a worst case with just a single reallocation since the final size is size() + input_range_size).
I missed that, thanks. My use case actually have this property too. Even without this condition we should be able to have only two reallocations: one initial that might allocate too much space and a second one which is basically shrink_to_fit() which can be chosen only to be performed if it reduces the space more than, say 10%. Call this strategy A (the percentage might be a default argument of the function). Another strategy (B) would be to assume something about the average number of distinct elements. This is difficult to do in general, and so the user would be asked for a number in the interface, specifying his belief in the number of elements to be inserted. It might be difficult even for the user to do this. The good thing about strategy A is that it gives a single allocation in the worst case for the case where all the new elements must be stored because the size estimation is correct. Therefore this sounds like a good approach. -Thorsten
participants (3)
-
igaztanaga@gmail.com
-
Ion Gaztañaga
-
Thorsten Ottosen