
On Mon, Mar 13, 2006 at 05:22:35PM +0100, Joaqu?n M? L?pez Mu?oz wrote:
I'd check the following for anomalies:
Joaquin, I'm busy tonight with other things so I'm only able to give you some approximate answers, for which I apologise. I hope to be able to give more definitive answers Wednesday.
1. Could you (just for tests), replace index #0, which now is hashed, with an ordered_unique index and rerun? A quadratic behavior could be the result of a very bad hashing scheme: this is extremely unlikely for your current code, but who knows.
The original container had ordered_unique as its first index and showed a similar (large) increase in pruning time when pruning more cache entries. I did not look for a power law or anything.
2. Is the size of CacheEntry::d_records roughly the same between prunes? Could it be that the average number of records in CacheEntry elements is disproportionately larger on those prunes that take longer?
The cache remains almost identical in size, around 50 million entries. The cache represents DNS records expiring, each run expires all DNS entries that have a 'time to die' ttd that is larger than the current time. This holds, I think, a vital clue to the problem at hand. Records are inserted with ever increasing ttds, generally. There is a number of common time-to-live numbers for DNS records, ie, 5, 60, 300, 3600, 7200, 86400 seconds. This means that we have a number of 'striding' ttd's being inserted: now + 5, now + 60, now +300 .. now +86400. Might this trigger a badly balanced tree? On a related note, I wonder if chopping off the lower half of any sort of ordered tree might trigger worst case behaviour. Note that CacheEntry::getTTD() which forms one index might be slow to call.
3. What percentage of elements of d_cache are visited on each prune? I.e. what is the typical value of looked/d_cache.size()? The nonvisited elements shoudn't have any impact on phase /*2*/, but they have a (minimal) impact on those operations of /*1*/ performing as O(n log n), such as replace(). Anyway, no amount on nonvisited elements can have such a huge impact on figures.
Typically 500 elements are visited of a 50 million element structure, so 1e-5. Things get crazy once > 1000 elements are visited. Thanks Joaquin, I hope to provide more measurements soon. Regards, bert -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services