
Given the problems with the datasets, I would change the initial query to one that clusters the cities as you do and show only the city + (lat,long) and number of neighbours in the query results. Some cities will show with the right string and others not given that the data doesn't map all coordinates to the same city name. RTL result is also 991 neighbours. The problem is that the Amsterdam area has the 991 neighbours (and most likely all AS with the same identical coordinates) so it is better to group the results by city i.e.: Amsterdam (lata, lonb) 991 city B (latc,lond) xyz city C (late, lonf) abc With these results we can compare both queries and although the string names from the cities my differ the numerical values should not. On 10/18/05, Calum Grant <calum@visula.org> wrote:
"Calum Grant" <calum@visula.org> wrote
Is it also possible to see the result?
Attached, Calum
I don't believe this is the required result, is it?
The problem with this data is that it contains a lot of duplicates. If I cluster the cities into 5103 clusters, I get 47ms. On the other hand if I don't cluster them, then I get 4.1s. The expensive part is building the index of indexing on distances. The results are rather odd - the 500 locations I get have 991 neighbours.
Regards, Calum
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost