Boost MultiIndexContainer, how to do delayed indexing for fast insertion?

I used below MultiIndexContainer typedef multi_index_container<PositionSummary*, indexed_by< ordered_unique< composite_key<PositionSummary, const_mem_fun<PositionSummary, int, &PositionSummary::positiondate>, const_mem_fun<PositionSummary, const std::string&, &PositionSummary::accountid>, const_mem_fun<PositionSummary, const std::string&, &PositionSummary::instid> > >, ordered_unique< composite_key<PositionSummary, const_mem_fun<PositionSummary, int, &PositionSummary::positiondate>, const_mem_fun<PositionSummary, const std::string&, &PositionSummary::instid>, const_mem_fun<PositionSummary, const std::string&, &PositionSummary::accountid> > > > > PositionSummaryContainer; And I inserted 10000 insts*36 accounts*100 days=36 million records //Begin testing of the multiIndexContainter std::cout << "Begin inserting data from array into the multiIndexContainter" << std::endl; timer.reset(); timer.begin(); for (int i = 0; i < numOfDays_; i++) { for (int j = 0; j < accountSize_; j++) { for (int k = 0; k < instSize_; k++) { PositionSummary* ps = psArray_[(i * accountSize_ + j) * instSize_ + k]; uniqueIndex.insert(ps); } } } printMemoryUsage(); timer.end(); std::cout << "Time take is " << timer.getInterval() << std::endl; And I found the speed of insertion is a little bit slow, about 20K+ records per second... Is there anyway to enhance this insertion speed? My data was in Oracle, properly indexed, so there should be no danger of corrupted data structure. I knew that in oracle you can first load then build index to save time, can I do the same with MultiIndexContainer, if there is a way? By the way, the parallel query speed is quite satisfactory, querying all the 36 m records on a 4 cpu(8kernal) machine takes only 2.8 seconds, code as below #pragma omp parallel for collapse(2) for (int i = 0; i < numOfDays_; i++) { for (int j = 0; j < accountSize_; j++) { const int& date = dates_[i]; const std::string& accountID = accountIDs_[j]; for (int k = 0; k < instSize_; k++) { const std::string& instID = instIDs_[i]; PositionSummaryContainer::iterator it = uniqueIndex.find(boost::make_tuple(date, accountID, instID)); if (it != uniqueIndex.end()) { #pragma omp atomic sum2 += (*it)->marketvalue(); } } //std::cout << "accountID: " << accountID << std::endl; } }

I used below MultiIndexContainer typedef multi_index_container<PositionSummary*, indexed_by< ordered_unique< composite_key<PositionSummary, const_mem_fun<PositionSummary, int, &PositionSummary::positiondate>, const_mem_fun<PositionSummary, const std::string&, &PositionSummary::accountid>, const_mem_fun<PositionSummary, const std::string&, &PositionSummary::instid> > >, ordered_unique< composite_key<PositionSummary, const_mem_fun<PositionSummary, int, &PositionSummary::positiondate>, const_mem_fun<PositionSummary, const std::string&, &PositionSummary::instid>, const_mem_fun<PositionSummary, const std::string&, &PositionSummary::accountid> > > > > PositionSummaryContainer; And I inserted 10000 insts*36 accounts*100 days=36 million records //Begin testing of the multiIndexContainter std::cout << "Begin inserting data from array into the multiIndexContainter" << std::endl; timer.reset(); timer.begin(); for (int i = 0; i < numOfDays_; i++) { for (int j = 0; j < accountSize_; j++) { for (int k = 0; k < instSize_; k++) { PositionSummary* ps = psArray_[(i * accountSize_ + j) * instSize_ + k]; uniqueIndex.insert(ps); } } } printMemoryUsage(); timer.end(); std::cout << "Time take is " << timer.getInterval() << std::endl; And I found the speed of insertion is a little bit slow, about 20K+ records per second... Is there anyway to enhance this insertion speed? My data was in Oracle, properly indexed, so there should be no danger of corrupted data structure. I knew that in oracle you can first load then build index to save time, can I do the same with MultiIndexContainer, if there is a way? By the way, the parallel query speed is quite satisfactory, querying all the 36 m records on a 4 cpu(8kernal) machine takes only 2.8 seconds, code as below #pragma omp parallel for collapse(2) for (int i = 0; i < numOfDays_; i++) { for (int j = 0; j < accountSize_; j++) { const int& date = dates_[i]; const std::string& accountID = accountIDs_[j]; for (int k = 0; k < instSize_; k++) { const std::string& instID = instIDs_[i]; PositionSummaryContainer::iterator it = uniqueIndex.find(boost::make_tuple(date, accountID, instID)); if (it != uniqueIndex.end()) { #pragma omp atomic sum2 += (*it)->marketvalue(); } } //std::cout << "accountID: " << accountID << std::endl; } }

HuMichael <hujinying <at> hotmail.com> writes:
I used below MultiIndexContainer
typedef multi_index_container<PositionSummary*, indexed_by< ordered_unique<composite_key<...> >, ordered_unique<composite_key<...> >
PositionSummaryContainer;
And I inserted 10000 insts*36 accounts*100 days=36 million records [...] for (int i = 0; i < numOfDays_; i++) { for (int j = 0; j < accountSize_; j++) { for (int k = 0; k < instSize_; k++) { PositionSummary* ps = psArray_[(i * accountSize_ + j) * instSize_ + k]; uniqueIndex.insert(ps); } } } [...] And I found the speed of insertion is a little bit slow, about 20K+ records per second... Is there anyway to enhance this insertion speed? My data was in Oracle, properly indexed, so there should be no danger of corrupted data structure.
Well, several ideas you can test: * If the data is being populated in an order consistent with either index, hinted insertion might get you some speedup: uniqueIndex.insert(uniqueIndex.end(),ps); You might need to play with the i,j,k loop order to hit this right. * If your queries on the composite keys are never partial (i.e. you always specify their three parameters), switching to hashed_unique won't break your code and will likely result in faster execution both at insertion and lookup times. * Can you store PositionSummary's rather than PositionSummary *'s? This eliminates one pointer indirection, modestly improves locality etc. * Can you use lighter types than std::strings for accountid and instid (like, ints)? HTH Joaquín M López Muñoz Telefónica
participants (2)
-
HuMichael
-
Joaquin M Lopez Munoz