Backend of multi_index container
Hi, I was going through the documentation of boost::multi_index container. I would like to know what is in its back end. Something like B-Tree,B+tree,etc.? Actually I want to create a database(in the sense it contains millions of records), but need not be reusable at a later period of time. i.e, I am looking only for run-time persistent data. Once data is inserted to this, it will not be modified. And there will be a unique id for each record. I would also like to know whether multi_index is the best suited one for the implementation of my so called database.. Thankyou
On 22 June 2016 at 16:13, Anaswara Nair
I would also like to know whether multi_index is the best suited one for the implementation of my so called database..
Dunno, but Symas Lightning Memory-mapped Database https://symas.com/products/lightning-memory-mapped-database/ is pretty good. There are C++-wrappers, and older one and a c++11 one. degski
I have gone through LMDB. However AFAIK LMDB stores the
data as a key-value pair. Though duplicate keys are allowed, I couldn't find
any relation between these entries. I couldn't understand its use as a
database as well. Can you provide some help so that I can go deeper into
this.
On Thu, Jun 23, 2016 at 10:23 AM, degski
On 22 June 2016 at 16:13, Anaswara Nair
wrote: I would also like to know whether multi_index is the best suited one for the implementation of my so called database..
Dunno, but Symas Lightning Memory-mapped Database https://symas.com/products/lightning-memory-mapped-database/ is pretty good. There are C++-wrappers, and older one and a c++11 one.
degski
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
On 24 June 2016 at 15:51, Anaswara Nair
I have gone through LMDB. However AFAIK LMDB stores the data as a key-value pair.
Yes, it's a key-value store.
Though duplicate keys are allowed, I couldn't find any relation between these entries. I couldn't understand its use as a database as well.
It's like std::(multi)map, maybe the wikipedia entry https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database helps.
Can you provide some help so that I can go deeper into this.
In your original question, you refer to having one unique ID and some related data per record. Maybe you could expand a bit on what you would like to achieve. degski
I have 7 fields in each record,two strings and five 64 bit integers. Of
these,one of the integers is unique. I need to store all such records into
the DB. In multi-index, I have a structure and this structure is being
indexed by that integer value. If I have to similarly create a structure
and store it as the value in LMDB, then what is its difference from
multimap? I can do the same in multimap also. Then how does LMDB become a
database.
On Fri, Jun 24, 2016 at 7:43 PM, degski
On 24 June 2016 at 15:51, Anaswara Nair
wrote: I have gone through LMDB. However AFAIK LMDB stores the data as a key-value pair.
Yes, it's a key-value store.
Though duplicate keys are allowed, I couldn't find any relation between these entries. I couldn't understand its use as a database as well.
It's like std::(multi)map, maybe the wikipedia entry https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database helps.
Can you provide some help so that I can go deeper into this.
In your original question, you refer to having one unique ID and some related data per record. Maybe you could expand a bit on what you would like to achieve.
degski
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
-- WITH REGARDS, ANASWARA
On 27 June 2016 at 07:27, Anaswara Nair
I have 7 fields in each record,two strings and five 64 bit integers. Of these,one of the integers is unique. I need to store all such records into the DB. In multi-index, I have a structure and this structure is being indexed by that integer value.
From the docs: "The Boost Multi-index Containers Library provides a class template named multi_index_container which enables the construction of containers maintaining one or more indices with different sorting and access semantics." You don't seem to need that functionality (and incurr
the cost of it).
If I have to similarly create a structure and store it as the value in LMDB, then what is its difference from multimap? I can do the same in multimap also.
There are plenty of differences (spelled out on the wiki page), but one of them is concurrency (even at process level). Then how does LMDB become a database.
What's a database? Depending how your indices are generated (randomly, sequentially or otherwise, is size known in advance?), even a std::vector could be your "database". You don't provide enough information. I would suggest to first get your application to work in the simplest possible way (i.e. std::multimap) and see whether that works for you. Don't forget to also test against a std::unordered_multimap, as it could be faster. If you need concurrency, have a look at the TBB library containers. d.
IIRC it uses red-black tree I had horrific experience with insert times, it just took too much time. Actually I had exactly your case, insert once and then just run on multiple indexes. Sounds like, if you don’t mind the load time, go for it. From: Boost-users [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Anaswara Nair Sent: Wednesday, June 22, 2016 4:13 PM To: boost-users@lists.boost.org Subject: [Boost-users] Backend of multi_index container Hi, I was going through the documentation of boost::multi_index container. I would like to know what is in its back end. Something like B-Tree,B+tree,etc.? Actually I want to create a database(in the sense it contains millions of records), but need not be reusable at a later period of time. i.e, I am looking only for run-time persistent data. Once data is inserted to this, it will not be modified. And there will be a unique id for each record. I would also like to know whether multi_index is the best suited one for the implementation of my so called database.. Thankyou
I had a file containing about 0.1 million records, each record containing seven fields(two strings and five 64 bit integers). I indexed(ordered_unique) one of the integer fields and tried inserting the records to a multiindex container and it took only about 650 milliseconds. Inserting the same to a SQLite DB took almost 1 second. So, will inserting a really really huge amount of records will degrade the performance? On Thu, Jun 23, 2016 at 10:32 AM, Ernest Zaslavsky < ernest.zaslavsky@sizmek.com> wrote:
IIRC it uses red-black tree
I had horrific experience with insert times, it just took too much time. Actually I had exactly your case, insert once and then just run on multiple indexes. Sounds like, if you don’t mind the load time, go for it.
*From:* Boost-users [mailto:boost-users-bounces@lists.boost.org] *On Behalf Of *Anaswara Nair *Sent:* Wednesday, June 22, 2016 4:13 PM *To:* boost-users@lists.boost.org *Subject:* [Boost-users] Backend of multi_index container
Hi, I was going through the documentation of boost::multi_index container. I would like to know what is in its back end. Something like B-Tree,B+tree,etc.? Actually I want to create a database(in the sense it contains millions of records), but need not be reusable at a later period of time. i.e, I am looking only for run-time persistent data. Once data is inserted to this, it will not be modified. And there will be a unique id for each record. I would also like to know whether multi_index is the best suited one for the implementation of my so called database.. Thankyou
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Hm… 650 ms is way too much for 100k container. AFAIR I was inserting 500k in one second, 5M in 3 seconds…
Are you sure you are NOT running in debug configuration.
Could you elaborate what is your use case? Do you have to insert once and then just use its index?
From: Boost-users [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Anaswara Nair
Sent: Friday, June 24, 2016 3:53 PM
To: boost-users@lists.boost.org
Subject: Re: [Boost-users] Backend of multi_index container
I had a file containing about 0.1 million records, each record containing seven fields(two strings and five 64 bit integers). I indexed(ordered_unique) one of the integer fields and tried inserting the records to a multiindex container and it took only about 650 milliseconds. Inserting the same to a SQLite DB took almost 1 second. So, will inserting a really really huge amount of records will degrade the performance?
On Thu, Jun 23, 2016 at 10:32 AM, Ernest Zaslavsky
I compiled in Release mode. And yes.. I have to insert once and should be able to use its index to retrieve the data. On Sun, Jun 26, 2016 at 5:12 PM, Ernest Zaslavsky < ernest.zaslavsky@sizmek.com> wrote:
Hm… 650 ms is way too much for 100k container. AFAIR I was inserting 500k in one second, 5M in 3 seconds…
Are you sure you are NOT running in debug configuration.
Could you elaborate what is your use case? Do you have to insert once and then just use its index?
*From:* Boost-users [mailto:boost-users-bounces@lists.boost.org] *On Behalf Of *Anaswara Nair *Sent:* Friday, June 24, 2016 3:53 PM *To:* boost-users@lists.boost.org *Subject:* Re: [Boost-users] Backend of multi_index container
I had a file containing about 0.1 million records, each record containing seven fields(two strings and five 64 bit integers). I indexed(ordered_unique) one of the integer fields and tried inserting the records to a multiindex container and it took only about 650 milliseconds. Inserting the same to a SQLite DB took almost 1 second. So, will inserting a really really huge amount of records will degrade the performance?
On Thu, Jun 23, 2016 at 10:32 AM, Ernest Zaslavsky < ernest.zaslavsky@sizmek.com> wrote:
IIRC it uses red-black tree
I had horrific experience with insert times, it just took too much time. Actually I had exactly your case, insert once and then just run on multiple indexes. Sounds like, if you don’t mind the load time, go for it.
*From:* Boost-users [mailto:boost-users-bounces@lists.boost.org] *On Behalf Of *Anaswara Nair *Sent:* Wednesday, June 22, 2016 4:13 PM *To:* boost-users@lists.boost.org *Subject:* [Boost-users] Backend of multi_index container
Hi, I was going through the documentation of boost::multi_index container. I would like to know what is in its back end. Something like B-Tree,B+tree,etc.? Actually I want to create a database(in the sense it contains millions of records), but need not be reusable at a later period of time. i.e, I am looking only for run-time persistent data. Once data is inserted to this, it will not be modified. And there will be a unique id for each record. I would also like to know whether multi_index is the best suited one for the implementation of my so called database.. Thankyou
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
-- WITH REGARDS, ANASWARA
With –O3? It is very slow for release…
In any case, if the initial insert time is crucial for you, you can model similar to MIC interface using preallocated vectors, holding indices as tuples in sorted vectors, if O(logn) search time is suitable for you. This is what I did in a while ago…
From: Boost-users [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Anaswara Nair
Sent: Sunday, June 26, 2016 3:03 PM
To: boost-users@lists.boost.org
Subject: Re: [Boost-users] Backend of multi_index container
I compiled in Release mode. And yes.. I have to insert once and should be able to use its index to retrieve the data.
On Sun, Jun 26, 2016 at 5:12 PM, Ernest Zaslavsky
OP might be reading the data from file, the file reading time might have been counted in the time for multi-index performance. On Sun, Jun 26, 2016 at 5:50 PM, Ernest Zaslavsky < ernest.zaslavsky@sizmek.com> wrote:
With –O3? It is very slow for release…
In any case, if the initial insert time is crucial for you, you can model similar to MIC interface using preallocated vectors, holding indices as tuples in sorted vectors, if O(logn) search time is suitable for you. This is what I did in a while ago…
*From:* Boost-users [mailto:boost-users-bounces@lists.boost.org] *On Behalf Of *Anaswara Nair *Sent:* Sunday, June 26, 2016 3:03 PM
*To:* boost-users@lists.boost.org *Subject:* Re: [Boost-users] Backend of multi_index container
I compiled in Release mode. And yes.. I have to insert once and should be able to use its index to retrieve the data.
On Sun, Jun 26, 2016 at 5:12 PM, Ernest Zaslavsky < ernest.zaslavsky@sizmek.com> wrote:
Hm… 650 ms is way too much for 100k container. AFAIR I was inserting 500k in one second, 5M in 3 seconds…
Are you sure you are NOT running in debug configuration.
Could you elaborate what is your use case? Do you have to insert once and then just use its index?
*From:* Boost-users [mailto:boost-users-bounces@lists.boost.org] *On Behalf Of *Anaswara Nair *Sent:* Friday, June 24, 2016 3:53 PM *To:* boost-users@lists.boost.org *Subject:* Re: [Boost-users] Backend of multi_index container
I had a file containing about 0.1 million records, each record containing seven fields(two strings and five 64 bit integers). I indexed(ordered_unique) one of the integer fields and tried inserting the records to a multiindex container and it took only about 650 milliseconds. Inserting the same to a SQLite DB took almost 1 second. So, will inserting a really really huge amount of records will degrade the performance?
On Thu, Jun 23, 2016 at 10:32 AM, Ernest Zaslavsky < ernest.zaslavsky@sizmek.com> wrote:
IIRC it uses red-black tree
I had horrific experience with insert times, it just took too much time. Actually I had exactly your case, insert once and then just run on multiple indexes. Sounds like, if you don’t mind the load time, go for it.
*From:* Boost-users [mailto:boost-users-bounces@lists.boost.org] *On Behalf Of *Anaswara Nair *Sent:* Wednesday, June 22, 2016 4:13 PM *To:* boost-users@lists.boost.org *Subject:* [Boost-users] Backend of multi_index container
Hi, I was going through the documentation of boost::multi_index container. I would like to know what is in its back end. Something like B-Tree,B+tree,etc.? Actually I want to create a database(in the sense it contains millions of records), but need not be reusable at a later period of time. i.e, I am looking only for run-time persistent data. Once data is inserted to this, it will not be modified. And there will be a unique id for each record. I would also like to know whether multi_index is the best suited one for the implementation of my so called database.. Thankyou
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
--
WITH REGARDS, ANASWARA
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
The execution time includes the time to read the data from the file and insert it to the container. Also I am using std::chrono to calculate the time. Is there some other way by which I can pause the calculation when data is read from the file and resume it again only when data is inserted to the container? How is the performance of the container measured? On Sun, Jun 26, 2016 at 5:50 PM, Ernest Zaslavsky < ernest.zaslavsky@sizmek.com> wrote:
With –O3? It is very slow for release…
In any case, if the initial insert time is crucial for you, you can model similar to MIC interface using preallocated vectors, holding indices as tuples in sorted vectors, if O(logn) search time is suitable for you. This is what I did in a while ago…
*From:* Boost-users [mailto:boost-users-bounces@lists.boost.org] *On Behalf Of *Anaswara Nair *Sent:* Sunday, June 26, 2016 3:03 PM
*To:* boost-users@lists.boost.org *Subject:* Re: [Boost-users] Backend of multi_index container
I compiled in Release mode. And yes.. I have to insert once and should be able to use its index to retrieve the data.
On Sun, Jun 26, 2016 at 5:12 PM, Ernest Zaslavsky < ernest.zaslavsky@sizmek.com> wrote:
Hm… 650 ms is way too much for 100k container. AFAIR I was inserting 500k in one second, 5M in 3 seconds…
Are you sure you are NOT running in debug configuration.
Could you elaborate what is your use case? Do you have to insert once and then just use its index?
*From:* Boost-users [mailto:boost-users-bounces@lists.boost.org] *On Behalf Of *Anaswara Nair *Sent:* Friday, June 24, 2016 3:53 PM *To:* boost-users@lists.boost.org *Subject:* Re: [Boost-users] Backend of multi_index container
I had a file containing about 0.1 million records, each record containing seven fields(two strings and five 64 bit integers). I indexed(ordered_unique) one of the integer fields and tried inserting the records to a multiindex container and it took only about 650 milliseconds. Inserting the same to a SQLite DB took almost 1 second. So, will inserting a really really huge amount of records will degrade the performance?
On Thu, Jun 23, 2016 at 10:32 AM, Ernest Zaslavsky < ernest.zaslavsky@sizmek.com> wrote:
IIRC it uses red-black tree
I had horrific experience with insert times, it just took too much time. Actually I had exactly your case, insert once and then just run on multiple indexes. Sounds like, if you don’t mind the load time, go for it.
*From:* Boost-users [mailto:boost-users-bounces@lists.boost.org] *On Behalf Of *Anaswara Nair *Sent:* Wednesday, June 22, 2016 4:13 PM *To:* boost-users@lists.boost.org *Subject:* [Boost-users] Backend of multi_index container
Hi, I was going through the documentation of boost::multi_index container. I would like to know what is in its back end. Something like B-Tree,B+tree,etc.? Actually I want to create a database(in the sense it contains millions of records), but need not be reusable at a later period of time. i.e, I am looking only for run-time persistent data. Once data is inserted to this, it will not be modified. And there will be a unique id for each record. I would also like to know whether multi_index is the best suited one for the implementation of my so called database.. Thankyou
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
--
WITH REGARDS, ANASWARA
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
-- WITH REGARDS, ANASWARA
You can do two step load, once into temporary container and then inserting into MIC and measure only the second part, it is inefficient but I guess you can afford it for test purposes.
I guess std::chrono is ok in case std::chrono::high_resolution_clock is precise enough on your system ( it is implementation specific). The other approach is to take std::chrono::high_resolution_clock::now before and after you call to MIC ‘insert’ and then calculate and sum each delta, again it will work in case your clock resolution is sufficient
From: Boost-users [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Anaswara Nair
Sent: Monday, June 27, 2016 8:51 AM
To: boost-users@lists.boost.org
Subject: Re: [Boost-users] Backend of multi_index container
The execution time includes the time to read the data from the file and insert it to the container. Also I am using std::chrono to calculate the time. Is there some other way by which I can pause the calculation when data is read from the file and resume it again only when data is inserted to the container? How is the performance of the container measured?
On Sun, Jun 26, 2016 at 5:50 PM, Ernest Zaslavsky
On 27 June 2016 at 08:50, Anaswara Nair
The execution time includes the time to read the data from the file and insert it to the container.
So start timing after you loaded the file. The way you load the file might affect the load-time. std::iostreams might not give the best performance, and is obviously highly affected by hardware, STL-implementation and underlying OS. Also I am using std::chrono to calculate the time.
Use std::chrono::high_resolution_clock.
Is there some other way by which I can pause the calculation when data is read from the file and resume it again only when data is inserted to the container?
You could just keep a running total and pause and resume as appropriate for your measurement, adding the individual measurements to the total time. It all smells a bit of premature optimization to me. d.
Anaswara Nair
Hi, I was going through the documentation of boost::multi_index container. I would like to know what is in its back end. Something like B-Tree,B+tree,etc.?
Each ordered index is associated with a corresponding red-black tree (https://en.m.wikipedia.org/wiki/Red%E2%80%93black_tree ).
Actually I want to create a database(in the sense it contains millions of records), but need not be reusable at a later period of time. i.e, I am looking only for run-time persistent data. Once data is inserted to this, it will not be modified. And there will be a unique id for each record. I would also like to know whether multi_index is the best suited one for the implementation of my so called database.
Others have been known to use Boost.MultiIndex for similar purposes, but it really depends on the functionality you need, your performance requirements and the results of your performance tests. Joaquín M López Muñoz
El miércoles, 22 de junio de 2016, Anaswara Nair
participants (6)
-
Anaswara Nair
-
degski
-
Ernest Zaslavsky
-
Joaquin M López Muñoz
-
Joaqun M Lopez Munoz
-
Lloyd