[persistent] Persistent library preview

older
Anyone care if I merge Boost.Array...

Stefan Strasser

3 Jan 2010 3 Jan '10

4:05 a.m.

Here's the current documentation of (Boost?).Persistent: https://svn.boost.org/svn/boost/sandbox/persistent/libs/persistent/doc/html/... In short, it is a library that provides access to objects persistent on disk with an interface as close as possible to accessing regular objects in memory, and it introduces transactions to e.g. ensure consistency after a crash. See "Introduction" for more information. This is the version I plan to submit for a boost review, so if you are interested in that kind of library, please have a look at it now and don't wait for a formal review. The boost review queue has grown substantially, so it might be a long time until a formal review, which could be used for improvements based on your comments. Code will follow as soon as some issues are sorted out, especially porting the (few) platform dependent parts to windows. Regards,

Show replies by date

Peder Holt

4 Jan 4 Jan

10:13 p.m.

2010/1/3 Stefan Strasser <strasser@uni-bremen.de>

...

Here's the current documentation of (Boost?).Persistent:

https://svn.boost.org/svn/boost/sandbox/persistent/libs/persistent/doc/html/...

In short, it is a library that provides access to objects persistent on disk with an interface as close as possible to accessing regular objects in memory, and it introduces transactions to e.g. ensure consistency after a crash. See "Introduction" for more information.

This is the version I plan to submit for a boost review, so if you are interested in that kind of library, please have a look at it now and don't wait for a formal review. The boost review queue has grown substantially, so it might be a long time until a formal review, which could be used for improvements based on your comments.

...

From the documentation it seems that the library supports some form of undo/redo. Is this the case? And if so, can you give a short code example of how to use

This looks very interesting. I read through the documentation, and the interface looks very clean. this? Also, can you describe the caching mechanism? How much control do the user have over how much memory he can afford to use for caching objects in memory? Is it possible to control cache size per type? A final question: I have an application that saves analysis results for a given analysis as an HDF5 file. http://www.hdfgroup.org/HDF5/ This HDF5 file initially contains only input to the analysis. The HDF5 file is then fed to another program which modifies it and stores back results. 1. Is it possible to to make boost.persistent write its data to an HDF5 file? 2. If this is/can be made possible, is it possible to ask a loc<T> about its index in the file? 3. Is it possible to create several intermingled sessions/transactions that save to different databases? Example: class Analysis { //Contains a bunch of result entries using boost.persistent } Analysis analysis1; Analysis analysis2; analysis1.setupAnalysis(); analysis2.setupAnalysis(); analysis1.execute(); analysis2.execute(); I want analysis1 and analysis2 to have its results in different file archives. I am looking forwards to following the development of this library Regards Peder Holt

...

Code will follow as soon as some issues are sorted out, especially porting the (few) platform dependent parts to windows.

Regards, _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Stefan Strasser

5 Jan 5 Jan

8:56 a.m.

Hi, Am Monday 04 January 2010 23:13:35 schrieb Peder Holt:

...

...
From the documentation it seems that the library supports some form of

undo/redo. Is this the case? And if so, can you give a short code example of how to use this?

there is "undo" by using transactions. however, it is not there to support undo/redo like you see in GUIs, if that's what you meant. once a transaction is committed, it is permanent. there is an example of that in Tutorial 3 but maybe I should add a simpler one before going into the details of concurrent transactions: void f(loc<pers_type> l){ l->name="John"; { transaction tx; l->name="Mike"; //tx.commit(); //this call is omitted } assert(l->name == "John"); } usually, the call to "commit" is omitted because of an exception, so this helps you to ensure data consistency and "strong" exception safety guarantees.

...

Also, can you describe the caching mechanism? How much control do the user have over how much memory he can afford to use for caching objects in memory? Is it possible to control cache size per type?

a cache sweep is (almost) linear to the number of objects removed from the cache in the sweep, so caches can be large. currently, the size of the cache is only controlled by the number of objects that can be in it. (see Configuring Boost.Persistent ) I thought about using the actual object size to determine cache overflow, the problem here is however that this would require the user to implement a function for each persistent type that returns the object size. the library cannot determine the size of an object (think e.g. an object that contains a std::vector). at best it could make guesses based on the size of the serialized stream once the object has reached disk, but that also only works if the object is serialized (see https://svn.boost.org/svn/boost/sandbox/persistent/libs/persistent/doc/html/... ) I don't want the user to be required to implement an additional function for that, so the best chance I see is an optional user function, like the ones described in "Optional members": class pers_type{ friend float object_size(pers_type const &){ return 2.5; } }; objects of this type would be 2.5 times worth an object without that function in the cache. if you don't implement it, it defaults to 1.0. if you need fine grained control you can stil implement it for every type and return the actual object size. would that be sufficient for your use case?

...

A final question: I have an application that saves analysis results for a given analysis as an HDF5 file. http://www.hdfgroup.org/HDF5/ This HDF5 file initially contains only input to the analysis. The HDF5 file is then fed to another program which modifies it and stores back results. 1. Is it possible to to make boost.persistent write its data to an HDF5 file?

although theoretically possible (see "Extending Boost.Persistent") I don't think I would advise to do so. that'd be like you decided you wanted to use MySQL to process some data, but instead of exporting the data when you're done you decide to write a MySQL storage backend that can write to your file format.

...

2. If this is/can be made possible, is it possible to ask a loc<T> about its index in the file?

locators intentionally hide the details about the storage of the object. the object id identifying an object, like your index, can be of different types, depending on in which resource the object is saved. (more resource managers can be used at the same time, although this is not used currently.) some kind of visitor pattern (see Boost.Variant) could be supported to query the object id, but I don't see the benefit of that right now.

...

3. Is it possible to create several intermingled sessions/transactions that save to different databases?

again, possible, but not advisable for your case I think. you can register more than one resource with the transaction manager and it performs a distributed commit if objects of different resources were involved in a transaction. see https://svn.boost.org/svn/boost/sandbox/persistent/libs/persistent/doc/html/...

...

I am looking forwards to following the development of this library

thanks for your questions and commits

Peder Holt

7 Jan 7 Jan

10:14 a.m.

2010/1/5 Stefan Strasser <strasser@uni-bremen.de>

...

Hi,

Am Monday 04 January 2010 23:13:35 schrieb Peder Holt:

...
...
From the documentation it seems that the library supports some form of

undo/redo. Is this the case? And if so, can you give a short code example of how to use this?

there is "undo" by using transactions. however, it is not there to support undo/redo like you see in GUIs, if that's what you meant. once a transaction is committed, it is permanent.

there is an example of that in Tutorial 3 but maybe I should add a simpler one before going into the details of concurrent transactions:

void f(loc<pers_type> l){

l->name="John"; { transaction tx; l->name="Mike"; //tx.commit(); //this call is omitted } assert(l->name == "John");

}

usually, the call to "commit" is omitted because of an exception, so this helps you to ensure data consistency and "strong" exception safety guarantees.

Would you consider to add this to the library? The difference would be that the persistent storage could store multiple versions of each object. Looking at it naively, one transaction could be considered a single history fragment. Each history fragment contains - a reference to the next/previous history fragment. - a reference to all objects that was modified in this transaction - a reference to all objects created/deleted in this fragment. All objects that are loaded to memory in a transaction are candidates for a status as modified. If an object is only loaded for read only access, it should not be set as modified. Do you think this method is feasible to implement in your library? In our CAE application, we have implemented unlimited undo/redo as a combination of the above technique and an implementation of: http://www.codeproject.com/KB/cpp/transactions.aspx plus some other techniques. Currently all our objects are loaded in memory always, so we need a library to limit the number of objects that are loaded while preserving the undo/redo capabilities.

...

...
Also, can you describe the caching mechanism? How much control do the user have over how much memory he can afford to

use

...
for caching objects in memory? Is it possible to control cache size per type?

a cache sweep is (almost) linear to the number of objects removed from the cache in the sweep, so caches can be large.

currently, the size of the cache is only controlled by the number of objects that can be in it. (see Configuring Boost.Persistent )

This covers my needs. Is it feasible to implement support for setting the max_cache_size per type?

...

...
A final question: I have an application that saves analysis results for a given analysis as an HDF5 file. http://www.hdfgroup.org/HDF5/ This HDF5 file initially contains only input to the analysis. The HDF5 file is then fed to another program which modifies it and stores back results. 1. Is it possible to to make boost.persistent write its data to an HDF5 file?

although theoretically possible (see "Extending Boost.Persistent") I don't think I would advise to do so. that'd be like you decided you wanted to use MySQL to process some data, but instead of exporting the data when you're done you decide to write a MySQL storage backend that can write to your file format.

Ok. I won't go down this path then :) Regards Peder

Stefan Strasser

10:53 a.m.

Am Thursday 07 January 2010 11:14:46 schrieb Peder Holt:

...

...
...
undo/redo. Is this the case? And if so, can you give a short code example of how to use this?

there is "undo" by using transactions. however, it is not there to support undo/redo like you see in GUIs, if that's what you meant. once a transaction is committed, it is permanent.

...

Would you consider to add this to the library?

I'm not sure a storage library is the right place to implement a undo/redo queue itself. the saving of multiple version of an object isn't a problem, a similar technique is used already to isolate concurrent transactions. but you assume that there is a sequential history of transactions and "undo" represents going back a step in that history. but take e.g. a user that changes some data, then changes some preferences of the application, which are also stored in the database, and then requests a undo of his changes to the data. the preferences obviously should not be undone. and in other cases not only unrelated data like preferences is changed but data that potentially conflicts with the "undo" operation. for example: 1. transaction 1, caused by GUI user: obj->value=5; 2. transaction 2, caused by application without user request: obj2->value = obj1->value + 1; 3. user requests undo of transaction 1 the library would need to detect this conflict and deny the undo. to accomplish that, a lot more information has to be logged than currently is, that would not be used for any other use of the library. e.g. information about read accesses of a transaction (in the example, tx2 reading obj1 causes the conflict). so, I think this should be handled by user code. but the library can support that. e.g. changesets/patches could be introduced. after a transaction has committed, you could request a changeset from the transaction that does contain all information to undo the changes made by the transaction and check for conflicts. this would also open the door for other use cases than sequential undo/redo. limitations: the changesets need to be handled by user code (e.g. storing it in an undo queue), and e.g. saving the undo queue to disk would not be part of the atomic operation(commit) that created the changeset. so you could loose your undo changeset in a crash but the transaction is still in place. if that is acceptable to your use case and other use cases you can think of I'll at it to the "future development" page. I don't think it's an essential part of such a library and necessary for the initial version of it. but you seem to be experienced in implementing undo/redo so you're welcome to draft a public interface for this. the implementation would be pretty easy I think. the information that describes the changes made by a transaction is already available while a transaction is running.

...

...
currently, the size of the cache is only controlled by the number of objects that can be in it. (see Configuring Boost.Persistent )

This covers my needs. Is it feasible to implement support for setting the max_cache_size per type?

no you can't but I agree that this needs improvement. I'll probably add what I described in the last email by the name of "cache_factor" and support limiting the number of objects either per type or per group. by group I mean limiting the overall number of objects in the cache that are of type A, B or C. with each type having its own "group" by default this feature would be a superset of limiting per type. 7

Peder Holt

8 Jan 8 Jan

1:07 p.m.

<snip> if that is acceptable to your use case and other use cases you can think of

...

I'll at it to the "future development" page. I don't think it's an essential part of such a library and necessary for the initial version of it.

This is acceptable to my use case.

...

but you seem to be experienced in implementing undo/redo so you're welcome to draft a public interface for this. the implementation would be pretty easy I think. the information that describes the changes made by a transaction is already available while a transaction is running.

I'll look into it. Thanks for your time Peder

Stefan Strasser

3:32 p.m.

Am Friday 08 January 2010 14:07:46 schrieb Peder Holt:

...

<snip>

if that is acceptable to your use case and other use cases you can think of

...
I'll at it to the "future development" page. I don't think it's an essential part of such a library and necessary for the initial version of it.

This is acceptable to my use case.

what I wrote in my previous emails, that changesets could be checked for conflicts, is incorrect though. the user of such an interface would have to make sure that the data an undo changeset reverses has not been used for anything (that is not undone).

...

...
but you seem to be experienced in implementing undo/redo so you're welcome to draft a public interface for this. the implementation would be pretty easy I think. the information that describes the changes made by a transaction is already available while a transaction is running.

I'll look into it.

Thanks for your time

this is an interesting feature because some other stuff can be implemented on top of it, like database replication, by sending the changesets to a remote system and applying it there. or incremental backup. so thanks for pushing me towards that idea.

Matthias Troyer

10 Jan 10 Jan

4:03 p.m.

On 7 Jan 2010, at 02:14, Peder Holt wrote:

...

2010/1/5 Stefan Strasser <strasser@uni-bremen.de>

...
...
A final question: I have an application that saves analysis results for a given analysis as an HDF5 file. http://www.hdfgroup.org/HDF5/ This HDF5 file initially contains only input to the analysis. The HDF5 file is then fed to another program which modifies it and stores back results. 1. Is it possible to to make boost.persistent write its data to an HDF5 file?

although theoretically possible (see "Extending Boost.Persistent") I don't think I would advise to do so. that'd be like you decided you wanted to use MySQL to process some data, but instead of exporting the data when you're done you decide to write a MySQL storage backend that can write to your file format.

Ok. I won't go down this path then :)

I actually would have exactly the same need, so maybe we can discuss whether this might not be still be feasible, once the library is finished. Matthias

Stefan Strasser

4:30 p.m.

Am Sunday 10 January 2010 17:03:56 schrieb Matthias Troyer:

...

...
...
...
is then fed to another program which modifies it and stores back results. 1. Is it possible to to make boost.persistent write its data to an HDF5 file?

although theoretically possible (see "Extending Boost.Persistent") I don't think I would advise to do so. that'd be like you decided you wanted to use MySQL to process some data, but instead of exporting the data when you're done you decide to write a MySQL storage backend that can write to your file format.

Ok. I won't go down this path then :)

I actually would have exactly the same need, so maybe we can discuss whether this might not be still be feasible, once the library is finished.

the library already supports that, I just thought it didn't make sense to do it in this case. the file format used by the backend must be able to store some internal information though, like e.g. the transaction ID of the transaction that made the last change to an object. have a look at the concepts here: https://svn.boost.org/svn/boost/sandbox/persistent/libs/persistent/doc/html/... especially AtomicStorageEngine or StorageEngine, depending on what kind of transaction guarantees your backend can give, and let me know if these concepts are sufficient to support your storage backend. they are obviously the results of implementing my storage backend, so even though I tried to make them as generic as possible there might be some shortcomings.

Stefan Strasser

11 Jan 11 Jan

8:54 p.m.

Am Thursday 07 January 2010 11:14:46 schrieb Peder Holt:

...

Is it feasible to implement support for setting the max_cache_size per type?

could you please explain your use case and why the "cache_factor" feature is not sufficient for it? this intuitively seemed useful to me but now I can't think of reason why I'd want to limit the cache size per type, if I can already assign a "weight" to each object in the cache. on second thought it also isn't so feasible to implement. the objects in the cache are stored in no particular order, the sweep algorithm only groups them logarithmically by time of last access. so there is no easy way to remove an object of a particular type. seperate lists would have to be maintained.

Peder Holt

12 Jan 12 Jan

8:51 a.m.

2010/1/11 Stefan Strasser <strasser@uni-bremen.de>

...

Am Thursday 07 January 2010 11:14:46 schrieb Peder Holt:

...
Is it feasible to implement support for setting the max_cache_size per type?

could you please explain your use case and why the "cache_factor" feature is not sufficient for it?

Hmm. Good question. It think it is an example of premature optimization on my side. My main concern is to reduce the memory consumption. My secondary concern is to do this without loosing too much performance. The reason for introducing a max_cache_size per type is that this was the way I originally thought to attack the problem, before reading about your library. My guess now is that a global max_cache_size would probably be sufficient, and until performance profiling on a real world problem has shown something else, you can ignore my request for a max_cache_size per type. Also, as you say, you already have a way of assigning a weight to each object, which I will try first if performance is suffering. Regards Peder

...

this intuitively seemed useful to me but now I can't think of reason why I'd want to limit the cache size per type, if I can already assign a "weight" to each object in the cache. on second thought it also isn't so feasible to implement. the objects in the cache are stored in no particular order, the sweep algorithm only groups them logarithmically by time of last access. so there is no easy way to remove an object of a particular type. seperate lists would have to be maintained.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

vicente.botet

5 Jan 5 Jan

9:44 p.m.

Hi Stefan, ----- Original Message ----- From: "Stefan Strasser" <strasser@uni-bremen.de> To: <boost@lists.boost.org> Sent: Sunday, January 03, 2010 5:05 AM Subject: [boost] [persistent] Persistent library preview

...

Here's the current documentation of (Boost?).Persistent:

https://svn.boost.org/svn/boost/sandbox/persistent/libs/persistent/doc/html/...

In short, it is a library that provides access to objects persistent on disk with an interface as close as possible to accessing regular objects in memory, and it introduces transactions to e.g. ensure consistency after a crash. See "Introduction" for more information.

<snip>

...

Code will follow as soon as some issues are sorted out, especially porting the (few) platform dependent parts to windows.

No need to say that I'm very interested on your library, you know already. I have added your library to the Boost Library Under Construction page http://svn.boost.org/trac/boost/wiki/LibrariesUnderConstruction#Boost.Persis.... Let me know if you want I change something in. I am looking forward to see the code corresponding to the actual documentation. Best, Vicente

5675

Age (days ago)

5684

Last active (days ago)

List overview

Download

11 comments

4 participants

participants (4)

Matthias Troyer
Peder Holt
Stefan Strasser
vicente.botet