[flyweight] Review period extended to February 3

Hi to all, We've received 4 positive reviews and several comments, but we still want more ;-) Some people expressed interested in the pre-review period and I want to give some time (weekend is the right time to review a Boost library...) to those who are interested in the library and have not found time to review it. The last review day is February 3. Common, the library is very lightweight ;-) to review, it's pretty well documented, it's certainly small and very general purpose. Here is the information. Last call to passengers! *Description:* Flyweights are small-sized handle classes granting constant access to shared common data, thus allowing for the management of large amounts of entities within reasonable memory limits. Boost.Flyweight makes it easy to use this common programming idiom by providing the class template flyweight<T>, which acts as a drop-in replacement for const T. *Online docs:* http://tinyurl.com/2sstyr http://svn.boost.org/svn/boost/sandbox/flyweight/libs/flyweight/index.html *Download:* http://tinyurl.com/hrdm6 http://www.boost-consulting.com/vault/index.php?&direction=0&order=&directory=Patterns *Notes:* 1) We've seen some suggestions in the mailing list for Flyweight. Joaquín has nicely explained a couple of issues that we'd like to address/discuss in the review: http://tinyurl.com/33ghtf http://svn.boost.org/svn/boost/sandbox/flyweight/libs/flyweight/doc/review_n... 2) Flyweight needs Boost 1.35 elements because the library depends on libraries like Interprocess for some features/tests. Since SVN snapshot tarballs seem to be missing these days, those who want to try flyweight can download a working SVN-HEAD snapshot here: http://igaztanaga.drivehq.com/boost_trunk.tar.bz2 3) Serialization tests won't work. This feature is expected to work when some new features (discussed in the mailing list between Joaquín and Robert Ramey) are added in Boost.Serialization. Those are expected for Boost 1.36. What to include in Review Comments ================================== Your comments may be brief or lengthy, but basically the Review Manager needs your evaluation of the library. If you identify problems along the way, please note if they are minor, serious, or showstoppers. Here are some questions you might want to answer in your review: * What is your evaluation of the design? * What is your evaluation of the implementation? * What is your evaluation of the documentation? * What is your evaluation of the potential usefulness of the library? * Did you try to use the library? With what compiler? Did you have any problems? * How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? * Are you knowledgeable about the problem domain? And finally, every review should answer this question: * Do you think the library should be accepted as a Boost library? Be sure to say this explicitly so that your other comments don't obscure your overall opinion. Ion Gaztañaga - Review Manager -

Hi Joaquin, What is the magic number 16384 in the intermodule_holder intantiator struct intermodule_holder_class contains the following initialization instantiator(): mutex(interprocess::open_or_create,compute_mutex_name()), seg(interprocess::open_or_create,compute_segment_name(),16384), Should this be configurable? Are the compute_mutex_name and compute_segment_name friendly for debuging? I think that we will need here some introspection in order obtain these values, or better yet to be able to configure them. Regards --------------------------- Vicente Juan Botet Escriba

"vicente.botet" ha escrito:
Hi Joaquin,
What is the magic number 16384 in the intermodule_holder intantiator struct intermodule_holder_class contains the following initialization instantiator(): mutex(interprocess::open_or_create,compute_mutex_name()), seg(interprocess::open_or_create,compute_segment_name(),16384),
intermodule_holder<T> creates a shared memory segment which is (intendedly) unique to the combination (process,T), and it is used to share information on the static data initialization process among the different dynamic modules in the program. 16 KB is just a conventional size requested for the memory segment. Actually, this segment only contains a pointer, but it's necessary that the segment be larger than merely sizeof(void*) to accommodate object names and other Boost.Interprocess internal information. Maybe 16KB is a little too much and we could do with 2 or 4 KB, but I didn't think it was worth being avaricious here.
Should this be configurable?
I don't think so, since the memory segment, as explained above, only holds data of fixed size.
Are the compute_mutex_name and compute_segment_name friendly for debuging? I think that we will need here some introspection in order obtain these values,or better yet to be able to configure them.
The mutex_name (segment_name is similar) has the form boost_flyweight_intermodule_holder_mutex_A_C0_C1_C2_C3 where A is the current process ID and C0,...,C3 are numbers obtained from hashing typeid(T).name(), T being the type in intermodule_holder<T>. The intention is that this name is unique to the combination (process,T). I don't think it should be configurable, since it's an internal detail and also because the particular name chosen is crucial to ensuring that the segment is unique to (process,T) (we're using a process-wide segment for intraprocess purposes.) Best, Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

"Joaquín Mª López Muñoz" wrotes
"vicente.botet" ha escrito:
Hi Joaquin,
What is the magic number 16384 in the intermodule_holder intantiator struct intermodule_holder_class contains the following initialization instantiator(): mutex(interprocess::open_or_create,compute_mutex_name()), seg(interprocess::open_or_create,compute_segment_name(),16384),
intermodule_holder<T> creates a shared memory segment which is (intendedly) unique to the combination (process,T), and it is used to share information on the static data initialization process among the different dynamic modules in the program.
16 KB is just a conventional size requested for the memory segment. Actually, this segment only contains a pointer, but it's necessary that the segment be larger than merely sizeof(void*) to accommodate object names and other Boost.Interprocess internal information. Maybe 16KB is a little too much and we could do with 2 or 4 KB, but I didn't think it was worth being avaricious here.
OK, I understand. [snip]
Are the compute_mutex_name and compute_segment_name friendly for debuging? I think that we will need here some introspection in order obtain these values,or better yet to be able to configure them.
The mutex_name (segment_name is similar) has the form
boost_flyweight_intermodule_holder_mutex_A_C0_C1_C2_C3
where A is the current process ID and C0,...,C3 are numbers obtained from hashing typeid(T).name(), T being the type in intermodule_holder<T>. The intention is that this name is unique to the combination (process,T). I don't think it should be configurable, since it's an internal detail and also because the particular name chosen is crucial to ensuring that the segment is unique to (process,T) (we're using a process-wide segment for intraprocess purposes.)
Sorry, I was yet thinking that the factory was allocated in this space. We don't need never to access to these objects by name. Forget please. --------------------------- Vicente Juan Botet Escriba

Hi, I was wondering if as the intermodule_holder is a kind of workaround for platforms where dynamic loaded libraries can duplicate static libraries, we can have a better performance on the others for free. On platforms on which static variables are unique inter dynamic loaded modules the intermodule_holder specifier can be a static_holder. This could be controlled using conditional compilation. struct intermodule_holder:holder_marker { template<typename C> struct apply { #ifdef BOOST_NO_DLL_UNIQUE_STATIC_VARIABLE typedef intermodule_holder_class<C> type; #else typedef static_holder_class<C> type; #endif }; }; Have this a sens or I'm missing something? Could the documentation add a reference to the duplication problem? In addition, I'd replace static_holder by intramodule_holder. intramodule_holder reflects much more the accessibility intent for the users, static_holder talks more about the implementation. Regards Vicente --------------------------- Vicente Juan Botet Escriba ----- Original Message ----- From: "Ion Gaztañaga" <igaztanaga@gmail.com> To: <boost@lists.boost.org>; "Boost User List" <boost-users@lists.boost.org>; <boost-announce@lists.boost.org> Sent: Tuesday, January 29, 2008 9:58 PM Subject: [boost] [flyweight] Review period extended to February 3 Hi to all, We've received 4 positive reviews and several comments, but we still want more ;-) Some people expressed interested in the pre-review period and I want to give some time (weekend is the right time to review a Boost library...) to those who are interested in the library and have not found time to review it. The last review day is February 3. Common, the library is very lightweight ;-) to review, it's pretty well documented, it's certainly small and very general purpose. Here is the information. Last call to passengers! *Description:* Flyweights are small-sized handle classes granting constant access to shared common data, thus allowing for the management of large amounts of entities within reasonable memory limits. Boost.Flyweight makes it easy to use this common programming idiom by providing the class template flyweight<T>, which acts as a drop-in replacement for const T. *Online docs:* http://tinyurl.com/2sstyr http://svn.boost.org/svn/boost/sandbox/flyweight/libs/flyweight/index.html *Download:* http://tinyurl.com/hrdm6 http://www.boost-consulting.com/vault/index.php?&direction=0&order=&directory=Patterns *Notes:* 1) We've seen some suggestions in the mailing list for Flyweight. Joaquín has nicely explained a couple of issues that we'd like to address/discuss in the review: http://tinyurl.com/33ghtf http://svn.boost.org/svn/boost/sandbox/flyweight/libs/flyweight/doc/review_n... 2) Flyweight needs Boost 1.35 elements because the library depends on libraries like Interprocess for some features/tests. Since SVN snapshot tarballs seem to be missing these days, those who want to try flyweight can download a working SVN-HEAD snapshot here: http://igaztanaga.drivehq.com/boost_trunk.tar.bz2 3) Serialization tests won't work. This feature is expected to work when some new features (discussed in the mailing list between Joaquín and Robert Ramey) are added in Boost.Serialization. Those are expected for Boost 1.36. What to include in Review Comments ================================== Your comments may be brief or lengthy, but basically the Review Manager needs your evaluation of the library. If you identify problems along the way, please note if they are minor, serious, or showstoppers. Here are some questions you might want to answer in your review: * What is your evaluation of the design? * What is your evaluation of the implementation? * What is your evaluation of the documentation? * What is your evaluation of the potential usefulness of the library? * Did you try to use the library? With what compiler? Did you have any problems? * How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? * Are you knowledgeable about the problem domain? And finally, every review should answer this question: * Do you think the library should be accepted as a Boost library? Be sure to say this explicitly so that your other comments don't obscure your overall opinion. Ion Gaztañaga - Review Manager - _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

"vicente.botet" ha escrito:
Hi,
I was wondering if as the intermodule_holder is a kind of workaround for platforms where dynamic loaded libraries can duplicate static libraries, we can have a better performance on the others for free.
On platforms on which static variables are unique inter dynamic loaded modules the intermodule_holder specifier can be a static_holder. This could be controlled using conditional compilation.
struct intermodule_holder:holder_marker { template<typename C> struct apply { #ifdef BOOST_NO_DLL_UNIQUE_STATIC_VARIABLE typedef intermodule_holder_class<C> type; #else typedef static_holder_class<C> type; #endif }; };
Have this a sens or I'm missing something? Could the documentation add a reference to the duplication problem?
This makes sense, but I lack expertise on this area, and don't really know which platforms or compilers avoid symbol duplication. If someone can help here I'll be happy to add this optimization.
In addition, I'd replace static_holder by intramodule_holder. intramodule_holder reflects much more the accessibility intent for the users, static_holder talks more about the implementation.
Do you mean that intermodule_holder should be the default rather than static_holder? There are reasons not to do that, as I explained to John Reid at: http://lists.boost.org/Archives/boost/2008/01/132814.php Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

"Joaquín Mª López Muñoz" wrotes "vicente.botet" ha escrito: [snip]
In addition, I'd replace static_holder by intramodule_holder. intramodule_holder reflects much more the accessibility intent for the users, static_holder talks more about the implementation.
Do you mean that intermodule_holder should be the default rather than static_holder? There are reasons not to do that, as I explained to John Reid at:
Non, this was only a renaming sugestion. NOT INTERmodule_holder BUT INTRAmodule_holder. I'd RENAME static_holder by intramodule_holder. intramodule_holder reflects much more the accessibility intent for the users, static_holder talks more about the implementation. So at the end there will be two holders: . intramodule_holder (renaming of static_holder) and . intermodule_holder Regards --------------------------- Vicente Juan Botet Escriba

Hello Steven,
"Steven Watanabe" wrotes
vicente.botet wrote:
So at the end there will be two holders: . intramodule_holder (renaming of static_holder) and . intermodule_holder
Those names are much too similar.
You are right. Maybe module_holder and process_holder? Regards Vicente Botet _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

----- Mensaje original ----- De: "vicente.botet" <vicente.botet@wanadoo.fr> Fecha: Sábado, Febrero 2, 2008 0:24 am Asunto: Re: [boost] [flyweight] Review period extended to February 3 Para: boost@lists.boost.org
Hello Steven,
"Steven Watanabe" wrotes
vicente.botet wrote:
So at the end there will be two holders: . intramodule_holder (renaming of static_holder) and . intermodule_holder
Those names are much too similar.
You are right. Maybe module_holder and process_holder?
I don't see any problem with static_holder, but if we don't want to stress the internal mechanism on which it is based maybe we can name it simple_holder, so we'd have simple_holder and intermodule_holder? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Ion Gaztañaga ha scritto:
* What is your evaluation of the design?
The overall design looks very good. However, IMHO it doesn't fully grasp the essence of the flyweight pattern because of a major flaw. The flaw is the fact that an object needs to be created in order to check whether an equivalent flyweight exists or not. Consider this code: typedef boost::flyweight<std::string> fw; fw s1("foo"); fw s2("foo"); the creation of s1 requires one call to the string constructor, two calls to the string copy constructor and two calls to the string destructor. If we could use rvalue references and perfect forwarding, we could avoid one or maybe even two of the three temporaries. However insertion of a new flyweight should occur rarely in typical use cases, so it doesn't matter much. The problem is when constructing s2, a case which is going to occur much more often. To construct s2 the current implementation calls the string constructor once, the string copy constructor once and the string destructor twice. Even in this case, we may in the future avoid one temporary, but not the other. We are actually creating a full-blown object only to throw it away! That is a waste which can be get worse if instead of one std::string you had an UDT which is expensive to construct. Notice that the latter is precisely the scenario where the flyweight pattern is most useful! The GoF book describes a flyweight pattern where each flyweight can be identified through some key object that is supposed to be cheaper to create and manage than the final object. In other words, the GoF pattern acts like an std::map, whereas this proposal assumes that the final object acts as the key, much like an std::set. I agree that the proposed approach may be simpler to explain and to work with and also provide an almost drop-in replacement of the original type, but IMHO the map approach is far more useful, especially for "heavy" types and for polymorphic types (whose construction usually requires a potentially expensive dynamic allocation).
* What is your evaluation of the implementation?
The implementation seems well done. The use of the "named parameters" idiom is very interesting. However, I strongly disagree with the choice of the term "factory" for a component that actually only acts as a storage. In GoF the term factory is properly used for a component which is devoted to construct the final object given its identifying key. But in this proposal it's the user code that actually constructs the object, so the term is used incorrectly and is misleading.
* What is your evaluation of the documentation?
The documentation looks very well written, with lot of examples. The only sections that I found a bit lacking are the ones about the policies. The policies are indeed properly described, but the bare description is not very helpful in showing how to write a new policy. In particular I couldn't understand how to write a Tracking policy until I actually saw the code of flyweights::refcounted.
* What is your evaluation of the potential usefulness of the library?
The library as it is, is not very useful, although it might be potentially very useful if the flaw I described before could be addressed.
* Did you try to use the library? With what compiler?
Yes, I used it with gcc 3.4.5 (mingw).
Did you have any problems?
No, once I realized that you needed Boost 1.35 even if you don't use any feature related with Boost.Interprocess.
* How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
A couple of hours.
* Are you knowledgeable about the problem domain?
Yes.
And finally, every review should answer this question:
* Do you think the library should be accepted as a Boost library? Be sure to say this explicitly so that your other comments don't obscure your overall opinion.
Reluctantly, the answer is no. I believe the library should not be accepted in Boost as it is. However, it's apparent that great and good work has been put into it and it has a great potential. Therefore I encourage the maintainer to consider addressing the issues I raised here, in which case I am ready to change my opinion. Ganesh

Hi Alberto, thank you for your review! ----- Mensaje original ----- De: Alberto Ganesh Barbati <AlbertoBarbati@libero.it> Fecha: Sábado, Febrero 2, 2008 0:38 am Asunto: Re: [boost] [flyweight] Review period extended to February 3 Para: boost@lists.boost.org
Ion Gaztañaga ha scritto:
* What is your evaluation of the design?
The overall design looks very good. However, IMHO it doesn't fully grasp the essence of the flyweight pattern because of a major flaw. The flaw is the fact that an object needs to be created in order to check whether an equivalent flyweight exists or not. Consider this code:
typedef boost::flyweight<std::string> fw;
fw s1("foo"); fw s2("foo");
the creation of s1 requires one call to the string constructor, two calls to the string copy constructor and two calls to the string destructor. If we could use rvalue references and perfect forwarding, we could avoid one or maybe even two of the three temporaries. However insertion of a new flyweight should occur rarely in typical use cases, so it doesn't matter much. The problem is when constructing s2, a case which is going to occur much more often. To construct s2 the current implementation calls the string constructor once, the string copy constructor once and the string destructor twice. Even in this case, we may in the future avoid one temporary, but not the other. We are actually creating a full-blown object only to throw it away! That is a waste which can be get worse if instead of one std::string you had an UDT which is expensive to construct. Notice that the latter is precisely the scenario where the flyweight pattern is most useful!
The GoF book describes a flyweight pattern where each flyweight can be identified through some key object that is supposed to be cheaper to create and manage than the final object. In other words, the GoF pattern acts like an std::map, whereas this proposal assumes that the final object acts as the key, much like an std::set. I agree that the proposed approach may be simpler to explain and to work with and also provide an almost drop-in replacement of the original type, but IMHO the map approach is far more useful, especially for "heavy" types and for polymorphic types (whose construction usually requires a potentially expensive dynamic allocation).
I'm glad you've raised an important issue here. As it happens, during the design of the library I devoted some thinking to the map-vs-set problem (if you allow me to call it that way) and I opted for the current design for the following reasons: 1. You're right that instantiating a flyweight<T> will always create a temporary T (and hopefully just one when we get rvalue refs and variadic templates, but also see point 3 below). But the purpose of the library is *not* to avoid constructing expensive objects (although that would be a nice secondary goal), it is to reduce memory consumption through object sharing. Collateral benefits include increased performance in object passing around. Consider the following "heavyweight" equivalent of your snippet above: typedef std::string str; str s1("foo"); str s2("foo"); Introducing the flyweight library here does not avoid constructing s2, but it results in a final situation of lower memory usage. Construction of a flyweight<T> object is marginally more expensive than constructing a T because of the extra factory lookup, although this is expected to be compensated by the fact that flywewight<T>s are subsequently cheaper to pass around than Ts. This effect can be observed running the performance example at http://svn.boost.org/svn/boost/sandbox/flyweight/ libs/flyweight/doc/examples.html#example5 (or see http://lists.boost.org/boost-users/2008/01/33630.php) So, my conclusion is that flyweightizing a type results in actual benefits, although these are not related to the avoidance of heavy object constructions (copy construction aside). 2. Indeed GoF introduces a key type K into the pattern that is used to retrieve the actual values of T. So, we have a one-to-one relation K-->T, i.e. there exists a stateless function f of the form T f(const K&); that can be used to construct a T from a given K. And, additionally, K is cheaper to construct than T. This is the map approach, right? My question now is: is this a realistic scenario? If K is actually cheaper to construct than T and we can univocally get the associated T from any K, why work with Ts and not just use Ks in the first place? The only plaussible justifications I can think of is that f() is computationally expensive or that T is more convenient to work with than K, but these seem (to me) not so likely concerns --more on the second concern on point 3 below, though. Note that I explicitly observed that f must be *stateless*, i.e. a K object contains exactly the same information as its associated T value. This is not the case in most usages of std::map, which is a reason why std::maps are useful :) I am not plainly denying the existence of sensible K-->T scenarios, but I thought long and hard and couldn't find any. If you can come up with one I'll be happy to know. So, my analysis led me to conclude that the right approach is to assume that K==T, that is, the set approach, or at most than K and T are just different representations of the same information. 3. That said, there is a natural evolution for the library that would eliminate T construction under some circumstances: C++0x containers will provide the emplace() member function allowing for in-place construction of an inserted elements given the ctor arguments, so eliminating the need to construct a temporary to pass to insert() and related memfuns. When this is available, the flyweight lib will take advantage of it, so that fw s2("foo"); will create no temporary std::strings. In the terminology of point 2, we have K==const char*, T==std::string. You can argue that this seems to contradict the thesis of point 2 (there are no sensible usa cases where K!=T), but note that this situation (constructing a std::string from a const char*) is only marginally useful: in a real program most std::strings do not get constructed from compile time literals, but rather they are computed dynamically, so the optimizations we can introduce here will not dominate the overall performance of the program.
* What is your evaluation of the implementation?
The implementation seems well done. The use of the "named parameters" idiom is very interesting.
However, I strongly disagree with the choice of the term "factory" for a component that actually only acts as a storage. In GoF the term factory is properly used for a component which is devoted to construct the final object given its identifying key. But in this proposal it's the user code that actually constructs the object, so the term is used incorrectly and is misleading.
I decided to keep "factory" because this component is obviously related to the element of the same name in GoF and oher descriptions of the pattern. Incidentally, when point 3 above gets implemented the factory will act more like a factory in the sense you describe. That said, I'll be happy to use whatever other terminology reviewers agree upon.
* What is your evaluation of the documentation?
The documentation looks very well written, with lot of examples. The only sections that I found a bit lacking are the ones about the policies. The policies are indeed properly described, but the bare description is not very helpful in showing how to write a new policy. In particular I couldn't understand how to write a Tracking policy until I actually saw the code of flyweights::refcounted.
Note taken, I'll try to refine that bit. It's no surprise that you get the most difficulties in understanding this aspect, because it is arguably the most convoluted.
* What is your evaluation of the potential usefulness of the library? The library as it is, is not very useful, although it might be potentially very useful if the flaw I described before could be addressed.
I hope you might concede that the lib is useful in the scenarios described at point 1 above, even if these are not the modes of usage you envisioned in the frist place. Looking forward to your discussion of the arguments I presented here. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Hi Joaquin On Feb 2, 2008 12:57 PM, "JOAQUIN LOPEZ MU?Z" <joaquin@tid.es> wrote:
2. Indeed GoF introduces a key type K into the pattern that is used to retrieve the actual values of T. So, we have a one-to-one relation K-->T, i.e. there exists a stateless function f of the form
T f(const K&);
that can be used to construct a T from a given K. And, additionally, K is cheaper to construct than T. This is the map approach, right? My question now is: is this a realistic scenario? If K is actually cheaper to construct than T and we can univocally get the associated T from any K, why work with Ts and not just use Ks in the first place? The only plaussible justifications I can think of is that f() is computationally expensive or that T is more convenient to work with than K, but these seem (to me) not so likely concerns --more on the second concern on point 3 below, though. Note that I explicitly observed that f must be *stateless*, i.e. a K object contains exactly the same information as its associated T value. This is not the case in most usages of std::map, which is a reason why std::maps are useful :) I am not plainly denying the existence of sensible K-->T scenarios, but I thought long and hard and couldn't find any. If you can come up with one I'll be happy to know. So, my analysis led me to conclude that the right approach is to assume that K==T, that is, the set approach, or at most than K and T are just different representations of the same information.
One example for all: K = const char * T = std::string I have seen many of those examples in which an object is actually built from something simpler (in all the cases f will be a one-argument, non-explicit constructor T(K) ). What about defining a trait, construction_arg<T>::type that can give K in such cases, and T if not specified? It could be generally useful also in other cases, expecially in the transition phase, in which we still lack r-value references. Corrado -- __________________________________________________________________________ dott. Corrado Zoccolo mailto:zoccolo@di.unipi.it PhD - Department of Computer Science - University of Pisa, Italy -------------------------------------------------------------------------- The self-confidence of a warrior is not the self-confidence of the average man. The average man seeks certainty in the eyes of the onlooker and calls that self-confidence. The warrior seeks impeccability in his own eyes and calls that humbleness. Tales of Power - C. Castaneda

Sorry, I hit the send button before finishing. On Feb 2, 2008 1:51 PM, Corrado Zoccolo <czoccolo@gmail.com> wrote:
Hi Joaquin
On Feb 2, 2008 12:57 PM, "JOAQUIN LOPEZ MU?Z" <joaquin@tid.es> wrote:
2. Indeed GoF introduces a key type K into the pattern that is used to retrieve the actual values of T. So, we have a one-to-one relation K-->T, i.e. there exists a stateless function f of the form
T f(const K&);
that can be used to construct a T from a given K. And, additionally, K is cheaper to construct than T. This is the map approach, right? My question now is: is this a realistic scenario? If K is actually cheaper to construct than T and we can univocally get the associated T from any K, why work with Ts and not just use Ks in the first place? The only plaussible justifications I can think of is that f() is computationally expensive or that T is more convenient to work with than K, but these seem (to me) not so likely concerns --more on the second concern on point 3 below, though. Note that I explicitly observed that f must be *stateless*, i.e. a K object contains exactly the same information as its associated T value. This is not the case in most usages of std::map, which is a reason why std::maps are useful :) I am not plainly denying the existence of sensible K-->T scenarios, but I thought long and hard and couldn't find any. If you can come up with one I'll be happy to know. So, my analysis led me to conclude that the right approach is to assume that K==T, that is, the set approach, or at most than K and T are just different representations of the same information.
One example for all: K = const char * T = std::string
I have seen many of those examples in which an object is actually built from something simpler (in all the cases f will be a one-argument, non-explicit constructor T(K) ).
What about defining a trait, construction_arg<T>::type that can give K in such cases, and T if not specified? It could be generally useful also in other cases, expecially in the transition phase, in which we still lack r-value references.
Clearly, you don't want to store the Ks in the container, exactly for the same reason you want to use T in your code, so you should still have something like a set<T> as a container, and have some way to look for a K inside it without creating a T. For example, for std::string we have overloaded operator< that also accept (const char *). Something equivalent could be obtained maybe with a custom comparator? Corrado
-- __________________________________________________________________________
dott. Corrado Zoccolo mailto:zoccolo@di.unipi.it PhD - Department of Computer Science - University of Pisa, Italy -------------------------------------------------------------------------- The self-confidence of a warrior is not the self-confidence of the average man. The average man seeks certainty in the eyes of the onlooker and calls that self-confidence. The warrior seeks impeccability in his own eyes and calls that humbleness. Tales of Power - C. Castaneda
-- __________________________________________________________________________ dott. Corrado Zoccolo mailto:zoccolo@di.unipi.it PhD - Department of Computer Science - University of Pisa, Italy -------------------------------------------------------------------------- The self-confidence of a warrior is not the self-confidence of the average man. The average man seeks certainty in the eyes of the onlooker and calls that self-confidence. The warrior seeks impeccability in his own eyes and calls that humbleness. Tales of Power - C. Castaneda

----- Mensaje original ----- De: Corrado Zoccolo <czoccolo@gmail.com> Fecha: Sábado, Febrero 2, 2008 2:05 pm Asunto: Re: [boost] [flyweight] Review period extended to February 3 Para: boost@lists.boost.org
One example for all: K = const char * T = std::string
I have seen many of those examples in which an object is actually built from something simpler (in all the cases f will be a one-argument,non-explicit constructor T(K) ).
What about defining a trait, construction_arg<T>::type that can give K in such cases, and T if not specified? It could be generally useful also in other cases, expecially in the transition phase, in which we still lack r-value references.
Clearly, you don't want to store the Ks in the container, exactly for the same reason you want to use T in your code, so you should still havesomething like a set<T> as a container, and have some way to look for a K inside it without creating a T.
For example, for std::string we have overloaded operator< that also accept (const char *). Something equivalent could be obtained maybe with a custom comparator?
I'd need more time to articulate this into a coherent framewoek, but at first blush the idea seems to get along with a potential extension of the Factory concept, the LookupFactory, described at http://lists.boost.org/Archives/boost/2008/01/132798.php LookupFactories are factories that, additionally to the insert memfun, provide a find memfun. Looks like this find could be overloaded so as to accept arguments of type K. I'm putting this into the list of things to consider. In any case, seems like the evolution of the library clearly points towards supporting different factory concepts providing extensions on the basic concept. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

JOAQUIN LOPEZ MU?Z ha scritto:
I'm glad you've raised an important issue here. As it happens, during the design of the library I devoted some thinking to the map-vs-set problem (if you allow me to call it that way) and I opted for the current design for the following reasons:
<snip>
2. Indeed GoF introduces a key type K into the pattern that is used to retrieve the actual values of T. So, we have a one-to-one relation K-->T, i.e. there exists a stateless function f of the form
T f(const K&);
that can be used to construct a T from a given K. And, additionally, K is cheaper to construct than T. This is the map approach, right? My question now is: is this a realistic scenario? If K is actually cheaper to construct than T and we can univocally get the associated T from any K, why work with Ts and not just use Ks in the first place? The only plaussible justifications I can think of is that f() is computationally expensive or that T is more convenient to work with than K, but these seem (to me) not so likely concerns --more on the second concern on point 3 below, though. Note that I explicitly observed that f must be *stateless*, i.e. a K object contains exactly the same information as its associated T value. This is not the case in most usages of std::map, which is a reason why std::maps are useful :) I am not plainly denying the existence of sensible K-->T scenarios, but I thought long and hard and couldn't find any. If you can come up with one I'll be happy to know. So, my analysis led me to conclude that the right approach is to assume that K==T, that is, the set approach, or at most than K and T are just different representations of the same information.
One such scenario is right there in the GoF book, the word-processing application that uses one flyweight object for each glyph in the document. I have another case in an project I am working on: in a 3D application, I have heavyweight 3D mesh objects that might be shared among several actors. The key of a 3D mesh is just its filename. I don't want to load an entire mesh into memory just to find out that I have it already. Yes, I could delay the actual loading of the mesh until the first time it is actually used, but that would be impractical for at least two reasons: 1) any error encountered while loading the mesh would occur in the wrong place and couldn't be handled properly, 2) the place where the mesh is used is inside a tight rendering loop with strict real-time requirements which can't be blocked by costly I/O operations.
3. That said, there is a natural evolution for the library that would eliminate T construction under some circumstances: C++0x containers will provide the emplace() member function allowing for in-place construction of an inserted elements given the ctor arguments, so eliminating the need to construct a temporary to pass to insert() and related memfuns. When this is available, the flyweight lib will take advantage of it, so that
fw s2("foo");
will create no temporary std::strings.
Actually it will always create one temporary. If the string is not already present in the factory, the temporary is then moved into the factory and so it is not created in vain. However, in the most common case, the string is already present in the factor and the temporary needs to be discarded. The main problem is that in order to avoid the creation of the temporary you would need to be able to compare keys with objects directly, while the current design requires constructing an object in order to compare it with the existing objects. In this sense, I believe the find/insert approach might help addressing the issue.
will create no temporary std::strings. In the terminology of point 2, we have K==const char*, T==std::string. You can argue that this seems to contradict the thesis of point 2 (there are no sensible usa cases where K!=T), but note that this situation (constructing a std::string from a const char*) is only marginally useful: in a real program most std::strings do not get constructed from compile time literals, but rather they are computed dynamically, so the optimizations we can introduce here will not dominate the overall performance of the program.
I may agree with that, but you should understand that strings are not the most general case. In the GoF example, glyphs are created from simple characters. The fact that they come from literals or are computed dynamically does not make a big difference, they're just characters. In such a scenario your framework couldn't avoid creating an unnecessary temporary glyph.
* What is your evaluation of the implementation? The implementation seems well done. The use of the "named parameters" idiom is very interesting.
However, I strongly disagree with the choice of the term "factory" for a component that actually only acts as a storage. In GoF the term factory is properly used for a component which is devoted to construct the final object given its identifying key. But in this proposal it's the user code that actually constructs the object, so the term is used incorrectly and is misleading.
I decided to keep "factory" because this component is obviously related to the element of the same name in GoF and oher descriptions of the pattern. Incidentally, when point 3 above gets implemented the factory will act more like a factory in the sense you describe. That said, I'll be happy to use whatever other terminology reviewers agree upon.
I would keep the "factory" term if you are considering the find/insert approach as an alternative to the current insert-only approach. With that approach case, find() would be given a key, rather than an object. If the object is not found, insert() would have to create an object out of the information stored in the key and that would actually be what factories are expected to do.
* What is your evaluation of the potential usefulness of the library? The library as it is, is not very useful, although it might be potentially very useful if the flaw I described before could be addressed.
I hope you might concede that the lib is useful in the scenarios described at point 1 above, even if these are not the modes of usage you envisioned in the frist place. Looking forward to your discussion of the arguments I presented here.
Maybe I was a little to harsh. I concede that. Having reduced memory usage and manipulating heavy-weight objects through lighter-weight handles is certainly an advantage over a naive approach. This advantage can be felt considerably once the flyweight "pool" is established and all that is left is to manipulate the objects. However, there are cases, for example the two cases I presented in this post, where the construction of the actual flyweight objects has a non negligible cost: in the GoF word-processing case just because of the very large number of glyphs, in the 3D mesh case because of the very high cost of I/O operations. In those cases, the current design makes the task of establishing the flyweight "pool" less efficient than hand-coded solutions based on maps. As in my projects I am very concerned with start-up times I view the proposed library design as mere syntactic sugar and I would definitely prefer the hand-coded solution. Just my opinion, Ganesh

----- Mensaje original ----- De: Alberto Ganesh Barbati <AlbertoBarbati@libero.it> Fecha: Domingo, Febrero 3, 2008 12:39 pm Asunto: Re: [boost] [flyweight] Review period extended to February 3 Para: boost@lists.boost.org
JOAQUIN LOPEZ MU?Z ha scritto: [...]
I am not plainly denying the existence of sensible K-->T scenarios, but I thought long and hard and couldn't find any. If you can come up with one I'll be happy to know. So, my analysis led me to conclude that the right approach is to assume that K==T, that is, the set approach, or at most than K and T are just different representations of the same information.
One such scenario is right there in the GoF book, the word- processing application that uses one flyweight object for each glyph in the document. I have another case in an project I am working on: in a 3D application, I have heavyweight 3D mesh objects that might be shared among several actors. The key of a 3D mesh is just its filename. I don't want to load an entire mesh into memory just to find out that I have it already. Yes, I could delay the actual loading of the mesh until the first time it is actually used, but that would be impractical for at least two reasons: 1) any error encountered while loading the mesh would occur in the wrong place and couldn't be handled properly, 2) the place where the mesh is used is inside a tight rendering loop with strict real-time requirements which can't be blocked by costly I/O operations.
OK, I'd classify these two examples as scenarios where K and T contain esentially the same info but the translation function f() is computationally expensive. This is a valid context, just not the one I deem the most common. Maybe the particular case K!=T could be solved by a different class keyed_flyweight used like this (the example is similar to that of GoF you referred to): struct character { character(char c):c(c){} char c; }; int main() { // fw_t objects are constructed from chars but are convertible // to character. typedef keyed_flyweight<char,character> fw_t; // creates an internal character('a') fw_t x1('a'); // same shared value as x1, zero temporary character's fw_t x2('a'); // creates an internal character('b') fw_t x3('b'); } Is this approach more convincing to you? It turns out it can be implemented with exactly the same machinery as flyweight (i.e. holders, factories, locking and tracking policies), so it could be provided as a companion to classic flyweight<> if there is interest in it. The attached code shows the idea is viable, the prototype keyed_flyweight implemented there has all their aspects (holder, factory, etc.) fixed for simplicity, and lacks some boilerplate features, but it proves its point. The idea is that the factory keeps elements of type similar to entry = pair<K,T*> So, only when a new entry is created need we equip it with a new T, hence completely avoding T temporaries. This incurs a cost in the form of additional storage and a slightly more expensive creation process (when no duplicates are found), so we shouldn't take flyweight<T> as shorthand for keyed_flyweight<T,T>, both classes serve different scenarios and ought to be provided separately. Well, if you think this line of research is interesting I can pursue it.
3. That said, there is a natural evolution for the library that would eliminate T construction under some circumstances: C++0x containers will provide the emplace() member function allowing for in-place construction of an inserted elements given the ctor arguments, so eliminating the need to construct a temporary to pass to insert() and related memfuns. When this is available, the flyweight lib will take advantage of it, so that
fw s2("foo");
will create no temporary std::strings.
Actually it will always create one temporary. If the string is not already present in the factory, the temporary is then moved into the factory and so it is not created in vain. However, in the most common case, the string is already present in the factor and the temporary needs to be discarded.
Yes, you're right, one temporary is unavoidable. Sorry for my mistake. [...]
I decided to keep "factory" because this component is obviously related to the element of the same name in GoF and oher descriptions of the pattern. Incidentally, when point 3 above gets implemented the factory will act more like a factory in the sense you describe. That said, I'll be happy to use whatever other terminology reviewers agree upon.
I would keep the "factory" term if you are considering the find/insert approach as an alternative to the current insert-only approach. With that approach case, find() would be given a key, rather than an object. If the object is not found, insert() would have to create an object out of the information stored in the key and that would actually be what factories are expected to do.
Maybe keyed_flyweight is a better approch than find/insert, despite the increase in storage consumption: as they're designed, STL containers make it difficult to look up an entry with a type other than the entry itself (although Boost.MultiIndex has extensions to that effect.) I have to think this over more thoroughly, but seems like find/insert should be best exploited for read/write mutexes, while the K!=T issue is best handled by keyed_flyweight. Well, thank you for giving me plenty of food for thought :) Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

On Feb 3, 2008 6:20 PM, "JOAQUIN LOPEZ MU?Z" <joaquin@tid.es> wrote:
----- Mensaje original ----- De: Alberto Ganesh Barbati <AlbertoBarbati@libero.it> Fecha: Domingo, Febrero 3, 2008 12:39 pm Asunto: Re: [boost] [flyweight] Review period extended to February 3 Para: boost@lists.boost.org
JOAQUIN LOPEZ MU?Z ha scritto: [...]
I am not plainly denying the existence of sensible K-->T scenarios, but I thought long and hard and couldn't find any. If you can come up with one I'll be happy to know. So, my analysis led me to conclude that the right approach is to assume that K==T, that is, the set approach, or at most than K and T are just different representations of the same information.
One such scenario is right there in the GoF book, the word- processing application that uses one flyweight object for each glyph in the document. I have another case in an project I am working on: in a 3D application, I have heavyweight 3D mesh objects that might be shared among several actors. The key of a 3D mesh is just its filename. I don't want to load an entire mesh into memory just to find out that I have it already. Yes, I could delay the actual loading of the mesh until the first time it is actually used, but that would be impractical for at least two reasons: 1) any error encountered while loading the mesh would occur in the wrong place and couldn't be handled properly, 2) the place where the mesh is used is inside a tight rendering loop with strict real-time requirements which can't be blocked by costly I/O operations.
OK, I'd classify these two examples as scenarios where K and T contain esentially the same info but the translation function f() is computationally expensive. This is a valid context, just not the one I deem the most common.
Just to chime in here, that is also my use case, and I add textures (images) to that mix. I usually have a flyweight factory that holds large 3D meshes, and another one that holds textures. Lookup is done based on name, and construction requires loading a file from disk. FWIW, all of my flyweight implementations have been key based (the map/set problem you spoke of), but I often wished for a set-based implementation too. I think offering both would be the correct way forward. --Michael Fawcett

Michael Fawcett ha escrito:
On Feb 3, 2008 6:20 PM, "JOAQUIN LOPEZ MU?Z" <joaquin@tid.es> wrote:
----- Mensaje original ----- De: Alberto Ganesh Barbati <AlbertoBarbati@libero.it> Fecha: Domingo, Febrero 3, 2008 12:39 pm Asunto: Re: [boost] [flyweight] Review period extended to February 3 Para: boost@lists.boost.org
One such scenario is right there in the GoF book, the word- processing application that uses one flyweight object for each glyph in the document. I have another case in an project I am working on: in a 3D application, I have heavyweight 3D mesh objects that might be shared among several actors. The key of a 3D mesh is just its filename. I don't want to load an entire mesh into memory just to find out that I have it already. Yes, I could delay the actual loading of the mesh until the first time it is actually used, but that would be impractical for at least two reasons: 1) any error encountered while loading the mesh would occur in the wrong place and couldn't be handled properly, 2) the place where the mesh is used is inside a tight rendering loop with strict real-time requirements which can't be blocked by costly I/O operations.
OK, I'd classify these two examples as scenarios where K and T contain esentially the same info but the translation function f() is computationally expensive. This is a valid context, just not the one I deem the most common.
Just to chime in here, that is also my use case, and I add textures (images) to that mix. I usually have a flyweight factory that holds large 3D meshes, and another one that holds textures. Lookup is done based on name, and construction requires loading a file from disk.
FWIW, all of my flyweight implementations have been key based (the map/set problem you spoke of), but I often wished for a set-based implementation too. I think offering both would be the correct way forward.
Good to know there's a potential for the keyed variant. Would you support then the provision of regular flyweight and also keyed_flyweight as exposed in my prior post? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

On Feb 4, 2008 12:16 PM, Joaquín Mª López Muñoz <joaquin@tid.es> wrote:
Good to know there's a potential for the keyed variant. Would you support then the provision of regular flyweight and also keyed_flyweight as exposed in my prior post?
I support the notion of being able to lookup existing instances by key. However you implement it, I hope it will have an elegant syntax, and an extremely efficient implementation. Remember that this pattern is used often in games and 3D graphics, where extra copies are not an option, regardless of how pretty it makes the code look. Alberto put it nicely when he wrote: "I would keep the "factory" term if you are considering the find/insert approach as an alternative to the current insert-only approach. With that approach case, find() would be given a key, rather than an object. If the object is not found, insert() would have to create an object out of the information stored in the key and that would actually be what factories are expected to do." I intended on doing a review but regrettably don't have time to review this library. Good luck! Your libraries are high quality and I enjoy using them, so I hope Flyweight gets accepted. --Michael Fawcett

Michael Fawcett <michael.fawcett <at> gmail.com> writes:
On Feb 4, 2008 12:16 PM, Joaquín Mª López Muñoz <joaquin <at> tid.es> wrote:
Good to know there's a potential for the keyed variant. Would you support then the provision of regular flyweight and also keyed_flyweight as exposed in my prior post?
I support the notion of being able to lookup existing instances by key. However you implement it, I hope it will have an elegant syntax, and an extremely efficient implementation. Remember that this pattern is used often in games and 3D graphics, where extra copies are not an option, regardless of how pretty it makes the code look.
Besides regular flyweight and keyed_flyweight, there's a third potential variant, namely tha in which the flyweight is keyed but the key is not stored separately and can be obtained from T. I'd like to ask you: in your particular use cases, do your mesh classes store (apart from their data) the name of the file the data was brought from, i.e. can you obtain K from T? I wonder whether this third variant is worth implementing after all, given ther difficulties STL containers have at comparing elements with external keys.
I intended on doing a review but regrettably don't have time to review this library. Good luck! Your libraries are high quality and I enjoy using them, so I hope Flyweight gets accepted.
Thanks for your compliments. The review has brought forward a good number of issues. Regardless of what the review outcome is, I'll have to rethink much of what I've written. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

On Feb 4, 2008 3:44 PM, Joaquín M LópezMuñoz <joaquin@tid.es> wrote:
Michael Fawcett <michael.fawcett <at> gmail.com> writes:
On Feb 4, 2008 12:16 PM, Joaquín Mª López Muñoz <joaquin <at> tid.es> wrote:
Good to know there's a potential for the keyed variant. Would you support then the provision of regular flyweight and also keyed_flyweight as exposed in my prior post?
I support the notion of being able to lookup existing instances by key. However you implement it, I hope it will have an elegant syntax, and an extremely efficient implementation. Remember that this pattern is used often in games and 3D graphics, where extra copies are not an option, regardless of how pretty it makes the code look.
Besides regular flyweight and keyed_flyweight, there's a third potential variant, namely tha in which the flyweight is keyed but the key is not stored separately and can be obtained from T. I'd like to ask you: in your particular use cases, do your mesh classes store (apart from their data) the name of the file the data was brought from, i.e. can you obtain K from T?
Almost never, and neither do the textures usually. The textures in particular are very lightweight - they only store a GLuint (OpenGL typedef for unsigned int). They also have a custom deleter that deletes the texture from video memory by calling glDeleteTextures(). --Michael Fawcett

Joaquín M LópezMuñoz wrote:
Besides regular flyweight and keyed_flyweight, there's a third potential variant, namely tha in which the flyweight is keyed but the key is not stored separately and can be obtained from T. I'd like to ask you: in your particular use cases, do your mesh classes store (apart from their data) the name of the file the data was brought from, i.e. can you obtain K from T? I wonder whether this third variant is worth implementing after all, given ther difficulties STL containers have at comparing elements with external keys.
This approach has been implemented in Boost.Intrusive. The user gives a key and a comparison functor compatible with the container Predicate and compares that key with stored values types. Basically this approach was provided to be able to efficiently implemented map-like containers over intrusive sets. This approach can be also used to search for prefix keys in ordered containers so that one can efficiently get the first element that begins with a certain key prefix. The approach is certainly efficient and I like it, but it can be a bit dangerous for the general case. In intrusive "insert_check" compares the key and checks for duplicates. "insert_commit" uses information from the previous function and inserts a real value type. The key used in "insert_check" does not need to be enough to create a value type, just enough to find one element. Flyweight certainly does not need this flexibility and I don't think it's the place for such a general low level mechanism. But keys that are convertible to value types (const char * vs. std::string) might be interesting use cases to try to implement that third potential variant in a safe way. Regards, Ion

Ion Gaztañaga ha escrito:
JoaquÃn M LópezMuñoz wrote:
Besides regular flyweight and keyed_flyweight, there's a third potential variant, namely tha in which the flyweight is keyed but the key is not stored separately and can be obtained from T. I'd like to ask you: in your particular use cases, do your mesh classes store (apart from their data) the name of the file the data was brought from, i.e. can you obtain K from T? I wonder whether this third variant is worth implementing after all, given ther difficulties STL containers have at comparing elements with external keys.
This approach has been implemented in Boost.Intrusive. The user gives a key and a comparison functor compatible with the container Predicate and compares that key with stored values types. Basically this approach was provided to be able to efficiently implemented map-like containers over intrusive sets. This approach can be also used to search for prefix keys in ordered containers so that one can efficiently get the first element that begins with a certain key prefix. The approach is certainly efficient and I like it, but it can be a bit dangerous for the general case.
In intrusive "insert_check" compares the key and checks for duplicates. "insert_commit" uses information from the previous function and inserts a real value type. The key used in "insert_check" does not need to be enough to create a value type, just enough to find one element. Flyweight certainly does not need this flexibility and I don't think it's the place for such a general low level mechanism. But keys that are convertible to value types (const char * vs. std::string) might be interesting use cases to try to implement that third potential variant in a safe way.
As I explained in my previous post to Alberto, the problem is that variants 1 (classical) and 2 (external key) can be implemented with dumb factories, while 3 seems to demand more intelligent components based on B.MI, B.Intrusive or sometihng like that. Wouldn't users be satisfied with variant 2 as a substitute? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Joaquín Mª López Muñoz wrote:
As I explained in my previous post to Alberto, the problem is that variants 1 (classical) and 2 (external key) can be implemented with dumb factories, while 3 seems to demand more intelligent components based on B.MI, B.Intrusive or sometihng like that. Wouldn't users be satisfied with variant 2 as a substitute?
Yes, I think variant 2 will be satisfactory.
Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
Regards, Ion

JOAQUIN LOPEZ MU?Z ha scritto:
OK, I'd classify these two examples as scenarios where K and T contain esentially the same info but the translation function f() is computationally expensive. This is a valid context, just not the one I deem the most common. Maybe the particular case K!=T could be solved by a different class keyed_flyweight used like this (the example is similar to that of GoF you referred to):
struct character { character(char c):c(c){} char c; };
int main() { // fw_t objects are constructed from chars but are convertible // to character. typedef keyed_flyweight<char,character> fw_t;
// creates an internal character('a') fw_t x1('a');
// same shared value as x1, zero temporary character's fw_t x2('a');
// creates an internal character('b') fw_t x3('b'); }
Is this approach more convincing to you? It turns out it can be implemented with exactly the same machinery as flyweight (i.e. holders, factories, locking and tracking policies), so it could be provided as a companion to classic flyweight<> if there is interest in it. The attached code shows the idea is viable, the prototype keyed_flyweight implemented there has all their aspects (holder, factory, etc.) fixed for simplicity, and lacks some boilerplate features, but it proves its point. The idea is that the factory keeps elements of type similar to
entry = pair<K,T*>
So, only when a new entry is created need we equip it with a new T, hence completely avoding T temporaries. This incurs a cost in the form of additional storage and a slightly more expensive creation process (when no duplicates are found), so we shouldn't take flyweight<T> as shorthand for keyed_flyweight<T,T>, both classes serve different scenarios and ought to be provided separately.
Well, if you think this line of research is interesting I can pursue it.
First of all, thanks for having considered my feedback. I was tempted to say that I liked the keyed_flyweight approach, but I preferred reading the rest of the thread before replying. When I reached the post of yours where you mention the "third" case where the keys are stored inside or can otherwise be computed from values, I realized that maybe we could not only get that case, but let it all happen (including the non-keyed case) with just one flyweight class template. Here's how I see it: 1) we have two types: key_type for keys and value_type for values. The two types can be coincident (as in your non-keyed proposal) or be different (keyed case) 2) the constructor of flyweight semantically takes keys, not values 3) keys (not values) are fed as-is to the factory insert() method: here's the key point, it's up to the factory to determine whether the key is already present and if it's not present to construct the corresponding value. There's actually no need for a find() method. The flyweight template need not know how to compare keys with values, actually it does not even need to know that it is happening. Let me say it again with different words: the flyweight/factory interface has only one method, namely insert() as in the current design. A possible map-base factory might implement the factory::insert() method as a double call map::find() and map::insert(), this would just be an implementation detail of the factory, the flyweight machinery needs not be aware of that. 4) the insert() method returns a handle 5) the factory provides a way to extract values from handles The rest of the design would more or less stays exactly the same. The "third" case can easily be implemented in this framework using Boost.MultiIndex. Does it make sense? Is there something obvious (o even less obvious :) that I'm missing?
Well, thank you for giving me plenty of food for thought :)
You're welcome! :-) Ganesh

Alberto Ganesh Barbati ha escrito:
JOAQUIN LOPEZ MU?Z ha scritto:
Well, if you think this line of research is interesting I can pursue it.
First of all, thanks for having considered my feedback.
This is what reviews are for :) Thanks for providing the feedback in the first place.
I was tempted to say that I liked the keyed_flyweight approach, but I preferred reading the rest of the thread before replying. When I reached the post of yours where you mention the "third" case where the keys are stored inside or can otherwise be computed from values, I realized that maybe we could not only get that case, but let it all happen (including the non-keyed case) with just one flyweight class template. Here's how I see it:
1) we have two types: key_type for keys and value_type for values. The two types can be coincident (as in your non-keyed proposal) or be different (keyed case)
2) the constructor of flyweight semantically takes keys, not values
OK so far.
3) keys (not values) are fed as-is to the factory insert() method: here's the key point, it's up to the factory to determine whether the key is already present and if it's not present to construct the corresponding value. There's actually no need for a find() method.
find() would still be useful in combination with rw locks.
The flyweight template need not know how to compare keys with values, actually it does not even need to know that it is happening.
How do I know if value_type is convertible to key_type? I need asistance from the user.
Let me say it again with different words: the flyweight/factory interface has only one method, namely insert() as in the current design. A possible map-base factory might implement the factory::insert() method as a double call map::find() and map::insert(), this would just be an implementation detail of the factory, the flyweight machinery needs not be aware of that.
4) the insert() method returns a handle
5) the factory provides a way to extract values from handles
The rest of the design would more or less stays exactly the same.
Well, this is a valid approach, but I'd rather factories remained dumber so that all this intelligence is implemented in the core, because of the following reason (among others): STL associative containers (which are the natural components upon which to implement factories) are either set-like (key_type=value_type) or map-like (key_type is a fixed part of value_type, which does not coincide with user's provided T). To be able to accept both the keyed and non-keyed variants, the container would have to deal with a more general notion of key extraction. Boost.MultiIndex can do that, but I don't want to tie flyweight to that lib. On the other hand, I don't know how to solve the third case (T convertible to K) with dumb factories :-( Maybe this third case is not worth implementing, given that the external key variant shouldn't require that much extra storage (shared values ought to be comparatively few with respect to the number of flyweight objects pointing to them.) Maybe I can provide an external flyweight wrapping both behaviors flyweight<T,...> // classical flyweight flyweight<T,key<K>,...> keyed flyweight or something to that effect. I definitely have to think this carefully. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Joaquín Mª López Muñoz ha scritto:
Alberto Ganesh Barbati ha escrito:
1) we have two types: key_type for keys and value_type for values. The two types can be coincident (as in your non-keyed proposal) or be different (keyed case)
2) the constructor of flyweight semantically takes keys, not values
OK so far.
3) keys (not values) are fed as-is to the factory insert() method: here's the key point, it's up to the factory to determine whether the key is already present and if it's not present to construct the corresponding value. There's actually no need for a find() method.
find() would still be useful in combination with rw locks.
If factories are kept "dumb" as they are, yes, it might be useful. See below for another possible approach.
The flyweight template need not know how to compare keys with values, actually it does not even need to know that it is happening.
How do I know if value_type is convertible to key_type? I need asistance from the user.
The user might be allowed to provide that as an additional extension point. However, calling the value_type constructor with the key_type as parameter is a reasonable default behaviour.
Let me say it again with different words: the flyweight/factory interface has only one method, namely insert() as in the current design. A possible map-base factory might implement the factory::insert() method as a double call map::find() and map::insert(), this would just be an implementation detail of the factory, the flyweight machinery needs not be aware of that.
4) the insert() method returns a handle
5) the factory provides a way to extract values from handles
The rest of the design would more or less stays exactly the same.
Well, this is a valid approach, but I'd rather factories remained dumber so that all this intelligence is implemented in the core, because of the following reason (among others): STL associative containers (which are the natural components upon which to implement factories) are either set-like (key_type=value_type) or map-like (key_type is a fixed part of value_type, which does not coincide with user's provided T). To be able to accept both the keyed and non-keyed variants, the container would have to deal with a more general notion of key extraction. Boost.MultiIndex can do that, but I don't want to tie flyweight to that lib.
If you stick to that, you really should consider a name different from factories, which indeed are nothing more than storage. Anyway, I get your point. The design I have in mind is a bit different, though, leaving more intelligence to the factory. Even in your design, std::set can't be used directly, it's been wrapped in set_factory<>. So what's wrong in having this added intelligence put inside the wrapper, rather than into the core? About the "third" case, you don't need to tie Boost.MultiIndex with the core. The library would simply propose (among several others) one factory implemented as a wrapper onto Boost.MultiIndex. Only user that needs that feature would need to Boost.MultiIndex. This would be no different than the current design, where hashed_factory depends on Boost.MultiIndex and let's not forget that hashed_factory is the default factory!
On the other hand, I don't know how to solve the third case (T convertible to K) with dumb factories :-( Maybe this third case is not worth implementing, given that the external key variant shouldn't require that much extra storage (shared values ought to be comparatively few with respect to the number of flyweight objects pointing to them.)
Maybe I can provide an external flyweight wrapping both behaviors
flyweight<T,...> // classical flyweight flyweight<T,key<K>,...> keyed flyweight
or something to that effect. I definitely have to think this carefully.
If I could rethink the design from scratch (and I know I can't as the library has already been accepted) I would probably come up with something like: 1) the core class flyweight with has only two parameters: the factory and the holder policy 2) a bunch of different factories implementations. In particular there should be at least three factories to cover the three basic cases: K==T, K!=T non-intrusive and K!=T intrusive. These factories may be parametrized with a user-provided container or a separate parametrizable factory should be provided ad hoc. Most important, all library-provided factories shall be parametrized by the locking and tracking policy 3) a bunch of holder, locking, tracking policy implementations The choice to have locking and tracking parametrize the factories instead of the core is that actually no locking nor tracking happens in the core! Locking happens only in the execution of insert() and erase(), which would be moved in the factory, and tracking can be obtained as a RAII wrapper provided by the factory (I admit that I didn't thought about tracking deeply enough to pretend that this is actually possible, but I'm sure about locking). Just my opinion, Ganesh

Alberto Ganesh Barbati ha escrito:
JoaquÃn Mª López Muñoz ha scritto:
[...]
How do I know if value_type is convertible to key_type? I need asistance from the user.
The user might be allowed to provide that as an additional extension point. However, calling the value_type constructor with the key_type as parameter is a reasonable default behaviour.
I was referring to the reverse process, from value_type to key_type, looks to me you're talking about key_type-->value_type here. As for the construction key_type-->value_type, I agree with you calling value_type ctor is a resonable default, but this should be replaceable by the user (for instance, keyed_flyweight<K,T*> typically will need a function calling new T rather than a pointer's ctor). [...]
Well, this is a valid approach, but I'd rather factories remained dumber so that all this intelligence is implemented in the core, because of the following reason (among others): STL associative containers (which are the natural components upon which to implement factories) are either set-like (key_type=value_type) or map-like (key_type is a fixed part of value_type, which does not coincide with user's provided T). To be able to accept both the keyed and non-keyed variants, the container would have to deal with a more general notion of key extraction. Boost.MultiIndex can do that, but I don't want to tie flyweight to that lib.
If you stick to that, you really should consider a name different from factories, which indeed are nothing more than storage.
I don't mind doing that (despite the tradition "factory" has in the expositions of this pattern) if there's an agreement about the name. Some reviewer proposed "pool", this does not look bad at all to me.
Anyway, I get your point. The design I have in mind is a bit different, though, leaving more intelligence to the factory. Even in your design, std::set can't be used directly, it's been wrapped in set_factory<>. So what's wrong in having this added intelligence put inside the wrapper, rather than into the core?
Well, although containers themselves are not directly usable as factories, as you point out, the work one needs to do in order to wrap it into an appropriate factory is trivial, and this is how I'd like to keep it. The following is a design rationale and you might legitimately disagree, but my goal is to decompose the fylweight ocnfiguration into a number of aspects that are: 1. Orthogonal 2. Simple 3. Natural The idea is that the core provides the extra intelligence needed to glue the components together. Hence my reluctance to add intelligence to the factories or fuse tracking or locking with factories.
About the "third" case, you don't need to tie Boost.MultiIndex with the core. The library would simply propose (among several others) one factory implemented as a wrapper onto Boost.MultiIndex. Only user that needs that feature would need to Boost.MultiIndex. This would be no different than the current design, where hashed_factory depends on Boost.MultiIndex and let's not forget that hashed_factory is the default factory!
My point is not that B.MI should be avoided, bu that it *can* be avoided: mandating that factories support in-place key_value's (i.e. some sort of key extraction from value_type) implies that you're going to have to use sophisticated data structures to implement your own factories --confront that with the fact that set_factory can de build in a couple dozen lines from a std::set.
On the other hand, I don't know how to solve the third case (T convertible to K) with dumb factories :-( Maybe this third case is not worth implementing, given that the external key variant shouldn't require that much extra storage (shared values ought to be comparatively few with respect to the number of flyweight objects pointing to them.)
Maybe I can provide an external flyweight wrapping both behaviors
flyweight<T,...> // classical flyweight flyweight<T,key<K>,...> keyed flyweight
or something to that effect. I definitely have to think this carefully.
If I could rethink the design from scratch (and I know I can't as the library has already been accepted)
AFAICS nothing is still cast in stone: we enjoy design freedom until the first release of Boost containing Boost.Flyweight, which won't happen in a number of months. So, do not hesitate to contribute your opinions!
I would probably come up with something like:
1) the core class flyweight with has only two parameters: the factory and the holder policy
2) a bunch of different factories implementations. In particular there should be at least three factories to cover the three basic cases: K==T, K!=T non-intrusive and K!=T intrusive. These factories may be parametrized with a user-provided container or a separate parametrizable factory should be provided ad hoc. Most important, all library-provided factories shall be parametrized by the locking and tracking policy
3) a bunch of holder, locking, tracking policy implementations
The choice to have locking and tracking parametrize the factories instead of the core is that actually no locking nor tracking happens in the core!
[no tracking happens at the factory either]
Locking happens only in the execution of insert() and erase(), which would be moved in the factory, and tracking can be obtained as a RAII wrapper provided by the factory (I admit that I didn't thought about tracking deeply enough to pretend that this is actually possible, but I'm sure about locking).
I think that tracking performs adequately as it is, given that no interaction problem have been detected. As for locking, I'd prefer that it be managed by the core for the reasons explained above, but I'm not totally opposed to te alternative you propose if that proves to be the most efficient design. I need to do some prototyping before taking a decision. One final note: if we analyze this paragraph of yours: "a bunch of different factories implementations. In particular there should be at least three factories to cover the three basic cases: K==T, K!=T non-intrusive and K!=T intrusive. These factories may be parametrized with a user-provided container or a separate parametrizable factory should be provided ad hoc. Most important, all library-provided factories shall be parametrized by the locking and tracking policy" and do the following replacement "factory"->"core" "container" -> "factory" we've got this: "a bunch of different core implementations. In particular there should be at least three cores to cover the three basic cases: K==T, K!=T non-intrusive and K!=T intrusive. These cores may be parametrized with a user-provided factory or a separate parametrizable core should be provided ad hoc. Most important, all library-provided cores shall be parametrized by the locking and tracking policy" which is disturbingly similar to the approach I'm favoring... Maybe we're disagreeing just in names after all? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Here are some questions you might want to answer in your review:
* What is your evaluation of the design?
Ok.
* What is your evaluation of the implementation?
I like it. Did not go into too much detail, but from what I've looked at, it's ok. One thing I don't like is the initialization of boost::flyweight: http://svn.boost.org/svn/boost/sandbox/flyweight/libs/flyweight/doc/tutorial... I think there could be a simpler way to do this: http://www.entropygames.net/globals.htm
* What is your evaluation of the documentation?
Very nice - I had no problem reading through the basics. Did not look much at extending the lib.
* What is your evaluation of the potential usefulness of the library?
Very useful.
* Did you try to use the library? With what compiler? Did you have any problems?
No.
* How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
I've looked at the docs a few times; the time spent would be around 1.5h .
* Are you knowledgeable about the problem domain?
I've implemented something like this, but at a much smaller scale.
And finally, every review should answer this question:
* Do you think the library should be accepted as a Boost library? Be sure to say this explicitly so that your other comments don't obscure your overall opinion.
Yes. Best, John
Ion Gaztañaga - Review Manager - _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- http://John.Torjo.com -- C++ expert ... call me only if you want things done right

Hello John, thanks for your review John Torjo ha escrito:
* What is your evaluation of the implementation?
I like it. Did not go into too much detail, but from what I've looked at, it's ok.
One thing I don't like is the initialization of boost::flyweight: http://svn.boost.org/svn/boost/sandbox/flyweight/libs/flyweight/doc/tutorial... I think there could be a simpler way to do this: http://www.entropygames.net/globals.htm
Well, static_holder (which is conceptually very similar to a singleton) does in fact relies on Meyers' idiom, all the complications described at the referred section stems from the fact that initialization is forced to happen in pre-main() time to avoid multi-threaded clashes (which are deemed more unlikely before than after main() starts). Note that the singletons described at your article can pose problems if first accessed simultaneously from two different threads --this is what I tried to avoid in my lib. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

One thing I don't like is the initialization of boost::flyweight: http://svn.boost.org/svn/boost/sandbox/flyweight/libs/flyweight/doc/tutorial... I think there could be a simpler way to do this: http://www.entropygames.net/globals.htm
Well, static_holder (which is conceptually very similar to a singleton) does in fact relies on Meyers' idiom, all the complications described at the referred section stems from the fact that initialization is forced to happen in pre-main() time to avoid multi-threaded clashes (which are deemed more unlikely before than after main() starts). Note that the singletons described at your article can pose problems if first accessed simultaneously from two different threads --this is what I tried to avoid in my lib.
I was talking about depends_on class Best, John -- http://John.Torjo.com -- C++ expert ... call me only if you want things done right

John Torjo ha escrito:
One thing I don't like is the initialization of boost::flyweight: http://svn.boost.org/svn/boost/sandbox/flyweight/libs/flyweight/doc/tutorial... I think there could be a simpler way to do this: http://www.entropygames.net/globals.htm
Well, static_holder (which is conceptually very similar to a singleton) does in fact relies on Meyers' idiom, all the complications described at the referred section stems from the fact that initialization is forced to happen in pre-main() time to avoid multi-threaded clashes (which are deemed more unlikely before than after main() starts). Note that the singletons described at your article can pose problems if first accessed simultaneously from two different threads --this is what I tried to avoid in my lib.
I was talking about depends_on class
Oh, excuse my misunderstanding. In which way do you think that the depends_on class can simplify the static initialization issues for the flyweight lib? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Thanks for extending this period! Ion Gaztañaga <igaztanaga <at> gmail.com> writes:
* What is your evaluation of the design? * What is your evaluation of the implementation? * What is your evaluation of the documentation?
I only took a short glance at the docs and at the issue page
* What is your evaluation of the potential usefulness of the library?
I think this lib _could_ become useful for me some day. No idea yet, but it is nice to know the solution to problems that I do not have - yet. This is how I use boost all the time ...
* Did you try to use the library? With what compiler? Did you have any problems?
no try
* How much effort did you put into your evaluation?
A quick reading
* Are you knowledgeable about the problem domain? no
And finally, every review should answer this question:
* Do you think the library should be accepted as a Boost library? Be sure to say this explicitly so that your other comments don't obscure your overall opinion.
I think it should be accepted. My remarks to the current discusion: Equality semantics: This is where I'd like to see the library changed. I did not think about this in depth, but it let me feel uncomfortable immediately after reading that passage in the docs. Flyweight is advertized as "drop-in replacement for const T." and can be used everywhere a "T const &" is expected and as such it should behave also for operator==. I found it odd, that something that is "not a pointer" must be dereferenced to be compared. This breaks too much code for a true drop-in replacement. Also I think that intuitive behaviour is a GoodThing(TM) for a C++ library. We get so used to all these little oddities that make you look into the book even after years in C++, but I think it's complicated enough now. The constant time comparison is nice, but it is due to the IMHO counter-intuitive basic interface (though I can see its advantages). In any case I'd prefer comparison of the underlying values, even for flyweights from different factories (!), just in order to keep the "drop-in replacement for const T." semantics everywhere and in any case. If I need a safe access that ensures both fws come from the same factory, and const compare times, etc, I'd prefer a special free function compare(...) which then has all the semantics operator== has now. Alternative/simpler configuration interface: If you use multiindex you get used to the 10-liner-typedefs here and there in your code, but of course they look ugly. Same with this library here. OTOH together with the high quality of the docs (they are perfect, thanks Joaquín!) this might not be a problem. If there is an elegant solution that does make the code simpler I'd appreciate its inclusion to flyweight, but it's not a must-have-before-acceptance. best regards, Markus

Hi Markus, thank you for submitting your review. Markus Werle ha escrito:
My remarks to the current discusion:
Equality semantics: This is where I'd like to see the library changed. I did not think about this in depth, but it let me feel uncomfortable immediately after reading that passage in the docs.
Flyweight is advertized as "drop-in replacement for const T." and can be used everywhere a "T const &" is expected and as such it should behave also for operator==. I found it odd, that something that is "not a pointer" must be dereferenced to be compared.
Sorry, I'm not getting this. Where is dereferencing needed?
This breaks too much code for a true drop-in replacement. Also I think that intuitive behaviour is a GoodThing(TM) for a C++ library. We get so used to all these little oddities that make you look into the book even after years in C++, but I think it's complicated enough now.
The constant time comparison is nice, but it is due to the IMHO counter-intuitive basic interface (though I can see its advantages). In any case I'd prefer comparison of the underlying values, even for flyweights from different factories (!), just in order to keep the "drop-in replacement for const T." semantics everywhere and in any case.
For flyweights of different types comparison resorts to the underlying values. The optimization is only introduced when the flyweight types compared are the same. This is explained in detail at http://svn.boost.org/svn/boost/sandbox/flyweight/libs/flyweight/ doc/reference/flyweight.html#comparison
If I need a safe access that ensures both fws come from the same factory, and const compare times, etc, I'd prefer a special free function compare(...) which then has all the semantics operator== has now.
I tended to think that way before the review, but I'm not so sure now: the cases where having &-based equality semantics could deviate from the underlying semantics are truly pathological (see for instance http://lists.boost.org/Archives/boost/2008/01/132710.php ). The main advantage of &-based equality is its nice interaction with composite patterns: having a separate compare function would not play so nicely. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

I vote for flyweight to be accepted into boost. Earlier this week I have tried to plug a flyweight<string> type into Boost.Wave but failed at doing so for reasons related to Boost.Wave or my inexperience with Boost.Wave. So this review is based on a look at its implementation and a longer look at its documentation. I am knowledgable about the domain as I have implemented my own flyweight type which was only customizable in respect to the parameters for the hash container and which used a refcount tracking. First of all I don't mind the name factory but I would prefer flyweight_pool. I think the design is very good and the amount of work that Joaquin has done for a seemingly simple library is amazing. The documentation is very detailed minus some minor spelling errors. It could use a better example in the Introduction with a custom object and a custom hash function. Because 'how do I use boost.hash again' was the question in my head. As was already mentioned, a way to get usage statistics about the flyweights would be nice. Kevin

Hello Kevin, thank you for your review. Kevin Sopp ha escrito:
First of all I don't mind the name factory but I would prefer flyweight_pool.
This name looks interesting to me. But I don't know how to arrange things so as to reach a consensus on terminology among the reviewers. Maybe a separate poll.
The documentation is very detailed minus some minor spelling errors.
Could you please point me to those spelling errors? Thank you!
It could use a better example in the Introduction with a custom object and a custom hash function. Because 'how do I use boost.hash again' was the question in my head.
I precisely chose a natively hashed type to not deviate the user's attention from these details :) I can include a reference to the appropriate section at Boost.Hash docs, though. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
participants (12)
-
"JOAQUIN LOPEZ MU?Z"
-
Alberto Ganesh Barbati
-
Corrado Zoccolo
-
Ion Gaztañaga
-
Joaquín M López Muñoz
-
Joaquín Mª López Muñoz
-
John Torjo
-
Kevin Sopp
-
Markus Werle
-
Michael Fawcett
-
Steven Watanabe
-
vicente.botet