Interest check: memoization

James Porter

25 Jan 2009 25 Jan '09

2:39 a.m.

After hacking around with some simple dynamic programming, it occurred to me that it should be possible to write a simple class template to handle memoization of functions. With the changes in C++0x that help solve the forwarding problem, writing a generic memoizer should be fairly straightforward. (Or so I thought!) The interface is straightforward, so I'll let the following code snippet speak for itself: memoizer<int (int,int)> memo(my_func); int ret = memo(i,j); As you'd expect, this creates a memoizer that caches results of calls to my_func. Pretty simple, right? Well, not exactly. :) (If you'd rather not read about the implementation, you can skip ahead to the end, where there is a link to download the source.) There are a number of issues that make this a fairly complicated problem, even with the improved generic programming facilities in C++0x. Chief among these is copying. With the naive implementation of the memoizer, I found that calculating the result of an unmemoized value created 3 copies (one for creating a tuple to search a map for, and two to insert the tuple into the map), and returning a memoized value created 1 copy (to create a tuple to search the map). To make a long story short, I set about trying to reduce the number of copies to the expected amount: 1 copy for unmemoized values and 0 for memoized values. I'm happy to say that this was a success! The trick was creating a tuple that holds references to each argument and can be upgraded to hold the arguments by value. In this way, we can compare by reference and store by value. This is simple on the surface, but requires some care when dealing with rvalue references. There are also issues with passing convertible types to the memoizer (e.g. passing floats when the function expects ints). For complicated types, you certainly wouldn't want to copy them more than once, and you'd want to use move constructors where possible. This is resolved with some creative manipulation of template types (lines 52-58 of memoizer.hpp). There are still a number of things missing in this code, such as the ability to use function objects with the memoizer, or the ability to specify the map type, but I believe that the most important issues are resolved: * Minimizes copying of arguments * Supports move constructors wherever possible * Safely converts arguments to the appropriate types The full source code for this is available here (requires gcc 4.3 or a compiler that supports variadic templates and rvalue refs): http://www.teamboxel.com/misc/memoizer.tar.gz Comments and criticisms are welcome. If there's interest, I will - of course - work on a version that supports the current C++ standard. - Jim

Show replies by date

Steven Watanabe

25 Jan 25 Jan

4:17 a.m.

AMDG James Porter wrote:

...

There are a number of issues that make this a fairly complicated problem, even with the improved generic programming facilities in C++0x. Chief among these is copying. With the naive implementation of the memoizer, I found that calculating the result of an unmemoized value created 3 copies (one for creating a tuple to search a map for, and two to insert the tuple into the map), and returning a memoized value created 1 copy (to create a tuple to search the map).

To make a long story short, I set about trying to reduce the number of copies to the expected amount: 1 copy for unmemoized values and 0 for memoized values. I'm happy to say that this was a success! The trick was creating a tuple that holds references to each argument and can be upgraded to hold the arguments by value. In this way, we can compare by reference and store by value. This is simple on the surface, but requires some care when dealing with rvalue references.

You might want to look at Boost.Intrusive, which allows you to avoid the extra copies without adding indirection. http://www.boost.org/doc/html/intrusive/advanced_lookups_insertions.html In Christ, Steven Watanabe

James Porter

7 a.m.

Steven Watanabe wrote:

...

You might want to look at Boost.Intrusive, which allows you to avoid the extra copies without adding indirection. http://www.boost.org/doc/html/intrusive/advanced_lookups_insertions.html

Unless I'm reading the documentation wrong, I'd still need some kind of indirection to perform the lookups. That is, I'd have to create a tuple of references to pass to find(). Besides that, creating a pair<const key_type,data_type> would also require a copy if key_type were an ordinary tuple. (Using the my_map[key] = value syntax does this under the hood, too). I'm not very familiar with Boost.Intrusive, so correct me if I'm wrong. - Jim

Steven Watanabe

4:32 a.m.

AMDG James Porter wrote:

...

After hacking around with some simple dynamic programming, it occurred to me that it should be possible to write a simple class template to handle memoization of functions. With the changes in C++0x that help solve the forwarding problem, writing a generic memoizer should be fairly straightforward. (Or so I thought!)

Comments and criticisms are welcome. If there's interest, I will - of course - work on a version that supports the current C++ standard.

The implementation of less in upgradable_tuple_inner doesn't look quite right. return get(as_ref) < rhs.get(rhs_as_ref) || tail.less(as_ref,rhs.tail,rhs_as_ref); Shouldn't this be return get(as_ref) < rhs.get(rhs_as_ref) || (!(rhs.get(rhs_as_ref) < get(as_ref)) && tail.less(as_ref,rhs.tail,rhs_as_ref)); Also, upgradable_tuple_inner::upgrade isn't exception safe. Fairly minor and maybe I shouldn't be worrying about them at this stage, but still... In Christ, Steven Watanabe

James Porter

7:04 a.m.

Steven Watanabe wrote:

...

The implementation of less in upgradable_tuple_inner doesn't look quite right.

return get(as_ref) < rhs.get(rhs_as_ref) || tail.less(as_ref,rhs.tail,rhs_as_ref);

Shouldn't this be

return get(as_ref) < rhs.get(rhs_as_ref) || (!(rhs.get(rhs_as_ref) < get(as_ref)) && tail.less(as_ref,rhs.tail,rhs_as_ref));

This is true. I was operating under the assumption of totally ordered sets. Thankfully, it's simple issue to fix!

...

Also, upgradable_tuple_inner::upgrade isn't exception safe.

Also true. I'll have to consider the best route to take with this. The upgrade() function isn't exactly pretty to begin with (though it would look much nicer with unrestricted unions :) ). - Jim

Mathias Gaunard

10:45 a.m.

James Porter wrote:

...

The interface is straightforward, so I'll let the following code snippet speak for itself:

memoizer<int (int,int)> memo(my_func); int ret = memo(i,j);

It would be better if it worked with any function object. Thus the return type cannot be memoizer<int (int, int)>, unless you also perform type erasure, which would be fairly silly.

James Porter

9:35 p.m.

Mathias Gaunard wrote:

...

It would be better if it worked with any function object. Thus the return type cannot be memoizer<int (int, int)>, unless you also perform type erasure, which would be fairly silly.

I'm still considering the syntax for that. It's simple enough for boost::functions, but for generic function objects, it's not clear to me what the syntax should be, since a function object could potentially have many overloads of operator(). Perhaps syntax like memoizer<int (FunctionObj::*)(int, int)> would work? - Jim

Mathias Gaunard

26 Jan 26 Jan

12:51 a.m.

James Porter wrote:

...

I'm still considering the syntax for that. It's simple enough for boost::functions, but for generic function objects, it's not clear to me what the syntax should be, since a function object could potentially have many overloads of operator().

Why not simply auto g = memoize(f); /* type of g is unspecified, using auto for automatic type deduction */ just like bind & friends. Does your memoize adaptor require knowing the signatures of the function object?

James Porter

1:31 a.m.

Mathias Gaunard wrote:

...

Why not simply

auto g = memoize(f); /* type of g is unspecified, using auto for automatic type deduction */

That's the plan, but I was writing for gcc 4.3, and auto is only available in gcc 4.4. :)

...

Does your memoize adaptor require knowing the signatures of the function object?

It can only memoize one function (or method) at a time, since it stores tuples of the arguments used to call the function, and assumes that the tuples are all of the same type. It's simple for function pointers, since they only have one type list for arguments, and simple for Boost/TR1 function objects for the same reason, but arbitrary function objects can have many overloads of operator(), and there would need to be a way to specify which overload is associated with the memoizer. - Jim

Mathias Gaunard

12:40 p.m.

James Porter wrote:

...

That's the plan, but I was writing for gcc 4.3, and auto is only available in gcc 4.4. :)

Well, bind has an unspecified return type, and that never bothered anyone. If you don't have auto, just don't make it a named variable.

...

...
Does your memoize adaptor require knowing the signatures of the function object?

arbitrary function objects can have many overloads of operator(), and there would need to be a way to specify which overload is associated with the memoizer.

Can't you just do something like template<typename F> struct memoizer { memoizer(F f_) : f(f_) {} typedef typename boost::result_of<F>::type result_type; template<typename T1, typename T2, ..., typename Tn> result_type operator()(T1 t1, T2 t2, ..., Tn tn) { typedef boost::tuple<T1, T2, ..., Tn> tuple_type; typedef std::map< tuple_type, result_type > map_type; static map_type values; tuple_type tuple = boost::make_tuple(t1, t2, ..., tn); map_type::iterator it = values.find(tuple); if(it == values.end()) it = values.insert( std::make_pair(tuple, f(t1, t2, ..., tn)) ).first; return *it; } private: F f; }; template<typename F> memoizer<F> memoize(F f) { return memoizer<F>(f); } With ... standing for preprocessing magic.

Steven Watanabe

5:51 p.m.

AMDG Mathias Gaunard wrote:

...

...
arbitrary function objects can have many overloads of operator(), and there would need to be a way to specify which overload is associated with the memoizer.

Can't you just do something like

template<typename T1, typename T2, ..., typename Tn> result_type operator()(T1 t1, T2 t2, ..., Tn tn) { typedef boost::tuple<T1, T2, ..., Tn> tuple_type;

You'd need to be a bit careful about implicit conversions. If you have a monomorphic function object that takes double arguments, then passing ints and passing double will get memoized separately this way. In Christ, Steven Watanabe

James Porter

10:55 p.m.

Mathias Gaunard wrote:

...

Well, bind has an unspecified return type, and that never bothered anyone. If you don't have auto, just don't make it a named variable.

Since you generally want to store the memoizer somewhere (it's basically a container), this isn't always appropriate. However, I realized that for monomorphic function objects, I can just deduce the signature of operator() so that the memoizer's type is simply memoizer<my_function_object>.

...

Can't you just do something like [snip]

I'm intentionally avoiding (type-) polymorphic function objects (or rather, I'm restricting them to one overload), since knowing the exact function signature allows for better type-safety and, as Steven mentioned, eliminates the chance of creating spurious maps to store the memoized values. I think the added restriction in terms of polymorphic function objects is worth the added safety with monomorphic types. If you are interested in looking at the new code, I have updated source available here: http://www.teamboxel.com/misc/memoizer-0.2.tar.gz . There's obviously a lot of work left to do, but hopefully this will help resolve some of your questions regarding function objects. - Jim

vicente.botet

27 Jan 27 Jan

7:16 a.m.

----- Original Message ----- From: "James Porter" <porterj@alum.rit.edu> To: <boost@lists.boost.org> Sent: Monday, January 26, 2009 11:55 PM Subject: Re: [boost] Interest check: memoization

...

If you are interested in looking at the new code, I have updated source available here: http://www.teamboxel.com/misc/memoizer-0.2.tar.gz . There's obviously a lot of work left to do, but hopefully this will help resolve some of your questions regarding function objects.

Hi, IMO the main issue is that we need to declare a variable, and in order to share this variable we need to either use a global variable or pass it as parameter to the inner functions. What do you think about defining a singleton type avoiding the need of this variable declaration, as do Boost.Flyweith? what about adding a _singleton variant typedef memoizer_singleton<doit> doit_memo; and use directly the type doit_memo as follows doit_memo(5.0,7); Or class factorial : public recursive_memoizer_singleton<factorial,int(int)>{ public: int call(int n) { if(n<=1) return 1; else return n*memo(n-1); } }; and use factorial(5); // 5! In this way memoizer will be a generalization of the Flyweith. Of course this do not precludes to provide the non_singleton versions. Thanks, Vicente

James Porter

7:52 a.m.

vicente.botet wrote:

...

IMO the main issue is that we need to declare a variable, and in order to share this variable we need to either use a global variable or pass it as parameter to the inner functions. What do you think about defining a singleton type avoiding the need of this variable declaration, as do Boost.Flyweith?

what about adding a _singleton variant

typedef memoizer_singleton<doit> doit_memo;

and use directly the type doit_memo as follows

doit_memo(5.0,7);

I think I prefer this syntax, since it still allows for recursive memoization, and doesn't allow for stateful function objects (results of boost::bind, etc.). In order to share the results of boost::bind, you'd have to do as you say and make a global or pass it to inner functions, and if it's good enough in those situations, it should be sufficient here as well. These are all good suggestions, though. I think a library like this will only really see use if the memoization code is largely invisible to users, which means it has to mimic all the use cases for functions, function objects, etc. - Jim

vicente.botet

25 Jan 25 Jan

11:50 p.m.

----- Original Message ----- From: "James Porter" <porterj@alum.rit.edu> To: <boost@lists.boost.org> Sent: Sunday, January 25, 2009 3:39 AM Subject: [boost] Interest check: memoization

...

After hacking around with some simple dynamic programming, it occurred to me that it should be possible to write a simple class template to handle memoization of functions. With the changes in C++0x that help solve the forwarding problem, writing a generic memoizer should be fairly straightforward. (Or so I thought!)

The interface is straightforward, so I'll let the following code snippet speak for itself:

memoizer<int (int,int)> memo(my_func); int ret = memo(i,j);

As you'd expect, this creates a memoizer that caches results of calls to my_func. Pretty simple, right? Well, not exactly. :)

(If you'd rather not read about the implementation, you can skip ahead to the end, where there is a link to download the source.)

There are a number of issues that make this a fairly complicated problem, even with the improved generic programming facilities in C++0x. Chief among these is copying. With the naive implementation of the memoizer, I found that calculating the result of an unmemoized value created 3 copies (one for creating a tuple to search a map for, and two to insert the tuple into the map), and returning a memoized value created 1 copy (to create a tuple to search the map).

To make a long story short, I set about trying to reduce the number of copies to the expected amount: 1 copy for unmemoized values and 0 for memoized values. I'm happy to say that this was a success! The trick was creating a tuple that holds references to each argument and can be upgraded to hold the arguments by value. In this way, we can compare by reference and store by value. This is simple on the surface, but requires some care when dealing with rvalue references.

There are also issues with passing convertible types to the memoizer (e.g. passing floats when the function expects ints). For complicated types, you certainly wouldn't want to copy them more than once, and you'd want to use move constructors where possible. This is resolved with some creative manipulation of template types (lines 52-58 of memoizer.hpp).

There are still a number of things missing in this code, such as the ability to use function objects with the memoizer, or the ability to specify the map type, but I believe that the most important issues are resolved:

* Minimizes copying of arguments * Supports move constructors wherever possible * Safely converts arguments to the appropriate types

The full source code for this is available here (requires gcc 4.3 or a compiler that supports variadic templates and rvalue refs): http://www.teamboxel.com/misc/memoizer.tar.gz

Comments and criticisms are welcome. If there's interest, I will - of course - work on a version that supports the current C++ standard.

Hi, The implementation is very elegant and the use of the C++0x facilities reduce the code to the minimum. I remember that something like this was requested to the Flyweigth library. The following is extracted from the documentation: " Example 5: flyweight-based memoization Memoization is an optimization technique consisting in caching the results of a computation for later reuse; this can dramatically improve performance when calculating recursive numerical functions, for instance. Key-value flyweights can be used to implement memoization for a numerical function f by modeling a memoized invocation of the function as a value of type flyweight<key_value<int,compute_f> >, where compute_f is a type that does the computation of f(n) at its compute_f::compute_f(int) constructor. For instance, the Fibonacci numbers can be computed with memoization like this: struct compute_fibonacci; typedef flyweight<key_value<int,compute_fibonacci>,no_tracking> fibonacci; struct compute_fibonacci{ compute_fibonacci(int n): result(n==0?0:n==1?1:fibonacci(n-2).get()+fibonacci(n-1).get()) {} operator int()const{return result;} int result; }; for(int n=0;n<40;++n){ std::cout<<"F("<<n<<")="<<fibonacci(n)<<std::endl; } The no_tracking policy is used so that the memoized computations persist for future use throughout the program. " How both solutions compares? int fibonacci_impl(int n); memoizer<int (int)> fibonacci(fibonacci_impl); int fibonacci_impl(int n) { return (n==0?0:n==1?1:fibonacci(n-2)+fibonacci(n-1)) } for(int n=0;n<40;++n){ std::cout<<"F("<<n<<")="<<fibonacci(n)<<std::endl; } We can memoize only unary functions with flyweight and have flyweight values with memoize(identity function). So which one perform better. If I understand memoizer can be used only with pure functions, isn't it? Is there a way to check for a pure function in C++0X? Is there a way to constraint the template parameter to only pure functions? Should the following compile? memoizer<int&& (int,int)> memo(my_func); // move result on calling int ret = memo(i,j); Should the following compile? memoizer<int (int&,int)> memo(my_func); // allows to modify the first argument int ret = memo(i,j); Some minor changes in typedef const Ret & result_reference; will be needed to return references. memoizer<int& (int,int)> memo(my_func); int& ret = memo(i,j); Best, Vicente

James Porter

26 Jan 26 Jan

1:47 a.m.

vicente.botet wrote:

...

The implementation is very elegant and the use of the C++0x facilities reduce the code to the minimum.

Thanks! I certainly don't relish writing this in C++03, though removing the need to address rvalue references will simplify some of the code.

...

We can memoize only unary functions with flyweight and have flyweight values with memoize(identity function). So which one perform better.

I'll obviously have to run some tests to see which one has better performance, but for simple cases, I'd expect performance to be approximately the same. Your example code with the memoizer does highlight one current limitation of my code, however: without support for function objects, recursive functions can only be memoized with a global memoizer.

...

If I understand memoizer can be used only with pure functions, isn't it? Is there a way to check for a pure function in C++0X? Is there a way to constraint the template parameter to only pure functions?

This is more-or-less true. You can allow functions with side effects, but it requires considerable care to use them properly.

...

Should the following compile? memoizer<int&& (int,int)> memo(my_func); // move result on calling int ret = memo(i,j);

This probably should compile, and move-construct the return value into the map. The return value would then be a const-reference.

...

Should the following compile? memoizer<int (int&,int)> memo(my_func); // allows to modify the first argument int ret = memo(i,j);

Perhaps not.

...

Some minor changes in typedef const Ret & result_reference;

will be needed to return references. memoizer<int& (int,int)> memo(my_func); int& ret = memo(i,j);

Definitely. I also have to examine the behavior with expensive-to-copy return values. This is still very much a work-in-progress. I suppose this is just an example of the law of library writing: generic, efficient, and simple; choose 2. :) - Jim

James Porter

8:56 a.m.

vicente.botet wrote:

...

int fibonacci_impl(int n); memoizer<int (int)> fibonacci(fibonacci_impl);

int fibonacci_impl(int n) { return (n==0?0:n==1?1:fibonacci(n-2)+fibonacci(n-1)) }

for(int n=0;n<40;++n){ std::cout<<"F("<<n<<")="<<fibonacci(n)<<std::endl; }

I've been thinking about this implementation and how it's fairly inflexible, especially for function objects. I've added support for function objects, and I've been working on a general solution to the recursive memoization problem. How does this look? class factorial : public recursive_memoizer<factorial, int(int)> { friend memoizer_core_access; private: int call(int n) { if(n==0) return 1; else return n*memo(n-1); } }; // Usage factorial f; f(5); The (in-progress) implementation I have allows for stateful function objects to be memoized, though clearly you wouldn't want to change the state once you start memoizing values. I'll post updated code when I'm satisfied that it actually works in more than the single case I've tested it with. :) Recursive memoization seems like one of the more common use cases for memoization in general, so I definitely want it to be as straightforward as possible to use for this purpose. - Jim

vicente.botet

12:18 p.m.

----- Original Message ----- From: "James Porter" <porterj@alum.rit.edu> To: <boost@lists.boost.org> Sent: Monday, January 26, 2009 9:56 AM Subject: Re: [boost] Interest check: memoization

...

vicente.botet wrote:

...
int fibonacci_impl(int n); memoizer<int (int)> fibonacci(fibonacci_impl);

int fibonacci_impl(int n) { return (n==0?0:n==1?1:fibonacci(n-2)+fibonacci(n-1)) }

for(int n=0;n<40;++n){ std::cout<<"F("<<n<<")="<<fibonacci(n)<<std::endl; }

I've been thinking about this implementation and how it's fairly inflexible, especially for function objects.

The global object wasn't look very pretty.

...

I've added support for function objects, and I've been working on a general solution to the recursive memoization problem. How does this look?

class factorial : public recursive_memoizer<factorial, int(int)> { friend memoizer_core_access; private: int call(int n) { if(n==0) return 1; else return n*memo(n-1); } };

// Usage factorial f; f(5);

Yes, this is very interesting. I'm wondering if Boost.Flyweight should'n provide a similar recursive class, Joaquin?

...

The (in-progress) implementation I have allows for stateful function objects to be memoized, though clearly you wouldn't want to change the state once you start memoizing values. I'll post updated code when I'm satisfied that it actually works in more than the single case I've tested it with. :)

Thanks, Vicente

James Porter

31 Jan 31 Jan

7:28 a.m.

For those who are still interested, I've uploaded a new version of the memoizer code: http://www.teamboxel.com/misc/memoizer-0.3.tar.gz The main changes are, 1) adding recursive and static memoizers, 2) switching over to Boost.Build. There are also a handful of unit tests, as well as some examples. The basic memoizer template works the same as before (though I fixed a bug that made it impossible to use std::/boost::function objects), so I'll focus on the new memoizer types: The recursize memoizer is useful for memoizable functions that call themselves. The basic memoizer doesn't easily allow for this without the use of global objects, prompting me to create a specialized version. The example below shows typical usage: class factorial : public recursive_memoizer<factorial,int(int)> { friend class memoizer_core_access; public: factorial(int k) : k(k) {} private: int call(int n) { if(n<=1) return 1; else return n*memo(n-k); } int k; }; // ... // usage factorial f(1); f(5); // == 120 The recursive memoizer is useful, but it still requires you to hold onto an object to handle the memoization. That's where the static memoizer comes in: it has the ability for recursion just like the recursive memoizer, but has static duration and therefore can be called at any time from anywhere in your program (standard caveat for calls from other static functions applies). Example: class fibonacci_; typedef static_memoizer<fibonacci_> fibonacci; class fibonacci_ { public: int operator ()(int n) { if(n==1) return 1; else if(n<1) return 0; else return fibonacci(n-1)+fibonacci(n-2); } }; // ... // usage fibonacci(10); // == 55 There are still, I'm sure, plenty of bugs and strange edge cases for me to look at, but I'm pretty happy with how the interface is shaping up so far. As usual, comments and criticisms are welcome. - Jim

Mathias Gaunard

11:44 a.m.

James Porter wrote:

...

The recursize memoizer is useful for memoizable functions that call themselves. The basic memoizer doesn't easily allow for this

Couldn't it be done using a Y combinator?

James Porter

1 Feb 1 Feb

2:14 a.m.

Mathias Gaunard wrote:

...

James Porter wrote:

...
The recursize memoizer is useful for memoizable functions that call themselves. The basic memoizer doesn't easily allow for this

Couldn't it be done using a Y combinator?

Possibly. I wouldn't exactly call that "easy" though (unless you already have a Y combinator handy). :) The recursive_memoizer class template is designed to make that subtype of memoization problems trivial to implement. In general, I'd say that memoization is not a conceptually difficult idea, so the interfaces should reflect that. - Jim

Brandon Kohn

31 Jan 31 Jan

2:59 p.m.

-------------------------------------------------- From: "James Porter" <porterj@alum.rit.edu> Sent: Saturday, January 31, 2009 2:28 AM To: <boost@lists.boost.org> Subject: Re: [boost] Interest check: memoization

...

For those who are still interested, I've uploaded a new version of the memoizer code: http://www.teamboxel.com/misc/memoizer-0.3.tar.gz

This sounds great James. I had written something for use at my work which I called a LazyCache (I hadn't ever heard the term memoization.) I would very much like to see a fully fleshed out implementation of this so I can replace mine and stop hacking it :D. Cheers, Brandon

6020

Age (days ago)

6027

Last active (days ago)

List overview

Download

21 comments

5 participants

participants (5)

Brandon Kohn
James Porter
Mathias Gaunard
Steven Watanabe
vicente.botet