Memoization - is that even a word?
Piotr Staniów • Posted on 28 Apr 2020 • 8 mins
This article is a part of Learning by implementing series.
I've learnt it the hard way that, as it happens with many tools, memoization can be both a blessing and a curse. Using adequate memoization techniques can vastly boost the performance of your applications but when used improperly, it may also become a source of slowdowns.
We are often caught unaware by common pitfalls of memoization and some popular libraries will actually lure you into them. With that in mind, I've decided to write a few words — quite a few actually — about what memoization is and how to implement it. We'll review a full spectrum of options ranging from the most basic ones, frequently implemented in libraries, to more advanced ones, which all of course have their trade-offs.
What is function memoization?
Memoization is a performance optimisation technique, reducing time required to get a result of (computationally) expensive function. The technique allows to evade performing a computation, when a function is called with the same set of arguments as previously.
There are two important takeaways here. Firstly, the memoized function should be rather costly to run (either on CPU or on memory), otherwise the memoizing function could actually worsen it. Secondly, to actually benefit from it, the same arguments should be likely to be provided to the function more than once, yet not necessarily in a row.
The basic idea here is that we will be striving to implement memoize
function
that takes another function as an argument and returns a memoized variant of it.
That's a good moment to take a pause, and ask yourself — have you ever heard of thunks? They have recently gained some popularity, especially amongst React developers and that idea will be also useful here, so let's first answer…
What is a thunk?
Simply speaking, a thunk is a function returned by another function. Thunks are usually created to delay call to a function, optionally with inserting some code before or after the function. It may be used as well when not all arguments are available immediately.
Let's get our hands dirty with writing a few thunks:
function thunkFactory() { return function thunk() { console.log('I\'ve called a thunk and I liked it!'); }; } thunkFactory()(); // One way of calling a thunk const aThunk = thunkFactory(); aThunk(); // And another way of calling a thunk
Notice that we have written a thunk and a thunk factory, which are both functions. To invoke the thunk, we need to first create it hence the doubled parentheses.
This pattern may also be leveraged when not all arguments are available at hand. Instead of invoking the function with all of them at once, we can gradually pass them as they arrive. Let's imagine, that we have an API that responds with a number and we want to increase the number by some factor that is only known in a completely different part of the project:
function sum(a) { return function thunk(b) { return a + b; }; } const sumWith5 = sum(5); // Let's print whatever API returns increased by 5! fetch('/api') .then(sumWith5) .then(console.log);
How to implement simple function memoization?
In this part, we will combine our knowledge about memoization and thunks. Our
goal here is to write a memoize
function that fits into this pattern:
function memoize(functionToMemoize) { return function memoizedFunction() { // A thunk functionToMemoize(); } }
The goal here, however, is to ensure that functionToMemoize
will be called as
rarely as it is possible because it's a computationally expensive function.
A somewhat naive implementation, that we will start with, may assume that the function takes only a single argument (we name such functions unary). For the sake of simplicity, we will also only store the previous argument it was called with, and the previous result of a call.
function memoize(functionToMemoize) { let previousArg; let cachedValue; return function memoizedFunction(arg0) { // If functionToMemoize was called previously if (previousArg === arg0) { // Return previously returned value without the actual call return cachedValue; } // Otherwise do the expensive call const nextResult = functionToMemoize(arg0); // And store both argument and the result for further usage previousArg = arg0; cachedValue = nextResult; // Finally, return the retrieved value return nextResult; }; }
There are three main features of the memoize
function we have written above:
- The function to memoize is unary (receiving only a single argument).
- Only a single call is memoized — when the argument changes, the previous result is lost (what a waste!).
- Shallow comparison is performed, so calling the function with two objects of identical shape but created separately (of different references) results in calling the expensive function twice.
As we progress through this article, the memoize
function will be tweaked and
tuned, so that we actually get to a far more potent variant. Our goal is to get
a version of it that:
- Memoizes functions of any arity (those taking any number of arguments)
- Remembers any number of calls or a fixed number of calls
- Performs deep equality check so that we recognize correctly two objects are actually equivalent
With these goals in mind lets reflect on…
Building blocks of memoization
As we iterate over all those implementations, we will preserve a few following concepts behind that are going to prevail in all of them:
- Cache — a place to store previously used arguments and corresponding values, that enables storing more than one call
- Serializer — a function that takes fingerprints off objects, so that we correctly recognize two equivalent objects,
- Strategy — that is the actual implementation of
memoize
, that combines a cache and a serializer in a certain way
Comparing two objects
In the previous implementation we haven't used any serializer. Therefore, we're
only able to cache a single previous response, when exactly the same object is
passed — with the same reference as previously. Nonetheless, this is not always
the case and in fact, that's a rather rare scenario. To address that issue, we
may modify our memoize function by adding a serializer, like the well-known
JSON.stringify
that serves this purpose just great:
const serializer = JSON.stringify; function memoize(functionToMemoize) { let previousArg; let cachedValue; return function memoizedFunction(arg0) { // We will cache and compare a serialized argument const serializedArg0 = serializer(arg0); if (previousArg === serializedArg0) { return cachedValue; } const nextResult = functionToMemoize(arg0); previousArg = serializedArg0; cachedValue = nextResult; return nextResult; }; }
This way allows us to peek into the internal structure of an object. It is also
relatively fast method, since it's implemented in the browser's native code.
The trade-off here though is that JSON.stringify
brings significant overhead
for primitive values that don't require serialization.
A key thing to note here is that JSON.stringify
is not sensitive to keys
order. It means that { a: 1, b: 2 }
and { b: 2, a: 1 }
are considered two
completely different objects, because their respective serialized strings are
different. This can be worked around by sorting keys in object, or using an
alternative library — for instance the
JSON stable stringify.
When to use memoization?
Memoization comes at a cost. We're building here some wrappers certainly adding some complexity to the function. We're using a cache which, provided that we cache all calls, may grow uncontrollably. It all depends on the use case — even a trivial function may be memoized with benefits, if that's a Redux selector in React and it triggers a costly rerender of a complex component.
Nonetheless, a rule of thumb here is to memoize functions that are expensive to run, and carry out heavy computations. This also applies to a function implementing an algorithm that is running few operations but is heavy on memory.
Another requirement is that the function has to be a pure function. It means, that it should be deterministic with respect to its input, returning always the same values for the same arguments. It also means, that it has no side effects, like setting a variable outside of it — you can think of pure functions like if they were mathematical functions.
Memoization may be also useful in case of recursive computations, when the same input is frequently reused, for instance when you compute values of Fibonacci sequence.
How to memoize multiple calls of a function?
The idea here is to add a cache, which is responsible for keeping information about what was the response for given argument passed to a function. We can actually use a simple JavaScript object for that purpose, storing the serialized argument as a key to it, and the previously returned value as a value in this object.
Please bear in mind, that the cache has to be recreated with every memoize
call, otherwise we would have a shared cache for all memoized functions
regardless of what they do, which is probably not we aim for here. 😉
const serializer = JSON.stringify; function memoize(functionToMemoize) { const cache = {}; // Re-created for every function memoized return function memoizedFunction(arg0) { const serializedArg0 = serializer(arg0); if (cache.hasOwnProperty(serializedArg0)) { return cache[serializedArg0]; } const nextResult = functionToMemoize(arg0); cache[serializedArg0] = nextResult; return nextResult; }; }
How to memoize multiple arguments of a function?
The way of memoizing multiple arguments of a function may depend on requirements
that we have for the memoize
function. If we expect a somewhat slow arguments
comparison, but performing in-depth lookup into objects, we can actually use our
serializer on the array of arguments passed to the function:
const serializer = JSON.stringify; function memoize(functionToMemoize) { const cache = {}; return function memoizedFunction(...args) { const serializedArgs = serializer(args); if (cache.hasOwnProperty(serializedArgs)) { return cache[serializedArgs]; } const nextResult = functionToMemoize(...args); cache[serializedArgs] = nextResult; return nextResult; }; }
This may however turn out to be a bit sluggish, in particular if multiple large objects are passed to the function, or if it always gets called only with primitive values. A more effective way would be to perform a shallow comparison for all arguments, to check if they match previously used arguments. Unfortunately, our cache (a plain object) is not adapted to storing any other key than a string, so we need to adapt the cache too.
One idea would be to store a list of tuples containing both a list of arguments and corresponding value. It is a naive approach, that takes long time to search but it certainly works:
const serializer = JSON.stringify; const areArraysEqual = (arr1, arr2) => arr1.every((el, idx) => el === arr2[idx]); // `notFound` is just some fixed value const notFound = Symbol.for('@memoize/notFound'); function memoize(functionToMemoize) { const cache = { _calls: [], set(args, value) { this._calls.push([args, value]); }, get(args) { const entry = this._calls.find( ([prevArgs]) => areArraysEqual(prevArgs, args) ); return entry ? entry[1] : notFound; }, }; return function memoizedFunction(...args) { const previousValue = cache.get(args); if (previousValue !== notFound) { return previousValue; } const nextResult = functionToMemoize(...args); cache.set(args, nextResult); return nextResult; }; }
One thing that I probably need to explain myself from is use of Symbol.for
—
this function simply returns an unique value like no other, that is unlikely to
be returned by the memoized function. We could instead use undefined
or -1
,
provided that the function to memoize never returns undefined
— which would be
indistinguishable from value being not found.
A more performant solution
The cache presented above has one very serious problem — it requires searching
through all previous calls to find the previous result of a call. However, the
main reason we write memoize
is to improve our performance, it's a performance
optimisation technique after all, so we can't afford this kind of inefficiencies.
Another idea for the cache would be to create a recursively nested object, where key corresponds to one of the arguments it was called with. The top-level object would keep first arguments of all calls as keys. The value would either be the return value of the function or another cache object which, in turn, would keep second arguments of calls as keys, etc.
For a function call heavy('arg1', 'arg2', 'arg3')
that returns 17
, the cache
structure after a first call would look like this one:
{ "arg1": { "arg2": { "arg3": 17 } } }
Determining whether the cache structure has already encountered given arguments would be straightforward and relatively fast:
const cache = { _calls: {}, get(args) { let lookup = this._calls; for (const arg of args) { if (!lookup.hasOwnProperty(arg)) { return notFound; } lookup = lookup[arg]; } return lookup; } }
I'm not using reduce
here on purpose because it's slower 🚀
Unfortunately, this approach won't work for arguments that are objects, since
the cache is an object too, and objects can't be used as keys. If you called
a function twice with two different objects, they would be both converted to
[Object object]
. To work around this problem, we may use Map
that allows
storing objects and primitive values as keys.
The full implementation of the cache may look like this one:
function createCache() { return { _calls: new Map(), set(args, value) { let lookup = this._calls; for (let i = 0; i < args.length; i++) { const arg = args[i]; const previousValue = lookup.get(arg); if (previousValue) { // Argument already cached lookup = previousValue; } else if (i === args.length - 1) { // Last argument lookup.set(arg, value); } else { // First encounter of argument at that position const newLevel = new Map(); lookup.set(arg, newLevel); lookup = newLevel; } } }, get(args) { let lookup = this._calls; for (const arg of args) { if (!lookup.has(arg)) { return notFound; } lookup = lookup.get(arg); } return lookup; }, }; }
The memoize function remains the same apart from the fact that you now create a
cache using createCache
helper function from the above snippet.
Memory-wise optimisation
Until now, we're only focusing on avoiding as many calls of the original function as it's possible, by memoizing with respect to multiple arguments, objects and many calls. The latest cache implementation is still not ideal — it will grow out of control in size as we are calling the function more and more with new arguments. It will also prevent JavaScript's garbage collector for freeing unused resources, because cache is forever keeping a reference to objects that once were arguments to memoized function.
There are two possible improvements in that area: keeping track of how often or how recently given arguments were passed to the function, and removing those infrequent or stale. A hint here would be to experiment with LRU cache and the underlying algorithm.
Another improvement would be to use WeakMap
that doesn't store a reference to
an object, so it would not stop garbage collector from releasing unused objects.
However, WeakMap
is kind of the opposite of an object — its keys cannot be
primitive values. So using that as a data structure powering our cache would
result in not being able to call the function with anything that is not an
object. We could however, use a trick and have a combined nested cache structure
that would use objects for primitive keys and WeakMaps for object keys. Just to
give you a hint of how it may look like:
_calls = [ { 10: [ { 15: null }, // fn(10, 15) -> null WeakMap {} ] }, WeakMap { { a: 1 } => [ { 15: 16 }, // fn({ a: 1 }, 15) -> 16 WeakMap {} ], } ]
We will not implement those in this article however, for the sake of brevity. Likely though, with the number of checks needed and complexity of the data structure, you need a solid justification to go this far.
Other implementations of memoization
Memoization is a difficult subject — there are many implementations available in various libraries that have various limitations.
The popular lodash.memoize
function is rather slow and memoizes only with
respect to the first argument (however other arguments are passed to the original
function). You can check the implementation here.
The createSelector
function from reselect
library performs only a shallow
arguments comparison (which is faster than serializing) but it only keeps track
of a single call. Implementation is available
here.
Another variant is memoizeWith
from ramda
that is considered one of the slowest
implementations (probably due to its functional paradigm style) and is only capable
of caching by string — it requires passing a serializer to it.
See implementation
There's also fast-memoize
library, that claims to be the world's fastest JavaScript memoization library.
Wrap up
There are some key takeaways from this article I would like you to remember.
Firstly, there is no "one suits all" solution for memoization and you should
properly identify what is the memoize
that you need.
It is also worth noting that memoization comes at a cost and depending on its implementation the cost may be huge. The memoization technique however, is extremely useful and no wonder it's that popular.
The ideal solution here remains, as always, to be curious, not accept things as they are, and dig deeper: you can always find the sweet spot for your functions by benchmarking them. And you can use versatile libraries that are dedicated to the sole purpose of memoization, and there are plenty of them. 😎