Wikifunction functions calls are good candidates for memoization, because:
- Functions can be expensive to run.
- Wikifunction functions are pure -- the output of a function is solely determined by its inputs.
- To the extent that we can forecast usage patterns, we think certain functions will be called often with the same inputs.
So, there's a performance optimization opportunity here, which is to cache function invocations so that repeated invocations with the same inputs can be served from a cache. The standard practice at the Wikimedia Foundation is to use memcached+mcrouter for this.
The memcached key used to look up memoized data will be derived from the function name, the normalized parameters, and some software version identifier. How exactly to include the parameters in the key will be tricky; inputs can be large, so we can't include them in full. We'll probably have to compute a hash of the inputs.
Questions / TODOs:
- Where are we going to use memcached?
- If only from PHP extension, MediaWiki provides APIs for memcached.
- If the node.js services will use memcached, need to see if there is a standard library.
- Figure out how to handle caching in dev environments: no caching is easiest but widens the gap between dev and prod and makes it harder to test cache code.
- How can we estimate the capacity requirements?
- Do we want persistence? (If so, memcached might not be the right solution.)
- How will we handle cold starts?
The caching setup will need to be compatible with multi-datacenter deployment, per the Wikimedia services policy. We should look at the design and implementation of MediaWiki's WAN object cache.