The next step in Lift Wing is probably how to cache score results to avoid expensive re computations. There are some high level strategies to follow:
- HTTP Caching at the CDN edge - We don't currently return any HTTP cache header in our responses to clients, and the API gateway sets no-cache if nothing is already specified. Having scores cached at the CDN layer could allow us to have basic protection against high traffic spikes, especially if the request the same traffic. The downside is that we could offer the caching only to external users (namely the ones using the api gateway), not the internal ones.
- Score cache in Cassandra. We could basically replicate the ORES Redis score cache, but in Cassandra. When a score is requested, we'd fire a call to Cassandra to check if a value was already computed, and in case return the result immediately. On the contrary, we could compute the result and store it. Among the pros we have that both internal and external clients would benefit from the cache, but the downside is that bursts in traffic would hit our backed services anyway (since the CDN wouldn't protect us).
Both strategies have some challenges to solve:
- How to invalidate the cache?
- How long a cached value could remain in cache?
- When/if the cache gets full, what is the policy for new data?
- etc..
The CDN option doesn't seem viable for the moment since we don't expose a complete REST API, since most of the parameters (like features) that really make a score different from another one are carried by the POST's payload, that usually it is not cached at the Varnish/ATS layer (it would be very expensive). For example, let's pick:
curl -s https://inference.svc.eqiad.wmnet:30443/v1/models/cswiki-goodfaith:predict -X POST -d '{"rev_id": 23040023}' -i -H "Host: cswiki-goodfaith.revscoring-editquality-goodfaith.wikimedia.org" --http1.1
The cached URL would be /v1/models/cswiki-goodfaith:predict and its value would be the json payload of the response. That would clearly be wrong since we don't vary the cached content based on the rev_id.
We could think about adding an extra "translation" layer in front of the current one, basically offering a real REST API (cacheable), but at this stage of Lift Wing it would be a major endeavor (and we already have made the API-Gateway's Lift Wing API public).
The remaining solution to try could be the Cassandra cache, but we'd need to plan it carefully.
Please add ideas/suggestions/doubts/etc.. :)