Page MenuHomePhabricator

Epic: Implement prototype inference service that uses Cassandra for request caching
Open, Needs TriagePublic

Description

The service that is likely the most useful in this is RevertRisk-Multilingual: it's a service that we want to continue to support, it would benefit from faster response times, and its req/response semantics are well suited to caching.

The steps needed to implement this are:

  • Set up user(s) and permissions on Cassandra
  • Add Istio config to make Cassandra reachable from LW
  • Copy RRML repo to new RRMLca repo for experimentation
  • Setup RRMLca LW service config (Deployment service etc), including Cassandra config
  • Decide schema for caching table(s)
  • Add code to RRMLca repo to read from and write to cache
  • Benchmark speed difference between cach hits and misses
  • Decide how cache invalidation should be handled and implement it.
  • Benchmark again, also check what a good cache size might be.
  • Distill insights into documentation and/or dummy example service that illustrates how to add caching to a LW service.

Event Timeline

klausman updated the task description. (Show Details)

Nice progress! I have a couple of questions/points/etc.., nothing urgent, just writing them there to not forget them:

  • What is the schema selected for the data stored in Cassandra? We should document it in here so people can find it, and probably discuss the replication strategy etc.. (for example, do we want to eventually be able to replicate a write to eqiad in codfw and vice-versa? etc.. Cassandra does a lot of things automatically but they need to be stated).
  • Related to the point above, ideally long term all Cassandra data/workloads will be managed by SRE's Data Persistence. We should keep them in the loop to be able to know if we can, at the end of the experiment, think about a handover of if it will be something to maintain on our side.
  • I see a tick related to code added to read/write from cache, is it already merged?
  • What is the schema selected for the data stored in Cassandra? We should document it in here so people can find it, and probably discuss the replication strategy etc.. (for example, do we want to eventually be able to replicate a write to eqiad in codfw and vice-versa? etc.. Cassandra does a lot of things automatically but they need to be stated).

For the POC, the schema is very model-specific (or rather, specific to the JSON returned by it). For the result cache (i.e. what's sent to the client, it has these fields:

 wiki_db | rev_id | model_version | last_used                       | p_false  | p_true   | prediction
---------+--------+---------------+---------------------------------+----------+----------+------------
  frwiki | 153221 |             3 | 2024-02-28 15:26:39.164000+0100 | 0.722097 | 0.277903 |      False

which is derived from this JSON:

{
  "model_name": "revertrisk-language-agnostic",
  "model_version": "3",
  "wiki_db": "frwiki",
  "revision_id": 153221,
  "cache_hit": false,
  "output": {
    "prediction": false,
    "probabilities": {
      "true": 0.2779030203819275,
      "false": 0.7220969796180725
    }
  }
}

The last_used field is not used for expiration, but rather Cassandra's own expiry of rows/fields (TTLs). The field is in there mostly for debugging, but It may also be useful in the future to cap cache size at X number of rows or similar (if TTL expiry is not fast enough).

If caching for the preprocessing stage is desired, then the schema becomes more complex, as the preprocessing step can have very deep and large objects as a result. I did a bit of exploratory coding there, but soon realized that for some objects the only option is a hex/base64-ified Python pickle which has interesting size implications.

  • Related to the point above, ideally long term all Cassandra data/workloads will be managed by SRE's Data Persistence. We should keep them in the loop to be able to know if we can, at the end of the experiment, think about a handover of if it will be something to maintain on our side.

So far I've been operating on the assumption that we use the dedicated ml-caching hosts for our caches.

  • I see a tick related to code added to read/write from cache, is it already merged?

It's not merged since the the production side (Cassandra server schema/user setup/credentials, environment for the isvc) isn't there yet. The code so far is here:

https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/995001

It still has some hardcoded elements in it. I want to update those once some of the prod-side stuff like credentials is in place.