⚓ T356256 Epic: Implement prototype inference service that uses Cassandra for request caching

Status	Assigned	Task
Open	klausman	T348155 Goal: Decide on an optional Lift Wing caching strategy for model servers
Open	klausman	T356256 Epic: Implement prototype inference service that uses Cassandra for request caching
Resolved	klausman	T360428 Add Istio (and related) config to allow LW isvcs to talk to ML Cassandra machines

Event Timeline

klausman created this task.Jan 31 2024, 10:17 AM

klausman moved this task from Unsorted to In Progress on the Machine-Learning-Team board.Feb 6 2024, 3:32 PM

klausman updated the task description. (Show Details)Feb 13 2024, 3:19 PM

klausman updated the task description. (Show Details)

kevinbazira subscribed.Feb 13 2024, 3:26 PM

Nice progress! I have a couple of questions/points/etc.., nothing urgent, just writing them there to not forget them:

What is the schema selected for the data stored in Cassandra? We should document it in here so people can find it, and probably discuss the replication strategy etc.. (for example, do we want to eventually be able to replicate a write to eqiad in codfw and vice-versa? etc.. Cassandra does a lot of things automatically but they need to be stated).
Related to the point above, ideally long term all Cassandra data/workloads will be managed by SRE's Data Persistence. We should keep them in the loop to be able to know if we can, at the end of the experiment, think about a handover of if it will be something to maintain on our side.
I see a tick related to code added to read/write from cache, is it already merged?

In T356256#9583456, @elukey wrote:

What is the schema selected for the data stored in Cassandra? We should document it in here so people can find it, and probably discuss the replication strategy etc.. (for example, do we want to eventually be able to replicate a write to eqiad in codfw and vice-versa? etc.. Cassandra does a lot of things automatically but they need to be stated).

For the POC, the schema is very model-specific (or rather, specific to the JSON returned by it). For the result cache (i.e. what's sent to the client, it has these fields:

 wiki_db | rev_id | model_version | last_used                       | p_false  | p_true   | prediction
---------+--------+---------------+---------------------------------+----------+----------+------------
  frwiki | 153221 |             3 | 2024-02-28 15:26:39.164000+0100 | 0.722097 | 0.277903 |      False

which is derived from this JSON:

{
  "model_name": "revertrisk-language-agnostic",
  "model_version": "3",
  "wiki_db": "frwiki",
  "revision_id": 153221,
  "cache_hit": false,
  "output": {
    "prediction": false,
    "probabilities": {
      "true": 0.2779030203819275,
      "false": 0.7220969796180725
    }
  }
}

The last_used field is not used for expiration, but rather Cassandra's own expiry of rows/fields (TTLs). The field is in there mostly for debugging, but It may also be useful in the future to cap cache size at X number of rows or similar (if TTL expiry is not fast enough).

If caching for the preprocessing stage is desired, then the schema becomes more complex, as the preprocessing step can have very deep and large objects as a result. I did a bit of exploratory coding there, but soon realized that for some objects the only option is a hex/base64-ified Python pickle which has interesting size implications.

Related to the point above, ideally long term all Cassandra data/workloads will be managed by SRE's Data Persistence. We should keep them in the loop to be able to know if we can, at the end of the experiment, think about a handover of if it will be something to maintain on our side.

So far I've been operating on the assumption that we use the dedicated ml-caching hosts for our caches.

I see a tick related to code added to read/write from cache, is it already merged?

It's not merged since the the production side (Cassandra server schema/user setup/credentials, environment for the isvc) isn't there yet. The code so far is here:

https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/995001

It still has some hardcoded elements in it. I want to update those once some of the prod-side stuff like credentials is in place.

Ottomata subscribed.Wed, May 29, 11:17 AM

klausman closed subtask T360428: Add Istio (and related) config to allow LW isvcs to talk to ML Cassandra machines as Resolved.Tue, Jun 4, 2:47 PM

Epic: Implement prototype inference service that uses Cassandra for request caching
Open, Needs TriagePublic
Actions

Description

Related Objects
Search...

Event Timeline

Epic: Implement prototype inference service that uses Cassandra for request cachingOpen, Needs TriagePublicActions

Description

Related ObjectsSearch...

Event Timeline

Epic: Implement prototype inference service that uses Cassandra for request caching
Open, Needs TriagePublic
Actions

Related Objects
Search...