The Revise Tone Task Generator is deployed on production.
According to our Istio Dashboard, our traffic is oscillating between 1-2 requests per second with us responding in ~180ms at p0.5, ~500ms at p0.95 and in ~1s in p0.99. This traffic is coming from Changeprop and those are edit events on en, pt, fr, ar and test wikis. We don't expect any other traffic then Changeprop.
We are currently able to sustain the traffic, however if the traffic would at least double, we want to explore our options for scaling without the need to use multiple GPUs to save resources:
- Explore the performance of CPU-only deployment
- Explore running multiple workers in one pod. This would enable us to use 1 GPU for multiple workers, but this has not been yet done with any other LiftWing model.