Page MenuHomePhabricator

Editing Suggestions - api
Open, Needs TriagePublic

Description

Add editing-suggestions KServe model server

Introduce a lookup-based inference service that serves pre-computed
editing suggestions by wiki_id and page_title, with pipeline, Docker
Compose, and unit test scaffolding.

Currently, Editing Suggestions work by fetching suggestions from model api e.g. for addlink they use https://api.wikimedia.org/service/linkrecommendation/apidocs/

Therefore, until the next phases, we want implement an api to expose suggestions. So that, we can directly fetch them from api on ui.
This also requires making the api public so that it's easier to fetch from ui.

Event Timeline

Change #1296559 had a related patch set uploaded (by Ozge; author: Ozge):

[integration/config@master] feat: adds editing suggestions

https://gerrit.wikimedia.org/r/1296559

Change #1296559 merged by jenkins-bot:

[integration/config@master] inference-services: Add LLM generated editing suggestions CI/CD pipelines.

https://gerrit.wikimedia.org/r/1296559

Mentioned in SAL (#wikimedia-releng) [2026-06-03T08:26:39Z] <hashar> Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1296559 "inference-services: Add LLM generated editing suggestions CI/CD pipelines." # T427794

Change #1297106 had a related patch set uploaded (by Ozge; author: ozge):

[operations/deployment-charts@master] feat: adds editing suggestions to ml experimental

https://gerrit.wikimedia.org/r/1297106

Change #1297106 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: add editing-suggestions isvc to experimental (ml-staging-codfw)

https://gerrit.wikimedia.org/r/1297106

deployment fails with the following message.

kubectl describe revision editing-suggestions-predictor-00001

    Last Failure Info:
      Exit Code:  1
      Message:    2026-06-04 10:04:27.797 1 storage.initializer INFO [storage-initializer-entrypoint:<module>():16] Initializing, args: (src_uri, dest_path): [('s3://wmf-ml-models/editing-suggestions/v1/', '/mnt/models')]
2026-06-04 10:04:27.797 1 storage.initializer INFO [kserve_storage.py:download():161] Copying contents of s3://wmf-ml-models/editing-suggestions/v1/ to local
2026-06-04 10:04:27.853 1 storage.initializer INFO [kserve_storage.py:_get_s3_client_kwargs():361] ca bundle file(/etc/ssl/certs/wmf-ca-certificates.crt) exists.
2026-06-04 10:04:28.027 1 storage.initializer ERROR [storage-initializer-entrypoint:<module>():21] Storage initialization failed: Failed to fetch model. No model found in editing-suggestions/v1/.

As I understand it's due to csv file not accepted by kserve. I'll try a pkl file and look for other solutions if it does not work.

Change #1297687 had a related patch set uploaded (by Ozge; author: Ozge):

[machinelearning/liftwing/inference-services@main] Fix editing-suggestions KServe model download

https://gerrit.wikimedia.org/r/1297687

Change #1297693 had a related patch set uploaded (by Ozge; author: Ozge):

[operations/deployment-charts@master] ml-services: editing-suggestions updating storage uri

https://gerrit.wikimedia.org/r/1297693

Change #1297687 abandoned by Ozge:

[machinelearning/liftwing/inference-services@main] Fix editing-suggestions KServe model download

Reason:

it can read csv

https://gerrit.wikimedia.org/r/1297687

Change #1297693 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: editing-suggestions updating storage uri

https://gerrit.wikimedia.org/r/1297693

We can easily handle 250 req/sec (probably more).
As I understand from addalink dashboards, we currently get ~7 req/sec from editing suggestions.

(venv) ozge@stat1010:~/repos/wiki/gerrit/inference-services/test/locust$ MODEL=editing_suggestions locust   --host=https://inference-staging.svc.codfw.wmnet:30443   --users 200   --spawn-rate 10   --run-time 30s

[2026-06-04 14:40:56,066] stat1010/INFO/locust.main: Run time limit set to 30 seconds
[2026-06-04 14:40:56,066] stat1010/INFO/locust.main: Starting Locust 2.31.5
[2026-06-04 14:40:56,067] stat1010/INFO/locust.runners: Ramping to 200 users at a rate of 10.00 per second
[2026-06-04 14:41:15,079] stat1010/INFO/locust.runners: All users spawned: {"EditingSuggestions": 200} (200 total users)
[2026-06-04 14:41:25,467] stat1010/INFO/locust.main: --run-time limit reached, shutting down
Load test results are within the threshold
[2026-06-04 14:41:25,607] stat1010/INFO/locust.main: Shutting down (exit code 0)
Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s
--------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
POST     /v1/models/editing-suggestions:predict                                          7396     0(0.00%) |     40      31     299     34 |  250.72        0.00
--------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
         Aggregated                                                                      7396     0(0.00%) |     40      31     299     34 |  250.72        0.00

Response time percentiles (approximated)
Type     Name                                                                                  50%    66%    75%    80%    90%    95%    98%    99%  99.9% 99.99%   100% # reqs
--------|--------------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------
POST     /v1/models/editing-suggestions:predict                                                 34     34     35     35     35     45    220    250    280    300    300   7396
--------|--------------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------
         Aggregated                                                                             34     34     35     35     35     45    220    250    280    300    300   7396

Change #1297717 had a related patch set uploaded (by Ozge; author: Ozge):

[operations/deployment-charts@master] ml-services: editing-suggestions eqiad deployment

https://gerrit.wikimedia.org/r/1297717

Change #1297725 had a related patch set uploaded (by Ozge; author: Ozge):

[machinelearning/liftwing/inference-services@main] Adds editing-suggestions locust test

https://gerrit.wikimedia.org/r/1297725

Change #1297717 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: editing-suggestions eqiad deployment

https://gerrit.wikimedia.org/r/1297717

Change #1297748 had a related patch set uploaded (by Ozge; author: Ozge):

[operations/deployment-charts@master] ml-services: makes editing-suggestions eqiad publicly available

https://gerrit.wikimedia.org/r/1297748

Change #1298101 had a related patch set uploaded (by Ozge; author: Ozge):

[operations/deployment-charts@master] ml-services: makes editing-suggestions available in both eqiad and codfw

https://gerrit.wikimedia.org/r/1298101

Change #1298101 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: makes editing-suggestions available in both eqiad and codfw

https://gerrit.wikimedia.org/r/1298101