Page MenuHomePhabricator

Add Multilingual RevertRisk predictions to mediawiki.page_revert_risk_prediction_change
Open, Needs TriagePublic

Description

Slack thread: https://wikimedia.slack.com/archives/C05F8ERE2CV/p1769524602443549

Use Case

Query multilingual revert risk scores for a month of revisions via SQL instead of multiple API calls, needed for analysis in T374698.

Currently, the event.mediawiki_page_revert_risk_prediction_change_v1 table contains only RRLA (revertrisk-language-agnostic) predictions, not revertrisk-multilingual predictions.

Proposed Solution

Produce Multilingual RevertRisk predictions to the mediawiki.page_revert_risk_prediction_change stream, following the RRLA implementation approach. The Multilingual model is slower and more resource-intensive than RRLA, so model optimization may be needed first.

Related Work

Event Timeline

Sounds easy enough (as long as the model can scale for scoring every revision! :) )

Should these be emitted into the same stream, or should we make a new stream for this?

Right now there is one language agnostic revert risck score event per revision in the mediawiki.page_revert_risk_prediction_change. The data model we chose does not allow us to put multiple model predictions in the same event. If we put the the multilingual ones in the existent stream, there will be multiple prediction events per revision.

This might be okay, but perhaps it makes more sense to put the multilingual prediction events into their own stream? They really are a different model prediction, yes?

If someone wants to consume both kinds of revert risk predictions, they can still do so from multiple streams at once, and it will be pretty much equivalent to having them in the same stream. They will end up in different Hive tables in the Data Lake though.

Should these be emitted into the same stream, or should we make a new stream for this?

I was thinking to use the same stream. When I proposed the name in T326179#10711809, my idea was to put all the predictions from revert-risk models (rr-language-agnostic, rr-multilingual, rr-wikidata) in one stream.

This might be okay, but perhaps it makes more sense to put the multilingual prediction events into their own stream? They really are a different model prediction, yes?

I'm okay with multiple prediction events per revision in the same stream. But I get your point - having one prediction event per revision is cleaner from a design perspective. From a model perspective, these are different models but they're serving the same purpose (predicting the risk of being reverted).

Does using the same stream have any risks or maintenance issues compared to having their own stream?

Also looking at the query in T405358#11557401, it seems that putting the prediction results in the same table would be more convenient for users to query.

When I proposed the name in T326179#10711809,

Thank you for the historical context and reminder! And also for the future-proofy thinking! <3

Does using the same stream have any risks or maintenance issues compared to having their own stream?

No maintenance issues.

I'm only a little worried about the expectation for users of a change stream like this. This stream is exposed in EventStreams. What will happen if folks start getting multiple events (with different model prediction) for the same revision? I'd expect the latest event in a change(log) to represent the current state of whatever the key is. The (informal) key on this is page_id (and rev_id? maybe.), so I'd expect the latest event to contain the latest (current) revert risk prediction for the page. If there are 2 events for the same page_id (and rev_id?), and I was trying to keep some downstream state up to date based on the events, I would not expect to have to save (and merge) state from multiple events. I would overwrite whatever I have for the key with the latest event's state.

Answering @gkyziridis's questions:

… what kind of optimization do you have in mind?

I meant optimizing for latency and throughput. Since the model server will need to handle every new edit once we produce rr-multilingual predictions to an event stream. The source is page_change event stream (every Wikipedia edit triggers a predict request), so the model server needs to be fast enough to keep up with the incoming edit rate.

… can we just use the same functionality but for the multilingual one?

Yes, we can reuse the existing RRLA functionality for the multilingual one. Note that we may need to produce predictions to a separate stream instead of mediawiki.page_revert_risk_prediction_change. This is currently being discussed with Andrew above.

https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Streams is a good resource to start, or the previous work on RRLA. This work requires changes in different repos: inference-service, changeprop, and mediawiki-config.

we may need to produce predictions to a separate stream instead of mediawiki.page_revert_risk_prediction_change.

Another option: make a .v2 stream with a different/new or just new major version 2.0.0 schema that supports multiple model predictions per event, either via a array of them, or a map of them. The downside would be that evolving the items in the array or map would not be easily supported (it's complicated).

Milimetric subscribed.

Andrew - what's the timeline on this for DE? Do we just track it in radar and advise or does it need active development from us?

Hey, I am working on this, I think that I have finished the implementation for publishing the predictions in events. I am now testing it locally.
Based on this: https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Streams I think there are these steps:

  1. Implementation on inference-services side (this is what I am testing).
  2. Test it and deploy the new model server versions.
  3. Configure Changeprop.
  4. Configure the new changes in the mediawiki-config repo.

I will update the task after the testing is finished.

Andrew - what's the timeline on this for DE? Do we just track it in radar and advise or does it need active development from us?

@Milimetric, radar is likely fine. ML has all the tools they need to do this. Event Platform folks should advise on data modeling, but I don't think we need to track this in a DE sprint.

@achou Do we know if we have any active consumers of mediawiki.page_revert_risk_prediction_change stream or event.mediawiki_page_revert_risk_prediction_change_v1 table? Is the stream used in recentchanges somehow? Should/could it be?

Whatever yall decide, some documentation somewhere explicitly descirbe what the this dataset is meant to represent and how it is expected to be used.

Update

Finished the implementation of the event mechanism in inference-services for the rr-multilingual model. \
This is the local testing on my machine:

# Run recertrisk-multilingual model server locally
$ make revertrisk-multilingual
MODEL_PATH=models/revertrisk/multilingual/20230810110019/model.pkl MODEL_NAME=revertrisk-multilingual \
         \
        my_venv/bin/python src/models/revert_risk_model/model_server/model.py \

Model Server: RevertRiskMultilingualGPU
INFO:root:Successfully loaded 342 canonical wiki languages.
WARNING:root:CUDA is not available or PyTorch is CPU-only; using CPU instead.
2026-02-11 10:30:07.482 71669 kserve INFO [model_server.py:register_model():402] Registering model: revertrisk-multilingual
2026-02-11 10:30:07.483 71669 kserve INFO [model_server.py:setup_event_loop():282] Setting max asyncio worker threads as 20
2026-02-11 10:30:07.517 71669 kserve INFO [server.py:_register_endpoints():110] OpenAI endpoints not registered
2026-02-11 10:30:07.517 71669 kserve INFO [server.py:start():161] Starting uvicorn with 1 workers
2026-02-11 10:30:07.555 71669 uvicorn.error INFO:     Started server process [71669]
2026-02-11 10:30:07.555 71669 uvicorn.error INFO:     Waiting for application startup.
2026-02-11 10:30:07.557 71669 kserve INFO [server.py:start():70] Starting gRPC server with 4 workers
2026-02-11 10:30:07.557 71669 kserve INFO [server.py:start():71] Starting gRPC server on [::]:8081
2026-02-11 10:30:07.557 71669 uvicorn.error INFO:     Application startup complete.
2026-02-11 10:30:07.557 71669 uvicorn.error INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
INFO:root:Received request for revision 456789 (en).
INFO:root:Opening a new Asyncio session for mwapi.
ERROR:root:EVENTGATE_URL or EVENTGATE_STREAM is not configured; skipping event emission for multilingual revert risk.
2026-02-11 10:30:22.831 71669 kserve.trace requestId: N.A., preprocess_ms: 340.643882751, explain_ms: 0, predict_ms: 395.059108734, postprocess_ms: 0.006914139
2026-02-11 10:30:22.831 uvicorn.access INFO:     127.0.0.1:54578 71669 - "POST /v1/models/revertrisk-multilingual%3Apredict HTTP/1.1" 200 OK
2026-02-11 10:30:22.831 71669 kserve.trace kserve.io.kserve.protocol.rest.v1_endpoints.predict: 0.7373721599578857 ['http_status:200', 'http_method:POST', 'time:wall']
2026-02-11 10:30:22.831 71669 kserve.trace kserve.io.kserve.protocol.rest.v1_endpoints.predict: 0.7488080000000004 ['http_status:200', 'http_method:POST', 'time:cpu']

Create a file sample_page_change.json:

{
  "event": {
    "$schema": "/mediawiki/page/change/1.0.0",
    "dt": "2025-01-01T00:00:00Z",
    "meta": {
      "domain": "en.wikipedia.org",
      "request_id": "test-request-id",
      "uri": "https://en.wikipedia.org/wiki/Foo"
    },
    "wiki_id": "enwiki",
    "page": {
      "page_id": 123,
      "page_title": "Foo",
      "namespace_id": 0,
      "is_redirect": false
    },
    "revision": {
      "rev_id": 456789,
      "rev_dt": "2025-01-01T00:00:00Z",
      "rev_parent_id": 456788
    },
    "performer": {
      "user_text": "Example",
      "groups": [],
      "is_bot": false
    }
  }
}

In another terminal sent the mock event:

# Hit the localhist API
$ curl -s localhost:8080/v1/models/revertrisk-multilingual:predict \
  -H 'Content-Type: application/json' \
  -d @sample_page_change.json

# Response: 
{"model_name":"revertrisk-multilingual","model_version":"4","wiki_db":"enwiki","revision_id":456789,"output":{"prediction":false,"probabilities":{"true":0.10331014929193036,"false":0.8966898507080696}}}%

I will configure changeprop and then deploy it on staging for extra testing.

Change #1238685 had a related patch set uploaded (by Gkyziridis; author: Gkyziridis):

[machinelearning/liftwing/inference-services@main] Revertrisk-multilingual: Add predictions to events stream.

https://gerrit.wikimedia.org/r/1238685

Change #1238692 had a related patch set uploaded (by Gkyziridis; author: Gkyziridis):

[operations/deployment-charts@master] changeprop: Add revertrisk-multilingual model ti changeprop staging configuration.

https://gerrit.wikimedia.org/r/1238692

Hi @gkyziridis and @achou!

Before deploying to production, we should resolve the data modeling questions! I don't think the current approach is quite right, but the alternatives are quite annoying. If we go with the current approach, it should be well documented so that users of this data know to expect 2 prediction update events for the same revision, and not to apply them both to the same downstream state store.

These change streams are modeled as changelog events. Each event in the stream is represents the updated state for a given key (in this case, the key is (wiki_id, page_id). The latest event with the same key can be used to overwrite the full state.

See also:
T308017: Design Schema for page state and page state with content (enriched) streams
T310082: [Shared Event Platform] - Research Flink Changelog semantics to inform POC MW schema design

@Ottomata thank you for the comments.
We are not in the state to deploy this on production. I just built it like this in order to understand the flow and test it on staging as well.
Currently many people from our team are absent, so we will make the final decisions when they are back.
For now I just implemented this and we can test things on staging.
I will experiment with the alternatives as well:

we may need to produce predictions to a separate stream instead of mediawiki.page_revert_risk_prediction_change.

Another option: make a .v2 stream with a different/new or just new major version 2.0.0 schema that supports multiple model predictions per event, either via a array of them, or a map of them. The downside would be that evolving the items in the array or map would not be easily supported (it's complicated).

Whatever we decide we will first post it here in the discussion and we will fully document it.
Then we will deploy on prod.

Great stuff, thank you so much!

Update

I was experimenting with the option:

Another option: make a .v2 stream with a different/new or just new major version 2.0.0 schema that supports multiple model predictions per event, either via a array of them, or a map of them. The downside would be that evolving the items in the array or map would not be easily supported (it's complicated).

And I found it kinda complicated. I think we can go with the option that we are creating a different (dedicated) stream for the rr-multilingual predictions, something like: EVENTGATE_STREAM=mediawiki.page_revert_risk_multilingual_prediction_change.v1, this will separate the stream right ?
This way we have two different streams pointing to the same schema, and in the deployment charts we set the corresponding EVENT_STREAM value for each of the rr models. We also set the correct values under the changeprop so we maintain two different streams.

How does this sound folks?
@Ottomata and @achou

creating a different (dedicated) stream for the rr-multilingual predictions

That is my preferred option! Sounds good to me!

How does this sound folks?

@gkyziridis Sounds good to me! :)

EVENTGATE_STREAM=mediawiki.page_revert_risk_multilingual_prediction_change.v1, this will separate the stream right ?

In addition to this, we'll need to create a mediawiki-config change like https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1133603 (this is for adding the RRLA stream).
And please test the whole workflow in staging changeprop + staging Lift Wing first before moving to production.

Update

I updated and test the model on docker using newer blubber version etc.
I am pasting the results from local testing on my machine:

Build Image:

docker build -f .pipeline/revertrisk/multilingual.yaml --target production --platform=linux/amd64 -t multilingual:events .

Running the image:

docker run -p 8080:8080 \
  -v $(pwd)/models/revertrisk/multilingual/20230810110019:/mnt/models \
  -e MODEL_NAME="revertrisk-multilingual" \
  -e MODEL_PATH="/mnt/models/model.pkl" \
  -e ALLOW_REVISION_JSON_INPUT="true" \
  -e FORCE_HTTP="true" \
  -e EVENTGATE_URL="http://host.docker.internal:8192" \
  -e EVENTGATE_STREAM="eqiad.mediawiki.page_revert_risk_prediction_change" \
  --platform linux/amd64 \
  --add-host=host.docker.internal:host-gateway \
  multilingual:events

In a second terminal create a mock python http server to mimic EventGate by running the following:

python3 -c "
import http.server
import socketserver

class Handler(http.server.BaseHTTPRequestHandler):
    def do_POST(self):
        content_length = int(self.headers.get('Content-Length', 0))
        body = self.rfile.read(content_length)
        print('\n=== EventGate Event Received ===')
        print(body.decode('utf-8'))
        print('================================\n')
        self.send_response(200)
        self.end_headers()
        self.wfile.write(b'OK')
    def log_message(self, format, *args):
        pass

socketserver.TCPServer.allow_reuse_address = True
with socketserver.TCPServer(('0.0.0.0', 8192), Handler) as httpd:
    print('Mock EventGate on 0.0.0.0:8192')
    httpd.serve_forever()
"

You can also avoid python and run a netcat mock eventgate:

nc -l 8192

In a third terminal send a fake event by running:

curl -X POST http://localhost:8080/v1/models/revertrisk-multilingual:predict \
  -H "Content-Type: application/json" \
  -d '{
    "event": {
      "$schema": "/mediawiki/page/change/1",
      "meta": {
        "domain": "en.wikipedia.org",
        "stream": "mediawiki.page_change"
      },
      "wiki_id": "enwiki",
      "page": {
        "page_id": 12345,
        "page_title": "Test Article",
        "page_namespace": 0
      },
      "revision": {
        "rev_id": 123456789
      },
      "user_id": 98765,
      "user_text": "TestUser",
      "timestamp": "2026-02-19T12:00:00Z"
    }
  }'
Results:

Docker Terminal (Server side):

INFO:root:Model Server: RevertRiskMultilingualGPU
INFO:root:Successfully loaded 342 canonical wiki languages.
WARNING:root:CUDA is not available or PyTorch is CPU-only; using CPU instead.
2026-02-24 15:26:07.120 1 kserve INFO [model_server.py:register_model():402] Registering model: revertrisk-multilingual
2026-02-24 15:26:07.121 1 kserve INFO [model_server.py:setup_event_loop():282] Setting max asyncio worker threads as 20
2026-02-24 15:26:07.166 1 kserve INFO [server.py:_register_endpoints():110] OpenAI endpoints not registered
2026-02-24 15:26:07.166 1 kserve INFO [server.py:start():161] Starting uvicorn with 1 workers
2026-02-24 15:26:07.188 1 uvicorn.error INFO:     Started server process [1]
2026-02-24 15:26:07.188 1 uvicorn.error INFO:     Waiting for application startup.
2026-02-24 15:26:07.217 1 kserve INFO [server.py:start():70] Starting gRPC server with 4 workers
2026-02-24 15:26:07.217 1 kserve INFO [server.py:start():71] Starting gRPC server on [::]:8081
2026-02-24 15:26:07.220 1 uvicorn.error INFO:     Application startup complete.
2026-02-24 15:26:07.220 1 uvicorn.error INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
INFO:root:Received request for revision 123456789 (en).
INFO:root:Opening a new Asyncio session for mwapi.
INFO:root:Opening a new Asyncio session for eventgate.
2026-02-24 15:26:09.819 1 kserve.trace requestId: N.A., preprocess_ms: 626.885652542, explain_ms: 0, predict_ms: 379.244089127, postprocess_ms: 0.032663345
/opt/lib/venv/lib/python3.11/site-packages/fastapi/routing.py:313: FastAPIDeprecationWarning: ORJSONResponse is deprecated, FastAPI now serializes data directly to JSON bytes via Pydantic when a return type or response model is set, which is faster and doesn't need a custom response class. Read more in the FastAPI docs: https://fastapi.tiangolo.com/advanced/custom-response/#orjson-or-response-model and https://fastapi.tiangolo.com/tutorial/response-model/
  return await dependant.call(**values)
2026-02-24 15:26:09.821 uvicorn.access INFO:     172.17.0.1:58338 1 - "POST /v1/models/revertrisk-multilingual%3Apredict HTTP/1.1" 200 OK
2026-02-24 15:26:09.821 1 kserve.trace kserve.io.kserve.protocol.rest.v1_endpoints.predict: 1.012237548828125 ['http_status:200', 'http_method:POST', 'time:wall']
2026-02-24 15:26:09.821 1 kserve.trace kserve.io.kserve.protocol.rest.v1_endpoints.predict: 0.9870409999999978 ['http_status:200', 'http_method:POST', 'time:cpu']

Mock HTTP Listener Terminal(mock EventGate):

Mock EventGate on 0.0.0.0:8192

=== EventGate Event Received ===
{"$schema": "mediawiki/page/prediction_classification_change/1.2.0", "meta": {"stream": "eqiad.mediawiki.page_revert_risk_prediction_change", "id": "ac3ffbe2-b947-42a7-b24f-667a31375453", "domain": "en.wikipedia.org"}, "wiki_id": "enwiki", "page": {"page_id": 12345, "page_title": "Test Article", "page_namespace": 0}, "revision": {"rev_id": 123456789}, "user_id": 98765, "user_text": "TestUser", "timestamp": "2026-02-19T12:00:00Z", "predicted_classification": {"model_name": "revertrisk-multilingual", "model_version": "4", "predictions": ["false"], "probabilities": {"true": 0.20824493352400839, "false": 0.7917550664759916}}}
================================

Curl Terminal (sending the mock event):

{"model_name":"revertrisk-multilingual","model_version":"4","wiki_db":"enwiki","revision_id":123456789,"output":{"prediction":false,"probabilities":{"true":0.20824493352400839,"false":0.7917550664759916}}}%

Change #1238685 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] Revertrisk-multilingual: Add predictions to events stream.

https://gerrit.wikimedia.org/r/1238685

Change #1247025 had a related patch set uploaded (by Gkyziridis; author: Gkyziridis):

[operations/deployment-charts@master] changeprop: Add revertrisk-multilingual model ti changeprop staging configuration.

https://gerrit.wikimedia.org/r/1247025

Change #1238692 abandoned by Gkyziridis:

[operations/deployment-charts@master] changeprop: Add revertrisk-multilingual model ti changeprop staging configuration.

Reason:

Abandon because of conflicts. Conflicts solved in 1247025

https://gerrit.wikimedia.org/r/1238692

Change #1247025 merged by jenkins-bot:

[operations/deployment-charts@master] changeprop: Add revertrisk-multilingual model to changeprop staging configuration.

https://gerrit.wikimedia.org/r/1247025

Change #1249294 had a related patch set uploaded (by Gkyziridis; author: Gkyziridis):

[machinelearning/liftwing/inference-services@main] rr-multilingual: Fix typo for using gpu.

https://gerrit.wikimedia.org/r/1249294

Change #1249294 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] rr-multilingual: Fix typo for using gpu.

https://gerrit.wikimedia.org/r/1249294

Change #1249330 had a related patch set uploaded (by Gkyziridis; author: Gkyziridis):

[operations/deployment-charts@master] ml-services: Deploy the newest version of rr-multulingual model on staging.

https://gerrit.wikimedia.org/r/1249330

Change #1249330 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: Deploy the newest version of rr-multulingual model on staging.

https://gerrit.wikimedia.org/r/1249330