Page MenuHomePhabricator

Send score to eventgate when requested
Open, Needs TriagePublic

Description

There is a complex workflow that powers the Kafka stream mediawiki.revision-score, used in various part of our infrastructure:

  • An editor makes a change to a wiki page, Mediawiki serves it and creates a mediawiki.revision-create event.
  • The event is sent to EventGate, that validates it against a schema and saves it in a Kafka topic (called mediawiki.revision-create).
  • Changeprop is instructed to read every event in the Kafka revision-create topic, calling ORES to get a score (with the /precache URI). The score is then sent to EventGate as mediawiki.revision-score event.
  • EventGate validates it, and inserts it into the related Kafka revision-score topic.

Multiple consumers of the mediawiki.revision-create topic use the data for various purposes:

The main problem with the above architecture is that ChangeProp has a special hack to call ORES and create the revision-score event, meanwhile it should be ORES that should send it (leaving ChangeProp more ignorant and simple).

Since we are building Lift Wing, we should probably add the config/code that it is needed to send the revision-score event when requested. Some thoughts:

  • The first thing that comes to mind is Knative eventing, but we are not currently deploying it via helm and our version of Knative is very old (since more recent ones require an higher version of K8s). So I'd avoid using it for the moment.
  • We could add a simple Python function to send an HTTP POST to EventGate, it should be relatively easy to do.
  • We can implement the above with a flag passed as parameter, but then we shouldn't expose it to API-Gateway (since we want only some clients to be able to trigger this). Changeprop will probably call the inference.discovery.wmnet endpoint, so it could be the only one allowed to use the feature flag.
  • To avoid duplicated events, we should probably use (at least for the moment) a new EventGate schema, not mediawiki.revision-score. We can come up with a new name and talk with Data Engineering about it. Then we could experiment with it, maybe instructing Changeprop to call Lift Wing once the MVP is ready. We'd have a nice test pipeline to see how things go, and where fixes are needed.

Details

ProjectBranchLines +/-Subject
operations/deployment-chartsmaster+14 -1
machinelearning/liftwing/inference-servicesmain+10 -14
machinelearning/liftwing/inference-servicesmain+21 -1
operations/deployment-chartsmaster+1 -3
operations/deployment-chartsmaster+14 -0
machinelearning/liftwing/inference-servicesmain+52 -28
operations/deployment-chartsmaster+5 -8
machinelearning/liftwing/inference-servicesmain+51 -27
operations/deployment-chartsmaster+12 -7
operations/deployment-chartsmaster+15 -0
operations/deployment-chartsmaster+27 -4
machinelearning/liftwing/inference-servicesmain+51 -27
integration/configmaster+6 -0
machinelearning/liftwing/inference-servicesmain+40 -29
machinelearning/liftwing/inference-servicesmain+129 -94
operations/deployment-chartsmaster+1 -1
machinelearning/liftwing/inference-servicesmain+1 -0
operations/deployment-chartsmaster+1 -1
machinelearning/liftwing/inference-servicesmain+2 -2
operations/deployment-chartsmaster+12 -5
machinelearning/liftwing/inference-servicesmain+114 -9
operations/deployment-chartsmaster+42 -0
operations/mediawiki-configmaster+9 -0
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
OpenNone
Openelukey

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I know nothing about kserve. Are the 'KServe services' that will respond with prediction KNative services? I'm sure there is some way with KNative Eventing to route events to those services, and then produce the results.

Yes correct, kserve is basically a controller that checks new resources of the InferenceService CRD, creating knative services etc.. So we don't directly control knative configs, but kserve ones. The support for eventing is sadly really low at the moment, we are stuck with knative 0.18 since they dropped support for k8s 1.16, so until we are able to upgrade we can't really use it (evening in super old versions is not super stable IIRC, and not very feature rich).

I discussed the use case with the team and what we'd like to strive for is a simple API that targets specific combinations of model/wiki, without following the road the ORES took about scoring the same revision with multiple models by default. It is costly in terms of resource, and we believe that most of the times people are interested only in one specific model type result rather than multiple ones when they check the Revision Score event stream. For us it would be very convenient to have something more granular like mediawiki.revision-score.<model-type>, so one stream for each kind of model. This would end up in the deprecation of revision score one, but it would be way more flexible for us in the long run (and probably for people interested in this data).

@Ottomata what do you think?

mediawiki.revision-score.<model-type>

Makes a lot of sense to me!

FYI in T308017: Design Schema for page state and page state with content (enriched) streams and T310082: [Shared Event Platform] - Research Flink Changelog semantics to inform POC MW schema design , we are changing the way we model state change event streams to be entity based. That is, rather than having a stream for each type of stage change (create, delete, etc), an individual state change stream should be useful for representing all state changes to the entity. E.g. a 'page-change' stream would be like combined page-create + page-delete + revision-create, but with a change_kind field set to one of create, delete or update. But, page-change on its own clearly won't have all of the possible state for a page. E.g. a page-content-change stream might have a very similar schema, but with extra fields to capture more state info, like the actual content.

If we have the ability to make new streams and schemas for revision scores, we should probably try to use the same model. So perhaps what we want a mediawiki/page-score-change schema, and multiple mediawiki.page-<model-type>-score-change streams (naming to-be-bikeshed :) that all are of mediawiki/page-score-change schema.

In that way, you can track the model score changes to pages by page entity and use that (and other page change streams) to keep some kind of externalized current state store or cache of the current score of all pages.

Change 808247 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] WIP - add support for revision-score events

https://gerrit.wikimedia.org/r/808247

@Ottomata thanks for all the insights! I like a lot the proposal for the page-<model>-score-change, +1

I chose to add an async kafka client to our code (see code review above, still wip) to avoid sending events to eventgate and test them directly on Kafka (IIRC we discussed this use case too, and IIUC pushing directly to kafka with the correct schema would also be acceptable). If you are ok I'd proceed in the following way:

  1. Use the current revision-score schema for the first version of the code, pushing to a test topic. This will allow my team to play with the new functionality and see how it goes (maybe even adding a test ChangeProp rule too).
  2. Once the page-change consolidation happens, we could decide the new schema for score-change as well and change the code (I'll make sure that it will be configurable etc..).

Does it make sense? Otherwise we can start working directly on the new schema, but from what I gathered above it may take time before it is completely done and I'd like to start testing asap.

Use the current revision-score schema for the first version of the code, pushing to a test topic

Sounds good!

I chose to add an async kafka client to our code [...] to avoid sending events to eventgate

Also sounds good. To do this right, the producer client should do all of the things eventgate does do ensure that the event is all valid n stuff. Ideally, we'd have an official Event Platform python lib for this. Want to make one? :) We have this for Java. Before produce the event, the lib must:

* - initialization of default values
*   - set meta.dt as the ingestion time (current time is fine)
*   - set meta.stream to the proper value (if not set by caller)
*   - $schema as the schema that can validate the generated event (if not set by caller)
*   - set meta.id to a uuid if not set
* - verification of schema vs stream (this requires looking up stream config)
* - validation of the event against its schema

Hm, I need to write more specific docs about this, as more folks are going to do this soon. Adding a todo here

Totally makes sense, thanks for the clarifications!

Given the work to do on the Python side (no time for the new lib sorry), I am wondering if we could (for the moment) do the following:

  1. Add a new stream to mediawiki-config called mediawiki.revision-score-editquality.
  2. Push events from Lift Wing, related to the editquality models, to eventgate. The events would follow the revision-score schema.

In this way Lift Wing would just need to send a POST to eventgate, without extra bits. We'd use the new stream to test the new workflow, waiting for the new events mentioned in T301878#8008932.

Would it be feasible? I had totally missed your comment related to a new stream earlier on, sorry for the confusion and thanks for the patience :)

Ya that sounds good!

FWIW, if you are just trying to test stuff, you are welcome to produce directly to a test kafka topic. There will be no validation or automated Hive ingestion, but you can at least test the stream content?

Let's bikeshed final stream/name schema later in CR. :)

Ya that sounds good!

FWIW, if you are just trying to test stuff, you are welcome to produce directly to a test kafka topic. There will be no validation or automated Hive ingestion, but you can at least test the stream content?

Yep yep but I'd prefer to test the whole pipeline, so the new stream seems better overall! If I can avoid a kafka dependency + code in my pods is also good :)

Let's bikeshed final stream/name schema later in CR. :)

Sure! Going to prep some work and add you later on. Thanks again!

Change 810007 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/mediawiki-config@master] Add a new Eventgate stream for revision-score events

https://gerrit.wikimedia.org/r/810007

Looks good. My only worry is that these making these new streams now, and planning to refactor their data model them later based on T308017 might confuse folks? How do you expect people to use 'mediawiki.revision-score-editquality' now?

Looks good. My only worry is that these making these new streams now, and planning to refactor their data model them later based on T308017 might confuse folks? How do you expect people to use 'mediawiki.revision-score-editquality' now?

There is a little risk yes, but people will likely reach out to ML or DE in case they want to use it and we can explain the use case. My idea is to test one new stream from end to end (possibly using Data Platform's workflows instead of ChangeProp) to validate Lift Wing and the whole pipeline while we wait for the new refactor. If you want we could rename it mediawiki.revision-score-editquality-test' or mediawiki.revision-score-test` to add more clarity, what do you think?

Otherwise I can go back to the kafka approach, but it would limit a bit the overall testing.

One thing to note: it is a bit difficult to change the schema a stream uses after the fact. So maybe while doing the test and development, using a 'test' stream name of some kind does make sense.

Ah ok I was assuming that the revision score streams would have changed their name anyway after T308017, but it may not be the case so keeping the names available is good.

I updated the mediawiki config change, thanks!

Ah ok I was assuming that the revision score streams would have changed their name anyway after T308017

Yeah they might, so maybe it doesn't matter! But, probably best to keep them in a 'test' namespace anyway unless you are excited about someone finding them and using them :)

@Ottomata if you have time I'd need some help in deploying https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/810007, I haven't done this in a while :) (otherwise I can add it to the deployment window schedule).

After the mw deployment, do I need to roll restart the eventgate main pods?

Change 810007 merged by jenkins-bot:

[operations/mediawiki-config@master] Add a new Eventgate stream for revision-score events

https://gerrit.wikimedia.org/r/810007

Mentioned in SAL (#wikimedia-operations) [2022-07-06T13:53:21Z] <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:810007|Add a new Eventgate stream for revision-score events (T301878)]] (duration: 03m 46s)

I tried to send an event manually with curl (see below) and I am getting:

context":{"message":"event 50571b69-47d4-4923-8502-8524fe5dfc61 of schema at /mediawiki/revision/score/2.0.0 destined to stream mediawiki.revision-score-test is not allowed in stream; mediawiki.revision-score-test is not configured.
{
   "$schema":"/mediawiki/revision/score/2.0.0",
   "meta":{
      "stream":"mediawiki.revision-score-test"
   },
   "database":"enwiki",
   "page_id":70553880,
   "page_title":"Victoria_Jubilee_Government_High_School",
   "page_namespace":0,
   "page_is_redirect":false,
   "performer":{
      "user_text":"Tmail4770",
      "user_groups":[
         "*",
         "user"
      ],
      "user_is_bot":false,
      "user_id":43710286,
      "user_registration_dt":"2022-04-12T11:55:08Z",
      "user_edit_count":1
   },
   "rev_id":1096766853,
   "rev_parent_id":1096765819,
   "rev_timestamp":"2022-07-06T14:14:40Z",
   "model_name":{
      "model_name":"goodfaith",
      "model_version":"0.5.1",
      "predictions":{
         "prediction":true,
         "probability":{
            "false":0.034048135316042005,
            "true":0.965951864683958
         }
      }
   }
}
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d @test-revision-score.json https://eventgate-main.discovery.wmnet:4492/v1/events

Yup! And just fixed that link, thank you!

Mentioned in SAL (#wikimedia-operations) [2022-07-07T13:19:29Z] <elukey> roll restart eventgate-main pods to add a new stream - T301878

The curl command above works now! I can see the eqiad.mediawiki.revision-score-test topic in Kafka main too.

Change 812010 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: Add knative and egress config for eventgate-main

https://gerrit.wikimedia.org/r/812010

Change 812010 merged by Elukey:

[operations/deployment-charts@master] ml-services: Add knative and egress config for eventgate-main

https://gerrit.wikimedia.org/r/812010

Change 808247 merged by Elukey:

[machinelearning/liftwing/inference-services@main] editquality: add support for revision-score events

https://gerrit.wikimedia.org/r/808247

Change 813633 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: test event production for editquality in ml-staging

https://gerrit.wikimedia.org/r/813633

Change 813633 merged by Elukey:

[operations/deployment-charts@master] ml-services: test event production for editquality in ml-staging

https://gerrit.wikimedia.org/r/813633

Change 813834 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] editquality: use json.dumps instead of urlencode for EventGate

https://gerrit.wikimedia.org/r/813834

Change 813834 merged by Elukey:

[machinelearning/liftwing/inference-services@main] editquality: use json.dumps instead of urlencode for EventGate

https://gerrit.wikimedia.org/r/813834

Change 813866 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: update image in ml-staging for enwiki editquality-goodfaith

https://gerrit.wikimedia.org/r/813866

Change 813866 merged by Elukey:

[operations/deployment-charts@master] ml-services: update image in ml-staging for enwiki editquality-goodfaith

https://gerrit.wikimedia.org/r/813866

Change 814098 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] editquality: set Content-type when sending events to EventGate

https://gerrit.wikimedia.org/r/814098

Change 814098 merged by Elukey:

[machinelearning/liftwing/inference-services@main] editquality: set Content-type when sending events to EventGate

https://gerrit.wikimedia.org/r/814098

Change 814718 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: update Docker image for editquality goodfaith

https://gerrit.wikimedia.org/r/814718

Change 814718 merged by Elukey:

[operations/deployment-charts@master] ml-services: update Docker image for editquality goodfaith

https://gerrit.wikimedia.org/r/814718

I was able to generate a revision-score-test event from the enwiki editquality goodfaith model in ml-staging (verified that the event landed correctly on kafka).

Next steps: extend the event generation code to all other revscoring-based models. We should probably use this task to figure out if there is a way to factor out some common code in a base/common module, to avoid code repetition (now and in the future).

Change 820134 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] editquality: refactor Blubber config to share code

https://gerrit.wikimedia.org/r/820134

Change 820134 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] editquality: refactor Blubber config to share code

https://gerrit.wikimedia.org/r/820134

Change 820381 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] editquality: move rev-id preprocess functions to a separate module

https://gerrit.wikimedia.org/r/820381

Change 820388 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] articlequality: add code to send events to EventGate

https://gerrit.wikimedia.org/r/820388

Change 820401 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] draftquality: add code to send events to EventGate

https://gerrit.wikimedia.org/r/820401

Change 820418 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] drafttopic: add code to send events to EventGate

https://gerrit.wikimedia.org/r/820418

Change 820456 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] outlink: move Blubber config to the new standard

https://gerrit.wikimedia.org/r/820456

Change 821166 had a related patch set uploaded (by Elukey; author: Elukey):

[integration/config@master] zuul: trigger inference-services Docker image rebuild with more files

https://gerrit.wikimedia.org/r/821166

Change 821167 had a related patch set uploaded (by Elukey; author: Elukey):

[machinelearning/liftwing/inference-services@main] python: Add more info about Docker image rebuild

https://gerrit.wikimedia.org/r/821167

Change 820381 merged by Elukey:

[machinelearning/liftwing/inference-services@main] editquality: move rev-id preprocess functions to a separate module

https://gerrit.wikimedia.org/r/820381

Change 820388 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] articlequality: add code to send events to EventGate

https://gerrit.wikimedia.org/r/820388

Change 821166 merged by jenkins-bot:

[integration/config@master] zuul: trigger inference-services Docker image rebuild with more files

https://gerrit.wikimedia.org/r/821166

Change 821247 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: update editquality's Docker image and settings

https://gerrit.wikimedia.org/r/821247

Change 821248 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: test the new Docker image for articlequality

https://gerrit.wikimedia.org/r/821248

Change 821247 merged by Elukey:

[operations/deployment-charts@master] ml-services: update editquality's Docker image and settings

https://gerrit.wikimedia.org/r/821247

Change 821248 merged by Elukey:

[operations/deployment-charts@master] ml-services: test the new Docker image for articlequality

https://gerrit.wikimedia.org/r/821248

Change 821263 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: add environment variables to editquality pods/isvcs

https://gerrit.wikimedia.org/r/821263

Change 821263 merged by Elukey:

[operations/deployment-charts@master] ml-services: add environment variables to editquality pods/isvcs

https://gerrit.wikimedia.org/r/821263

@Ottomata Hi! I am slowly rolling out the code to allow to all revscoring-based models to push mediawiki.revision-score events to EventGate main (precisely, to a test stream). Now I'd like to do the next step, namely having something that:

  • listens to mediawiki.revision-create events
  • based on some easy rules, decide what Lift Wing endpoint/model to call (for example, call articlequality/editquality/etc.. when an enwiki revision is created, etc..)

The current workflow, based on ChangeProp, assumes that ORES knows what models to call for any given revision/wiki combination. So ChangeProp simply calls ORES passing a revision-create event, and ORES generates as many scores as it is configured. Lift Wing is more granular and models are more independent on it, so our end goal is to have separate streams for separate models. I thought about Airflow at the beginning, but it would be a "batch" tool running every X minutes, meanwhile ChangeProp seems more oriented towards streaming. What do you think it is best? Any new tools in DE that I should try? :)

Hello! Separate streams for different models seems fine, but perhaps what you want are separate events for each model, not necessarily different streams? I guess it depends on how you expect people to consume these events. If you expect a given consumer to ever only be interested in a single model, then separate streams makes sense. However, if you expect a consumer to want some or all model scores, then keeping them in the same stream, or even in the same event, might be better. It makes reasoning about ordering easier. If a page is edited often, you'll want to make it as easy as possible to consume the scores in the order that the edits happen. The more events and streams you have the more you have to worry about ordering.

Anyway, more directly, I think you want a streaming model, no? We don't use spark streaming much, but here's a python example that pulls together Event Platform stuff like stream config and schemas, to get you a Spark streaming DataFrame: https://gist.github.com/ottomata/26e35c39ff4025d73238fba87c4e8e68

There are also ways to do a similar thing with Flink, which might be nice. Old example here (but don't use this as is, talk to me if you want to use Flink, we have better stuff now).

You could also just consume with a Kafka consumer :)

Change 821599 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: update articlequality's docker image

https://gerrit.wikimedia.org/r/821599

Change 820401 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] draftquality: add code to send events to EventGate

https://gerrit.wikimedia.org/r/820401

Change 821599 merged by Elukey:

[operations/deployment-charts@master] ml-services: update articlequality's docker image

https://gerrit.wikimedia.org/r/821599

Change 821696 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: update settings for draftquality to test the new image

https://gerrit.wikimedia.org/r/821696

Change 820418 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] drafttopic: add code to send events to EventGate

https://gerrit.wikimedia.org/r/820418

Change 821696 merged by Elukey:

[operations/deployment-charts@master] ml-services: update settings for draftquality to test the new image

https://gerrit.wikimedia.org/r/821696

Change 821699 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: apply the staging's draftquality image to prod

https://gerrit.wikimedia.org/r/821699

Change 821699 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: apply the staging's draftquality image to prod

https://gerrit.wikimedia.org/r/821699

Change 820456 merged by Elukey:

[machinelearning/liftwing/inference-services@main] outlink: move Blubber config to the new standard

https://gerrit.wikimedia.org/r/820456

Change 821167 merged by Elukey:

[machinelearning/liftwing/inference-services@main] python: Add more info about Docker image rebuild

https://gerrit.wikimedia.org/r/821167

Change 821749 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: update drafttopic docker image and settings

https://gerrit.wikimedia.org/r/821749

Change 821749 merged by Elukey:

[operations/deployment-charts@master] ml-services: update drafttopic docker image and settings

https://gerrit.wikimedia.org/r/821749

All revscoring-based models are now able to accept a revision-create event, generate a revision-score one and send it to EventGate! \o/

Hello! Separate streams for different models seems fine, but perhaps what you want are separate events for each model, not necessarily different streams? I guess it depends on how you expect people to consume these events. If you expect a given consumer to ever only be interested in a single model, then separate streams makes sense. However, if you expect a consumer to want some or all model scores, then keeping them in the same stream, or even in the same event, might be better. It makes reasoning about ordering easier. If a page is edited often, you'll want to make it as easy as possible to consume the scores in the order that the edits happen. The more events and streams you have the more you have to worry about ordering.

Definitely, the ordering is a valid point. From my team's perspective, people should care about single model predictions (in theory) so we are more oriented towards the separate streams way. Moreover the Lift Wing architecture points people to this view of things, since the API doesn't offer a way (nor the way that models are deployed via KServe) to score a rev-id with multiple models and aggregate the result. We can proceed with this assumption and change if anybody has doubts/suggestions as we move forward :)

Anyway, more directly, I think you want a streaming model, no? We don't use spark streaming much, but here's a python example that pulls together Event Platform stuff like stream config and schemas, to get you a Spark streaming DataFrame: https://gist.github.com/ottomata/26e35c39ff4025d73238fba87c4e8e68

Very nice thanks a lot! So this spark job would run on the Hadoop cluster as always, configured via puppet etc.. It may be a good alternative to the current ChangeProp ORES config and code!

There are also ways to do a similar thing with Flink, which might be nice. Old example here (but don't use this as is, talk to me if you want to use Flink, we have better stuff now).

You could also just consume with a Kafka consumer :)

Will explore some options and report back, thanks as always!

So this spark job would run on the Hadoop cluster as always, configured via puppet

We don't currently maintain any streaming jobs in Hadoop for production, and I'd be reluctant to do so. However, its a great way to develop!

You'd probably want to run whatever streaming job in k8s.