We are currently publishing events from Lift Wing to Event Gate:
- All revscoring model servers can read a page-change event, generate revision-score event and publish it to Event Gate (the stream name varies, usually it is mediawiki.revision_score_$model.
- Outlink model server can read a page-change event, generate a prediction_classification_change event and publish it to Event Gate (the stream name is mediawiki.page_outlink_topic_prediction_change.v1.
The stream names are defined in MediaWiki config. When we test in staging we have the following setting:
- revscoring model servers POST events to Event Gate production, stream mediawiki.revision-score-test.
- outlink model server POST events to Event Gate production as well, but using the "prod" stream (mediawiki.page_outlink_topic_prediction_change.v1).
The latter is not ideal of course because we cannot easily test the pipeline without interfering with the prod streams. It is unclear what is the future of our streams, if Lift Wing will emit events in the future of if a stream processor will do for it instead, but we should find a strategy for the current settings.
Multiple possibilities:
- We keep using Event Gate production for Lift Wing staging, but we create a new testing stream for prediction-change events (the schema is and will be shared by multiple model servers). This will allow us to have something like mediawiki.revision-score-test and test current and future models.
- We can use the staging endpoint of Event Gate for Lift Wing staging, that is configured to prefix staging. to all target topics (instead of eqiad|codfw, like the prod ones) so in Kafka the testing data will end up in a different queue, keeping things separated. In this case the stream names in the isvc configs will stay the same, we'll vary only the eventgate endpoint.
The main drawback of the latter is that there is no discovery endpoint for wikikube staging, but only these endpoints:
staging.svc.{eqiad,codfw}.wmnet are simple CNAMEs to some kubestage worker nodes, and it is not clear what endpoint we should call at any given time (for example, now eqiad works and codfw hangs, that suggests Event Gate staging pods are only in eqiad, but will it be like this in the future?).
Whatever we decide, we also need to update https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Streams
