Page MenuHomePhabricator

Add a new outlink topic stream for EventGate main
Closed, ResolvedPublic

Description

In T315994, we've demonstrated with a test stream that the outlink topic model is able to send predictions to EventGate when requested. To support task T328276, we need to add an official outlink stream for EventGate main.

There are some requirements that need to be fulfilled to add a new stream with Lift Wing, documented in https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing#Streams

We've done them for the outlink topic model:

  • A model server needs to be deployed to Lift Wing, and it must have passed basic sanity checks from the ML team
  • Decide the source event stream
  • mediawiki.page_change
  • Decide if you need to filter or not the traffic in the topic

According to the model card #Users and uses, the model can be used for all projects within Wikipedia (all languages) and pages in the main namespace (namespace=0).

  • Decide Schema for event created
  • $ref: /fragment/common/2.0.0#
  • $ref: /fragment/mediawiki/state/change/page/1.0.0
  • predicted_topics

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/deployment-chartsmaster+1 -1
operations/deployment-chartsmaster+1 -1
machinelearning/liftwing/inference-servicesmain+7 -3
operations/deployment-chartsmaster+1 -1
machinelearning/liftwing/inference-servicesmain+2 -0
operations/deployment-chartsmaster+2 -2
operations/deployment-chartsmaster+5 -1
operations/mediawiki-configmaster+1 -1
operations/deployment-chartsmaster+17 -1
operations/deployment-chartsmaster+7 -3
operations/deployment-chartsmaster+2 -2
machinelearning/liftwing/inference-servicesmain+9 -0
operations/deployment-chartsmaster+51 -31
operations/deployment-chartsmaster+4 -4
operations/deployment-chartsmaster+27 -2
operations/mediawiki-configmaster+5 -0
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@Isaac @Ottomata I dig a bit more into the event schema (https://schema.wikimedia.org/#!/) today and have some thoughts after yesterday's meeting.

Option 1:
input event: mediawiki.page_change
output schema:

  • $ref: /fragment/common/2.0.0#
  • $ref: /fragment/mediawiki/state/change/page/1.0.0
  • predicted_topics

Option 2:
input event: mediawiki.page-links-change
output schema:

  • $ref: /fragment/mediawiki/page/common/1.0.0#
  • predicted_topics

Yesterday we seemed to settle on option 1, but I'm considering option 2 and wondering if it is feasible. Option 2 is more in line with the nature of the outlink topic model (link-based) since links change is the only type of edit that would actually impact model predictions, and can capture links change for templates that isn't captured by the page_change. But if we use page-links-change as the source event, we can't model the output stream based on the entity 'change', because we don't have all the info needed. It can only be based on the old(?) page entity schema that is used by the page-links-change.

I think the outlink topic model is quite special because (1) it is link based rather than content based (2) it is for the current revision of the page (param is page_title instead of rev_id). For the revert-risk stream (T326179) in the future, I think it's more suitable that we use a generic score model (e.g. mediawiki.page-score-change) based on entity change schema, because the revert-risk model is (1) content-based, and (2) for revision (param is rev_id)

Option 2 is more in line with the nature of the outlink topic model (link-based) since links change is the only type of edit that would actually impact model predictions, and can capture links change for templates that isn't captured by the page_change.

I think the ideal option we discussed, which is not available to us now is:

Option 3:
input event: A new mediawiki.page_links_change (New stream+schema model, based on mediawiki/state/change/page
output schema:

  • $ref: /fragment/common/2.0.0#
  • $ref: /fragment/mediawiki/state/change/page/1.0.0
  • predicted_topics

Since we don't have this new mediawiki.page_links_change as an input, we talked about just using mediawiki.page_change for now, with the plan to switch to a new mediawiki.page_links_change if/when it is created (we should make a task for this).

(1) it is link based rather than content based
if we use page-links-change as the source event, we can't model the output stream based on the entity 'change', because we don't have all the info needed.

A new mediawiki.page_links_change would solve this problem.

(2) it is for the current revision of the page (param is page_title instead of rev_id).

This actually fits very well with the page entity model. The new page state change streams do not include changes to revisions from the past. They only represent changes that affect the 'current' state of the page. E.g. revision-visibility (AKA suppression) is represented in page_change stream only if the current revision is suppressed.

the revert-risk model is (1) content-based, and (2) for revision (param is rev_id)

Interesting! Yeah we should probably get together and talk about a overarching data model strategy for T326179: Emit revision revert risk scores as a stream and expose in EventStreams API and other mediawiki page scoring models in streams. If we need to score past revisions (or anything that represents changes to revisions, not pages), we'd probably use revision entity based model and streams.

Anyway, how do you feel about doing Option 1. now, but with a documented plan/goal to do Option 3? I think having this and other page related changes use the same entity model will be appealing to Search Platform team, as they are planning to us page_change and page_content_change in their new Search Update Pipeline. cc @dcausse

Your need for a new page_links_change could expedite the work in Event Platform to get it done. cc @lbowmaker @gmodena

Thank you for sharing your thoughts on this. They all make sense! I agree that doing Option 1 for now and having a documented plan to switch to Option 3 in the future would be a good approach :)

Change 915789 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/mediawiki-config@master] Add mediawiki.page_outlink_topic_prediction_change stream

https://gerrit.wikimedia.org/r/915789

Change 915789 merged by jenkins-bot:

[operations/mediawiki-config@master] Add mediawiki.page_outlink_topic_prediction_change stream

https://gerrit.wikimedia.org/r/915789

Mentioned in SAL (#wikimedia-operations) [2023-05-08T14:08:31Z] <otto@deploy1002> Synchronized wmf-config/ext-EventStreamConfig.php: wgEventStreams - Add mediawiki.page_outlink_topic_prediction_change stream - T328899 (duration: 06m 54s)

Deployed stream config. It should now be possible to produce events to the mediawiki.page_outlink_topic_prediction_change stream via eventgate-main.

@achou, search team started working on ingesting redirects in their update pipeline. Therefore, we proposed the addition of redirect information in the page_change stream. As of now, we model the redirect as a separate entity, link_target, a representation of the LinkTarget DTO in MW. As @Ottomata pointed out in Create new mediawiki.page_links_change stream, the model of that link_target might be reused for page_links_changed. Now the question is: Would that transport all the information you need, if
page_links_changed consisted of a representation of /fragment/mediawiki/state/entity/page/1.1.0 + a list for each outgoing (and incoming?) links, represented as /fragment/mediawiki/state/entity/link_target/1.0.0 (see change request mentioned above).

@pfischer As far as I know, the Outlink topic model does not use redirect information to predict the topic of an article, so it should not be a problem. :)

In the model card, it states:

Don't use this model for • namespaces outside of 0disambiguations, and redirects — the training data for this model explicitly excludes draft pages, talk pages, disambiguations, redirects, and other non-article pages, as they do not have a training label that could be associated with them.

Change 920282 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] changeprop: add liftwing outlink topic stream

https://gerrit.wikimedia.org/r/920282

Change 920282 merged by Elukey:

[operations/deployment-charts@master] changeprop: add liftwing outlink topic stream

https://gerrit.wikimedia.org/r/920282

Change deployed to Change prop staging!

In theory now we should be able to send an event to the new outlink test topic following what's written in https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Streams, and if everything works correctly we'll see another event generated in the target kafka topic.

@achou Lemme know when you want to test it and if you need help!

Change 923292 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update docker images for outlink

https://gerrit.wikimedia.org/r/923292

Change 923292 merged by Elukey:

[operations/deployment-charts@master] ml-services: update docker images for outlink

https://gerrit.wikimedia.org/r/923292

@achou Oh, you know, we should probably version this stream.
https://wikitech.wikimedia.org/wiki/Event_Platform/Stream_Configuration

We just adopted this stream naming convention to do this.

I'm going to edit the task description here to add this as a TODO

Change 923571 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/mediawiki-config@master] Declare mediawiki.page_outlink_topic_prediction_change.v1 stream

https://gerrit.wikimedia.org/r/923571

I found two problems while testing the following Change-Prop staging config:

outlink-topic-model:
  concurrency: 2
  match_config_need_quotes: ['page_change_kind', 'wiki_id']
  match_config:
    wiki_id: 'enwiki'
    page_change_kind: '/^(edit|create)$/'
    page:
      is_redirect: false
      namespace_id: 0
  namespace: articletopic-outlink
  kafka_topic: 'liftwing.test-outlink-events'

I used a test event https://phabricator.wikimedia.org/P48594 from eqiad.rc1.mediawiki.page_change.

  1. The source event contains the content_slots field in the revision field and the prior_state.revision field caused a validation error:
'.prior_state.revision' should NOT have additional properties, '.revision' should NOT have additional properties

This error occurred when Lift Wing sent the generated page_outlink_topic_prediction_change event to EventGate.

Reason: When we designed the schema for prediction_classification_change, we referred to /fragment/mediawiki/state/change/page/1.0.0, which doesn't have the content_slots field. The content_slots field was later added to mediawiki.page_change (see the schema).

I tested an event without the content_slots field, and the generated event was successfully posted to EventGate without errors.

Possible solution: In Lift Wing when generating the prediction classification event, we remove the content_slots field.

  1. When I checked mediawiki.page_outlink_topic_prediction_change to verify that the generated event was posted correctly, I found many events like below:
{"$schema":"/mediawiki/page/prediction_classification_change/1.0.0","changelog_kind":"update","comment":"changed a thing","dt":"2021-01-01T00:00:00.0Z","meta":{"domain":"canary","stream":"mediawiki.page_outlink_topic_prediction_change","id":"080c79b2-e937-44be-84ef-b81cbb0f52d7","dt":"2023-05-30T18:15:34.069Z","request_id":"d5bc6ce7-2a03-4f4c-b6ce-3b3f01a99b7d"},"page":{"is_redirect":false,"namespace_id":1,"page_id":1,"page_title":"example","revision_count":1},"page_change_kind":"edit","performer":{"user_id":123,"user_text":"yoohoo"},"predicted_classification":{"model_name":"example_model","model_version":"1.0.1","predictions":["yes","mostly"],"probabilities":{"hardly":0.01,"mostly":0.9,"yes":0.99}},"revision":{"comment":"changed a thing","editor":{"user_id":123,"user_text":"example"},"is_comment_visible":true,"is_content_visible":true,"is_editor_visible":true,"is_minor_edit":false,"rev_dt":"2021-01-01T00:00:00.0Z","rev_id":2,"rev_parent_id":1,"rev_sha1":"16619839a55cfb5c61bcf520bf9734e0c67f98cc","rev_size":100},"wiki_id":"example"}

I currently have no idea why these events are being produced on the target Kafka topic. I'll investigate further.

@Ottomata @elukey any thoughts on these two issues? I would appreciate any input. :)

Ah, yes, you'll need to filter out canary events. We need better docs on this. I'm working on them now.

Your processor code should discard all events where meta.domain == "canary".

This comment was removed by pfischer.

Change 925852 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] changeprop: allow match_not in match_config for liftwing

https://gerrit.wikimedia.org/r/925852

Change 925852 merged by Elukey:

[operations/deployment-charts@master] changeprop: allow match_not in match_config for liftwing

https://gerrit.wikimedia.org/r/925852

Change 928583 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] events: remove content_slots field from prediction classification event

https://gerrit.wikimedia.org/r/928583

Change 928583 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] events: remove content_slots field from prediction classification event

https://gerrit.wikimedia.org/r/928583

Change 929335 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update docker images for outlink

https://gerrit.wikimedia.org/r/929335

Change 929335 merged by Elukey:

[operations/deployment-charts@master] ml-services: update docker images for outlink

https://gerrit.wikimedia.org/r/929335

Change 930000 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update outlink stream name and config autoscaling

https://gerrit.wikimedia.org/r/930000

Change 930002 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] changeprop: add outlink stream to changeprop prod

https://gerrit.wikimedia.org/r/930002

Change 930000 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update outlink stream name and config autoscaling

https://gerrit.wikimedia.org/r/930000

Change 930002 merged by jenkins-bot:

[operations/deployment-charts@master] changeprop: add outlink stream to changeprop prod

https://gerrit.wikimedia.org/r/930002

Change 923571 merged by jenkins-bot:

[operations/mediawiki-config@master] Declare mediawiki.page_outlink_topic_prediction_change.v1 stream

https://gerrit.wikimedia.org/r/923571

Mentioned in SAL (#wikimedia-operations) [2023-06-14T14:22:36Z] <otto@deploy1002> Synchronized wmf-config/ext-EventStreamConfig.php: EventStreamConfig - Declare mediawiki.page_outlink_topic_prediction_change.v1 stream - T328899 (duration: 10m 25s)

We now can see some traffic hitting the outlink model server on LiftWing!
https://grafana.wikimedia.org/d/zsdYRV7Vk/istio-sidecar?orgId=1&var-cluster=eqiad%20prometheus%2Fk8s-mlserve&var-namespace=articletopic-outlink&var-backend=All&var-response_code=All&var-quantile=0.5&var-quantile=0.95&var-quantile=0.99

and here is the changeprop dashboard:
https://grafana.wikimedia.org/d/000300/change-propagation?orgId=1&refresh=1m&from=now-1h&to=now

The events were correctly generated to the eqiad.mediawiki.page_outlink_topic_prediction_change.v1

Next step:

  • Remove the match condition on 'enwiki' for the changeprop, so it applies to all wikis

Change 930610 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] changeprop: remove match on specific wiki_id for outlink stream

https://gerrit.wikimedia.org/r/930610

Change 930748 had a related patch set uploaded (by Klausman; author: Klausman):

[operations/deployment-charts@master] ml-services: update outlink replica counts to 3/5

https://gerrit.wikimedia.org/r/930748

Change 930748 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update outlink replica counts

https://gerrit.wikimedia.org/r/930748

Change 930610 merged by jenkins-bot:

[operations/deployment-charts@master] changeprop: set wiki_id match config for outlink stream

https://gerrit.wikimedia.org/r/930610

Change 930610 has been pushed to prod, so now we get the full feed from changeprop.

CPU usage of the outlink pods has increased accordingly, but is still way under the limit we've set.

No errors in the outlink pod logs that I could see, request rate has increased as expected.

this is VERRRRRY exciting! thank you all! I took a look at the event table on Hive and did some basic quality checks and looked good but did raise one issue that I'd missed when helping with design. when a page is moved, that doesn't trigger the model. if it's just a page name change, it's not a big deal. but when it's moved from the draft namespace to main namespace (example), it'd be nice to have a prediction. I don't see this as a major blocker because it's a rarer edge-case and we will get a prediction once there's an edit (which is quite common after publishing a draft article). But something to consider for other streams where perhaps predictions are more important to have for brand-new articles. And if it's an easy fix, obviously makes sense to make it here too.

Query:

year = 2023
month = 6
day = 20
hour = 6
wikidb = 'enwiki'

# grab an hour of revisions that'd be expected to be scored
# and make sure they all are by checking to see if each has an associated prediction.
# Note: to avoid hour boundary issues, I check the preceding and following hour too for predictions.

query = f"""
WITH revs AS (
    SELECT
      rev_id AS rev_id_edit
    FROM event.mediawiki_revision_create
    WHERE
      year = {year} and month = {month} and day = {day} and hour = {hour}
      AND `database` = '{wikidb}'
      AND page_namespace = 0 AND NOT page_is_redirect
),
preds AS (
    SELECT
      revision.rev_id AS rev_id_pred
    FROM event.mediawiki_page_outlink_topic_prediction_change_v1
    WHERE
      year = {year} and month = {month} and day = {day} and (hour = {hour-1} OR hour = {hour} OR hour = {hour+1})
      AND wiki_id = '{wikidb}'
)
SELECT
  rev_id_edit
FROM revs r
ANTI JOIN preds p
  ON (r.rev_id_edit = p.rev_id_pred)
LIMIT 50
"""

spark.sql(query).show(50, False)

+-----------+
|rev_id_edit|
+-----------+
|1161036659 |
|1161034421 |
|1161036701 |
|1161035224 |
|1161034828 |
|1161033668 |
|1161036764 |
|1161033259 |
|1161036277 |
|1161034911 |
+-----------+

Change 932201 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] outlink: add logging of input page_change event

https://gerrit.wikimedia.org/r/932201

Change 932201 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] outlink: add logging of input page_change event

https://gerrit.wikimedia.org/r/932201

Change 932261 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update outlink transformer docker image

https://gerrit.wikimedia.org/r/932261

Change 932261 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update outlink transformer docker image

https://gerrit.wikimedia.org/r/932261

Change 932826 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] outlink: add logging of source event for get_outlinks function

https://gerrit.wikimedia.org/r/932826

Change 933060 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] changeprop: update page_change_kind for outlink stream

https://gerrit.wikimedia.org/r/933060

Change 932826 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] outlink: add logging of source event for get_outlinks function

https://gerrit.wikimedia.org/r/932826

Change 933080 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update outlink transformer docker image

https://gerrit.wikimedia.org/r/933080

Change 933080 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update outlink transformer docker image

https://gerrit.wikimedia.org/r/933080

Change 933060 merged by jenkins-bot:

[operations/deployment-charts@master] changeprop: update page_change_kind for outlink stream

https://gerrit.wikimedia.org/r/933060