Page MenuHomePhabricator

[Event Platform] Design and Implement realtime enrichment pipeline for MW page change with content
Closed, ResolvedPublic

Description

User Story
As a platform engineer, I need to design, implement and deploy a streaming job that produces event streams of mediawiki page changes with raw content.
The service willl:
  • Call MW API to get the wikitext for the article
  • Format the input stream data and wikitext into the new topic format
  • Output the formatted data to a new Kafka topic
Expected Spikes:
  • Data modeling exercise for new consolidated stream - T308017
Why are we doing this?
  • Simplify event stream consumption. Consumers can listen to a single stream that represent the state of a page rather than a page action (current design)
  • Adding content to streams to make them usable by consumers without having to enrich themselves
What is needed for GA internal release
Follow up work that needs to be done

Related Objects

StatusSubtypeAssignedTask
Resolvedgmodena
ResolvedOttomata
ResolvedOttomata
DeclinedNone
ResolvedOttomata
ResolvedOttomata
ResolvedOttomata
Resolveddcausse
Resolvedgmodena
Resolvedgmodena
Resolvedgmodena
ResolvedOttomata
OpenNone
ResolvedOwenRB
ResolvedOwenRB
ResolvedOttomata
Resolvedgmodena
ResolvedOttomata
ResolvedOttomata
ResolvedOttomata
DuplicateNone
Resolvedgmodena
ResolvedOttomata
ResolvedOttomata
ResolvedEevans
Resolvedgmodena
ResolvedOttomata
OpenNone
DeclinedNone
Resolvedbking
Resolvedbking
ResolvedNone
DeclinedNone
Resolvedgmodena
Resolvedgmodena
Resolvedgmodena
Resolvedgmodena
Resolvedgmodena
Resolvedgmodena
Resolvedgmodena
ResolvedBTullis

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 852909 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/mediawiki-config@master] Set eventgate service for rc0.mediawiki.page_content_change stream

https://gerrit.wikimedia.org/r/852909

Change 852909 merged by jenkins-bot:

[operations/mediawiki-config@master] Set eventgate service for rc0.mediawiki.page_content_change stream

https://gerrit.wikimedia.org/r/852909

Ottomata renamed this task from [Shared Event Platform] Design and Implement POC Flink Service to Combine Existing Streams, Enrich and Output to New Topic to [Event Platform] Design and Implement realtime enrichment pipeline for MW page change with content.Jan 23 2023, 6:18 PM
Ottomata updated the task description. (Show Details)

I archived the mediawiki-stream-enrichment repo associated with this task. That contained a Scala PoC that help us inform the current (WIP) Python implementation deployed on DSE.

Ottomata updated the task description. (Show Details)
Ottomata updated the task description. (Show Details)
gmodena updated the task description. (Show Details)

Change 951444 had a related patch set uploaded (by Gmodena; author: Gmodena):

[operations/mediawiki-config@master] Declare v1 of the page_content_change stream.

https://gerrit.wikimedia.org/r/951444

Change 951446 had a related patch set uploaded (by Gmodena; author: Gmodena):

[operations/deployment-charts@master] mw-page-content-change-enrich: stream version bump

https://gerrit.wikimedia.org/r/951446

Change 951444 merged by jenkins-bot:

[operations/mediawiki-config@master] Declare v1 of the page_content_change stream.

https://gerrit.wikimedia.org/r/951444

Mentioned in SAL (#wikimedia-operations) [2023-08-22T20:16:03Z] <urbanecm@deploy1002> Started scap: Backport for [[gerrit:951444|Declare v1 of the page_content_change stream. (T307959)]]

Mentioned in SAL (#wikimedia-operations) [2023-08-22T20:17:34Z] <urbanecm@deploy1002> urbanecm and gmodena: Backport for [[gerrit:951444|Declare v1 of the page_content_change stream. (T307959)]] synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Mentioned in SAL (#wikimedia-operations) [2023-08-22T20:27:22Z] <urbanecm@deploy1002> Finished scap: Backport for [[gerrit:951444|Declare v1 of the page_content_change stream. (T307959)]] (duration: 11m 19s)

Change 951446 merged by jenkins-bot:

[operations/deployment-charts@master] mw-page-content-change-enrich: stream version bump

https://gerrit.wikimedia.org/r/951446

Change 951929 had a related patch set uploaded (by Gmodena; author: Gmodena):

[operations/mediawiki-config@master] Remove rc1.mediawiki.page_content_change stream

https://gerrit.wikimedia.org/r/951929

Change 951959 had a related patch set uploaded (by Gmodena; author: Gmodena):

[operations/alerts@master] data-engineering: flink: alert when TM is missing for 5m.

https://gerrit.wikimedia.org/r/951959

Change 952160 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Increase the kafka-jumbo maximum message size to 10 MB

https://gerrit.wikimedia.org/r/952160

Change 951959 merged by jenkins-bot:

[operations/alerts@master] data-engineering: flink: alert when TM is missing for 5m.

https://gerrit.wikimedia.org/r/951959

Change 951929 merged by jenkins-bot:

[operations/mediawiki-config@master] Remove rc1.mediawiki.page_content_change stream

https://gerrit.wikimedia.org/r/951929

Mentioned in SAL (#wikimedia-operations) [2023-08-31T13:14:28Z] <sgimeno@deploy1002> Started scap: Backport for [[gerrit:951929|Remove rc1.mediawiki.page_content_change stream (T307959)]]

Mentioned in SAL (#wikimedia-operations) [2023-08-31T13:16:03Z] <sgimeno@deploy1002> gmodena and sgimeno: Backport for [[gerrit:951929|Remove rc1.mediawiki.page_content_change stream (T307959)]] synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Mentioned in SAL (#wikimedia-operations) [2023-08-31T13:25:01Z] <sgimeno@deploy1002> Finished scap: Backport for [[gerrit:951929|Remove rc1.mediawiki.page_content_change stream (T307959)]] (duration: 10m 33s)

Change 954968 had a related patch set uploaded (by Btullis; author: Btullis):

[analytics/refinery@master] Increase the max kafka message size for gobblin

https://gerrit.wikimedia.org/r/954968

Change 954968 abandoned by Joal:

[analytics/refinery@master] Increase the max kafka message size for gobblin

Reason:

Change not actually needed.

https://gerrit.wikimedia.org/r/954968

Change 952160 merged by Btullis:

[operations/puppet@production] Increase the kafka-jumbo maximum message size to 10 MB

https://gerrit.wikimedia.org/r/952160

Quoted Text

@lbowmaker @gmodena Should we resolve and close this?

Yeah!