Page MenuHomePhabricator

Deploy mediawiki-page-content-change-enrichment to wikikube k8s
Closed, ResolvedPublic

Description

User Story
As a platform engineer I need to deploy the PyFlink stream enrichment service to wikikube k8s
Why?

So that it can be deployed to production to help us understand how well things scale

Dependencies:
  • Finalizing page content change schema
Expected Sub Tasks:

Related Objects

StatusSubtypeAssignedTask
Resolvedgmodena
ResolvedOttomata
Resolvedgmodena
ResolvedOttomata
ResolvedOttomata
ResolvedOttomata
DuplicateNone
Resolvedgmodena
ResolvedOttomata
ResolvedOttomata
ResolvedEevans
Resolvedgmodena
ResolvedOttomata
OpenNone
DeclinedNone
Resolvedbking
Resolvedbking
ResolvedNone
DeclinedNone
Resolvedgmodena
Resolvedgmodena
Resolvedgmodena

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I did some initial/prep to add support for EventRowTypeInfo to eventutilities-python. This allows us to reuse all Flink Row` SerDe, Sink and validation from Java enventutils. This is also true for the Source
side of the pipeline.

Right now there's a pyflink version of the mediawiki stream enrichment pipeline deployed on YARN that produces page_content_change into kafka, effectively superseding the Scala one. Deployment is documented
at https://www.mediawiki.org/wiki/Platform_Engineering_Team/Event_Platform_Value_Stream/Pyflink_Enrichment_Service_Deployment.

lbowmaker updated the task description. (Show Details)
lbowmaker updated the task description. (Show Details)
Ottomata renamed this task from Productionize PyFlink Enrichment Service to Deploy mediawiki-page-content-change-enrichment to wikikube k8s.Feb 8 2023, 3:59 PM

@JMeybohm @akosiaris, we plan to deploy to wikikube by the end of this quarter (end of March). Are there any blockers in wikikube to creating new namespaces and deploying the flink-kubernetes operator there?

@JMeybohm @akosiaris, we plan to deploy to wikikube by the end of this quarter (end of March). Are there any blockers in wikikube to creating new namespaces and deploying the flink-kubernetes operator there?

Make sure to file the paperwork via https://phabricator.wikimedia.org/project/profile/1305/ but no, we don't anticipate any blockers end of March. An estimation of resources required would be needed.
Mid march (no definite date yet) we will probably be re-initializing the wikikube eqiad cluster though. Keep that in mind.

By the way, which eventgate (and thus which kafka cluster) will this produce to?

This doesn't produce to EventGate, it produces to Kafka directly. It will produce to Kafka main clusters.

This doesn't produce to EventGate, it produces to Kafka directly. It will produce to Kafka main clusters.

Thanks. Are the kafka-main clusters ready to receive this ? serviceops owns them but I don't think we can answer that question easily right now.

Good question. I think so, but we should consult.

Perhaps, we should just continue producing to Kafka jumbo-eqiad from wikikube until we feel ready for kafka main. However, this means that the app in wikikube codfw would produce cross DC. I think this will be fine as we are still in 'release candidate' mode...and MirrorMaker already replicates all data in codfw -> eqiad cross DC anyway.

We did some research on this stream's volume as part of T307944: Evaluate Kafka Stretch cluster potential, and if possible, request hardware ASAP.

I'll start with @Milimetric's handy numbers from a comment below. For 2021, the average revision size was 20623 bytes. (The average is quite inflated by big outliers). Let's just round this up and say 30000 bytes. In 2021 there were 533764132 revisions, so 533764132*30000/365/24/60/60 == 507767 bytes / second. Compared with what kafka main is already doing (<5 MB / second) adding a revision text event stream wouldn't be a significant addition of traffic. I suppose if some bot edits e.g. 2MB pages 10 or 100 times a second this could matter though.

Assuming growth, more use cases, [...], let's just round this up to 10MB per second.

Um, not sure why I wrote 10MB / second in that task. I should have written "round this up to 1MB / second", but more realistically it will be around 0.5MB / second.

Messages throughput will be the same as mediawiki.page_change now, ~30 messages/second

There main worry is outliers of large messages. Here's some context for new search pipeline design.
There is a list of max and average revision sizes per year here.

Kafka main is configured to reject messages > 4MB. We will be using (snappy) compression. There may still be cases where we try to produces messages with content larger than 4MB. We can discuss later if we should increase this limit in Kafka, but for now, we should keep it. We may choose to deal with these large messages differently (not producing them, but instead producing with a URI pointer to content).

Change 895241 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/deployment-charts@master] New wikikube service: mediawiki-page-content-change-enrichment - staging

https://gerrit.wikimedia.org/r/895241

Change 895241 merged by jenkins-bot:

[operations/deployment-charts@master] New wikikube service: mediawiki-page-content-change-enrichment - staging

https://gerrit.wikimedia.org/r/895241

Change 920377 had a related patch set uploaded (by Ottomata; author: Ottomata):

[mediawiki/extensions/EventBus@master] Change default page_change stream name to use major versioning

https://gerrit.wikimedia.org/r/920377

Ottomata updated the task description. (Show Details)

Change 924956 had a related patch set uploaded (by Ottomata; author: Ottomata):

[wikimedia-event-utilities@master] eventutillities-flink - kafkaSourceBuilder now allows for specific topics

https://gerrit.wikimedia.org/r/924956

Change 924956 merged by Ottomata:

[wikimedia-event-utilities@master] eventutillities-flink - kafkaSourceBuilder now allows for specific topics

https://gerrit.wikimedia.org/r/924956

Change 924979 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/deployment-charts@master] mw-page-content-change-enrich - bump image to 1.12.0

https://gerrit.wikimedia.org/r/924979

Change 924979 merged by Ottomata:

[operations/deployment-charts@master] mw-page-content-change-enrich - bump image to 1.20.0

https://gerrit.wikimedia.org/r/924979

Ottomata updated the task description. (Show Details)

Change 926601 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/deployment-charts@master] mw-page-content-change-enrich - enable upgradeMode: savepoint, and take periodic savepoints.

https://gerrit.wikimedia.org/r/926601

Change 926601 merged by jenkins-bot:

[operations/deployment-charts@master] mw-page-content-change-enrich - enable upgradeMode: savepoint, and take periodic savepoints.

https://gerrit.wikimedia.org/r/926601

Change 927219 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/deployment-charts@master] mw-page-content-change-enrich - use kafka at least once delivery guarantee

https://gerrit.wikimedia.org/r/927219

Change 927224 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/deployment-charts@master] Remove dse mediawiki-page-content-change-enrichment and stream-enrichment-poc ns

https://gerrit.wikimedia.org/r/927224

Change 927219 merged by Ottomata:

[operations/deployment-charts@master] mw-page-content-change-enrich - use kafka at least once delivery guarantee

https://gerrit.wikimedia.org/r/927219

Change 927224 merged by jenkins-bot:

[operations/deployment-charts@master] Remove dse mediawiki-page-content-change-enrichment and stream-enrichment-poc ns

https://gerrit.wikimedia.org/r/927224

Mentioned in SAL (#wikimedia-operations) [2023-06-20T17:44:41Z] <ottomata> remove stream-enrichment-poc namespace and related resources from dse-k8s-eqiad - T325303