Page MenuHomePhabricator

gmodena (GModena (WMF))
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Nov 2 2020, 1:15 PM (135 w, 5 d)
Availability
Available
LDAP User
Gmodena
MediaWiki User
GModena (WMF) [ Global Accounts ]

Recent Activity

Thu, Jun 8

gmodena updated the task description for T309699: [Event Platform] Understand, document, and implement error handling and retry logic when fetching data from the MW api.
Thu, Jun 8, 8:29 AM · Event-Platform Value Stream (Sprint 14 B), Data-Engineering-Planning

Wed, Jun 7

gmodena updated subscribers of T309699: [Event Platform] Understand, document, and implement error handling and retry logic when fetching data from the MW api.

@gmodena what should we do with the page_content_change event when we get badrevids? Right now, we discard it, so e.g. the page create event from page_change won't exist in the page_content_change stream.

I have a feeling we should just produce it as is into page_content_change with no content...or maybe some way of indicating that we could not get the content? Not sure though. cc also @Milimetric

Wed, Jun 7, 7:44 PM · Event-Platform Value Stream (Sprint 14 B), Data-Engineering-Planning
gmodena moved T309699: [Event Platform] Understand, document, and implement error handling and retry logic when fetching data from the MW api from Next Up to In progress on the Event-Platform Value Stream (Sprint 14 B) board.
Wed, Jun 7, 7:35 PM · Event-Platform Value Stream (Sprint 14 B), Data-Engineering-Planning
gmodena moved T336488: eventutilities-python: review and clean up in preparation for a GA release. from In progress to Blocked/Paused on the Event-Platform Value Stream (Sprint 14 B) board.
Wed, Jun 7, 7:35 PM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering
gmodena created T338380: eventutilities-python: http event process function should report latency..
Wed, Jun 7, 7:30 PM · Data-Engineering, Event-Platform Value Stream

Tue, Jun 6

gmodena added a comment to T337475: eventutillities-python should publish python doc to doc.wikimedia.org.

Project documentation is available at
https://doc.wikimedia.org/data-engineering/eventutilities-python/eventutilities_python/

Tue, Jun 6, 7:30 PM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering, Release-Engineering-Team
gmodena placed T337421: Fix wikimedia-event-utilities Guava dependencies issues up for grabs.
Tue, Jun 6, 6:02 PM · Event-Platform Value Stream (Sprint 14 B), Data-Engineering, Data Pipelines
gmodena moved T337475: eventutillities-python should publish python doc to doc.wikimedia.org from In progress to In Review on the Event-Platform Value Stream (Sprint 14 B) board.
Tue, Jun 6, 6:01 PM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering, Release-Engineering-Team
gmodena moved T337475: eventutillities-python should publish python doc to doc.wikimedia.org from In Review to In progress on the Event-Platform Value Stream (Sprint 14 B) board.
Tue, Jun 6, 6:01 PM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering, Release-Engineering-Team
gmodena moved T336488: eventutilities-python: review and clean up in preparation for a GA release. from Blocked/Paused to In progress on the Event-Platform Value Stream (Sprint 14 B) board.
Tue, Jun 6, 2:19 PM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering
gmodena added a comment to T337400: Get coverage artifacts from Kokkuri.

@tchin an alternative path for coverage reporting could be integrating with https://gitlab.wikimedia.org/repos/releng/docpub/-/blob/main/README.md and linking the published coverage report from Gitlab (we'd lose metric reporting in badge - but so be it).

Tue, Jun 6, 9:07 AM · Event-Platform Value Stream (Sprint 14 B), Data-Engineering
gmodena moved T337421: Fix wikimedia-event-utilities Guava dependencies issues from Next Up to In progress on the Event-Platform Value Stream (Sprint 14 B) board.
Tue, Jun 6, 9:05 AM · Event-Platform Value Stream (Sprint 14 B), Data-Engineering, Data Pipelines
gmodena claimed T337421: Fix wikimedia-event-utilities Guava dependencies issues .
Tue, Jun 6, 9:05 AM · Event-Platform Value Stream (Sprint 14 B), Data-Engineering, Data Pipelines
gmodena moved T337475: eventutillities-python should publish python doc to doc.wikimedia.org from In progress to In Review on the Event-Platform Value Stream (Sprint 14 B) board.
Tue, Jun 6, 9:04 AM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering, Release-Engineering-Team
gmodena added a comment to T337475: eventutillities-python should publish python doc to doc.wikimedia.org.

Couple of questions re integrating this workflow in our pipeline. See the attached patch and this pipeline.

  • As a test, I wanted to trigger .docpub:publish-docs (derived) manually from a working (unprotected) branch, but I'm not authorized to do so. Will this job automatically execute on a protected branch?
  • We do releases by manually triggering a publish_gitlab_release on main. If I wanted to automatically trigger .docpub:publish-docs on release, can I just declare a need: publish_gitlab_release condition?
Tue, Jun 6, 8:43 AM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering, Release-Engineering-Team
gmodena claimed T337475: eventutillities-python should publish python doc to doc.wikimedia.org.
Tue, Jun 6, 6:24 AM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering, Release-Engineering-Team
gmodena moved T337475: eventutillities-python should publish python doc to doc.wikimedia.org from Next Up to In progress on the Event-Platform Value Stream (Sprint 14 B) board.
Tue, Jun 6, 6:23 AM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering, Release-Engineering-Team
gmodena moved T336488: eventutilities-python: review and clean up in preparation for a GA release. from In progress to Blocked/Paused on the Event-Platform Value Stream (Sprint 14 B) board.
Tue, Jun 6, 6:23 AM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering

Mon, Jun 5

gmodena updated the task description for T336488: eventutilities-python: review and clean up in preparation for a GA release..
Mon, Jun 5, 8:53 PM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering
gmodena updated the task description for T336488: eventutilities-python: review and clean up in preparation for a GA release..
Mon, Jun 5, 8:15 AM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering

Wed, May 31

gmodena updated the task description for T336488: eventutilities-python: review and clean up in preparation for a GA release..
Wed, May 31, 8:55 PM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering
gmodena added a comment to T336488: eventutilities-python: review and clean up in preparation for a GA release..

Think we can publish pydocs as part of this task!?

Wed, May 31, 12:47 PM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering

Tue, May 30

gmodena added a comment to T333833: Define Service Level Objective (SLO) for mediawiki-page-content-change-enrichment.

I marked the Google Doc as read-only and moved the draft to https://wikitech.wikimedia.org/wiki/MediaWiki_Event_Enrichment/SLO/Mediawiki_Page_Content_Change_Enrichment.

Tue, May 30, 3:04 PM · Event-Platform Value Stream (Sprint 14 A), Data-Engineering-Planning

Thu, May 25

gmodena claimed T336488: eventutilities-python: review and clean up in preparation for a GA release..
Thu, May 25, 9:44 AM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering
gmodena moved T336488: eventutilities-python: review and clean up in preparation for a GA release. from Next Up to In progress on the Event-Platform Value Stream (Sprint 14 A) board.
Thu, May 25, 9:29 AM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering

Wed, May 24

gmodena added a comment to T330693: Storage request: swift s3 bucket for mediawiki-page-content-change-enrichment checkpointing.

mw_page_content_change_enrich__dse-k8s-eqiad is not a valid s3 bucket, because the protocol does no allow _ in names: https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucketnamingrules.html.
As a test, I replaced _ with - and checkpoints are now stored.

Wed, May 24, 3:29 PM · Event-Platform Value Stream (Sprint 14 A), Data-Engineering-Planning, SRE-swift-storage
gmodena added a comment to T330693: Storage request: swift s3 bucket for mediawiki-page-content-change-enrichment checkpointing.

This log entry is also relevant to the error above:

com.amazonaws.services.s3.model.AmazonS3Exception: The specified bucket is not valid. (Service: Amazon S3; Status Code: 400; Error Code: InvalidBucketName; Request ID: tx07b821592df14024bd0fb-00646dbe61; S3 Extended Request ID: null; Proxy: null), S3 Extended Request ID: null
Wed, May 24, 3:11 PM · Event-Platform Value Stream (Sprint 14 A), Data-Engineering-Planning, SRE-swift-storage
gmodena added a comment to T333833: Define Service Level Objective (SLO) for mediawiki-page-content-change-enrichment.

There's a draft at https://docs.google.com/document/d/1U2bYVqmEsn7ryP0dtFUr-S5xPqF9_plLIFdzk883HBc/edit#. After a first round of feedback, I'll move it to wikitech.

Wed, May 24, 10:42 AM · Event-Platform Value Stream (Sprint 14 A), Data-Engineering-Planning
gmodena added a comment to T330693: Storage request: swift s3 bucket for mediawiki-page-content-change-enrichment checkpointing.

Hopping on this thread to confirm that we are now able to store snapshots / savepoints in swift using the provided containers.

Wed, May 24, 9:58 AM · Event-Platform Value Stream (Sprint 14 A), Data-Engineering-Planning, SRE-swift-storage

Tue, May 23

gmodena added a comment to T336656: mediawiki-page-content-change-enrichment checkpoints should be stored in Swift.

@gmodena I think(?) I've deployed in dse-k8s-eqiad staging. HA has been disabled, but swift checkpointing should be enabled? I'm not entirely sure how to check though.

Tue, May 23, 5:46 PM · Data-Engineering, Event-Platform Value Stream (Sprint 14 A)

Mon, May 22

gmodena added a comment to T329629: Improve Event Platform and MediaWiki Event Enrichment wikitech documentation.

As for other docs to move: probably all links at
https://www.mediawiki.org/wiki/Platform_Engineering_Team/Event_Platform_Value_Stream (expect maybe for the RFCs on Use Cases?).

Mon, May 22, 6:38 PM · Event-Platform Value Stream (Sprint 14 B), Data-Engineering-Planning
gmodena added a comment to T329629: Improve Event Platform and MediaWiki Event Enrichment wikitech documentation.

There is some deployment documentation (dse-k8s) at https://www.mediawiki.org/wiki/Platform_Engineering_Team/Event_Platform_Value_Stream/Pyflink_Enrichment_Service_Deployment.

Mon, May 22, 3:05 PM · Event-Platform Value Stream (Sprint 14 B), Data-Engineering-Planning

Fri, May 19

gmodena added a comment to T330507: New Service Request mediawiki-page-content-change-enrichment.

@Ottomata should we adopt this naming conventions also for DSE?

Fri, May 19, 11:43 AM · Data-Engineering, Event-Platform Value Stream (Sprint 14 A), serviceops, Service-deployment-requests
gmodena added a comment to T336656: mediawiki-page-content-change-enrichment checkpoints should be stored in Swift.

s3://<namespace>-<k8s-cluster-name-env>/<release>/flink/

Let's go with flink <job_name> instead of <namespace>. In https://phabricator.wikimedia.org/T330507#8860655 I just ran into an issue where the helmfile/namespace name is too long for our WMF prod k8s/helm conventions, and in that case,

Fri, May 19, 11:12 AM · Data-Engineering, Event-Platform Value Stream (Sprint 14 A)
gmodena moved T333833: Define Service Level Objective (SLO) for mediawiki-page-content-change-enrichment from Next Up to In progress on the Event-Platform Value Stream (Sprint 14 A) board.
Fri, May 19, 10:49 AM · Event-Platform Value Stream (Sprint 14 A), Data-Engineering-Planning
gmodena claimed T333833: Define Service Level Objective (SLO) for mediawiki-page-content-change-enrichment.
Fri, May 19, 10:49 AM · Event-Platform Value Stream (Sprint 14 A), Data-Engineering-Planning

Wed, May 17

gmodena created T336901: flink-app: swift bucket and zookeeper paths should be templated..
Wed, May 17, 7:12 PM · Data-Engineering, Event-Platform Value Stream
gmodena moved T331283: Store Flink HA metadata in Zookeeper from In progress to In Review on the Event-Platform Value Stream (Sprint 14 A) board.
Wed, May 17, 6:26 PM · Event-Platform Value Stream (Sprint 14 B), serviceops-radar, Data-Engineering
gmodena added a comment to T331283: Store Flink HA metadata in Zookeeper.

As we have one zookeeper cluster per DC, I think it's not required to include the DC name in the Zookeeper path.

Hm, true, but we probably shouldn't try to un-suffix the k8s cluster names. It is redundant, but it is probably better if we just refer to clusters as their full names.

Wed, May 17, 12:45 PM · Event-Platform Value Stream (Sprint 14 B), serviceops-radar, Data-Engineering
gmodena added a comment to T331283: Store Flink HA metadata in Zookeeper.
Wed, May 17, 10:18 AM · Event-Platform Value Stream (Sprint 14 B), serviceops-radar, Data-Engineering
gmodena closed T322125: [NEEDS GROOMING] Improve reliability of simple stateless services as Resolved.
Wed, May 17, 6:57 AM · Data-Engineering-Planning, Event-Platform Value Stream
gmodena added a comment to T322125: [NEEDS GROOMING] Improve reliability of simple stateless services.

@gmodena can we close this task?

Wed, May 17, 6:57 AM · Data-Engineering-Planning, Event-Platform Value Stream

Tue, May 16

gmodena updated subscribers of T336656: mediawiki-page-content-change-enrichment checkpoints should be stored in Swift.

Checkpointing to Swift (S3 protocol) has been enabled.

Tue, May 16, 6:46 PM · Data-Engineering, Event-Platform Value Stream (Sprint 14 A)
gmodena added a comment to T331283: Store Flink HA metadata in Zookeeper.

Flink docs recommend setting zookeeper https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#high-availability-zookeeper-path-root. I need some help to brainstorm / bikeshed a naming convention.

Tue, May 16, 3:00 PM · Event-Platform Value Stream (Sprint 14 B), serviceops-radar, Data-Engineering
gmodena moved T331283: Store Flink HA metadata in Zookeeper from Next Up to In progress on the Event-Platform Value Stream (Sprint 14 A) board.
Tue, May 16, 2:10 PM · Event-Platform Value Stream (Sprint 14 B), serviceops-radar, Data-Engineering
gmodena moved T336656: mediawiki-page-content-change-enrichment checkpoints should be stored in Swift from In progress to In Review on the Event-Platform Value Stream (Sprint 14 A) board.
Tue, May 16, 2:10 PM · Data-Engineering, Event-Platform Value Stream (Sprint 14 A)
gmodena claimed T331283: Store Flink HA metadata in Zookeeper.
Tue, May 16, 2:10 PM · Event-Platform Value Stream (Sprint 14 B), serviceops-radar, Data-Engineering
gmodena added a comment to T331283: Store Flink HA metadata in Zookeeper.

@elukey @JMeybohm since it seems we reached consensus, I'd like to enable Zookeeper HA in our app. Who's currently responsible for Zookeeper? Is there a request process I should follow, or can I just go ahead and configure the application helmfile?

Tue, May 16, 1:05 PM · Event-Platform Value Stream (Sprint 14 B), serviceops-radar, Data-Engineering

Mon, May 15

gmodena moved T336656: mediawiki-page-content-change-enrichment checkpoints should be stored in Swift from Next Up to In progress on the Event-Platform Value Stream (Sprint 14 A) board.
Mon, May 15, 8:14 PM · Data-Engineering, Event-Platform Value Stream (Sprint 14 A)
gmodena created T336656: mediawiki-page-content-change-enrichment checkpoints should be stored in Swift.
Mon, May 15, 9:07 AM · Data-Engineering, Event-Platform Value Stream (Sprint 14 A)
gmodena placed T336488: eventutilities-python: review and clean up in preparation for a GA release. up for grabs.
Mon, May 15, 8:20 AM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering
gmodena moved T336488: eventutilities-python: review and clean up in preparation for a GA release. from In progress to Next Up on the Event-Platform Value Stream (Sprint 14 A) board.
Mon, May 15, 8:20 AM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering
gmodena claimed T336488: eventutilities-python: review and clean up in preparation for a GA release..
Mon, May 15, 8:18 AM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering
gmodena moved T336488: eventutilities-python: review and clean up in preparation for a GA release. from Next Up to In progress on the Event-Platform Value Stream (Sprint 14 A) board.
Mon, May 15, 8:18 AM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering
gmodena awarded T330693: Storage request: swift s3 bucket for mediawiki-page-content-change-enrichment checkpointing a Love token.
Mon, May 15, 7:31 AM · Event-Platform Value Stream (Sprint 14 A), Data-Engineering-Planning, SRE-swift-storage

Fri, May 12

gmodena added a comment to T330693: Storage request: swift s3 bucket for mediawiki-page-content-change-enrichment checkpointing.

Ok, this is setup and has been tested. I created the two containers discussed as well (mediawiki-page-content-change-enrichment-{eqiad,codfw}).

Fri, May 12, 6:55 AM · Event-Platform Value Stream (Sprint 14 A), Data-Engineering-Planning, SRE-swift-storage

May 11 2023

gmodena updated the task description for T331526: eventutilities-python should support using Kafka TLS ports.
May 11 2023, 6:46 PM · Event-Platform Value Stream (Sprint 14 A), Data-Engineering
gmodena moved T331526: eventutilities-python should support using Kafka TLS ports from In progress to In Review on the Event-Platform Value Stream (Sprint 12) board.
May 11 2023, 3:20 PM · Event-Platform Value Stream (Sprint 14 A), Data-Engineering
gmodena renamed T336488: eventutilities-python: review and clean up in preparation for a GA release. from [NEEDS GROOMING] eventutilities-python: review and clean up to eventutilities-python: review and clean up in preparation for a GA release..
May 11 2023, 12:06 PM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering
gmodena moved T336488: eventutilities-python: review and clean up in preparation for a GA release. from Backlog to Sprint 14 A on the Event-Platform Value Stream board.
May 11 2023, 12:05 PM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering
gmodena created T336488: eventutilities-python: review and clean up in preparation for a GA release..
May 11 2023, 12:05 PM · Patch-For-Review, Event-Platform Value Stream (Sprint 14 B), Data-Engineering
elukey awarded T331526: eventutilities-python should support using Kafka TLS ports a Love token.
May 11 2023, 8:34 AM · Event-Platform Value Stream (Sprint 14 A), Data-Engineering
gmodena claimed T331526: eventutilities-python should support using Kafka TLS ports.
May 11 2023, 8:22 AM · Event-Platform Value Stream (Sprint 14 A), Data-Engineering
gmodena moved T331526: eventutilities-python should support using Kafka TLS ports from Next Up to In progress on the Event-Platform Value Stream (Sprint 12) board.
May 11 2023, 8:22 AM · Event-Platform Value Stream (Sprint 14 A), Data-Engineering

May 9 2023

gmodena moved T335706: eventutilities-python EventProcessFunction throws NPE if user func returns None from In progress to Done on the Event-Platform Value Stream (Sprint 12) board.
May 9 2023, 2:50 PM · Event-Platform Value Stream (Sprint 12), Data-Engineering
gmodena updated the task description for T335706: eventutilities-python EventProcessFunction throws NPE if user func returns None.
May 9 2023, 12:25 PM · Event-Platform Value Stream (Sprint 12), Data-Engineering
gmodena moved T335706: eventutilities-python EventProcessFunction throws NPE if user func returns None from Next Up to In progress on the Event-Platform Value Stream (Sprint 12) board.
May 9 2023, 10:38 AM · Event-Platform Value Stream (Sprint 12), Data-Engineering
gmodena moved T335706: eventutilities-python EventProcessFunction throws NPE if user func returns None from Backlog to Sprint 12 on the Event-Platform Value Stream board.
May 9 2023, 10:38 AM · Event-Platform Value Stream (Sprint 12), Data-Engineering
gmodena added a comment to T335706: eventutilities-python EventProcessFunction throws NPE if user func returns None.

Discussed with @Ottomata, and we opted for option 2: We log and raise a more functional error message.

May 9 2023, 10:36 AM · Event-Platform Value Stream (Sprint 12), Data-Engineering
gmodena claimed T335706: eventutilities-python EventProcessFunction throws NPE if user func returns None.
May 9 2023, 5:21 AM · Event-Platform Value Stream (Sprint 12), Data-Engineering

Apr 19 2023

gmodena added a comment to T332948: mediawiki-event-enrichment: issue async requests from ProcessFunction.

We have a seemingly stable job running on YARN. Using an ad hoc process function that combines count and time triggers on a keyed data stream, seem to have reduced memory pressure on the beam workers.

Apr 19 2023, 12:26 PM · Patch-For-Review, Event-Platform Value Stream (Sprint 12), Data-Engineering

Apr 4 2023

gmodena added a comment to T330693: Storage request: swift s3 bucket for mediawiki-page-content-change-enrichment checkpointing.

[...]

@MatthewVernon brought up persistent volume claim. I have no experience with it, but it sounds like a good fit; I'd like to explore that a bit.

I will investigate.

Apr 4 2023, 11:46 AM · Event-Platform Value Stream (Sprint 14 A), Data-Engineering-Planning, SRE-swift-storage

Apr 3 2023

gmodena renamed T332948: mediawiki-event-enrichment: issue async requests from ProcessFunction from mediawiki-event-enrichment issue async requests from MapFunction context to mediawiki-event-enrichment: issue async requests from MapFunction context.
Apr 3 2023, 6:31 PM · Patch-For-Review, Event-Platform Value Stream (Sprint 12), Data-Engineering
gmodena added a comment to T332948: mediawiki-event-enrichment: issue async requests from ProcessFunction.

Next step will be measuring latency/throughput on YARN and possibly tune settings (batch size, thread pool size). If this works, we can look at integrating it into eventutilities-python.

Apr 3 2023, 6:30 PM · Patch-For-Review, Event-Platform Value Stream (Sprint 12), Data-Engineering
gmodena renamed T332948: mediawiki-event-enrichment: issue async requests from ProcessFunction from eventutilities-python: issue async requests from MapFunction context to mediawiki-event-enrichment issue async requests from MapFunction context.
Apr 3 2023, 6:19 PM · Patch-For-Review, Event-Platform Value Stream (Sprint 12), Data-Engineering
gmodena updated the task description for T332948: mediawiki-event-enrichment: issue async requests from ProcessFunction.
Apr 3 2023, 6:16 PM · Patch-For-Review, Event-Platform Value Stream (Sprint 12), Data-Engineering
gmodena added a comment to T330507: New Service Request mediawiki-page-content-change-enrichment.

Caveat: on DSE we run the application with the same memory settings, and noticed a slightly higher memory footprint (in the order of 20%). Might be due to different Java versions (java8 on YARN, java11 on k8s). We have tasks for load testing and tuning planned for upcoming sprints.

Apr 3 2023, 11:35 AM · Data-Engineering, Event-Platform Value Stream (Sprint 14 A), serviceops, Service-deployment-requests

Mar 27 2023

gmodena updated the task description for T332948: mediawiki-event-enrichment: issue async requests from ProcessFunction.
Mar 27 2023, 6:38 AM · Patch-For-Review, Event-Platform Value Stream (Sprint 12), Data-Engineering
gmodena updated the task description for T332948: mediawiki-event-enrichment: issue async requests from ProcessFunction.
Mar 27 2023, 6:38 AM · Patch-For-Review, Event-Platform Value Stream (Sprint 12), Data-Engineering
gmodena added a comment to T332948: mediawiki-event-enrichment: issue async requests from ProcessFunction.

I have been able to validate that the idea can work. I was able to init a thread pool local to an operator and execute parallel requests on a batch of elements (PoC implementation here). No pickling issues, because what is put on the wire is the function output (instead of the closure). I think we could fallback to using a ProcessWindowFunction for compute + managing sideoutput, without too much impact on our API. I have a rough implementation outside of eventutilities-python, that I got to work locally (docker/minikube). Next step will be measuring latency/throughput on YARN and possibly tune settings (batch size, thread pool size). If this works, we can look at integrating it into eventutilities-python.

Mar 27 2023, 6:35 AM · Patch-For-Review, Event-Platform Value Stream (Sprint 12), Data-Engineering

Mar 24 2023

gmodena moved T332948: mediawiki-event-enrichment: issue async requests from ProcessFunction from Next Up to In Progress on the Event-Platform Value Stream (Sprint 10) board.
Mar 24 2023, 7:45 PM · Patch-For-Review, Event-Platform Value Stream (Sprint 12), Data-Engineering
gmodena renamed T332948: mediawiki-event-enrichment: issue async requests from ProcessFunction from [NEEDS GROOMING] eventutilities-python: issue async requests from MapFunction context to eventutilities-python: issue async requests from MapFunction context.
Mar 24 2023, 8:39 AM · Patch-For-Review, Event-Platform Value Stream (Sprint 12), Data-Engineering
gmodena claimed T332948: mediawiki-event-enrichment: issue async requests from ProcessFunction.
Mar 24 2023, 8:34 AM · Patch-For-Review, Event-Platform Value Stream (Sprint 12), Data-Engineering
gmodena moved T332948: mediawiki-event-enrichment: issue async requests from ProcessFunction from Investigate to Sprint 10 on the Event-Platform Value Stream board.
Mar 24 2023, 8:34 AM · Patch-For-Review, Event-Platform Value Stream (Sprint 12), Data-Engineering
gmodena moved T332948: mediawiki-event-enrichment: issue async requests from ProcessFunction from Backlog to Investigate on the Event-Platform Value Stream board.
Mar 24 2023, 8:33 AM · Patch-For-Review, Event-Platform Value Stream (Sprint 12), Data-Engineering
gmodena added a comment to T332166: [SPIKE] tune memory and latency of mediawiki-event-enrichment on k8s.

After a few long lasting runs on YARN and k8s all I can say is I see correlation (not necessarilly causation!) with OOMs and:

  1. calls to dict_to_row conversion
  2. python fn-bundle size (beam mico-batching)
  3. messages throughput and kafka topic lag

To some degree I think that blocking I/O is contributing to memory pressure (saying this because of the effect of tuning microbatching via fn-bundle size)

Mar 24 2023, 8:33 AM · Data-Engineering, Event-Platform Value Stream (Sprint 10)
gmodena moved T332166: [SPIKE] tune memory and latency of mediawiki-event-enrichment on k8s from In Progress to In Review on the Event-Platform Value Stream (Sprint 10) board.
Mar 24 2023, 8:30 AM · Data-Engineering, Event-Platform Value Stream (Sprint 10)

Mar 23 2023

gmodena renamed T332948: mediawiki-event-enrichment: issue async requests from ProcessFunction from [NEEDS GROOMING] eventutilities-python: issue async requests from FlatMap context to [NEEDS GROOMING] eventutilities-python: issue async requests from MapFunction context.
Mar 23 2023, 9:24 PM · Patch-For-Review, Event-Platform Value Stream (Sprint 12), Data-Engineering
gmodena added a comment to T332166: [SPIKE] tune memory and latency of mediawiki-event-enrichment on k8s.

I created https://phabricator.wikimedia.org/T332948 as follow up work for this spike.

Mar 23 2023, 9:07 PM · Data-Engineering, Event-Platform Value Stream (Sprint 10)
gmodena renamed T332948: mediawiki-event-enrichment: issue async requests from ProcessFunction from [NEEDS GROOMING] eventutilities-python: issue async request from FlatMap context to [NEEDS GROOMING] eventutilities-python: issue async requests from FlatMap context.
Mar 23 2023, 9:02 PM · Patch-For-Review, Event-Platform Value Stream (Sprint 12), Data-Engineering
gmodena created T332948: mediawiki-event-enrichment: issue async requests from ProcessFunction.
Mar 23 2023, 9:02 PM · Patch-For-Review, Event-Platform Value Stream (Sprint 12), Data-Engineering

Mar 22 2023

gmodena added a comment to T332166: [SPIKE] tune memory and latency of mediawiki-event-enrichment on k8s.

I experimented with tunables recommended on Flink's Slack, and was able to run the application for over 6 hours. Memory kept growing till container OOM.
Some additional info that I was able to observe:

  1. memory consumption grows over time, the number of objects tracked by Python's GC seem to stabilise after a few minutes of runtime, and not grow linearly with memory.
  2. How quickly memory grows seems to be proportional to python.fn-execution.bundle.size

I enabled the python profiler (python.profile.enabled)
and was able to identify some hot paths that do correlate to memory
allocation (SerDes and dict <-> Row conversion).

Mar 22 2023, 1:58 PM · Data-Engineering, Event-Platform Value Stream (Sprint 10)

Mar 16 2023

gmodena added a comment to T330693: Storage request: swift s3 bucket for mediawiki-page-content-change-enrichment checkpointing.

Hi Eric,

Mar 16 2023, 10:24 AM · Event-Platform Value Stream (Sprint 14 A), Data-Engineering-Planning, SRE-swift-storage
gmodena added a comment to T330693: Storage request: swift s3 bucket for mediawiki-page-content-change-enrichment checkpointing.

This is a k8s application running on the WMF OpenStack, yes?

Might it be appropriate to use a persistent volume claim, backed by the OpenStack storage itself for this? At my last job that's the sort of solution we'd have looked at for this kind of workflow. That would give you small amounts of fast storage local to your compute environment.

Mar 16 2023, 10:15 AM · Event-Platform Value Stream (Sprint 14 A), Data-Engineering-Planning, SRE-swift-storage

Mar 15 2023

gmodena claimed T332166: [SPIKE] tune memory and latency of mediawiki-event-enrichment on k8s.
Mar 15 2023, 12:47 PM · Data-Engineering, Event-Platform Value Stream (Sprint 10)
gmodena created T332166: [SPIKE] tune memory and latency of mediawiki-event-enrichment on k8s.
Mar 15 2023, 12:46 PM · Data-Engineering, Event-Platform Value Stream (Sprint 10)
gmodena added a comment to T330693: Storage request: swift s3 bucket for mediawiki-page-content-change-enrichment checkpointing.
Mar 15 2023, 11:15 AM · Event-Platform Value Stream (Sprint 14 A), Data-Engineering-Planning, SRE-swift-storage

Mar 9 2023

gmodena added a comment to T326536: Streaming services errors should be routed to an error event topic..

So: since we decided to use plain on datastream.map when error_destination = False (instead of our datastream.process w error handler), the job was dying when it encountered these errors. This makes sense I think. If stream manager error handling is disabled, then it is up to the user to catch exceptions in their own map function, right?

Mar 9 2023, 11:14 AM · Event-Platform Value Stream (Sprint 11), Patch-For-Review, Data-Engineering-Planning

Mar 6 2023

gmodena updated the task description for T330994: mediawiki-event-enrichment should support the latest eventutilities-python changes.
Mar 6 2023, 12:57 PM · Event-Platform Value Stream (Sprint 10), Patch-For-Review, Data-Engineering-Planning
gmodena moved T330994: mediawiki-event-enrichment should support the latest eventutilities-python changes from In Progress to In Review on the Event-Platform Value Stream (Sprint 09) board.
Mar 6 2023, 12:57 PM · Event-Platform Value Stream (Sprint 10), Patch-For-Review, Data-Engineering-Planning