While we ought to consider an upgrade for all 4 clusters, from what I understand Jumbo can be upgraded independently. Are there any concerns with that approach?
What are the upgrade considerations for Kafka clients?
Specifically are there clients that publish to Kafka Jumbo directly or do all Kafka topics get mirrored from main (possibly logging?)?

Dec 2 2023, 12:59 AM · Data-Platform-SRE, Event-Platform, Epic, Data-Engineering, SRE, observability, serviceops

Dec 1 2023

odimitrijevic added a comment to T351387: Requesting access to wmf and analytics-privatedata-users for EHughes (superset access with no server access).

Approved

Dec 1 2023, 6:17 PM · SRE, SRE-Access-Requests

odimitrijevic reassigned T351387: Requesting access to wmf and analytics-privatedata-users for EHughes (superset access with no server access) from odimitrijevic to eoghan.

Dec 1 2023, 6:17 PM · SRE, SRE-Access-Requests

odimitrijevic reassigned T350918: Requesting access to WMF LDAP group and deployment and analytics-privatedata-users shell access group for Grace (ecarg) from odimitrijevic to eoghan.

Dec 1 2023, 6:17 PM · SRE, SRE-Access-Requests

odimitrijevic added a comment to T350918: Requesting access to WMF LDAP group and deployment and analytics-privatedata-users shell access group for Grace (ecarg).

Approved

Dec 1 2023, 6:16 PM · SRE, SRE-Access-Requests

odimitrijevic added a comment to T352098: Requesting access to "researchers" and "analytics-privatedata-users" for Xiao Xiao.

Approved

Dec 1 2023, 6:08 AM · Patch-For-Review, SRE, SRE-Access-Requests

Oct 31 2023

odimitrijevic closed T310229: Data Catalog Documentation Style Guide as Resolved.

This was delivered as part of the "documentathon": https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/DataHub/Data_Catalog_Documentation_Guide

Oct 31 2023, 4:34 PM · Documentation, Data-Engineering, Data-Catalog

odimitrijevic closed T310229: Data Catalog Documentation Style Guide, a subtask of T349103: Define dataset documentation strategy, as Resolved.

Oct 31 2023, 4:34 PM · Goal, Tech-Docs-Team, Data-Engineering

Oct 25 2023

odimitrijevic added a comment to T344199: Grant Access to analytics-privatedata-users for ATsay-WMF.

Approved!

Oct 25 2023, 8:00 PM · SRE-Access-Requests, SRE, LDAP-Access-Requests

Oct 4 2023

odimitrijevic created T348204: Indicate cluster location metadata for Druid datasets.

Oct 4 2023, 10:45 PM · Data-Engineering

Sep 22 2023

odimitrijevic added a comment to T345726: Requesting Creation of a new POSIX group and system user for the Analytics WMDE team..

approved

Sep 22 2023, 9:07 PM · Data-Platform-SRE, SRE, SRE-Access-Requests

Sep 19 2023

odimitrijevic added a comment to T346694: Requesting access to analytics and search resources for dr0ptp4kt.

Approved

Sep 19 2023, 4:50 PM · SRE, SRE-Access-Requests

Sep 8 2023

odimitrijevic added a comment to T345959: Requesting access to analytics-privatedata-users for ahoelzl.

Approved!

Sep 8 2023, 9:41 PM · SRE, SRE-Access-Requests

Sep 6 2023

odimitrijevic moved T345390: eventutilities-python: cookicutter template example should be updated from In progress to In Review on the Data Engineering and Event Platform Team (Sprint 1) board.

Sep 6 2023, 3:05 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform

Sep 5 2023

odimitrijevic moved T345193: Document the onboarding journey on Event Platfrom from In progress to In Review on the Data Engineering and Event Platform Team (Sprint 1) board.

Sep 5 2023, 5:58 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform

odimitrijevic added a comment to T341848: Consult on investigation of relation between UA deprecation and increase in automated traffic.

@MGerlach does the pre-fetch traffic have headers that can identify it as such as it comes through as webrequests?

Sep 5 2023, 3:44 PM · Research (FY2023-24-Research-July-September)

Sep 1 2023

odimitrijevic added a comment to T345455: Requesting access to analytics-admins for cjming.

Approved

Sep 1 2023, 5:16 PM · SRE, SRE-Access-Requests

Aug 24 2023

odimitrijevic added a comment to T333011: [Maintenance] Define Migration/Deprecation Plan for Hue.

It looks like the request is also in PyHive with the following PR still open: https://github.com/dropbox/PyHive/pull/328

Bug closed because too old, and not fixed: https://github.com/apache/superset/issues/3243

Aug 24 2023, 11:56 PM · Data-Engineering, Data Pipelines (Sprint 14)

Aug 23 2023

odimitrijevic added a comment to T333011: [Maintenance] Define Migration/Deprecation Plan for Hue.

@JAllemandou is the limitation in data formatting coming from Presto or Superset (or both :) ?

Aug 23 2023, 6:30 PM · Data-Engineering, Data Pipelines (Sprint 14)

odimitrijevic moved T341277: mediawiki page_content_change should generate new meta.id field from In Review to Done on the Data Engineering and Event Platform Team (Sprint 1) board.

Aug 23 2023, 3:06 PM · Data Engineering and Event Platform Team (Sprint 1), Data-Engineering, Event-Platform

odimitrijevic updated subscribers of T318863: [Event Platform] Event Platform and DataHub Integration.

@BTullis we'll need the SRE team's help with the deployment of the event platform schema ingestion into Datahub. The deployment involves a) creating the event steams custom platform and
b) deploying the ingestion code/transformer

Aug 23 2023, 6:08 AM · Data Engineering and Event Platform Team (Sprint 3), Data-Engineering, Data-Catalog, Event-Platform

Aug 18 2023

odimitrijevic edited projects for T326002: [Event Platform] eventgate-wikimedia occasionally fails to produce events due to stream config fetch errors, added: Data Engineering and Event Platform Team (Sprint 2); removed Data Engineering and Event Platform Team.

The failure of this job requires a manual rerun, and based on a recent assessment this happens with some frequency (on average once daily). Let's bring this into current sprint and continue to troubleshoot.

Aug 18 2023, 11:07 PM · Data-Engineering (Sprint 6), MW-1.41-notes (1.41.0-wmf.28; 2023-09-26), Event-Platform, Data Pipelines

odimitrijevic created Data Engineering and Event Platform Team (Sprint 2).

Aug 18 2023, 10:36 PM

odimitrijevic added a comment to T344199: Grant Access to analytics-privatedata-users for ATsay-WMF.

I approve

Aug 18 2023, 10:27 PM · SRE-Access-Requests, SRE, LDAP-Access-Requests

Aug 17 2023

odimitrijevic added a comment to T344257: Requesting membership in analytics-privatedata-users group, sql_lab role, Kerberos Principal for Omari Sefu.

Approving group membership

Aug 17 2023, 10:30 PM · SRE, SRE-Access-Requests

Aug 16 2023

odimitrijevic added a comment to T318863: [Event Platform] Event Platform and DataHub Integration.

Here are some considerations that we discussed, that we need to further explore and decide on:

Explore creating a custom platform for Event Streams
Add top level event schema description as the dataset documentation. TBD on how to accomplish this given import options.
The schema import automatically adds subgroups under kafka based on the first dot segment of the schema name. In the production instance of DataHub there are also streams with the naming analytics/mediawiki/web_ab_test_enrollment. Can “/” be used as a separator to designate the top level category?
Can we import goblin lineage to propagate lineage from kafka > hive?
There would value to import hive event_raw database for completion of lineage events
Can we add a link to the event platform schema/datahub documentation to hive tables in event and event_sanitized? Lineage would be one way to trace this. Another would be to add links in the documentation to datasets with equivalent schema both upstream and downstream. This falls into the larger consideration on how to propagate metadata between equivalent datasets stored across different platforms and refinements.
Some of the kafka topics are remnants of tests and misconfiguration/misnamings. There is an option to add them to an exclusion list. Ideally we'd delete these in Kafka, otherwise there is an exclusion list.
Given that the prod datahub has the event streams current Kafka metadata can we delete and reimport all the Kafka metadata? If a fresh backup is not available it would be have one handy
Is there a way to add ownership data to event schema json and import it from there? This would benefit Metrics Platform work and allow alerting the right parties about event publishing errors. Some discussion about adding this data already happened https://phabricator.wikimedia.org/T201063#4546544
What is the best way to ingest the metadata? Datahub transformer vs airflow vs TBD?

Aug 16 2023, 10:25 PM · Data Engineering and Event Platform Team (Sprint 3), Data-Engineering, Data-Catalog, Event-Platform

odimitrijevic updated the task description for T318863: [Event Platform] Event Platform and DataHub Integration.

Aug 16 2023, 10:08 PM · Data Engineering and Event Platform Team (Sprint 3), Data-Engineering, Data-Catalog, Event-Platform

odimitrijevic updated the task description for T318863: [Event Platform] Event Platform and DataHub Integration.

Aug 16 2023, 9:45 PM · Data Engineering and Event Platform Team (Sprint 3), Data-Engineering, Data-Catalog, Event-Platform

odimitrijevic added a comment to T318863: [Event Platform] Event Platform and DataHub Integration.

@tchin as discussed today, that sounds like a good approach. Before deploying to production, let's wipe out the kafka metadata given that the original POC was imported under the kafka platform. I'll add these to the acceptance criteria.

Aug 16 2023, 9:43 PM · Data Engineering and Event Platform Team (Sprint 3), Data-Engineering, Data-Catalog, Event-Platform

odimitrijevic closed T288254: Explore Containerization Solutions for DE Applications as Resolved.

Aug 16 2023, 4:24 PM · Data-Platform-SRE, Data Engineering and Event Platform Team, Spike, Data Pipelines, Data-Engineering

odimitrijevic added a comment to T288254: Explore Containerization Solutions for DE Applications.

The work related to this has been done as part of standing up the DSE K8s cluster. I will go ahead and close the ticket.

Aug 16 2023, 4:24 PM · Data-Platform-SRE, Data Engineering and Event Platform Team, Spike, Data Pipelines, Data-Engineering