User Details
- User Since
- Apr 28 2021, 12:42 AM (158 w, 2 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- ODimitrijevic (WMF) [ Global Accounts ]
Thu, Apr 25
Tue, Apr 23
Mon, Apr 15
With the migration to liftwing these settings are no longer applicable. cc @calbon
Fri, Apr 12
Mar 8 2024
Approved!
Mar 6 2024
Yes, that's correct! Approve x 2
Approved
Mar 5 2024
Approved
Approved
Mar 4 2024
Yes, approved
Approved
Feb 23 2024
The annotations have been added to the graphs:
Feb 22 2024
Feb 21 2024
Approved
Approved
Approved
Jan 30 2024
Approved.
Jan 26 2024
Approved.
Jan 19 2024
Approved
Jan 12 2024
Dec 21 2023
Approved
Approved
Dec 13 2023
Thank you @elukey!
Dec 11 2023
@Milimetric What was the root cause of this issue (the cause of missing datasets)?
Dec 7 2023
Decommissioning EventLogging would be EPIC!
Dec 6 2023
Dec 5 2023
Would the header be translated into an x-analytics value?
Dec 2 2023
A few questions:
- While we ought to consider an upgrade for all 4 clusters, from what I understand Jumbo can be upgraded independently. Are there any concerns with that approach?
- What are the upgrade considerations for Kafka clients?
- Specifically are there clients that publish to Kafka Jumbo directly or do all Kafka topics get mirrored from main (possibly logging?)?
Dec 1 2023
Approved
Approved
Approved
Oct 31 2023
This was delivered as part of the "documentathon": https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/DataHub/Data_Catalog_Documentation_Guide
Oct 25 2023
Approved!
Oct 4 2023
Sep 22 2023
approved
Sep 19 2023
Approved
Sep 8 2023
Approved!
Sep 6 2023
Sep 5 2023
@MGerlach does the pre-fetch traffic have headers that can identify it as such as it comes through as webrequests?
Sep 1 2023
Approved
Aug 24 2023
It looks like the request is also in PyHive with the following PR still open: https://github.com/dropbox/PyHive/pull/328
Bug closed because too old, and not fixed: https://github.com/apache/superset/issues/3243
Aug 23 2023
@JAllemandou is the limitation in data formatting coming from Presto or Superset (or both :) ?
@BTullis we'll need the SRE team's help with the deployment of the event platform schema ingestion into Datahub. The deployment involves a) creating the event steams custom platform and
b) deploying the ingestion code/transformer
Aug 18 2023
The failure of this job requires a manual rerun, and based on a recent assessment this happens with some frequency (on average once daily). Let's bring this into current sprint and continue to troubleshoot.
I approve
Aug 17 2023
Approving group membership
Aug 16 2023
Here are some considerations that we discussed, that we need to further explore and decide on:
- Explore creating a custom platform for Event Streams
- Add top level event schema description as the dataset documentation. TBD on how to accomplish this given import options.
- The schema import automatically adds subgroups under kafka based on the first dot segment of the schema name. In the production instance of DataHub there are also streams with the naming analytics/mediawiki/web_ab_test_enrollment. Can “/” be used as a separator to designate the top level category?
- Can we import goblin lineage to propagate lineage from kafka > hive?
- There would value to import hive event_raw database for completion of lineage events
- Can we add a link to the event platform schema/datahub documentation to hive tables in event and event_sanitized? Lineage would be one way to trace this. Another would be to add links in the documentation to datasets with equivalent schema both upstream and downstream. This falls into the larger consideration on how to propagate metadata between equivalent datasets stored across different platforms and refinements.
- Some of the kafka topics are remnants of tests and misconfiguration/misnamings. There is an option to add them to an exclusion list. Ideally we'd delete these in Kafka, otherwise there is an exclusion list.
- Given that the prod datahub has the event streams current Kafka metadata can we delete and reimport all the Kafka metadata? If a fresh backup is not available it would be have one handy
- Is there a way to add ownership data to event schema json and import it from there? This would benefit Metrics Platform work and allow alerting the right parties about event publishing errors. Some discussion about adding this data already happened https://phabricator.wikimedia.org/T201063#4546544
- What is the best way to ingest the metadata? Datahub transformer vs airflow vs TBD?
@tchin as discussed today, that sounds like a good approach. Before deploying to production, let's wipe out the kafka metadata given that the original POC was imported under the kafka platform. I'll add these to the acceptance criteria.
The work related to this has been done as part of standing up the DSE K8s cluster. I will go ahead and close the ticket.
Aug 14 2023
@BTullis These are good to be removed
Aug 10 2023
Done. Are there any recovery keys to be had in case I am not able to access
my phone for whatever reason?
Aug 2 2023
Approved.
Aug 1 2023
@Htriedman we are picking this work up again. Is the POC that you did available in a repository on gitlab?
Thank you @jbond!
Jul 28 2023
Approved
Jul 27 2023
@Mayakp.wiki @nshahquinn-wmf Is this still an issue?
This dataset is no longer subscribed to. We should remove the database from the download list.
Jul 26 2023
Approved
Jul 11 2023
@BTullis do the permissions need to be removed before closing the task?
Jul 7 2023
@Antoine does this still need to be implemented?
Jul 6 2023
So gratifying to be able to be closing this task!