User Details
- User Since
- Apr 28 2021, 12:42 AM (203 w, 2 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- ODimitrijevic (WMF) [ Global Accounts ]
Jan 19 2025
Congrats! Can this task be closed?
Jan 10 2025
Thanks @xcollazo. Will this delay impact the timeliness of any of the downstream pipelines?
Dec 22 2024
Based on my understanding, given that these are partial dumps they won't have downstream cascading effects on the internal use cases. We can wait to start the full run when folks are back after the New Year. @BTullis @Marostegui let's go ahead and pause the run.
Dec 10 2024
Yes, I approve streamlining the access to WMDE staff in the same way that we do for WMF staff as proposed in https://phabricator.wikimedia.org/T370424
Dec 4 2024
For reference: https://phabricator.wikimedia.org/T119070
Nov 12 2024
I will go ahead and close this task since the POCs are complete. The proposed integration architecture introduces a proxy service that will provide experiment configurations to the wikis in the same way, regardless of the experimentation platform. When this service is proposed, we'll engage the appropriate teams for review and approval.
This task along with https://phabricator.wikimedia.org/T369178 can be closed since the POCs are complete. The proposed integration architecture introduces a proxy service that will provide experiment configurations to the wikis in the same way, regardless of the experimentation platform. Once this service is in design , we'll engage the appropriate teams for review and approval.
Hi @Legoktm, the linked document has been abandoned and is not longer under consideration.
Nov 2 2024
Oct 25 2024
Oct 24 2024
@BTullis Let's pause the production of the dumps given that they are stuck in a retry loop that's putting load on the production servers.
Oct 22 2024
Oct 17 2024
Approved
Aug 20 2024
Aug 19 2024
Hi @elukey , @joanna_borun, the reason the plugin was chosen is because there are no other equivalent open source options. The plugin underwent an internal security review and signoff as part of the provisioning process which should provide sufficient assurances wrt to the security concerns above. Additionally see Ben's note about the codebase. If anyone wishes to reuse our configurations for Matomo they can disable the corresponding plugin. Does that address your concerns? I would like to request that we proceed with the install.
Aug 8 2024
Jul 18 2024
Thanks @Ottomata. Ftr, I approve the proposal.
Jul 16 2024
I approve the install of the plugin. The Matomo software has passed the security review, it is not distributed as part of MediaWiki and is used for analytics purposes stated above.
@BTullis Please go ahead with the deployment. The goal with the POC to understand if this is a viable solution. The decision around non-OSI licensing will follow once a recommendation is made.
Jun 25 2024
Jun 7 2024
Approved
Approved
approved
Approved
Approved
Approved
May 31 2024
Approved.
May 23 2024
Apr 25 2024
Apr 23 2024
Apr 15 2024
With the migration to liftwing these settings are no longer applicable. cc @calbon
Apr 12 2024
Mar 8 2024
Approved!
Mar 6 2024
Yes, that's correct! Approve x 2
Approved
Mar 5 2024
Approved
Approved
Mar 4 2024
Yes, approved
Approved
Feb 23 2024
The annotations have been added to the graphs:
Feb 22 2024
Feb 21 2024
Approved
Approved
Approved
Jan 30 2024
Approved.
Jan 26 2024
Approved.
Jan 19 2024
Approved
Jan 12 2024
Dec 21 2023
Approved
Approved
Dec 13 2023
Thank you @elukey!
Dec 11 2023
@Milimetric What was the root cause of this issue (the cause of missing datasets)?
Dec 7 2023
Decommissioning EventLogging would be EPIC!
Dec 6 2023
Dec 5 2023
Would the header be translated into an x-analytics value?
Dec 2 2023
A few questions:
- While we ought to consider an upgrade for all 4 clusters, from what I understand Jumbo can be upgraded independently. Are there any concerns with that approach?
- What are the upgrade considerations for Kafka clients?
- Specifically are there clients that publish to Kafka Jumbo directly or do all Kafka topics get mirrored from main (possibly logging?)?
Dec 1 2023
Approved
Approved
Approved
Oct 31 2023
This was delivered as part of the "documentathon": https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/DataHub/Data_Catalog_Documentation_Guide
Oct 25 2023
Approved!
Oct 4 2023
Sep 22 2023
approved
Sep 19 2023
Approved
Sep 8 2023
Approved!
Sep 6 2023
Sep 5 2023
@MGerlach does the pre-fetch traffic have headers that can identify it as such as it comes through as webrequests?
Sep 1 2023
Approved
Aug 24 2023
It looks like the request is also in PyHive with the following PR still open: https://github.com/dropbox/PyHive/pull/328
Bug closed because too old, and not fixed: https://github.com/apache/superset/issues/3243
Aug 23 2023
@JAllemandou is the limitation in data formatting coming from Presto or Superset (or both :) ?
@BTullis we'll need the SRE team's help with the deployment of the event platform schema ingestion into Datahub. The deployment involves a) creating the event steams custom platform and
b) deploying the ingestion code/transformer
Aug 18 2023
The failure of this job requires a manual rerun, and based on a recent assessment this happens with some frequency (on average once daily). Let's bring this into current sprint and continue to troubleshoot.
I approve
Aug 17 2023
Approving group membership
Aug 16 2023
Here are some considerations that we discussed, that we need to further explore and decide on:
- Explore creating a custom platform for Event Streams
- Add top level event schema description as the dataset documentation. TBD on how to accomplish this given import options.
- The schema import automatically adds subgroups under kafka based on the first dot segment of the schema name. In the production instance of DataHub there are also streams with the naming analytics/mediawiki/web_ab_test_enrollment. Can “/” be used as a separator to designate the top level category?
- Can we import goblin lineage to propagate lineage from kafka > hive?
- There would value to import hive event_raw database for completion of lineage events
- Can we add a link to the event platform schema/datahub documentation to hive tables in event and event_sanitized? Lineage would be one way to trace this. Another would be to add links in the documentation to datasets with equivalent schema both upstream and downstream. This falls into the larger consideration on how to propagate metadata between equivalent datasets stored across different platforms and refinements.
- Some of the kafka topics are remnants of tests and misconfiguration/misnamings. There is an option to add them to an exclusion list. Ideally we'd delete these in Kafka, otherwise there is an exclusion list.
- Given that the prod datahub has the event streams current Kafka metadata can we delete and reimport all the Kafka metadata? If a fresh backup is not available it would be have one handy
- Is there a way to add ownership data to event schema json and import it from there? This would benefit Metrics Platform work and allow alerting the right parties about event publishing errors. Some discussion about adding this data already happened https://phabricator.wikimedia.org/T201063#4546544
- What is the best way to ingest the metadata? Datahub transformer vs airflow vs TBD?
@tchin as discussed today, that sounds like a good approach. Before deploying to production, let's wipe out the kafka metadata given that the original POC was imported under the kafka platform. I'll add these to the acceptance criteria.
The work related to this has been done as part of standing up the DSE K8s cluster. I will go ahead and close the ticket.
Aug 14 2023
@BTullis These are good to be removed
Aug 10 2023
Done. Are there any recovery keys to be had in case I am not able to access
my phone for whatever reason?