User Details
- User Since
- Apr 28 2021, 12:42 AM (175 w, 6 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- ODimitrijevic (WMF) [ Global Accounts ]
Tue, Aug 20
Mon, Aug 19
Hi @elukey , @joanna_borun, the reason the plugin was chosen is because there are no other equivalent open source options. The plugin underwent an internal security review and signoff as part of the provisioning process which should provide sufficient assurances wrt to the security concerns above. Additionally see Ben's note about the codebase. If anyone wishes to reuse our configurations for Matomo they can disable the corresponding plugin. Does that address your concerns? I would like to request that we proceed with the install.
Aug 8 2024
Jul 18 2024
Thanks @Ottomata. Ftr, I approve the proposal.
Jul 16 2024
I approve the install of the plugin. The Matomo software has passed the security review, it is not distributed as part of MediaWiki and is used for analytics purposes stated above.
@BTullis Please go ahead with the deployment. The goal with the POC to understand if this is a viable solution. The decision around non-OSI licensing will follow once a recommendation is made.
Jun 25 2024
Jun 7 2024
Approved
Approved
approved
Approved
Approved
Approved
May 31 2024
Approved.
May 23 2024
Apr 25 2024
Apr 23 2024
Apr 15 2024
With the migration to liftwing these settings are no longer applicable. cc @calbon
Apr 12 2024
Mar 8 2024
Approved!
Mar 6 2024
Yes, that's correct! Approve x 2
Approved
Mar 5 2024
Approved
Approved
Mar 4 2024
Yes, approved
Approved
Feb 23 2024
The annotations have been added to the graphs:
Feb 22 2024
Feb 21 2024
Approved
Approved
Approved
Jan 30 2024
Approved.
Jan 26 2024
Approved.
Jan 19 2024
Approved
Jan 12 2024
Dec 21 2023
Approved
Approved
Dec 13 2023
Thank you @elukey!
Dec 11 2023
@Milimetric What was the root cause of this issue (the cause of missing datasets)?
Dec 7 2023
Decommissioning EventLogging would be EPIC!
Dec 6 2023
Dec 5 2023
Would the header be translated into an x-analytics value?
Dec 2 2023
A few questions:
- While we ought to consider an upgrade for all 4 clusters, from what I understand Jumbo can be upgraded independently. Are there any concerns with that approach?
- What are the upgrade considerations for Kafka clients?
- Specifically are there clients that publish to Kafka Jumbo directly or do all Kafka topics get mirrored from main (possibly logging?)?
Dec 1 2023
Approved
Approved
Approved
Oct 31 2023
This was delivered as part of the "documentathon": https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/DataHub/Data_Catalog_Documentation_Guide
Oct 25 2023
Approved!
Oct 4 2023
Sep 22 2023
approved
Sep 19 2023
Approved
Sep 8 2023
Approved!
Sep 6 2023
Sep 5 2023
@MGerlach does the pre-fetch traffic have headers that can identify it as such as it comes through as webrequests?
Sep 1 2023
Approved
Aug 24 2023
It looks like the request is also in PyHive with the following PR still open: https://github.com/dropbox/PyHive/pull/328
Bug closed because too old, and not fixed: https://github.com/apache/superset/issues/3243
Aug 23 2023
@JAllemandou is the limitation in data formatting coming from Presto or Superset (or both :) ?
@BTullis we'll need the SRE team's help with the deployment of the event platform schema ingestion into Datahub. The deployment involves a) creating the event steams custom platform and
b) deploying the ingestion code/transformer
Aug 18 2023
The failure of this job requires a manual rerun, and based on a recent assessment this happens with some frequency (on average once daily). Let's bring this into current sprint and continue to troubleshoot.
I approve
Aug 17 2023
Approving group membership
Aug 16 2023
Here are some considerations that we discussed, that we need to further explore and decide on:
- Explore creating a custom platform for Event Streams
- Add top level event schema description as the dataset documentation. TBD on how to accomplish this given import options.
- The schema import automatically adds subgroups under kafka based on the first dot segment of the schema name. In the production instance of DataHub there are also streams with the naming analytics/mediawiki/web_ab_test_enrollment. Can โ/โ be used as a separator to designate the top level category?
- Can we import goblin lineage to propagate lineage from kafka > hive?
- There would value to import hive event_raw database for completion of lineage events
- Can we add a link to the event platform schema/datahub documentation to hive tables in event and event_sanitized? Lineage would be one way to trace this. Another would be to add links in the documentation to datasets with equivalent schema both upstream and downstream. This falls into the larger consideration on how to propagate metadata between equivalent datasets stored across different platforms and refinements.
- Some of the kafka topics are remnants of tests and misconfiguration/misnamings. There is an option to add them to an exclusion list. Ideally we'd delete these in Kafka, otherwise there is an exclusion list.
- Given that the prod datahub has the event streams current Kafka metadata can we delete and reimport all the Kafka metadata? If a fresh backup is not available it would be have one handy
- Is there a way to add ownership data to event schema json and import it from there? This would benefit Metrics Platform work and allow alerting the right parties about event publishing errors. Some discussion about adding this data already happened https://phabricator.wikimedia.org/T201063#4546544
- What is the best way to ingest the metadata? Datahub transformer vs airflow vs TBD?
@tchin as discussed today, that sounds like a good approach. Before deploying to production, let's wipe out the kafka metadata given that the original POC was imported under the kafka platform. I'll add these to the acceptance criteria.
The work related to this has been done as part of standing up the DSE K8s cluster. I will go ahead and close the ticket.
Aug 14 2023
@BTullis These are good to be removed
Aug 10 2023
Done. Are there any recovery keys to be had in case I am not able to access
my phone for whatever reason?
Aug 2 2023
Approved.
Aug 1 2023
@Htriedman we are picking this work up again. Is the POC that you did available in a repository on gitlab?
Thank you @jbond!
Jul 28 2023
Approved
Jul 27 2023
@Mayakp.wiki @nshahquinn-wmf Is this still an issue?
This dataset is no longer subscribed to. We should remove the database from the download list.
Jul 26 2023
Approved
Jul 11 2023
@BTullis do the permissions need to be removed before closing the task?
Jul 7 2023
@Antoine does this still need to be implemented?