User Details
- User Since
- Apr 28 2021, 12:42 AM (99 w, 1 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- ODimitrijevic (WMF) [ Global Accounts ]
Tue, Mar 21
Approved
Thu, Mar 16
Wed, Mar 15
@xcollazo @Antoine_Quhen Does this apply to the airflow ci/cd that DE manages? Are there any improvements that we wish to adopt or expand?
Another ping to @Maryana and @MMiller_WMF . Do you have opinions on asking the TikTok team vs parsing the user-agent string as is?
Thu, Mar 9
Approved
Fri, Mar 3
Is this the same issue reported in https://phabricator.wikimedia.org/T328127?
Feb 17 2023
unique_devices_project_wide_daily and unique_devices_project_wide_monthly have no data and have been marked as deprecated. Ticket to delete: https://phabricator.wikimedia.org/T329978
Consider doing this at the same time as https://phabricator.wikimedia.org/T329978
Feb 14 2023
Currently only unique_devices_per_domain_monthly has the dataset description.
For unique devices, let's document all of:
- unique_devices_per_domain_daily
- unique_devices_per_domain_monthly
- unique_devices_per_project_family_daily
- unique_devices_per_project_family_monthly
- unique_devices_project_wide_daily
- unique_devices_project_wide_monthly
Feb 10 2023
Feb 8 2023
- The [[ media requestvapi entry | https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,event.mediawiki_api_request,PROD)/Schema?is_lineage_mode=false ]] is lacking information for the normalized host. Not sure why that's the case given that the rest of the structs are filled in. The other aspects look good. Who would be the best person to fill it in? Who should be assigned as the data owner.
- Are there links to external documentation that can be added?
- The field is_wmf_domain talks about how it is derived but not what the field means. Is it a boolean that indicates if the request came from a wmf domain vs externally e.g. bot or toolforge tool?
- Who should be the owner of this dataset?
Jan 31 2023
Yes, thank you @Antoine_Quhen! This is very exciting. Will the lineage be emitted with the upgrade or will there be another step to configure lineage to be emitted? Doing the latter will minimize the risk and give us some time to understand what lineage looks like and how to explain it... which leads to the question about the best way to test it out.
Jan 30 2023
@Maryana @KinneretG please see the spreadsheet provided for your review
Jan 26 2023
Jan 25 2023
Jan 23 2023
Pinging Product Analytics for review.
I approve the request!
Jan 20 2023
@JAllemandou Can you please provide some guidance on how to go about estimating the data loss?
It is possible to add annotations to the wikistats dashboards via meta. See https://meta.wikimedia.org/w/index.php?title=Config:Dashiki:Annotations/Wikistats/totalPageViews. Annotations have been created for the two above incidents. Let's review and assess if this is sufficient.
Adding @Snwachukwu and @Antoine_Quhen as subscribers.
For a reference of datasets that have already been documented, and ones that still need to be added to the data catalog see: https://docs.google.com/spreadsheets/d/1lyl92MVVhfFPQva_fPMUnXtSCFa3axHliM6eUaSFjNU/edit#gid=812088000
Jan 19 2023
Approved
Jan 10 2023
Jan 9 2023
@Antoine_Quhen the dashboard doesn't show any results currently.
Thanks Andrew for the write up!
Jan 6 2023
Jan 4 2023
The report has been published on: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Data_Issues/2021-02-09_Unique_Devices_By_Family_Overcount
The data issue summary report has been published on https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Data_Issues/2021-06-04_Traffic_Data_Loss
Dec 21 2022
Dec 19 2022
@CDanis thanks for bubbling this up. We'll discuss when we get back In January to understand what the effort entails. We may have some additional questions about your specific use case to understand how to prioritize against the many other requests.
@JAllemandou can you please provide the delta between the two totals (per family and per project) and an estimate, if not the actual number of how much of that delta can be attributed to the special page handling.
I believe this is the same problem discussed in https://phabricator.wikimedia.org/T276472. Can they both be closed at the same time?
Thank you @JAllemandou! That explains things clearly. I have added the follow up work to the planning board.
Dec 16 2022
Dec 14 2022
Dec 12 2022
@JArguello-WMF ideally before end of year. I would like to add it to the unexpected work in the current sprint.
Dec 8 2022
Dec 6 2022
@kzimmerman let's discuss prioritizing. A significantly larger overcount may exist for the wikimedia project family.
Nov 28 2022
I am arriving at the conversation a little late. I am curious about the reason to separate geoeditor from the editing analytics services?
Nov 14 2022
@Gehel putting this on your radar
Spark 2 deprecation was announced via Slack and analytics-alerts:
Nov 7 2022
Approved
Oct 23 2022
Adding to data-pipelines to assess if there is data loss that should be noted for the first outage where the backfill might not have been run.
Oct 21 2022
@Cmjohnson Thank you!
@Ottomata makes sense. Thanks for posting the ticket
Oct 20 2022
DataHub has an API and we can use to import the schema. That schema should ideally be tied to the Kafka topics as this is the true source.
Oct 18 2022
Thanks @mforns for creating the list. Would it be helpful to have the risk v complexity column similar to the Oozie migration spreadsheet both for our and other team's jobs?
Improving the bot traffic detection is on the longer term planning horizon, however we are not likely to get to this in the near future given other priorities.
Sep 22 2022
Sep 19 2022
Approved!
Sep 12 2022
Approved
Sep 7 2022
Thank you @JAllemandou
Aug 29 2022
Aug 8 2022
Jul 28 2022
Request is approved.
Approved
Approved
Approved!