Page MenuHomePhabricator

AndrewTavis_WMDE (Andrew Tavis McAllister)
Data analyst for Wikidata at Wikimedia Deutschland

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Apr 14 2023, 9:33 AM (67 w, 20 h)
Availability
Available
IRC Nick
andrewtavis-wmde
LDAP User
Andrew McAllister (WMDE)
MediaWiki User
Andrew McAllister (WMDE) [ Global Accounts ]

Hello! I'm Andrew (he/him), an Oregonian from the US who's been living in Berlin since 2016. I'm a data analyst for the Wikidata team at Wikimedia Deutschland.

Outside my work at Wikimedia Deutschland I'm also an initiator of activist.org, a platform for progressive political activism, and Scribe, an open-source organization that leverages Wikidata to create keyboards for second language learners.

Note: this Phabricator account is solely for my work at Wikimedia Deutschland. My private account is AndrewTavis.

Recent Activity

Fri, Jul 5

AndrewTavis_WMDE updated the task description for T356618: [EPIC] Check of legacy wmde analytics infrastructure.
Fri, Jul 5, 7:13 PM · Epic, Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE updated the task description for T356618: [EPIC] Check of legacy wmde analytics infrastructure.
Fri, Jul 5, 7:12 PM · Epic, Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE moved T365457: Bring in all Purdue Porgram PRs and upload Mismatch Finder mismatches from Prioritized backlog to In progress on the Wikidata Analytics (Kanban) board.
Fri, Jul 5, 7:03 PM · Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE moved T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs from In progress to Product verification on the Wikidata Analytics (Kanban) board.
Fri, Jul 5, 7:03 PM · Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE added a comment to T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs.

DAG has passed in production, and the data can be seen at published/datasets/wmde/analytics/wd_rest_api_metrics_monthly/.

Fri, Jul 5, 7:03 PM · Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE updated the task description for T349285: [Analytics] Quartely/monthly User Agents using Wikidata's new REST API.
Fri, Jul 5, 6:49 PM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T349285: [Analytics] Quartely/monthly User Agents using Wikidata's new REST API.

The DAG its job at this point are ready. The merge request has been updated and can be seen here: gitlab.wikimedia.org/repos/data-engineering/airflow-dags/merge_requests/738. I don't have time now to do the local testing as I tried earlier today and the query was taking too long, but this will be finished quickly after I'm back.

Fri, Jul 5, 6:47 PM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE moved T349285: [Analytics] Quartely/monthly User Agents using Wikidata's new REST API from Prioritized backlog to In progress on the Wikidata Analytics (Kanban) board.
Fri, Jul 5, 6:24 PM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE moved T362849: [Analytics] Items that contain a sitelink to one of the Wikimedia projects over time from In progress to Product verification on the Wikidata Analytics (Kanban) board.
Fri, Jul 5, 6:23 PM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE moved T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs from Product verification to In progress on the Wikidata Analytics (Kanban) board.
Fri, Jul 5, 6:23 PM · Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE moved T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs from In progress to Product verification on the Wikidata Analytics (Kanban) board.
Fri, Jul 5, 6:23 PM · Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE added a comment to T362849: [Analytics] Items that contain a sitelink to one of the Wikimedia projects over time .

Data from the DAG can be found at published/datasets/wmde/analytics/wd_item_sitelink_segments_weekly/. There's a 0ed out row for 13-5-2024 as we'd gotten the data from the previous week in testing, but didn't for that week. It'll be collected in T363583 when we get the historical segments.

Fri, Jul 5, 6:22 PM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T362849: [Analytics] Items that contain a sitelink to one of the Wikimedia projects over time .
Fri, Jul 5, 6:21 PM · Wikidata Analytics (Kanban), Wikidata

Wed, Jul 3

AndrewTavis_WMDE added a comment to T342559: [PERIODIC] Monthly repeating tasks (next: August 2024).

@Manuel, report has been updated. The queries ran fine, but for some reason the data export didn't work even though it did for testing. Checking with WMF about this. Will get to the rest of the tasks for this month today/tomorrow and the other DAG will be deployed once the issues with the current ones are figured out.

Wed, Jul 3, 12:16 PM · periodic-update, Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE updated the task description for T342559: [PERIODIC] Monthly repeating tasks (next: August 2024).
Wed, Jul 3, 12:14 PM · periodic-update, Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE added a comment to T342559: [PERIODIC] Monthly repeating tasks (next: August 2024).

Hey @Manuel. Still not up, which might be because of changes to the queries given the new export to published datasets step. I'm deploying the updated DAG now as tests passed, so we'll see if all finishes.

Wed, Jul 3, 9:48 AM · periodic-update, Wikidata, Wikidata Analytics (Kanban)

Tue, Jul 2

AndrewTavis_WMDE updated the task description for T368944: Improve Airflow DAG testing process.
Tue, Jul 2, 9:18 AM · Data-Engineering (Q1 2024 July 1st - September 30th), Data Pipelines

Mon, Jul 1

AndrewTavis_WMDE updated the task description for T368944: Improve Airflow DAG testing process.
Mon, Jul 1, 4:54 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Data Pipelines
AndrewTavis_WMDE updated the task description for T368944: Improve Airflow DAG testing process.
Mon, Jul 1, 4:50 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Data Pipelines
AndrewTavis_WMDE updated the task description for T368944: Improve Airflow DAG testing process.
Mon, Jul 1, 4:31 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Data Pipelines
AndrewTavis_WMDE created T368944: Improve Airflow DAG testing process.
Mon, Jul 1, 4:26 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Data Pipelines
AndrewTavis_WMDE added a comment to T342559: [PERIODIC] Monthly repeating tasks (next: August 2024).

Hi @Manuel: I just ran the query and it's not yet been updated. Likely that wmf.webrequest isn't updated yet and thus the sensor hasn't fired to trigger the DAG. Will check again tomorrow.

Mon, Jul 1, 12:08 PM · periodic-update, Wikidata, Wikidata Analytics (Kanban)

Jun 20 2024

AndrewTavis_WMDE added a comment to T349285: [Analytics] Quartely/monthly User Agents using Wikidata's new REST API.

MR is already open with the basic structure of the DAG (basically just the first REST API one with minor modifications). I'll finalize this once the REST API metrics one and the sitelinks one are finalized as far as exporting to the published datasets. Only question on those now is where I should be sending test data to (HDFS or the stat machines).

Jun 20 2024, 3:39 PM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T349285: [Analytics] Quartely/monthly User Agents using Wikidata's new REST API.
Jun 20 2024, 3:00 PM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T349285: [Analytics] Quartely/monthly User Agents using Wikidata's new REST API.
Jun 20 2024, 3:00 PM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE claimed T349285: [Analytics] Quartely/monthly User Agents using Wikidata's new REST API.
Jun 20 2024, 3:00 PM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T349285: [Analytics] Quartely/monthly User Agents using Wikidata's new REST API.

A question on this as I'm writing the basics while I'm waiting on info for the testing of the other DAGs:

Jun 20 2024, 2:59 PM · Wikidata Analytics (Kanban), Wikidata

Jun 19 2024

AndrewTavis_WMDE updated the task description for T356618: [EPIC] Check of legacy wmde analytics infrastructure.
Jun 19 2024, 9:40 AM · Epic, Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE updated the task description for T356618: [EPIC] Check of legacy wmde analytics infrastructure.
Jun 19 2024, 9:40 AM · Epic, Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE updated subscribers of T356618: [EPIC] Check of legacy wmde analytics infrastructure.

Thank you for this, @Michael! Really appreciate you following along with this and helping out :) :)

Jun 19 2024, 9:37 AM · Epic, Wikidata, Wikidata Analytics (Kanban)

Jun 14 2024

AndrewTavis_WMDE added a comment to T367568: Cloud VPS "wmde-dashboards" project Buster deprecation.

Thanks for making this! I've marked the wmde-dashboards project as good for deletion as seen here. Please let us know if further information is needed :)

Jun 14 2024, 5:14 PM · Cloud-VPS (Debian Buster Deprecation)

Jun 12 2024

AndrewTavis_WMDE added a project to T356618: [EPIC] Check of legacy wmde analytics infrastructure: Epic.
Jun 12 2024, 11:52 AM · Epic, Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE updated the task description for T356618: [EPIC] Check of legacy wmde analytics infrastructure.
Jun 12 2024, 11:25 AM · Epic, Wikidata, Wikidata Analytics (Kanban)

Jun 11 2024

AndrewTavis_WMDE moved T366621: [Analytics] Analysis of REST API user agents for May 2024 from In progress to Product verification on the Wikidata Analytics (Kanban) board.

@Manuel and @Lydia_Pintscher, just shared a folder with the two CSVs on Wolke. Let me know if there's anything else needed, and I will set a reminder that they should be deleted on my end in 89 days (they were generated yesterday). Sharing has been disabled on the directory, so if others need access, then let me know :)

Jun 11 2024, 3:47 PM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T366621: [Analytics] Analysis of REST API user agents for May 2024.
Jun 11 2024, 3:46 PM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T360296: [Analytics] Implement data process to identify missing Wiktionary entries .

Hi @MarcoSwart 👋 Thanks for the communication here :) I guess I'm a bit confused by how the other one would be used. You're roughly talking about:

Jun 11 2024, 2:33 PM · Wikidata Integration in Wikimedia projects, Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T366621: [Analytics] Analysis of REST API user agents for May 2024.

@Manuel, my assumption was that you could help any non-analytics PMs or go through the results with them as you have the needed access. Using Google for PII is not something we're supposed to do if it can be avoided, but I have no experience with Wolke. Please let me know if you'd like me to look into Wolke or send the files over Drive.

Jun 11 2024, 1:56 PM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T360296: [Analytics] Implement data process to identify missing Wiktionary entries .

Talked further with WMF about this just now. One basic question for the end users: would it make it more convenient for you all if the exported datasets were per Wiktionary? There are two options here, with missing entries being used as an example:

Jun 11 2024, 1:12 PM · Wikidata Integration in Wikimedia projects, Wikidata Analytics (Kanban), Wikidata

Jun 10 2024

AndrewTavis_WMDE updated the task description for T366621: [Analytics] Analysis of REST API user agents for May 2024.
Jun 10 2024, 5:06 PM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T366621: [Analytics] Analysis of REST API user agents for May 2024.
Jun 10 2024, 3:43 PM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T366621: [Analytics] Analysis of REST API user agents for May 2024.
Jun 10 2024, 12:48 PM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T366621: [Analytics] Analysis of REST API user agents for May 2024.

I can also prepare a notebook with quick functions to load and explore the data, if that would make the option I suggested a bit easier.

Jun 10 2024, 11:02 AM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T366621: [Analytics] Analysis of REST API user agents for May 2024.

Would it be possible to send us a spreadsheet (and schedule it for deletion after 90 days)?

Jun 10 2024, 11:01 AM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T366621: [Analytics] Analysis of REST API user agents for May 2024.

Base queries for all of this are ready :) Let me know on the above and I'll finalize them. Actually running them will take some time.

Jun 10 2024, 10:46 AM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T366621: [Analytics] Analysis of REST API user agents for May 2024.

Checking on the numbers here really quick: the request is for the top 1000 user agents by number of requests and then a sample of 1000 user agents, but the total is 1221. Would an ordered list of all of them make more sense as we're talking a sample of 82%? There really isn't going to be a difference between the first two sets. An ordered list of all of them and another ordered list of all who were active in May and not in April?

Jun 10 2024, 10:18 AM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T366621: [Analytics] Analysis of REST API user agents for May 2024.
Jun 10 2024, 9:57 AM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE moved T363583: Generate historical weekly segments of Wikidata item sitelink segmentations from Waiting for input/support to Prioritized backlog on the Wikidata Analytics (Kanban) board.
Jun 10 2024, 9:42 AM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T363583: Generate historical weekly segments of Wikidata item sitelink segmentations.

Status is open as T364045 has been resolved :)

Jun 10 2024, 9:38 AM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE changed the status of T363583: Generate historical weekly segments of Wikidata item sitelink segmentations from Stalled to Open.
Jun 10 2024, 9:37 AM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE moved T366621: [Analytics] Analysis of REST API user agents for May 2024 from Prioritized backlog to In progress on the Wikidata Analytics (Kanban) board.
Jun 10 2024, 9:37 AM · Wikidata Analytics (Kanban), Wikidata

Jun 7 2024

AndrewTavis_WMDE added a comment to T364045: [Bug?] Can't find wikidatawiki on wmf.mediawiki_wikitext_history.

Thank you for the efforts here, @JAllemandou! Really great to have this back, and glad that it's worked out in a way where others are not adversely effected :)

Jun 7 2024, 8:46 AM · Wikidata, Wikidata Analytics, Data-Engineering (Q4 2024 April 1st - June 30th)

Jun 6 2024

AndrewTavis_WMDE added a comment to T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs.

Unstalled as the plan for the data export has been approved in T365699 :)

Jun 6 2024, 12:32 PM · Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE changed the status of T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs from Stalled to Open.
Jun 6 2024, 12:31 PM · Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE added a comment to T362849: [Analytics] Items that contain a sitelink to one of the Wikimedia projects over time .

Unstalled as the table has been created :)

Jun 6 2024, 12:31 PM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE changed the status of T362849: [Analytics] Items that contain a sitelink to one of the Wikimedia projects over time , a subtask of T343019: [EPIC] Segments of Wikidata's data over time [up to milestone 3], from Stalled to Open.
Jun 6 2024, 12:30 PM · Wikidata Analytics (Kanban), Epic, Wikidata
AndrewTavis_WMDE changed the status of T362849: [Analytics] Items that contain a sitelink to one of the Wikimedia projects over time from Stalled to Open.
Jun 6 2024, 12:30 PM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T360296: [Analytics] Implement data process to identify missing Wiktionary entries .

Hi @MarcoSwart, sorry for changing the status without explanation. Was in a meeting and we were moving things around, but obviously context should have been added. This is stalled for now as we're waiting for WMF to advise us on the best way forward on migrating data from MariaDB to HDFS. The data processes we need to use for this cannot be run directly on MariaDB in a sustainable way that's in line with long term supported data practices, so first we need to migrate the data to the private data cluster, and then our normal workflows take over. This migration is non-standard, and they're looking into how best to support/guide us.

Jun 6 2024, 12:27 PM · Wikidata Integration in Wikimedia projects, Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T363583: Generate historical weekly segments of Wikidata item sitelink segmentations.

Note, work that will unblock this task is being done in T364045: [Bug?] Can't find wikidatawiki on wmf.mediawiki_wikitext_history.

Jun 6 2024, 11:21 AM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE claimed T366621: [Analytics] Analysis of REST API user agents for May 2024.
Jun 6 2024, 11:18 AM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T366621: [Analytics] Analysis of REST API user agents for May 2024.

Quick note on this, in discussion, something to check as well would be those user agents that were present in May 2024, but were not active in April 2024 :)

Jun 6 2024, 11:18 AM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE changed the status of T360296: [Analytics] Implement data process to identify missing Wiktionary entries , a subtask of T332899: [EPIC] Migrate selected R-based Wikidata products , from Open to Stalled.
Jun 6 2024, 11:13 AM · Wikidata Analytics (Kanban), Epic, Wikidata
AndrewTavis_WMDE changed the status of T360296: [Analytics] Implement data process to identify missing Wiktionary entries from Open to Stalled.
Jun 6 2024, 11:13 AM · Wikidata Integration in Wikimedia projects, Wikidata Analytics (Kanban), Wikidata

Jun 4 2024

AndrewTavis_WMDE added a comment to T360296: [Analytics] Implement data process to identify missing Wiktionary entries .

There's now a MR draft for the DAGs open on GitLab. There's still lots to do as WMF wants to sync on suggestions they'll give me on how to do the MariaDB to HDFS data transfer, but the DAGs are mapped out and the hive queries they're calling have been prepared :)

Jun 4 2024, 7:28 PM · Wikidata Integration in Wikimedia projects, Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T356618: [EPIC] Check of legacy wmde analytics infrastructure.
Jun 4 2024, 4:20 PM · Epic, Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE updated the task description for T356618: [EPIC] Check of legacy wmde analytics infrastructure.
Jun 4 2024, 4:18 PM · Epic, Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE added a comment to T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days).

Thanks so much for the support here, @BTullis! I'll update the epic with this being done. So close to being finished with all this :)

Jun 4 2024, 4:17 PM · Data-Platform-SRE (2024.05.27 - 2024.06.16), Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE added a comment to T365699: Published datasets data release request for Wikidata REST API metrics.

Quick note on this, @Htriedman, was talking with y'all's Data Engineering about this and we made the decision that the query will be reverted to what it was before such that the data lake will have the bigint values, and then in the step that generates the CSV for export to the published datasets directories we'll cast the values to strings and make sure that <25 is replaced for those values that call for it!

Jun 4 2024, 12:47 PM · Privacy Engineering

Jun 3 2024

AndrewTavis_WMDE updated the task description for T360296: [Analytics] Implement data process to identify missing Wiktionary entries .
Jun 3 2024, 5:26 PM · Wikidata Integration in Wikimedia projects, Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T360296: [Analytics] Implement data process to identify missing Wiktionary entries .
Jun 3 2024, 4:18 PM · Wikidata Integration in Wikimedia projects, Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE moved T360296: [Analytics] Implement data process to identify missing Wiktionary entries from In progress to Prioritized backlog on the Wikidata Analytics (Kanban) board.
Jun 3 2024, 3:30 PM · Wikidata Integration in Wikimedia projects, Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T360296: [Analytics] Implement data process to identify missing Wiktionary entries .
Jun 3 2024, 2:57 PM · Wikidata Integration in Wikimedia projects, Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T360296: [Analytics] Implement data process to identify missing Wiktionary entries .

wmde/analytics/hql/airflow_jobs/wiktionary_cognate on GitLab now has all the needed queries for missing entries, most popular entries and comparing Wiktionaries. Was easier to write all three at once rather than lose some context later. Note that these are Hive queries as the goal is to first migrate them to HDFS.

Jun 3 2024, 2:14 PM · Wikidata Integration in Wikimedia projects, Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T342559: [PERIODIC] Monthly repeating tasks (next: August 2024).

Table has been updated with the new data from the most recent DAG run. Lots more user agents - almost a 3x increase. Noting this for now as maybe grounds for further investigation later, but IPs are also increasing (just not by as much).

Jun 3 2024, 9:11 AM · periodic-update, Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE renamed T342559: [PERIODIC] Monthly repeating tasks (next: August 2024) from [Analytics] Monthly repeating tasks (next: June 2024) to [Analytics] Monthly repeating tasks (next: July 2024).
Jun 3 2024, 9:03 AM · periodic-update, Wikidata, Wikidata Analytics (Kanban)

May 29 2024

AndrewTavis_WMDE closed T351072: Remove the WDCM clone (stats1007) as Resolved.
May 29 2024, 4:13 PM · Wikidata Dev Team, Wikidata Analytics (Kanban), Puppet, wmde-wikidata-tech, Wikidata, Technical-Debt
AndrewTavis_WMDE closed T351072: Remove the WDCM clone (stats1007), a subtask of T351070: [EPIC] Clean up Wikidata Grafana cronjobs , as Resolved.
May 29 2024, 4:12 PM · Wikidata, Wikidata-Campsite, Wikidata Analytics, Epic
AndrewTavis_WMDE closed T351072: Remove the WDCM clone (stats1007), a subtask of T364965: stat1007 to stat1011 migration pipeline output check, as Resolved.
May 29 2024, 4:12 PM · Wikidata Dev Team, Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE added a comment to T351072: Remove the WDCM clone (stats1007).

Perfect, @Lucas_Werkmeister_WMDE! Glad to have this all cleared up :)

May 29 2024, 4:12 PM · Wikidata Dev Team, Wikidata Analytics (Kanban), Puppet, wmde-wikidata-tech, Wikidata, Technical-Debt
AndrewTavis_WMDE closed T364965: stat1007 to stat1011 migration pipeline output check as Resolved.

Sounds good to me! :) Thanks for the help here, @Lucas_Werkmeister_WMDE and @BTullis!

May 29 2024, 4:11 PM · Wikidata Dev Team, Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE awarded T351072: Remove the WDCM clone (stats1007) a Like token.
May 29 2024, 4:08 PM · Wikidata Dev Team, Wikidata Analytics (Kanban), Puppet, wmde-wikidata-tech, Wikidata, Technical-Debt
AndrewTavis_WMDE awarded T364965: stat1007 to stat1011 migration pipeline output check a Like token.
May 29 2024, 4:08 PM · Wikidata Dev Team, Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE added a comment to T321666: Wiktionary Cognate Dashboard is not accessible [timeboxed 0.5 days].

Hi @Bicolino34 👋 Thanks for reaching out :) We are still working on tasks related to this dashboard - at least bringing back some of the data processes.

May 29 2024, 10:05 AM · Wikidata Analytics, Wikidata-Campsite, User-ItamarWMDE, Cognate, Wikidata
AndrewTavis_WMDE added a comment to T351072: Remove the WDCM clone (stats1007).

Moving this to verification given the work in T364965. Thanks for all of this, @Lucas_Werkmeister_WMDE! Maybe we can resolve this and leave T364965 until stat1007 is deprecated, or resolve both?

May 29 2024, 7:56 AM · Wikidata Dev Team, Wikidata Analytics (Kanban), Puppet, wmde-wikidata-tech, Wikidata, Technical-Debt
AndrewTavis_WMDE added a comment to T364965: stat1007 to stat1011 migration pipeline output check.

None of the files listed in your comment above look like things we should worry about, @Lucas_Werkmeister_WMDE. Similarly that there's a different commit for this, as to my knowledge stat1005 was the main server for the related work.

May 29 2024, 7:54 AM · Wikidata Dev Team, Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE moved T364965: stat1007 to stat1011 migration pipeline output check from Prioritized backlog to Product verification on the Wikidata Analytics (Kanban) board.
May 29 2024, 7:47 AM · Wikidata Dev Team, Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE moved T351072: Remove the WDCM clone (stats1007) from Prioritized backlog to Product verification on the Wikidata Analytics (Kanban) board.
May 29 2024, 7:47 AM · Wikidata Dev Team, Wikidata Analytics (Kanban), Puppet, wmde-wikidata-tech, Wikidata, Technical-Debt
AndrewTavis_WMDE closed T365700: Published datasets data release request for Wikidata sitelink segments metrics as Resolved.

Given comments in T365700 - https://phabricator.wikimedia.org/T365699#9838686 - resolving this task as the metrics are for the underlying dataset and not user/editor/reader activity :)

May 29 2024, 7:46 AM · Privacy Engineering
AndrewTavis_WMDE added a comment to T365699: Published datasets data release request for Wikidata REST API metrics.

Thanks for confirming, @Htriedman! I'm not seeing the message in T365700, but will quote the above message when I resolve it :) Will resolve this one post MR merge!

May 29 2024, 7:44 AM · Privacy Engineering

May 28 2024

AndrewTavis_WMDE added a comment to T365699: Published datasets data release request for Wikidata REST API metrics.

Thanks for the help here, @Htriedman :) I've now changed the SELECT to the following locally as well as switched over the schema to using string:

May 28 2024, 11:59 AM · Privacy Engineering
AndrewTavis_WMDE added a comment to T360296: [Analytics] Implement data process to identify missing Wiktionary entries .

I've been asking around about the data source and connecting the tables and have yet to get concrete answers. Based on general assumptions of the names of the tables/columns though, the path forward for getting missing entries for a Wiktionary will be to:

May 28 2024, 11:52 AM · Wikidata Integration in Wikimedia projects, Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE updated the task description for T356618: [EPIC] Check of legacy wmde analytics infrastructure.
May 28 2024, 9:16 AM · Epic, Wikidata, Wikidata Analytics (Kanban)

May 24 2024

AndrewTavis_WMDE added a comment to T365699: Published datasets data release request for Wikidata REST API metrics.

Hey @Htriedman! Is very helpful, yes :) Considering the schema creation for the given table, would it then make sense to convert all of the bigint over to string? I'll also add an annotation of what <25 would mean to the schema :)

May 24 2024, 9:59 AM · Privacy Engineering

May 23 2024

AndrewTavis_WMDE changed the status of T362849: [Analytics] Items that contain a sitelink to one of the Wikimedia projects over time from Open to Stalled.
May 23 2024, 3:56 PM · Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE changed the status of T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs from Open to Stalled.
May 23 2024, 3:55 PM · Wikidata, Wikidata Analytics (Kanban)
AndrewTavis_WMDE updated the task description for T365699: Published datasets data release request for Wikidata REST API metrics.
May 23 2024, 3:54 PM · Privacy Engineering
AndrewTavis_WMDE updated the task description for T365700: Published datasets data release request for Wikidata sitelink segments metrics.
May 23 2024, 3:54 PM · Privacy Engineering
AndrewTavis_WMDE updated the task description for T365699: Published datasets data release request for Wikidata REST API metrics.
May 23 2024, 3:53 PM · Privacy Engineering
AndrewTavis_WMDE updated the task description for T360296: [Analytics] Implement data process to identify missing Wiktionary entries .
May 23 2024, 3:38 PM · Wikidata Integration in Wikimedia projects, Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE moved T360296: [Analytics] Implement data process to identify missing Wiktionary entries from Prioritized backlog to In progress on the Wikidata Analytics (Kanban) board.
May 23 2024, 3:25 PM · Wikidata Integration in Wikimedia projects, Wikidata Analytics (Kanban), Wikidata
AndrewTavis_WMDE added a comment to T364965: stat1007 to stat1011 migration pipeline output check.

Thanks for taking care of this, @Lucas_Werkmeister_WMDE! We'll be able to close both this and T351072 after Tuesday next week if/when the Puppet change is deployed :)

May 23 2024, 3:11 PM · Wikidata Dev Team, Wikidata, Wikidata Analytics (Kanban)