Scope
Understand the complexity of bringing back the data processes for:
- Cognate “I Miss You”
- The main CSV in https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/Wiktionary/ should be regenerated (unclear which CSV is the right one, but we know that it was linked from the legacy dashboard as a download)
- Generally, People wanted I Miss You, Compare, Most Popular. We can investigate all three, but are committed to implement only "I Miss You" for now.
- Usage Dashboard
- The CSV in https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/Wikidata/WD_percentUsage/ should be regenerated
Acceptance criteria
- We understand what the complexity for #1 is.
- An aggregation process based on the cognate_wiktionary table
- We understand what the complexity for #2 is.
- An aggregation process based on the wbc_entity_usage table
Information below this point is filled out by the Wikidata Analytics team.
Assignee Planning
Information is filled out by the assignee of this task.
Estimation
Estimate: 1.5 days
Actual: 1.5 days
Sub Tasks
Full breakdown of the steps to complete this task:
- Check process to derive data for the Wiktionary Cognate tables
- Find another source for this data as the R process isn't clear
- Check process to derive data for the Usage Dashboard tables
Data to be used
See Analytics/Data_Lake for the breakdown of the data lake databases and tables.
The following tables will be referenced in this task:
- NA: all work is via investigating prior R based code bases
Notes and Questions
Things that came up during the completion of this task, questions to be answered and follow up tasks:
- Note