Page MenuHomePhabricator

[L] Push unillustrated articles with their suggestions, suggestion reasons and confidence scores to Cassandra
Closed, ResolvedPublic

Description

NOTE: blocked by T299408 and T299789

User story

As a product manager, I want a single Source Of Truth for image suggestions
As a developer, I need a place to store whether an image suggestion is accepted or rejected, so I can use that data in future
So we need to store image suggestions with confidence scores and acceptance/rejection data


Cassandra will the the Source Of Truth for images suggestions, and so once we have all the data we want to store it there

We want to store all the data in the suggestions, title_cache and instanceof_cache tables in T293808

To do this we'll need at least a preliminary dataset from T299789, and will probably use T297934

Event Timeline

Cparle updated the task description. (Show Details)
Cparle updated the task description. (Show Details)
Cparle removed Cparle as the assignee of this task.

Note that we currently have a notebook for gathering the data but we don't have agreement about how to get the data into cassandra yet

CBogen renamed this task from Push unillustrated articles with their suggestions, suggestion reasons and confidence scores to Cassandra to [L] Push unillustrated articles with their suggestions, suggestion reasons and confidence scores to Cassandra.Mar 23 2022, 4:22 PM

Moving back to ready: preliminary data available now, thanks @Cparle !

mfossati changed the task status from Open to In Progress.Apr 11 2022, 9:10 AM
  • Manual executions from stat1008's command line populated the target Cassandra tables fine
  • two manual executions of relevant tasks in the Airflow DAG were successful

Merge request at https://gitlab.wikimedia.org/repos/generated-data-platform/datapipelines/-/merge_requests/47