Page MenuHomePhabricator

Sqoop: remove cuc_comment and join to comment table
Open, MediumPublic


Per T166732, the cuc_comment field will be removed, but we're afraid we're going to miss it because we don't have a good way to keep in touch with that work. So this should remind us to monitor the schema before every sqoop, and when the change is applied, change our sqoop and deploy.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 7 2019, 4:34 PM
fdans triaged this task as Medium priority.Mar 7 2019, 5:41 PM
fdans moved this task from Incoming to Ops Week on the Analytics board.
Nuria assigned this task to JAllemandou.Aug 29 2019, 5:14 PM
Nuria added a project: Analytics-Kanban.

Changing sqoop to join a base table to the comment one in the sqoop-SQL query has been tested for mediawiki-history and has lead to non-acceptable performance (too long).
We now sqoop the comment and actor tables from analytics production replicas for acceptable performance time, and join base tables to the actor and comment ones in Spark for mediawiki-history.
I suggest the same approach should be used for cu_changes, meaning waiting for actor and comment tables to be present, and join to them.
Waiting for comments before starting to work in that direction.

JAllemandou removed JAllemandou as the assignee of this task.Sep 6 2019, 7:01 AM
JAllemandou moved this task from In Progress to Paused on the Analytics-Kanban board.
JAllemandou added a subscriber: JAllemandou.
Nuria added a subscriber: Nuria.Sep 10 2019, 7:50 PM

cu_changes is always scooped from production right?

Nuria added a comment.Sep 10 2019, 8:37 PM

The task to follow is that already ccs analytics, that refactor has not started yet

Nuria moved this task from Ops Week to Smart Tools for Better Data on the Analytics board.

I fixed a broken link, the task we need to follow is now T233004, and it looks like work is going forward. So we'll need a patch here. I'm happy to take this but will wait for grooming to jump back in the dance.

@Nuria I also see a bug in Phabricator. You mentioned with the full URL, which doesn't grab the status of the task. If you use the T-shorthand like T232531, then Phab shows it crossed out and would've helped find the miscommunication.