Page MenuHomePhabricator

Add change tag tables to monthly mediawiki_history sqoop
Closed, ResolvedPublic5 Estimated Story Points

Description

We now have change tag data starting from September 2018 in the Data Lake (T201062), but we still need older tag data in the same schema so it can be easily unioned (T205932). Product-Analytics will take care of getting it into the proper schema if the raw data can be loaded into the Data Lake.

We only need a one-time sqoop to do this, but we might as well add this to the normal mediawiki_history sqoop since we'll ultimately need that for T161149.

There is a refactor of the change_tag tables underway (T185355), but the new ct_tag_id columns and change_tag_def tables are actually already being written, so as long as we avoid select * from change_tag, we won't have to change the workflow again.

Essentially, what we want sqooped is:

SELECT
database(),
ct_rev_id,
ct_tag_id,
ctd_name,
FROM change_tag
LEFT JOIN change_tag_def
ON ct_tag_id = ctd_id

Event Timeline

Change 467320 had a related patch set uploaded (by Fdans; owner: Fdans):
[operations/puppet@production] Add change_tag to the list of tables to sqoop in cron

https://gerrit.wikimedia.org/r/467320

Change 467320 abandoned by Fdans:
Add change_tag to the list of tables to sqoop in cron

Reason:
Opening a new change that reflects the current status of the modules

https://gerrit.wikimedia.org/r/467320

Change 470593 had a related patch set uploaded (by Fdans; owner: Fdans):
[operations/puppet@production] Add change_tag to list of tables to sqoop

https://gerrit.wikimedia.org/r/470593

Change 470793 had a related patch set uploaded (by Fdans; owner: Fdans):
[analytics/refinery@master] Add change_tag to list of mediawiki tables to be dropped

https://gerrit.wikimedia.org/r/470793

Change 470793 merged by Elukey:
[analytics/refinery@master] Add change_tag to list of mediawiki tables to be dropped

https://gerrit.wikimedia.org/r/470793

Change 470593 merged by Elukey:
[operations/puppet@production] Add change_tag to list of tables to sqoop

https://gerrit.wikimedia.org/r/470593

Change 471689 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] analytics refinery-sqoop-mediawiki: add change_tag table

https://gerrit.wikimedia.org/r/471689

Change 471689 merged by Elukey:
[operations/puppet@production] analytics refinery-sqoop-mediawiki: add change_tag table

https://gerrit.wikimedia.org/r/471689

We tried to scoop the change_tag table in 2018-10 sqoop but it is not working , will consult with team and revert changes if needed

Change 488927 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Add change_tag and change_tag_def to sqoop script

https://gerrit.wikimedia.org/r/488927

Change 488927 merged by Milimetric:
[analytics/refinery@master] Add change_tag and change_tag_def to sqoop script

https://gerrit.wikimedia.org/r/488927

Change 490828 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Remove old change_tag definition from sqoop script

https://gerrit.wikimedia.org/r/490828

Change 490828 merged by Joal:
[analytics/refinery@master] Correct sqoop script for change_tag

https://gerrit.wikimedia.org/r/490828

Change 491246 had a related patch set uploaded (by Joal; owner: Joal):
[operations/puppet@production] Update sqoop launchers used by timers

https://gerrit.wikimedia.org/r/491246

Change 491246 merged by Elukey:
[operations/puppet@production] Update sqoop launchers used by timers

https://gerrit.wikimedia.org/r/491246

Change 491838 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Add change_tag and change_tag_def to hive

https://gerrit.wikimedia.org/r/491838

Change 491838 merged by Joal:
[analytics/refinery@master] Add change_tag and change_tag_def to hive

https://gerrit.wikimedia.org/r/491838

Nuria set the point value for this task to 5.