Page MenuHomePhabricator

Adapt Sqoop for categorylinks schema change
Closed, ResolvedPublic

Description

This will need to be updated to be compatible with the categorylinks schema changes.

Done is:

  • cl_target_id and cl_collation_id added to Sqoop's categorylinks table definition in python/refinery/sqoop.py
  • Columns cl_to and cl_collation dropped from Sqoop's categorylinks table definition
  • Table collation added to Sqoop
  • Table collation added to Sqoop's Puppet script invocation definition
  • Table categorylinks CREATE HQL script updated with two new columns added
  • Table categorylinks CREATE HQL script updated with two columns dropped
  • Table wmf_raw.wikimedia_categorylinks updated in Hive with two new columns (ALTER TABLE ADD COLUMN...)
  • Table wmf_raw.wikimedia_collation created in Hive
  • Table wmf_raw.wikimedia_collation added to mediawiki history load DAG
  • Table wmf_raw.wikimedia_collation added to bin/refinery-drop-mediawiki-snapshots script

Event Timeline

@xcollazo , Dan and I are grooming, and this needs to happen soon. We'd like to spread a bit of knowledge around, would you be willing to make this happen? @Milimetric can help.

@xcollazo , Dan and I are grooming, and this needs to happen soon. We'd like to spread a bit of knowledge around, would you be willing to make this happen? @Milimetric can help.

No problem.

Change #1174563 had a related patch set uploaded (by Aleksandar Mastilovic; author: Aleksandar Mastilovic):

[analytics/refinery@master] Update Sqoop's schema for the categorylinks table

https://gerrit.wikimedia.org/r/1174563

Change #1175924 had a related patch set uploaded (by Aleksandar Mastilovic; author: Aleksandar Mastilovic):

[operations/puppet@production] Add collation to the list of sqooped table

https://gerrit.wikimedia.org/r/1175924

Change #1174563 merged by Aleksandar Mastilovic:

[analytics/refinery@master] Update Sqoop's schema for the categorylinks table

https://gerrit.wikimedia.org/r/1174563

Change #1175924 merged by Btullis:

[operations/puppet@production] Add collation to the list of sqooped table

https://gerrit.wikimedia.org/r/1175924

I approved the MR above.
And actually while we're at it, we should review if the list here https://github.com/wikimedia/analytics-refinery/blob/master/bin/refinery-drop-mediawiki-snapshots#L85 matches the list of tables we sqoop!

I approved the MR above.
And actually while we're at it, we should review if the list here https://github.com/wikimedia/analytics-refinery/blob/master/bin/refinery-drop-mediawiki-snapshots#L85 matches the list of tables we sqoop!

Thank you! Re: the "drop mediawiki snapshots" thing, the new collation table is not on the list, but I'm not quite sure if it should be.

Re: the "drop mediawiki snapshots" thing, the new collation table is not on the list, but I'm not quite sure if it should be.

It should be indeed, as any other table we get through sqoop. We don't want to keep those snapshots forever :)

Change #1177507 had a related patch set uploaded (by Aleksandar Mastilovic; author: Aleksandar Mastilovic):

[analytics/refinery@master] Add wmf_raw.mediawiki_collation table to snapshot maintenance

https://gerrit.wikimedia.org/r/1177507

Change #1177507 merged by Aleksandar Mastilovic:

[analytics/refinery@master] Add wmf_raw.mediawiki_collation table to snapshot maintenance

https://gerrit.wikimedia.org/r/1177507

amastilovic updated the task description. (Show Details)
amastilovic updated the task description. (Show Details)

Change #1182877 had a related patch set uploaded (by Aleksandar Mastilovic; author: Aleksandar Mastilovic):

[analytics/refinery@master] Set cl_to and cl_collation columns in categorylinks to NULL

https://gerrit.wikimedia.org/r/1182877

Change #1182877 merged by Aleksandar Mastilovic:

[analytics/refinery@master] Set cl_to and cl_collation columns in categorylinks to NULL

https://gerrit.wikimedia.org/r/1182877

amastilovic updated the task description. (Show Details)

Change #1185169 had a related patch set uploaded (by Aleksandar Mastilovic; author: Aleksandar Mastilovic):

[analytics/refinery@master] Fix wmf_raw.categorylinks column types

https://gerrit.wikimedia.org/r/1185169

Change #1185169 merged by Aleksandar Mastilovic:

[analytics/refinery@master] Fix wmf_raw.categorylinks column types

https://gerrit.wikimedia.org/r/1185169