Page MenuHomePhabricator

Pre-generate mysql ORM code for sqoop
Closed, ResolvedPublic13 Estimated Story Points

Description

sqoop by default will query the database it's hooked up to in order to generate a bunch of Java files and transfer data. But it also takes a parameter to use a pre-generated jar. If we pass this it will make our sqoop jobs faster.

Event Timeline

Milimetric triaged this task as Medium priority.Aug 22 2016, 3:35 PM
Milimetric moved this task from Incoming to Backlog (Later) on the Analytics board.

@Milimetric : is this still relrevant for Q2 if we run scoop once a week?

Oh yeah, it's relevant even if we run sqoop once, because for every table in every database it repeats the column detection process (so like 5000 times every run). I should have probably not skipped it in the first place but I was afraid I'd find different schemas on different dbs (which is actually the case) so I didn't want to hold up the sqooping task any longer.

  • we need a jar with bindings for the MySQL schema. How we generate this?
  • we could run script with one parameter --generate-jar or do this automatically when we deploy refinery, jars would need to be somewhere where script can find them at runtime. Script needs to be run on 1002 to be able to generate bindings from MySQL
  • changing scoop job to have a parameter that passes jar along
  • passing jar to scoop job that will generate ORM code
Nuria set the point value for this task to 13.

Ping @Milimetric lower priority than our design but if you feel you ned to grab an item you could do this one

Change 349723 had a related patch set uploaded (by Milimetric):
[analytics/refinery@master] [WIP] Add just-generate-jar and jar-file options

https://gerrit.wikimedia.org/r/349723

Change 349723 merged by Ottomata:
[analytics/refinery@master] Add --generate-jar and --jar-file options

https://gerrit.wikimedia.org/r/349723

Change 351667 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/refinery@master] Add README.mediawiki-tables-sqoop-orm

https://gerrit.wikimedia.org/r/351667

Change 351857 had a related patch set uploaded (by Milimetric; owner: Milimetric):
[operations/puppet@production] Sqoop using the pre-generated orm jar

https://gerrit.wikimedia.org/r/351857

Change 351857 merged by Elukey:
[operations/puppet@production] Sqoop using the pre-generated orm jar

https://gerrit.wikimedia.org/r/351857

Change 351667 merged by Ottomata:
[analytics/refinery@master] Add README.mediawiki-tables-sqoop-orm

https://gerrit.wikimedia.org/r/351667