Page MenuHomePhabricator

clean up page content generation code and file listing methods as prep work for splitting page content generation across multiple servers
Closed, ResolvedPublic

Description

By "clean up" here I mean:

  • refactor everything so it's unit testable and add those tests
  • extricate all the input/output file list methods from the Dump class, moving them into separate classes
  • standardize the args to the file listing methods

This includes methods for selecting prefetch files from previous runs and stub files for the current run.

Event Timeline

Change 575581 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/dumps@master] cleanup of page content dumps run()

https://gerrit.wikimedia.org/r/575581

Change 575584 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/dumps@master] fix up file list methods

https://gerrit.wikimedia.org/r/575584

Change 575585 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/dumps@master] convert all file list methods to use common args

https://gerrit.wikimedia.org/r/575585

Change 575586 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/dumps@master] move StubProvider out to its own module

https://gerrit.wikimedia.org/r/575586

Change 575587 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/dumps@master] move some dfname/pagerange munging methods to their own class

https://gerrit.wikimedia.org/r/575587

Change 575588 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/dumps@master] move some output file listing methods to their own module

https://gerrit.wikimedia.org/r/575588

Change 575589 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/dumps@master] use only jobFileLister instance methods in other modules

https://gerrit.wikimedia.org/r/575589

Change 575591 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/dumps@master] add some unit tests for prefetch arg generation

https://gerrit.wikimedia.org/r/575591

Because I have already thoroughly tested these patches, and the window for deploys is basically today and tomorrow, I'll merge and deploy tomorrow so that they are in time for the March 1 run. This will let me stack up another set of commits for the March 20th run rather than bundling these with those all together then.

Change 575581 merged by ArielGlenn:
[operations/dumps@master] cleanup of page content dumps run()

https://gerrit.wikimedia.org/r/575581

Change 575584 merged by ArielGlenn:
[operations/dumps@master] fix up file list methods

https://gerrit.wikimedia.org/r/575584

Change 575585 merged by ArielGlenn:
[operations/dumps@master] convert all file list methods to use common args

https://gerrit.wikimedia.org/r/575585

Change 575586 merged by ArielGlenn:
[operations/dumps@master] move StubProvider out to its own module

https://gerrit.wikimedia.org/r/575586

Change 575587 merged by ArielGlenn:
[operations/dumps@master] move some dfname/pagerange munging methods to their own class

https://gerrit.wikimedia.org/r/575587

Change 575588 merged by ArielGlenn:
[operations/dumps@master] move some output file listing methods to their own module

https://gerrit.wikimedia.org/r/575588

Change 575589 merged by ArielGlenn:
[operations/dumps@master] use only jobFileLister instance methods in other modules

https://gerrit.wikimedia.org/r/575589

Change 575591 merged by ArielGlenn:
[operations/dumps@master] add some unit tests for prefetch arg generation

https://gerrit.wikimedia.org/r/575591

Mentioned in SAL (#wikimedia-operations) [2020-03-01T06:02:48Z] <ariel@deploy1001> Started deploy [dumps/dumps@8376c62]: refactor page content jobs, prefetch, and output file listings: see T246465

Mentioned in SAL (#wikimedia-operations) [2020-03-01T06:02:53Z] <ariel@deploy1001> Finished deploy [dumps/dumps@8376c62]: refactor page content jobs, prefetch, and output file listings: see T246465 (duration: 00m 04s)

Change 577225 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/dumps@master] rename 'parts' attribute of Dump subclasses to something more accurate

https://gerrit.wikimedia.org/r/577225

Change 577226 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/dumps@master] make value of 'parts' in the file listing methods be None or a list

https://gerrit.wikimedia.org/r/577226

Change 577228 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/dumps@master] New class for output file listing methods to move them out of jobs code

https://gerrit.wikimedia.org/r/577228

Change 578477 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/dumps@master] clean up file list method docs, tighten up code

https://gerrit.wikimedia.org/r/578477

These last patches are ready to go once the wikidata dumps run finishes up. After that this ticket can be closed.

Change 577225 merged by ArielGlenn:
[operations/dumps@master] rename 'parts' attribute of Dump subclasses to something more accurate

https://gerrit.wikimedia.org/r/577225

Change 577226 merged by ArielGlenn:
[operations/dumps@master] make value of 'parts' in the file listing methods be None or a list

https://gerrit.wikimedia.org/r/577226

Change 577228 merged by ArielGlenn:
[operations/dumps@master] New class for output file listing methods to move them out of jobs code

https://gerrit.wikimedia.org/r/577228

Change 578477 merged by ArielGlenn:
[operations/dumps@master] clean up file list method docs, tighten up code

https://gerrit.wikimedia.org/r/578477

The new run has started with these patches deployed. If there are any issues during the run they can be discussed in a new ticket.