Page MenuHomePhabricator

Add documentation and tests for `--updatedb` and get-latest-run-date.js
Closed, DeclinedPublic

Description

--updatedb tells the script to use the db as the source of its yesterday parameter and uses get-latest-run-date.js to do so. This isn't documented or tested and should be.

Acceptance Criteria:

  • README.md documents the --updatedb parameter and get-latest-run-date.js script
  • get-latest-run-date.js is a module
  • get-latest-run-date.js is covered by tests

Event Timeline

Thinking through what happens next in the scenario where the import for a day (say 19990102) is incomplete...

When we attempt to update 19990102 to 19990103, it will instead update 19990101 to 19990102.

If we again attempt to update 19990102 to 19990103, getImportStatus will return { status: NOT_STARTED, yesterday: 19990102, today: 19990103 } and will run the update - great.

But if the next thing we do is try to update 19990103 to 19990104 (maybe because there was no time for another attempt on 3rd Jan so the next run that happens is 4th Jan), getImportStatus will return { status: NOT_STARTED, yesterday: 19990103, today: 19990104 }, and we'll go ahead and make that update, wrongly assuming that the DB matches 19990103 (leading to data drift).

This is prevented by using the --updatedb option, which gets the date from the database. (We are using that option in production!)

Given all this I don't think --updatedb should be an option. Since the error handling behaviour isn't configurable and relies on getting the date from the database, that shouldn't be configurable either.