Page MenuHomePhabricator

data pipeline: Don't attempt to re-import current day if full import already existed
Closed, DuplicatePublic

Description

The parent task (T344941 and https://gitlab.wikimedia.org/repos/mediawiki/services/ipoid/-/merge_requests/172) were intended to ensure that we stop full imports if there was an error in the previous attempt. That seems to work (T355246) but now the cron job for daily updates does the following:

  1. First scheduled job of the day - do the full import from yesterday to today (good!)
  2. Second scheduled job of the day - there are no errors, but the script erroneously attempts to do a full import using the current date as the parameters for $yesterday and $today.

Thankfully, the diffing of the same file results in this being a no-op, but we want to modify the behavior for an early exit if there are no errors in the previous import, and today's import has already concluded.

Logs from second run of the day via kubectl logs -f ipoid-production-daily-updates-28427820-x6zsr:

{"log.level":"info","@timestamp":"2024-01-19T13:00:14.782Z","process.pid":12,"host.hostname":"ipoid-production-daily-updates-28427820-x6zsr","ecs.version":"8.10.0","message":"Update init already run, skipping..."}
{"log.level":"info","@timestamp":"2024-01-19T13:00:14.784Z","process.pid":12,"host.hostname":"ipoid-production-daily-updates-28427820-x6zsr","ecs.version":"8.10.0","message":"Update remove-tunnel-anonymous-property already run, skipping..."}
{"log.level":"info","@timestamp":"2024-01-19T13:00:14.785Z","process.pid":12,"host.hostname":"ipoid-production-daily-updates-28427820-x6zsr","ecs.version":"8.10.0","message":"Database updated!"}
{"log.level":"debug","@timestamp":"2024-01-19T13:00:15.329Z","process.pid":1,"host.hostname":"ipoid-production-daily-updates-28427820-x6zsr","ecs.version":"8.10.0","message":"Starting normal import."}
{"log.level":"info","@timestamp":"2024-01-19T13:00:16.309Z","process.pid":62,"host.hostname":"ipoid-production-daily-updates-28427820-x6zsr","ecs.version":"8.10.0","message":"Feed for 20240119 exists, pulling now..."}
{"log.level":"info","@timestamp":"2024-01-19T13:00:43.641Z","process.pid":62,"host.hostname":"ipoid-production-daily-updates-28427820-x6zsr","ecs.version":"8.10.0","message":"Feed downloaded to /tmp/ipoid/20240119.json.gz"}
{"log.level":"info","@timestamp":"2024-01-19T13:00:43.744Z","process.pid":73,"host.hostname":"ipoid-production-daily-updates-28427820-x6zsr","ecs.version":"8.10.0","message":"Feed for 20240119 already exists, skipping attempt to download."}
{"log.level":"info","@timestamp":"2024-01-19T13:00:43.755Z","process.pid":84,"host.hostname":"ipoid-production-daily-updates-28427820-x6zsr","ecs.version":"8.10.0","message":"Removing potential stale files..."}
{"log.level":"info","@timestamp":"2024-01-19T13:00:43.795Z","process.pid":84,"host.hostname":"ipoid-production-daily-updates-28427820-x6zsr","ecs.version":"8.10.0","message":"Unzipping yesterday's file..."}
{"log.level":"info","@timestamp":"2024-01-19T13:01:40.605Z","process.pid":84,"host.hostname":"ipoid-production-daily-updates-28427820-x6zsr","ecs.version":"8.10.0","message":"Sorting yesterday's file..."}
...
{"log.level":"info","@timestamp":"2024-01-19T13:23:46.924Z","process.pid":84,"host.hostname":"ipoid-production-daily-updates-28427820-x6zsr","ecs.version":"8.10.0","message":"Joining and sorting unique files..."}
sort: cannot read: '/tmp/ipoid/chunk*': No such file or directory