Page MenuHomePhabricator

Backing up wikimetrics data fails if data is written while we back it up
Closed, ResolvedPublic

Description

The run of the hourly script for 2014-07-28 05:00 failed with

tar: /var/lib/wikimetrics/public/69987: file changed as we read it
Error: Either failed to get lock on /data/project/wikimetrics/backup/wikimetrics1/hourly, or tar-ing failed.

I checked locks, and they were properly cleaned up. So it seems the issue
was only that the file was written while we tried to tar it up.

Since we expect more writing over time, should we guard against this
from happening again?


Version: unspecified
Severity: enhancement
Whiteboard: u=AnalyticsEng c=Wikimetrics p=5 s=2014-08-07

Details

Reference
bz68731

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 3:27 AM
bzimport set Reference to bz68731.

It happened again for the 2014-07-28 14:00 run:

tar: /var/lib/wikimetrics/public/69989: file changed as we read it
tar: /var/lib/wikimetrics/public/69987: file changed as we read it
Error: Either failed to get lock on /data/project/wikimetrics/backup/wikimetrics1/hourly, or tar-ing failed.

While the bug is of course valid as is, I'll stop reporting further
instances for now, as it seems the wikimetrics1 is having
more severe issues (bug 68743).

Change 153388 had a related patch set uploaded by QChris:
Reschedule backups to not interfer with queue runs so easily

https://gerrit.wikimedia.org/r/153388

Change 153395 had a related patch set uploaded by QChris:
Force redis dump before backing up

https://gerrit.wikimedia.org/r/153395

Change 153568 had a related patch set uploaded by QChris:
Make hourly backup keep around known-good full backups in case of issues

https://gerrit.wikimedia.org/r/153568

Change 153388 merged by Ottomata:
Reschedule backups to not interfer with queue runs so easily

https://gerrit.wikimedia.org/r/153388

Change 153568 merged by Ottomata:
Make hourly backup keep around known-good full backups in case of issues

https://gerrit.wikimedia.org/r/153568

Change 153395 merged by Ottomata:
Force redis dump before backing up

https://gerrit.wikimedia.org/r/153395

Tested throughly on dev but this of course needs baking time in prod. Wish we had a status "READY_TO_DEPLOY" that should be how bugs are left at the end of sprint.