Page MenuHomePhabricator

Make sure rsynced dump status/html files don't contain links to files not yet copied over
Closed, ResolvedPublic

Description

We used to write dumps directly to the web server. Now we rsync them over periodically, along with various status files and an index.html file for each wiki and each date with links to all the files that have been completed. It's now possible for those to be out of sync. The index.html file could get written with links to files that just got completed, but were not ready when the current rsync started. Then it could be picked up itself for rsync during the sweep.

We'll handle this by tarring up all such files, rsyncing that over as a tarball, and having a little script on the remote end that checks for updates and unpacks as necessary. There are other things we might want that script to do as well, which will be discussed in future tickets.

TODO:

  • compression for siteinfo-namespaces and abstracts output
  • generate tarball of status files (all plaintext, html and json output) for rsync
  • job on the remote host that checks the timestamp on the tarball once every 5 minutes and unpacks it f it's new. Or something like that.
  • stop rsyncing the status files along with the dump output files (add --exclude=*.json etc)

Event Timeline

ArielGlenn triaged this task as Medium priority.Nov 6 2017, 6:57 PM
ArielGlenn created this task.

Change 389667 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] switch dumps monitor to read status from and write results to dumpsdata host

https://gerrit.wikimedia.org/r/389667

Change 389667 merged by ArielGlenn:
[operations/puppet@production] switch dumps monitor to read status from and write results to dumpsdata host

https://gerrit.wikimedia.org/r/389667

Change 392875 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] rsync all dumps status files to web servers and unpack them periodically

https://gerrit.wikimedia.org/r/392875

Change 392875 merged by ArielGlenn:
[operations/puppet@production] rsync all dumps status files to web servers and unpack them periodically

https://gerrit.wikimedia.org/r/392875

Done, deployed, runs, closing this ticket.