Page MenuHomePhabricator

cron job rsyncing dumps webserver logs to stat1005 is broken
Closed, ResolvedPublic3 Estimated Story Points

Description

Seeing cronspam from labstore 1006,7:

Cron <root@labstore1007> /usr/bin/rsync -rt --perms --chmod=go+r --bwlimit=50000 /var/log/nginx/*.gz stat1005.eqiad.wmnet::srv/log/webrequest/archive/dumps.wikimedia.org/

...
rsync: mkstemp "/log/webrequest/archive/dumps.wikimedia.org/.access.log-20181107.gz.o34AbA" (in srv) failed: Permission denied (13)
rsync: mkstemp "/log/webrequest/archive/dumps.wikimedia.org/.access.log-20181108.gz.HL6EXm" (in srv) failed: Permission denied (13)

etc.

Event Timeline

ArielGlenn triaged this task as Medium priority.Dec 6 2018, 12:47 PM
ArielGlenn created this task.

The commit that broke the rsync is this one: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/471035/
(BTW that commit has a typo in compute.pp that is still around in the manifests, "publshed")

Because having various rsyncs from/to the webserver from a bunch of other hosts was unmanageable, all rsyncs from/to labstore1006,7 are managed on the labstore server side; we learned this during the migration from the old dataset host.

The reason for the initial rsyncs is this ticket here: https://phabricator.wikimedia.org/T119070
I have no idea if broader reports are generated for downloads of xml/sql dumps, though it seems reasonable.

Change 478022 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::statistics::private: allow labsdb to push nginx logs

https://gerrit.wikimedia.org/r/478022

Change 478022 merged by Elukey:
[operations/puppet@production] profile::statistics::private: allow labstore to push nginx logs

https://gerrit.wikimedia.org/r/478022

fdans raised the priority of this task from Medium to High.Dec 10 2018, 5:06 PM
fdans moved this task from Incoming to Operational Excellence on the Analytics board.

I'm not getting cronspam about this; is it still a problem? Also, I thought stat1005 was basically gone now.

@ArielGlenn after https://gerrit.wikimedia.org/r/478022 we should be ok, I think that it is fine to leave this rule as it is, thoughts @Ottomata? stat1005 has now a separate role (dedicated to GPU testing) so the rsyncs are targeting stat1007.

elukey set the point value for this task to 3.