Connection timeout from tin to new ores servers
Closed, ResolvedPublic

Description

I'm not sure if this is a scap or network thing:

scap deploy-log -f scap/log/scap-sync-2017-11-29-0003-1-g532bd0b.log

Related Objects

There are a very large number of changes, so older changes are hidden. Show Older Changes

So, scap closed the connection, 1m, 6s after the login. scap complained about deploy-local failing 50 secs after successful SSH login, but way before the SSH connection closing.

Interesting...

How do I get more verbosity, i.e. for ssh?

Hm, judging by https://phabricator.wikimedia.org/source/scap/browse/master/scap/ssh.py;0d2afa233f3ee489ddbfb9e7dbfaf467452e50db$41, it should probably require some changes in scap's code so the scap -v would propagate to SSH as well.

That sounds like a thoroughly sensible change. I'll create a patch now.

I'd like to push the latest scap code to production this week if I can get an opsen to upload the package. I'll create a new release tag today. That will at least help to debug this issue since -v will be passed on to ssh as suggested by @akosiaris above.

I can handle that. I 'll try and build it tomorrow and upload it if successful

This is affecting me in production, now:

Timeout, server scb2004.codfw.wmnet not responding.

16:01:39 connection to scb2004.codfw.wmnet failed and future stages will not be attempted for this target

@awight: can you try it now with the -v flag, scap should include the ssh logs in verbose output now.

Mentioned in SAL (#wikimedia-operations) [2017-12-11T19:17:56Z] <awight@tin> Started deploy [ores/deploy@1c0ede0]: (non-production) Testing parallel ORES deployment, T181661

Mentioned in SAL (#wikimedia-operations) [2017-12-11T19:19:07Z] <awight@tin> Finished deploy [ores/deploy@1c0ede0]: (non-production) Testing parallel ORES deployment, T181661 (duration: 01m 12s)

Done. Logs are in scap-sync-2017-12-09-0004.log (sic., note that the dates are still misleading)

Thanks for this feature, it looks helpful!

mmodell added a comment.EditedDec 11 2017, 7:28 PM

Something really strange is going on...

debug1: Offering RSA public key: /etc/keyholder.d/deploy_service.pub
debug1: Server accepts key: pkalg ssh-rsa blen 279
debug1: Authentication succeeded (publickey).
Authenticated to ores1004.eqiad.wmnet ([10.64.16.95]:22).
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: Sending command: /usr/bin/scap deploy-local -v --repo ores/deploy -g cluster fetch --refresh-config
http://tin.eqiad.wmnet/ores/deploy/.git
Timeout, server ores1004.eqiad.wmnet not responding.

19:18:57 [tin] connection to ores1004.eqiad.wmnet failed and future stages will not be attempted for this target

So it would appear that this deploy-local command freezes: /usr/bin/scap deploy-local -v --repo ores/deploy -g cluster fetch --refresh-config http://tin.eqiad.wmnet/ores/deploy/.git

@thcipriani, @akosiaris does that seem right to you or am I missing something?

I'd like to see the corresponding syslog on the target machine, there might be clues in /var/log/auth.log /var/log/daemon.log

Pardon me, but I have to ask why a file with timestamps in the log file dating Dec 11th, and with a local fs system timestamp (on tin) of December 11th as well is named 2017-12-09 giving the impression it was created on Dec 9th. I guess a bug ?

In any case. daemon.log is not really expected to hold anything of significant value for Dec 11th 19:00-19:30 but here goes

Dec 11 19:03:15 ores1004 puppet-agent[7736]: Retrieving pluginfacts
Dec 11 19:03:15 ores1004 puppet-agent[7736]: Retrieving plugin
Dec 11 19:03:15 ores1004 puppet-agent[7736]: Loading facts
Dec 11 19:03:27 ores1004 puppet-agent[7736]: Caching catalog for ores1004.eqiad.wmnet
Dec 11 19:03:28 ores1004 puppet-agent[7736]: Applying configuration version '1513018998'
Dec 11 19:03:30 ores1004 systemd[1]: Reloading.
Dec 11 19:03:30 ores1004 systemd[1]: Started ACPI event daemon.
Dec 11 19:03:30 ores1004 systemd[1]: Listening on ACPID Listen Socket.
Dec 11 19:03:30 ores1004 systemd[1]: Mounted /.
Dec 11 19:03:36 ores1004 puppet-agent[7736]: Finished catalog run in 8.37 seconds
Dec 11 19:22:44 ores1004 dbus[921]: [system] Activating via systemd: service name='org.freedesktop.timedate1' unit='dbus-org.freedesktop.timedate1.service'
Dec 11 19:22:44 ores1004 systemd[1]: Starting Time & Date Service...
Dec 11 19:22:44 ores1004 systemd[1]: Failed to reset devices.list on /system.slice: Invalid argument
Dec 11 19:22:44 ores1004 dbus[921]: [system] Successfully activated service 'org.freedesktop.timedate1'
Dec 11 19:22:44 ores1004 systemd-timedated[13695]: /etc/localtime should be a symbolic link to a time zone data file in /usr/share/zoneinfo/.
Dec 11 19:22:44 ores1004 systemd[1]: Started Time & Date Service.
Dec 11 19:33:20 ores1004 puppet-agent[16060]: Retrieving pluginfacts
Dec 11 19:33:20 ores1004 puppet-agent[16060]: Retrieving plugin
Dec 11 19:33:20 ores1004 puppet-agent[16060]: Loading facts
Dec 11 19:33:28 ores1004 puppet-agent[16060]: Caching catalog for ores1004.eqiad.wmnet
Dec 11 19:33:30 ores1004 puppet-agent[16060]: Applying configuration version '1513020803'
Dec 11 19:33:31 ores1004 systemd[1]: Reloading.
Dec 11 19:33:31 ores1004 systemd[1]: Started ACPI event daemon.
Dec 11 19:33:31 ores1004 systemd[1]: Listening on ACPID Listen Socket.
Dec 11 19:33:31 ores1004 systemd[1]: Mounted /.
Dec 11 19:33:37 ores1004 puppet-agent[16060]: Finished catalog run in 7.96 seconds
Dec 11 19:52:44 ores1004 dbus[921]: [system] Activating via systemd: service name='org.freedesktop.timedate1' unit='dbus-org.freedesktop.timedate1.service'
Dec 11 19:52:44 ores1004 systemd[1]: Starting Time & Date Service...
Dec 11 19:52:44 ores1004 systemd[1]: Failed to reset devices.list on /system.slice: Invalid argument
Dec 11 19:52:44 ores1004 dbus[921]: [system] Successfully activated service 'org.freedesktop.timedate1'
Dec 11 19:52:44 ores1004 systemd-timedated[20231]: /etc/localtime should be a symbolic link to a time zone data file in /usr/share/zoneinfo/.
Dec 11 19:52:44 ores1004 systemd[1]: Started Time & Date Service.
Dec 11 20:04:05 ores1004 puppet-agent[22822]: Retrieving pluginfacts

auth.log is just a tad more useful, just a repeat of what we 've seen before

Dec 11 19:17:50 ores1004 sshd[11976]: Set /proc/self/oom_score_adj to 0
Dec 11 19:17:50 ores1004 sshd[11976]: Connection from 208.80.155.119 port 46296 on 10.64.16.95 port 22
Dec 11 19:17:50 ores1004 sshd[11976]: Connection closed by 208.80.155.119 [preauth]
Dec 11 19:17:55 ores1004 sshd[11978]: Set /proc/self/oom_score_adj to 0
Dec 11 19:17:55 ores1004 sshd[11978]: Connection from 208.80.153.74 port 45262 on 10.64.16.95 port 22
Dec 11 19:17:55 ores1004 sshd[11978]: Connection closed by 208.80.153.74 [preauth]
Dec 11 19:17:56 ores1004 sshd[11980]: Set /proc/self/oom_score_adj to 0
Dec 11 19:17:56 ores1004 sshd[11980]: Connection from 10.64.0.196 port 57196 on 10.64.16.95 port 22
Dec 11 19:17:56 ores1004 sshd[11980]: Postponed publickey for deploy-service from 10.64.0.196 port 57196 ssh2 [preauth]
Dec 11 19:17:56 ores1004 sshd[11980]: Accepted publickey for deploy-service from 10.64.0.196 port 57196 ssh2: RSA 6d:54:92:8b:39:10:f5:9b:84:40:36:ef:3c:9a:6d:d8
Dec 11 19:17:56 ores1004 sshd[11980]: pam_unix(sshd:session): session opened for user deploy-service by (uid=0)
Dec 11 19:17:56 ores1004 sshd[11980]: User child is on pid 11995
Dec 11 19:17:56 ores1004 sshd[11995]: Starting session: command for deploy-service from 10.64.0.196 port 57196
Dec 11 19:18:01 ores1004 sudo:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib/nagios/plugins/check_ferm
Dec 11 19:18:01 ores1004 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Dec 11 19:18:01 ores1004 sudo: pam_unix(sudo:session): session closed for user root
Dec 11 19:18:01 ores1004 CRON[12265]: pam_unix(cron:session): session opened for user prometheus by (uid=0)
Dec 11 19:18:01 ores1004 CRON[12265]: pam_unix(cron:session): session closed for user prometheus
Dec 11 19:18:04 ores1004 sudo:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/lib/nagios/plugins/check_raid md
Dec 11 19:18:04 ores1004 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Dec 11 19:18:04 ores1004 sudo: pam_unix(sudo:session): session closed for user root
Dec 11 19:18:06 ores1004 sudo:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib/nagios/plugins/check_ferm
Dec 11 19:18:06 ores1004 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Dec 11 19:18:06 ores1004 sudo: pam_unix(sudo:session): session closed for user root
Dec 11 19:18:17 ores1004 sudo:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/lib/nagios/plugins/check_raid md
Dec 11 19:18:17 ores1004 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Dec 11 19:18:17 ores1004 sudo: pam_unix(sudo:session): session closed for user root
Dec 11 19:18:50 ores1004 sshd[12515]: Set /proc/self/oom_score_adj to 0
Dec 11 19:18:50 ores1004 sshd[12515]: Connection from 208.80.155.119 port 58488 on 10.64.16.95 port 22
Dec 11 19:18:50 ores1004 sshd[12515]: Connection closed by 208.80.155.119 [preauth]
Dec 11 19:18:55 ores1004 sshd[12517]: Set /proc/self/oom_score_adj to 0
Dec 11 19:18:55 ores1004 sshd[12517]: Connection from 208.80.153.74 port 57728 on 10.64.16.95 port 22
Dec 11 19:18:55 ores1004 sshd[12517]: Connection closed by 208.80.153.74 [preauth]
Dec 11 19:19:01 ores1004 sudo:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib/nagios/plugins/check_ferm
Dec 11 19:19:01 ores1004 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Dec 11 19:19:01 ores1004 sudo: pam_unix(sudo:session): session closed for user root
Dec 11 19:19:01 ores1004 CRON[12550]: pam_unix(cron:session): session opened for user prometheus by (uid=0)
Dec 11 19:19:02 ores1004 CRON[12550]: pam_unix(cron:session): session closed for user prometheus
Dec 11 19:19:04 ores1004 sudo:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/lib/nagios/plugins/check_raid md
Dec 11 19:19:04 ores1004 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Dec 11 19:19:04 ores1004 sudo: pam_unix(sudo:session): session closed for user root
Dec 11 19:19:06 ores1004 sudo:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib/nagios/plugins/check_ferm
Dec 11 19:19:06 ores1004 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Dec 11 19:19:06 ores1004 sudo: pam_unix(sudo:session): session closed for user root
Dec 11 19:19:13 ores1004 sshd[11995]: Connection closed by 10.64.0.196
Dec 11 19:19:13 ores1004 sshd[11995]: Transferred: sent 76296, received 3484 bytes
Dec 11 19:19:13 ores1004 sshd[11995]: Closing connection to 10.64.0.196 port 57196
Dec 11 19:19:13 ores1004 sshd[11980]: pam_unix(sshd:session): session closed for user deploy-service

Since syslog was mentioned, here it is as well

Dec 11 19:00:01 ores1004 CRON[7031]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:01:01 ores1004 CRON[7244]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:02:01 ores1004 CRON[7464]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:03:01 ores1004 CRON[7690]: (root) CMD (/usr/local/sbin/puppet-run > /dev/null 2>&1)
Dec 11 19:03:01 ores1004 CRON[7691]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:03:15 ores1004 puppet-agent[7736]: Retrieving pluginfacts
Dec 11 19:03:15 ores1004 puppet-agent[7736]: Retrieving plugin
Dec 11 19:03:15 ores1004 puppet-agent[7736]: Loading facts
Dec 11 19:03:27 ores1004 puppet-agent[7736]: Caching catalog for ores1004.eqiad.wmnet
Dec 11 19:03:28 ores1004 puppet-agent[7736]: Applying configuration version '1513018998'
Dec 11 19:03:30 ores1004 systemd[1]: Reloading.
Dec 11 19:03:30 ores1004 systemd[1]: Started ACPI event daemon.
Dec 11 19:03:30 ores1004 systemd[1]: Listening on ACPID Listen Socket.
Dec 11 19:03:30 ores1004 systemd[1]: Mounted /.
Dec 11 19:03:32 ores1004 crontab[8226]: (root) LIST (root)
Dec 11 19:03:32 ores1004 crontab[8228]: (root) LIST (prometheus)
Dec 11 19:03:36 ores1004 puppet-agent[7736]: Finished catalog run in 8.37 seconds
Dec 11 19:04:01 ores1004 CRON[8868]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:05:01 ores1004 CRON[9089]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 11 19:05:01 ores1004 CRON[9090]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:06:01 ores1004 CRON[9302]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:07:01 ores1004 CRON[9530]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:08:01 ores1004 CRON[9762]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:09:01 ores1004 CRON[9975]: (root) CMD (  [ -x /usr/lib/php5/sessionclean ] && /usr/lib/php5/sessionclean)
Dec 11 19:09:01 ores1004 CRON[9976]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:10:01 ores1004 CRON[10222]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:11:01 ores1004 CRON[10433]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:12:02 ores1004 CRON[10660]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:13:01 ores1004 CRON[10888]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:14:01 ores1004 CRON[11109]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:15:01 ores1004 CRON[11329]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 11 19:15:01 ores1004 CRON[11330]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:16:01 ores1004 CRON[11553]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:17:01 ores1004 CRON[11782]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Dec 11 19:17:01 ores1004 CRON[11783]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:18:01 ores1004 CRON[12266]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:19:01 ores1004 CRON[12551]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:20:01 ores1004 CRON[12765]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:21:01 ores1004 CRON[13215]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:22:01 ores1004 CRON[13522]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:22:44 ores1004 dbus[921]: [system] Activating via systemd: service name='org.freedesktop.timedate1' unit='dbus-org.freedesktop.timedate1.service'
Dec 11 19:22:44 ores1004 systemd[1]: Starting Time & Date Service...
Dec 11 19:22:44 ores1004 systemd[1]: Failed to reset devices.list on /system.slice: Invalid argument
Dec 11 19:22:44 ores1004 dbus[921]: [system] Successfully activated service 'org.freedesktop.timedate1'
Dec 11 19:22:44 ores1004 systemd-timedated[13695]: /etc/localtime should be a symbolic link to a time zone data file in /usr/share/zoneinfo/.
Dec 11 19:22:44 ores1004 systemd[1]: Started Time & Date Service.
Dec 11 19:23:01 ores1004 CRON[13781]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:24:01 ores1004 CRON[14005]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:25:01 ores1004 CRON[14210]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 11 19:25:01 ores1004 CRON[14211]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:26:01 ores1004 CRON[14431]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:27:01 ores1004 CRON[14657]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:28:01 ores1004 CRON[14900]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:29:01 ores1004 CRON[15121]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:30:01 ores1004 CRON[15341]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:31:01 ores1004 CRON[15560]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:32:01 ores1004 CRON[15786]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:32:17 ores1004 systemd-timesyncd[791]: interval/delta/delay/jitter/drift 2048s/+0.000s/0.000s/0.000s/+8ppm
Dec 11 19:33:01 ores1004 CRON[15999]: (root) CMD (/usr/local/sbin/puppet-run > /dev/null 2>&1)
Dec 11 19:33:01 ores1004 CRON[16004]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:33:20 ores1004 puppet-agent[16060]: Retrieving pluginfacts
Dec 11 19:33:20 ores1004 puppet-agent[16060]: Retrieving plugin
Dec 11 19:33:20 ores1004 puppet-agent[16060]: Loading facts
Dec 11 19:33:28 ores1004 puppet-agent[16060]: Caching catalog for ores1004.eqiad.wmnet
Dec 11 19:33:30 ores1004 puppet-agent[16060]: Applying configuration version '1513020803'
Dec 11 19:33:31 ores1004 systemd[1]: Reloading.
Dec 11 19:33:31 ores1004 systemd[1]: Started ACPI event daemon.
Dec 11 19:33:31 ores1004 systemd[1]: Listening on ACPID Listen Socket.
Dec 11 19:33:31 ores1004 systemd[1]: Mounted /.
Dec 11 19:33:33 ores1004 crontab[16520]: (root) LIST (root)
Dec 11 19:33:33 ores1004 crontab[16522]: (root) LIST (prometheus)
Dec 11 19:33:37 ores1004 puppet-agent[16060]: Finished catalog run in 7.96 seconds
Dec 11 19:34:01 ores1004 CRON[17085]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)

Now for the more interesting stuff. I 've tried to run /usr/bin/scap deploy-local -v --repo ores/deploy -g cluster fetch --refresh-config manually as deploy-service on ores1004.eqiad.wmnet and it croaks with a stacktrace

{"name": "deploy-local", "created": 1513091889.277656, "args": [], "msecs": 277.65607833862305, "filename": "cli.py", "levelno": 30, "msg": "Unhandled error:", "lineno": 230, "exc_text": "Traceback (most recent call last):\n  File \"/usr/lib/python2.7/dist-packages/scap/cli.py\", line 329, in run\n    exit_status = app.main(app.extra_arguments)\n  File \"/usr/lib/python2.7/dist-packages/scap/deploy.py\", line 146, in main\n    getattr(self, stage)()\n  File \"/usr/lib/python2.7/dist-packages/scap/deploy.py\", line 316, in fetch\n    git.checkout(rev_dir, self.rev)\n  File \"/usr/lib/python2.7/dist-packages/scap/git.py\", line 349, in checkout\n    git.checkout('--force', '--quiet', rev)\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 1428, in __call__\n    return RunningCommand(cmd, call_args, stdin, stdout, stderr)\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 775, in __init__\n    self.wait()\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 793, in wait\n    self.handle_command_exit_code(exit_code)\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 816, in handle_command_exit_code\n    raise exc\nErrorReturnCode_128: \n\n  RAN: /usr/bin/git checkout --force --quiet b4f2b0249a60754cfa4e18650fdd45c62ef4a09a\n\n  STDOUT:\n\n\n  STDERR:\nfatal: reference is not a tree: b4f2b0249a60754cfa4e18650fdd45c62ef4a09a\n", "funcName": "_handle_exception", "relativeCreated": 1288.093090057373}
{"name": "deploy-local", "created": 1513091889.280495, "args": ["deploy-local", "ErrorReturnCode_128", {"full_cmd": "/usr/bin/git checkout --force --quiet b4f2b0249a60754cfa4e18650fdd45c62ef4a09a", "stderr": "fatal: reference is not a tree: b4f2b0249a60754cfa4e18650fdd45c62ef4a09a\n", "stdout": ""}], "msecs": 280.49492835998535, "filename": "cli.py", "levelno": 40, "msg": "%s failed: <%s> %s", "lineno": 232, "exc_text": null, "funcName": "_handle_exception", "relativeCreated": 1290.9319400787354}

simply put b4f2b0249a60754cfa4e18650fdd45c62ef4a09a does not exist in whichever repo scap tries to find it at. For some more information

deploy-service@ores1004:/srv/deployment/ores/deploy-cache$ ls -la . revs/
.:
total 20
drwxr-xr-x  4 deploy-service deploy-service 4096 Dec  1 14:24 .
drwxr-xr-x  4 deploy-service root           4096 Dec  1 14:23 ..
-rw-r--r--  1 deploy-service deploy-service 1851 Dec 12 15:18 .config
lrwxrwxrwx  1 deploy-service deploy-service   45 Dec  1 14:24 .done -> revs/532bd0bc6e923b8594aba71295e8c6db9bbc5f8e
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec 12 15:18 cache
lrwxrwxrwx  1 deploy-service deploy-service   45 Dec  1 14:22 current -> revs/532bd0bc6e923b8594aba71295e8c6db9bbc5f8e
drwxr-xr-x 10 deploy-service deploy-service 4096 Dec 11 19:21 revs

revs/:
total 40
drwxr-xr-x 10 deploy-service deploy-service 4096 Dec 11 19:21 .
drwxr-xr-x  4 deploy-service deploy-service 4096 Dec  1 14:24 ..
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov  6 14:40 29905e509984050a2ea675725f3c110033fc67a0
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov 29 18:04 532bd0bc6e923b8594aba71295e8c6db9bbc5f8e
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec  5 15:53 6baed71ef9a02264660f8637fb2313c9c560f71d
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov  6 17:33 82a13ae1173ec570ea563f389ab22e1a69aa154a
drwxr-xr-x  6 deploy-service deploy-service 4096 Oct 26 16:43 971be2294a51772d309c967465b32f4f9a9b66f5
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec  1 20:18 9afbf148ec3007a6787ab099978bf5a08360879d
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec 11 19:21 b4f2b0249a60754cfa4e18650fdd45c62ef4a09a
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov 29 13:17 e58bfbf2827be1ac453bc3d07dfe7fe41a08dc66

So the object from a first look does exist in the deploy-cache. However...

deploy-service@ores1004:/srv/deployment/ores/deploy-cache/revs/b4f2b0249a60754cfa4e18650fdd45c62ef4a09a$ git show 
commit 82dfd5652c1550dbee362100204d7aa5242e59ab
Author: halfak <aaron.halfaker@gmail.com>
Date:   Mon Jun 26 17:15:23 2017 -0500

    Bumps submodules for editquality/ores/wheels (1.3.15 update)
    
    Change-Id: I72eabc15cf84ca1abc38b1f19883eb229d2bf778

OK, what's going on here ?

Pardon me, but I have to ask why a file with timestamps in the log file dating Dec 11th, and with a local fs system timestamp (on tin) of December 11th as well is named 2017-12-09 giving the impression it was created on Dec 9th. I guess a bug ?

This is a known bug, T181171: scap logs sometimes have incorrect date. The date is derived from the git commit being deployed, rather than the current system date.

Pardon me, but I have to ask why a file with timestamps in the log file dating Dec 11th, and with a local fs system timestamp (on tin) of December 11th as well is named 2017-12-09 giving the impression it was created on Dec 9th. I guess a bug ?

This is a known bug, T181171: scap logs sometimes have incorrect date. The date is derived from the git commit being deployed, rather than the current system date.

Aha. That's one piece of information I was missing. Thanks!

Mentioned in SAL (#wikimedia-operations) [2017-12-12T15:58:58Z] <akosiaris@tin> Started deploy [ores/deploy@b4f2b02]: T181661

Mentioned in SAL (#wikimedia-operations) [2017-12-12T15:59:08Z] <akosiaris@tin> Finished deploy [ores/deploy@b4f2b02]: T181661 (duration: 00m 09s)

Mentioned in SAL (#wikimedia-operations) [2017-12-12T15:59:19Z] <akosiaris@tin> Started deploy [ores/deploy@b4f2b02]: T181661

Mentioned in SAL (#wikimedia-operations) [2017-12-12T15:59:28Z] <akosiaris@tin> Finished deploy [ores/deploy@b4f2b02]: T181661 (duration: 00m 04s)

Now for the more interesting stuff. I 've tried to run /usr/bin/scap deploy-local -v --repo ores/deploy -g cluster fetch --refresh-config manually as deploy-service on ores1004.eqiad.wmnet and it croaks with a stacktrace

{"name": "deploy-local", "created": 1513091889.277656, "args": [], "msecs": 277.65607833862305, "filename": "cli.py", "levelno": 30, "msg": "Unhandled error:", "lineno": 230, "exc_text": "Traceback (most recent call last):\n  File \"/usr/lib/python2.7/dist-packages/scap/cli.py\", line 329, in run\n    exit_status = app.main(app.extra_arguments)\n  File \"/usr/lib/python2.7/dist-packages/scap/deploy.py\", line 146, in main\n    getattr(self, stage)()\n  File \"/usr/lib/python2.7/dist-packages/scap/deploy.py\", line 316, in fetch\n    git.checkout(rev_dir, self.rev)\n  File \"/usr/lib/python2.7/dist-packages/scap/git.py\", line 349, in checkout\n    git.checkout('--force', '--quiet', rev)\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 1428, in __call__\n    return RunningCommand(cmd, call_args, stdin, stdout, stderr)\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 775, in __init__\n    self.wait()\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 793, in wait\n    self.handle_command_exit_code(exit_code)\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 816, in handle_command_exit_code\n    raise exc\nErrorReturnCode_128: \n\n  RAN: /usr/bin/git checkout --force --quiet b4f2b0249a60754cfa4e18650fdd45c62ef4a09a\n\n  STDOUT:\n\n\n  STDERR:\nfatal: reference is not a tree: b4f2b0249a60754cfa4e18650fdd45c62ef4a09a\n", "funcName": "_handle_exception", "relativeCreated": 1288.093090057373}
{"name": "deploy-local", "created": 1513091889.280495, "args": ["deploy-local", "ErrorReturnCode_128", {"full_cmd": "/usr/bin/git checkout --force --quiet b4f2b0249a60754cfa4e18650fdd45c62ef4a09a", "stderr": "fatal: reference is not a tree: b4f2b0249a60754cfa4e18650fdd45c62ef4a09a\n", "stdout": ""}], "msecs": 280.49492835998535, "filename": "cli.py", "levelno": 40, "msg": "%s failed: <%s> %s", "lineno": 232, "exc_text": null, "funcName": "_handle_exception", "relativeCreated": 1290.9319400787354}

FWIW, if you pass -Dlog_json:False you get non-json output that's easier to read: /usr/bin/scap deploy-local -v -Dlog_json:False --repo ores/deploy -g cluster fetch --refresh-config

simply put b4f2b0249a60754cfa4e18650fdd45c62ef4a09a does not exist in whichever repo scap tries to find it at. For some more information

deploy-service@ores1004:/srv/deployment/ores/deploy-cache$ ls -la . revs/
.:
total 20
drwxr-xr-x  4 deploy-service deploy-service 4096 Dec  1 14:24 .
drwxr-xr-x  4 deploy-service root           4096 Dec  1 14:23 ..
-rw-r--r--  1 deploy-service deploy-service 1851 Dec 12 15:18 .config
lrwxrwxrwx  1 deploy-service deploy-service   45 Dec  1 14:24 .done -> revs/532bd0bc6e923b8594aba71295e8c6db9bbc5f8e
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec 12 15:18 cache
lrwxrwxrwx  1 deploy-service deploy-service   45 Dec  1 14:22 current -> revs/532bd0bc6e923b8594aba71295e8c6db9bbc5f8e
drwxr-xr-x 10 deploy-service deploy-service 4096 Dec 11 19:21 revs

revs/:
total 40
drwxr-xr-x 10 deploy-service deploy-service 4096 Dec 11 19:21 .
drwxr-xr-x  4 deploy-service deploy-service 4096 Dec  1 14:24 ..
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov  6 14:40 29905e509984050a2ea675725f3c110033fc67a0
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov 29 18:04 532bd0bc6e923b8594aba71295e8c6db9bbc5f8e
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec  5 15:53 6baed71ef9a02264660f8637fb2313c9c560f71d
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov  6 17:33 82a13ae1173ec570ea563f389ab22e1a69aa154a
drwxr-xr-x  6 deploy-service deploy-service 4096 Oct 26 16:43 971be2294a51772d309c967465b32f4f9a9b66f5
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec  1 20:18 9afbf148ec3007a6787ab099978bf5a08360879d
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec 11 19:21 b4f2b0249a60754cfa4e18650fdd45c62ef4a09a
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov 29 13:17 e58bfbf2827be1ac453bc3d07dfe7fe41a08dc66

So the object from a first look does exist in the deploy-cache. However...

deploy-service@ores1004:/srv/deployment/ores/deploy-cache/revs/b4f2b0249a60754cfa4e18650fdd45c62ef4a09a$ git show 
commit 82dfd5652c1550dbee362100204d7aa5242e59ab
Author: halfak <aaron.halfaker@gmail.com>
Date:   Mon Jun 26 17:15:23 2017 -0500

    Bumps submodules for editquality/ores/wheels (1.3.15 update)
    
    Change-Id: I72eabc15cf84ca1abc38b1f19883eb229d2bf778

OK, what's going on here ?

This is a bit of scap internals. The origin remote of anything in the revs dirs is the deploy-cache/cache dir. The object is likely not there because the directory exists (somehow) and you didn't pass --force see:

https://github.com/wikimedia/scap/blob/master/scap/deploy.py#L289-L295

Mentioned in SAL (#wikimedia-operations) [2017-12-12T16:22:38Z] <akosiaris@tin> Started deploy [ores/deploy@b4f2b02]: T181661

akosiaris@tin:/srv/deployment/ores/deploy$ scap deploy -v -l 'ores1004.eqiad.wmnet' T181661

fails reproducibly.

Log is

16:22:37 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info', pid 33404>: process completed
16:22:37 Started deploy [ores/deploy@b4f2b02]: T181661
16:22:37 
== CLUSTER ==
:* ores1004.eqiad.wmnet
16:22:37 Running remote deploy cmd ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '-g', 'cluster', 'fetch', '--refresh-config']
16:22:37 Using key: /etc/keyholder.d/deploy_service.pub
16:22:39 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '-g', 'cluster', 'fetch', '--refresh-config'] on ores1004.eqiad.wmnet returned [70]: OpenSSH_6.7p1 Debian-5+deb8u3, OpenSSL 1.0.2m  2 Nov 2017
debug1: Reading configuration data /dev/null
debug1: Connecting to ores1004.eqiad.wmnet [10.64.16.95] port 22.
debug1: Connection established.
debug1: identity file /etc/keyholder.d/deploy_service.pub type 1
debug1: key_load_public: No such file or directory
debug1: identity file /etc/keyholder.d/deploy_service.pub-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3
debug1: Remote protocol version 2.0, remote software version OpenSSH_6.7p1 Debian-5+deb8u3
debug1: match: OpenSSH_6.7p1 Debian-5+deb8u3 pat OpenSSH* compat 0x04000000
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: server->client aes128-ctr umac-128-etm@openssh.com none
debug1: kex: client->server aes128-ctr umac-128-etm@openssh.com none
debug1: sending SSH2_MSG_KEX_ECDH_INIT
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ECDSA 5d:bd:36:84:e8:2a:57:85:03:ef:80:cc:3e:fc:fb:6f
debug1: Host 'ores1004.eqiad.wmnet' is known and matches the ECDSA host key.
debug1: Found key in /etc/ssh/ssh_known_hosts:1116
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,keyboard-interactive
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /etc/keyholder.d/deploy_service.pub
debug1: Server accepts key: pkalg ssh-rsa blen 279
debug1: Authentication succeeded (publickey).
Authenticated to ores1004.eqiad.wmnet ([10.64.16.95]:22).
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: Sending command: /usr/bin/scap deploy-local -v --repo ores/deploy -g cluster fetch --refresh-config
http://tin.eqiad.wmnet/ores/deploy/.git
/srv/deployment/ores/deploy-cache/cache
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug1: client_input_channel_req: channel 0 rtype eow@openssh.com reply 0
debug1: channel 0: free: client-session, nchannels 1
debug1: fd 1 clearing O_NONBLOCK
Transferred: sent 3432, received 119092 bytes, in 1.4 seconds
Bytes per second: sent 2459.9, received 85360.0
debug1: Exit status 70

This correlates with trying to run scap deploy-local manually.

@mmodell Ideas on how I can debug this further ?

Mentioned in SAL (#wikimedia-operations) [2017-12-12T17:03:29Z] <akosiaris@tin> Started deploy [ores/deploy@b4f2b02]: T181661

Mentioned in SAL (#wikimedia-operations) [2017-12-12T17:11:14Z] <akosiaris@tin> Started deploy [ores/deploy@b4f2b02]: T181661

Mentioned in SAL (#wikimedia-operations) [2017-12-12T17:11:50Z] <akosiaris@tin> Finished deploy [ores/deploy@b4f2b02]: T181661 (duration: 00m 36s)

Mentioned in SAL (#wikimedia-operations) [2017-12-12T17:12:00Z] <akosiaris@tin> Started deploy [ores/deploy@b4f2b02]: T181661

Mentioned in SAL (#wikimedia-operations) [2017-12-12T17:12:10Z] <akosiaris@tin> Finished deploy [ores/deploy@b4f2b02]: T181661 (duration: 00m 03s)

Now for the more interesting stuff. I 've tried to run /usr/bin/scap deploy-local -v --repo ores/deploy -g cluster fetch --refresh-config manually as deploy-service on ores1004.eqiad.wmnet and it croaks with a stacktrace

{"name": "deploy-local", "created": 1513091889.277656, "args": [], "msecs": 277.65607833862305, "filename": "cli.py", "levelno": 30, "msg": "Unhandled error:", "lineno": 230, "exc_text": "Traceback (most recent call last):\n  File \"/usr/lib/python2.7/dist-packages/scap/cli.py\", line 329, in run\n    exit_status = app.main(app.extra_arguments)\n  File \"/usr/lib/python2.7/dist-packages/scap/deploy.py\", line 146, in main\n    getattr(self, stage)()\n  File \"/usr/lib/python2.7/dist-packages/scap/deploy.py\", line 316, in fetch\n    git.checkout(rev_dir, self.rev)\n  File \"/usr/lib/python2.7/dist-packages/scap/git.py\", line 349, in checkout\n    git.checkout('--force', '--quiet', rev)\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 1428, in __call__\n    return RunningCommand(cmd, call_args, stdin, stdout, stderr)\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 775, in __init__\n    self.wait()\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 793, in wait\n    self.handle_command_exit_code(exit_code)\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 816, in handle_command_exit_code\n    raise exc\nErrorReturnCode_128: \n\n  RAN: /usr/bin/git checkout --force --quiet b4f2b0249a60754cfa4e18650fdd45c62ef4a09a\n\n  STDOUT:\n\n\n  STDERR:\nfatal: reference is not a tree: b4f2b0249a60754cfa4e18650fdd45c62ef4a09a\n", "funcName": "_handle_exception", "relativeCreated": 1288.093090057373}
{"name": "deploy-local", "created": 1513091889.280495, "args": ["deploy-local", "ErrorReturnCode_128", {"full_cmd": "/usr/bin/git checkout --force --quiet b4f2b0249a60754cfa4e18650fdd45c62ef4a09a", "stderr": "fatal: reference is not a tree: b4f2b0249a60754cfa4e18650fdd45c62ef4a09a\n", "stdout": ""}], "msecs": 280.49492835998535, "filename": "cli.py", "levelno": 40, "msg": "%s failed: <%s> %s", "lineno": 232, "exc_text": null, "funcName": "_handle_exception", "relativeCreated": 1290.9319400787354}

FWIW, if you pass -Dlog_json:False you get non-json output that's easier to read: /usr/bin/scap deploy-local -v -Dlog_json:False --repo ores/deploy -g cluster fetch --refresh-config

Ah indeed, that's more useful.

simply put b4f2b0249a60754cfa4e18650fdd45c62ef4a09a does not exist in whichever repo scap tries to find it at. For some more information

deploy-service@ores1004:/srv/deployment/ores/deploy-cache$ ls -la . revs/
.:
total 20
drwxr-xr-x  4 deploy-service deploy-service 4096 Dec  1 14:24 .
drwxr-xr-x  4 deploy-service root           4096 Dec  1 14:23 ..
-rw-r--r--  1 deploy-service deploy-service 1851 Dec 12 15:18 .config
lrwxrwxrwx  1 deploy-service deploy-service   45 Dec  1 14:24 .done -> revs/532bd0bc6e923b8594aba71295e8c6db9bbc5f8e
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec 12 15:18 cache
lrwxrwxrwx  1 deploy-service deploy-service   45 Dec  1 14:22 current -> revs/532bd0bc6e923b8594aba71295e8c6db9bbc5f8e
drwxr-xr-x 10 deploy-service deploy-service 4096 Dec 11 19:21 revs

revs/:
total 40
drwxr-xr-x 10 deploy-service deploy-service 4096 Dec 11 19:21 .
drwxr-xr-x  4 deploy-service deploy-service 4096 Dec  1 14:24 ..
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov  6 14:40 29905e509984050a2ea675725f3c110033fc67a0
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov 29 18:04 532bd0bc6e923b8594aba71295e8c6db9bbc5f8e
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec  5 15:53 6baed71ef9a02264660f8637fb2313c9c560f71d
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov  6 17:33 82a13ae1173ec570ea563f389ab22e1a69aa154a
drwxr-xr-x  6 deploy-service deploy-service 4096 Oct 26 16:43 971be2294a51772d309c967465b32f4f9a9b66f5
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec  1 20:18 9afbf148ec3007a6787ab099978bf5a08360879d
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec 11 19:21 b4f2b0249a60754cfa4e18650fdd45c62ef4a09a
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov 29 13:17 e58bfbf2827be1ac453bc3d07dfe7fe41a08dc66

So the object from a first look does exist in the deploy-cache. However...

deploy-service@ores1004:/srv/deployment/ores/deploy-cache/revs/b4f2b0249a60754cfa4e18650fdd45c62ef4a09a$ git show 
commit 82dfd5652c1550dbee362100204d7aa5242e59ab
Author: halfak <aaron.halfaker@gmail.com>
Date:   Mon Jun 26 17:15:23 2017 -0500

    Bumps submodules for editquality/ores/wheels (1.3.15 update)
    
    Change-Id: I72eabc15cf84ca1abc38b1f19883eb229d2bf778

OK, what's going on here ?

This is a bit of scap internals. The origin remote of anything in the revs dirs is the deploy-cache/cache dir. The object is likely not there because the directory exists (somehow) and you didn't pass --force see:

https://github.com/wikimedia/scap/blob/master/scap/deploy.py#L289-L295

--force did not solve the issue unfortunately :(

Moving the deploy-cache/cache directory aside did solve the issue (partially?) and moved on until...

7:18:27 acquiring buffering lock for flushing buffer
17:18:27 got buffering lock for flushing buffer
17:18:27 released buffering lock for flushing buffer
17:18:27 <Command u'/usr/bin/git submodule update --init --recursive --reference /srv/deployment/ores/deploy-cache/cache'>.<Process 3440 ['/usr/bin/git', 'submodule', 'update', '--init', '--recursive', '--reference', '/srv/deployment/ores/deploy-cache/cache']>.stdout: got chunk size 0 to flush: ''
17:18:27 acquiring buffering lock for flushing buffer
17:18:27 got buffering lock for flushing buffer
17:18:27 released buffering lock for flushing buffer
17:18:27 got chunk size 0 to flush: ''
17:18:27 Unhandled error:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/scap/cli.py", line 329, in run
    exit_status = app.main(app.extra_arguments)
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 146, in main
    getattr(self, stage)()
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 321, in fetch
    reference=self.context.cache_dir)
  File "/usr/lib/python2.7/dist-packages/scap/git.py", line 388, in update_submodules
    git.submodule(*cmd)
  File "/usr/lib/python2.7/dist-packages/scap/sh.py", line 1428, in __call__
    return RunningCommand(cmd, call_args, stdin, stdout, stderr)
  File "/usr/lib/python2.7/dist-packages/scap/sh.py", line 775, in __init__
    self.wait()
  File "/usr/lib/python2.7/dist-packages/scap/sh.py", line 793, in wait
    self.handle_command_exit_code(exit_code)
  File "/usr/lib/python2.7/dist-packages/scap/sh.py", line 816, in handle_command_exit_code
    raise exc
ErrorReturnCode_1: 

  RAN: /usr/bin/git submodule update --init --recursive --reference /srv/deployment/ores/deploy-cache/cache

  STDOUT:
Submodule 'submodules/draftquality' (http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/draftquality) registered for path 'submodules/draftquality'
Submodule 'submodules/editquality' (http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/editquality) registered for path 'submodules/editquality'
Submodule 'submodules/ores' (http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/ores) registered for path 'submodules/ores'
Submodule 'submodules/wheels' (http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/wheels) registered for path 'submodules/wheels'
Submodule 'submodules/wikiclass' (http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/wikiclass) registered for path 'submodules/wikiclass'
Submodule path 'submodul... (258 more, please see e.stdout)

  STDERR:
Cloning into 'submodules/draftquality'...
Cloning into 'submodules/editquality'...
fatal: reference is not a tree: 15d5283b7422919d85203b5ba907027f9356e421
Cloning into 'submodules/ores'...
fatal: reference is not a tree: b7413d6abf80688d44b42da4a7d3a4d1a80ca150
Cloning into 'submodules/wheels'...
Cloning into 'submodules/wikiclass'...
Unable to checkout '15d5283b7422919d85203b5ba907027f9356e421' in submodule path 'submodules/editquality'
Unable to checkout 'b7413d6abf80688d44b42da4a7d3a4d1a80ca150' in submodule path 'submodules/ores'

17:18:27 deploy-local failed: <ErrorReturnCode_1> 

  RAN: /usr/bin/git submodule update --init --recursive --reference /srv/deployment/ores/deploy-cache/cache

  STDOUT:
Submodule 'submodules/draftquality' (http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/draftquality) registered for path 'submodules/draftquality'
Submodule 'submodules/editquality' (http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/editquality) registered for path 'submodules/editquality'
Submodule 'submodules/ores' (http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/ores) registered for path 'submodules/ores'
Submodule 'submodules/wheels' (http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/wheels) registered for path 'submodules/wheels'
Submodule 'submodules/wikiclass' (http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/wikiclass) registered for path 'submodules/wikiclass'
Submodule path 'submodul... (258 more, please see e.stdout)

  STDERR:
Cloning into 'submodules/draftquality'...
Cloning into 'submodules/editquality'...
fatal: reference is not a tree: 15d5283b7422919d85203b5ba907027f9356e421
Cloning into 'submodules/ores'...
fatal: reference is not a tree: b7413d6abf80688d44b42da4a7d3a4d1a80ca150
Cloning into 'submodules/wheels'...
Cloning into 'submodules/wikiclass'...
Unable to checkout '15d5283b7422919d85203b5ba907027f9356e421' in submodule path 'submodules/editquality'
Unable to checkout 'b7413d6abf80688d44b42da4a7d3a4d1a80ca150' in submodule path 'submodules/ores'

15d5283b7422919d85203b5ba907027f9356e421 doesn't exist in the editquality repo. Somehow the submodule pointers are wrong.

But it does exist on tin

akosiaris@tin:/srv/deployment/ores/deploy/.git/modules/submodules/editquality$ git show 15d5283b7422919d85203b5ba907027f9356e421
commit 15d5283b7422919d85203b5ba907027f9356e421
Merge: 529371c 4813659
Author: Aaron Halfaker <aaron.halfaker@gmail.com>
Date:   Wed Dec 6 11:47:01 2017 -0600

    Merge pull request #108 from wiki-ai/eswikiquote
    
    Add eswikiquote reverted models and revisions for review

For completeness sake

deploy-service@ores1004:/srv/deployment/ores/deploy-cache/cache$ cat .gitmodules 
[submodule "submodules/wikiclass"]
	path = submodules/wikiclass
	url = http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/wikiclass
[submodule "submodules/editquality"]
	path = submodules/editquality
	url = http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/editquality
[submodule "submodules/ores"]
	path = submodules/ores
	url = http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/ores
[submodule "submodules/wheels"]
	path = submodules/wheels
	url = http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/wheels
[submodule "submodules/draftquality"]
	path = submodules/draftquality
	url = http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/draftquality
deploy-service@ores1004:/srv/deployment/ores/deploy-cache/cache$ cat .git/config 
[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
[remote "origin"]
	url = http://tin.eqiad.wmnet/ores/deploy/.git
	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
	remote = origin
	merge = refs/heads/master
[submodule "submodules/draftquality"]
	url = http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/draftquality
[submodule "submodules/editquality"]
	url = http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/editquality
[submodule "submodules/ores"]
	url = http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/ores
[submodule "submodules/wheels"]
	url = http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/wheels
[submodule "submodules/wikiclass"]
	url = http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/wikiclass

But a git fetch indeed does not fetch anything new.

Another thing: I'm having difficulty just cloning the editquality submodule.

It's so large that git pack-objects takes over 100 seconds of cpu time and uses ~5.5gigs of ram.

RES    SHR    S  %CPU %MEM     TIME+ COMMAND 
5.665g 486352 S  1822  9.0  102:16.34 git pack-objects --revs --thin --stdout --delta-base-offset

I don't think there is anything really wrong with scap, it's just an unusably-large git repo.

@mmodell is on to something though with the comment about that commit not being in the repo

akosiaris@tin:/srv/deployment/ores/deploy/.git/modules/submodules/editquality$ git branch --contains 15d5283b7422919d85203b5ba907027f9356e421
akosiaris@tin:/srv/deployment/ores/deploy/.git/modules/submodules/editquality$ git tag --contains 15d5283b7422919d85203b5ba907027f9356e421
akosiaris@tin:/srv/deployment/ores/deploy/.git/modules/submodules/editquality$

So this commit does not exist in any of the branches or tags

mmodell added a comment.EditedDec 13 2017, 10:49 AM

Hmm, indeed, if the object does not exist on any branch or tag then it likely won't be fetched by the "dumb" git http transport (which is what we still use in scap.)

Probably wouldn't be fetched by other transports either, though I guess it might get fetched along with other objects if it's in the same pack file.

Just fetching this one repo (editquality) from phabricator is causing inordinate load on the server. It's nothing that phab1001 can't handle, however, it's surprising how much work it takes for git to clone a single ~3gb repository.

load average: 18.23, 26.87, 28.41

Behavior is erratic as well

akosiaris@bast1001:~$ git clone https://phabricator.wikimedia.org/source/editquality.git
Cloning into 'editquality'...
error: RPC failed; result=22, HTTP code = 504
fatal: The remote end hung up unexpectedly

Ahem...

mmodell added a comment.EditedDec 13 2017, 11:02 AM

Yeah that repo is 334M in the current workdir but the .git is 2.1 gigs. That doesn't seem too unreasonable but apparently git can't handle it. It's especially bad cloning from phabricator over https (upstream:T4369)

akosiaris added a comment.EditedDec 13 2017, 3:07 PM

A fresh clone of http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/editquality on bast1001 does not have 15d5283b7422919d85203b5ba907027f9356e421 either so aside from the difficult to fetch repo we have at least identified 1 more cause for the problem. It seems like the editquality submodule referenced by the deploy repo needs to be updated to contain that change as well and not be at HEAD detached at ac3042e state. @awight care to have the honours of updating that submodule ?

I can't help but wonder why scap is trying to checkout 15d5283b7422919d85203b5ba907027f9356e421 however. Where does it get that info from ? master ?

@akosiaris: scap should be getting the hash from the submodule pointers contained at HEAD of tin.eqiad.wmnet/ores/deploy/.git

That said, I can't actually find that hash anywhere in the repo on tin so I'm baffled really, I can't figure out where that commit is coming from at all.

the ores submodule btw is in the exact same state and also fails to checkout

akosiaris@tin:/srv/deployment/ores/deploy/ores$ git status
HEAD detached at f3a4a63
nothing to commit, working tree clean
akosiaris@tin:/srv/deployment/ores/deploy/ores$ cd ../
akosiaris@tin:/srv/deployment/ores/deploy$ git status
On branch master
mmodell added a comment.EditedDec 13 2017, 3:26 PM

ah ha! I figured _something_ out at least! The 15d5283b commit is in origin/master it just hasn't been merged to master on tin. git pull would fast forward it and then that commit exists. The question that this raises, however, is this:

Why is scap trying to fetch that commit? It would appear as if scap is fetching the repo from origin instead of tin, at least initially, then attempting to fetch only the submodules from tin (but not finding the commit in question because it hasn't been pulled there)

(If this is true, it's incorrect behavior, I think, as scap should always fetch from tin unless instructed otherwise by scap.cfg)

@akosiaris: scap should be getting the hash from the submodule pointers contained at HEAD of tin.eqiad.wmnet/ores/deploy/.git

Well, it's getting the submodule pointers from origin/master instead I 'd say. See 42cf53249ffacc078aa786a581fe165e00bb1520

That said, I can't actually find that hash anywhere in the repo on tin so I'm baffled really, I can't figure out where that commit is coming from at all.

@akosiaris: scap should be getting the hash from the submodule pointers contained at HEAD of tin.eqiad.wmnet/ores/deploy/.git

Well, it's getting the submodule pointers from origin/master instead I 'd say. See 42cf53249ffacc078aa786a581fe165e00bb1520

Indeed, and that's the important detail that was eluding me before. And I think that I have found the culprit: scap/scap.cfg has this line: git_rev: origin/master

mmodell added a comment.EditedDec 13 2017, 3:35 PM

so @awight, can you enlighten me about your scap.cfg? Is git_rev: origin/master intentional? If you want to deploy from origin then I think we need to use git_upstream_submodules = True

Aha! nice find. It looks like it's been there since the very beginning. See fd1067ff4da. It has undergone a few minor changes since then but nothing major.

I 've crafted a commit on tin removing that line and retrying a scap deploy from tin just for ores1004.

I think I should add a NOTICE to scap that says something along the lines of "Deploying from non-default origin: $GIT_REV" just to make this more obvious ;)

I 've crafted a commit on tin removing that line and retrying a scap deploy from tin just for ores1004.

OK, success! deploy for the first time proceeded fine on ores1004. @mmodell your rock! Thanks!

Now to see if we have solved the parallel scap issue on all ores* boxes as well

I 've crafted a commit on tin removing that line and retrying a scap deploy from tin just for ores1004.

OK, success! deploy for the first time proceeded fine on ores1004. @mmodell your rock! Thanks!

Now to see if we have solved the parallel scap issue on all ores* boxes as well

Hurray!!! I had to manually (viva cumin) rm -rf /srv/deployment/ores/deploy/deploy-cache/cache on all ores* hosts, but it now works.

AFAICT, next steps forward should probably be:

  • A patch to remove git_rev: origin/master from scap config (I 'll do that) for ores
  • The above NOTICE that @mmodell suggested would probably be nice
  • Figure out why the deletion of the deploy-cache/cache directory was required and whether that's correct behavior or not

Change 398067 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[mediawiki/services/ores/deploy@master] scap: Remove git_rev setting

https://gerrit.wikimedia.org/r/398067

Change 398067 merged by Awight:
[mediawiki/services/ores/deploy@master] scap: Remove git_rev setting

https://gerrit.wikimedia.org/r/398067

@mmodell Tangential note, I've been happy using git clone --depth 1 on personal projects. Would that make any sense for scap?

@awight: from what I understand, git has to do a lot of extra work on the server side in order to build to shallow clone. I may look into this a bit further, it would certainly save space on the deployment target but it probably doesn't solve the issue of slow fetches.

@awight: from what I understand, git has to do a lot of extra work on the server side in order to build to shallow clone. I may look into this a bit further, it would certainly save space on the deployment target but it probably doesn't solve the issue of slow fetches.

I may actually be wrong about this, upon reviewing my sources I think I was just remembering something incorrectly. It's only hard on the server when you subsequently try to fetch into a shallow clone - git has to do more work to figure out what needs to be fetched and it makes the server work harder than simply cloning the whole repo to begin with.

So --depth=1 might work as long as we don't have to fetch from the repo later. I'm not sure how that would be better than the --reference method that scap is using right now but I am definitely open to considering anything which can improve performance and stability.

Looks like I'm getting the same error.

commit b67bba77acb7c0ffc678201c9f3f54f198da6650

scap deploy -v -l "ores*" "(non-production) Update ORES on new cluster"

scap-sync-2017-12-12-0006-3-gb67bba7.log

Mentioned in SAL (#wikimedia-operations) [2017-12-14T11:12:07Z] <akosiaris@tin> Started deploy [ores/deploy@b67bba7]: T181661

Mentioned in SAL (#wikimedia-operations) [2017-12-14T11:13:02Z] <akosiaris@tin> Finished deploy [ores/deploy@b67bba7]: T181661 (duration: 00m 55s)

Mentioned in SAL (#wikimedia-operations) [2017-12-14T11:13:59Z] <akosiaris@tin> Started deploy [ores/deploy@b67bba7]: T181661

Mentioned in SAL (#wikimedia-operations) [2017-12-14T11:14:08Z] <akosiaris@tin> Finished deploy [ores/deploy@b67bba7]: T181661 (duration: 00m 03s)

Looks like I'm getting the same error.

commit b67bba77acb7c0ffc678201c9f3f54f198da6650

scap deploy -v -l "ores*" "(non-production) Update ORES on new cluster"

scap-sync-2017-12-12-0006-3-gb67bba7.log

I can not reproduce this. A deploy of b67bba77a (see T181661#3836939) just worked fine. In fact it worked fine 2 times in a row. Command used was scap deploy -v -l "ores*" T181661.

Mentioned in SAL (#wikimedia-operations) [2017-12-14T11:16:12Z] <akosiaris@tin> Started deploy [ores/deploy@b67bba7]: T181661

Mentioned in SAL (#wikimedia-operations) [2017-12-14T11:16:21Z] <akosiaris@tin> Finished deploy [ores/deploy@b67bba7]: T181661 (duration: 00m 03s)

Not sure if this is related, but now I'm seeing a deploy-local failure with no diagnostics other than error code 70:

1awight@tin:/srv/deployment/ores/deploy$ scap deploy -v -f -l "ores1001*" "(non-production) Update ORES on new cluster"
211:23:51 INFO - <Command u'/usr/bin/git describe --always'>: starting process
311:23:51 INFO - <Command u'/usr/bin/git describe --always', pid 4191>: process started
411:23:51 INFO - <Command u'/usr/bin/git describe --always', pid 4191>: process completed
511:23:51 INFO - <Command u'/usr/bin/git describe --always', pid 4191>: process completed
611:23:51 INFO - <Command u'/usr/bin/git describe --always', pid 4191>: process completed
711:23:51 <Command u'/usr/bin/git show -s --format=%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>: starting process
811:23:51 <Command u'/usr/bin/git show -s --format=%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>: started process
911:23:51 <Command u'/usr/bin/git show -s --format=%ct b67bba77acb7c0ffc678201c9f3f54f198da6650', pid 4196>: process started
1011:23:51 <Command u'/usr/bin/git show -s --format=%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>: acquiring wait lock to wait for completion
1111:23:51 <Command u'/usr/bin/git show -s --format=%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>: got wait lock
1211:23:51 <Command u'/usr/bin/git show -s --format=%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>: exit code not set, waiting on pid
1311:23:51 <Command u'/usr/bin/git show -s --format=%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>: <scap.sh.StreamReader object at 0x7f0b81f00850> ready to be read from
1411:23:51 <Command u'/usr/bin/git show -s --format=%%%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>.stdout: got chunk size 11: '1513181123\n'
1511:23:51 acquiring buffering lock to process chunk (buffering: 1)
1611:23:51 got buffering lock to process chunk (buffering: 1)
1711:23:51 released buffering lock for processing chunk (buffering: 1)
1811:23:51 <Command u'/usr/bin/git show -s --format=%%%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>.stdout: putting chunk onto pipe: '1513181123\n'
1911:23:51 <Command u'/usr/bin/git show -s --format=%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>: <scap.sh.StreamReader object at 0x7f0b81f00fd0> ready to be read from
2011:23:51 got no chunk, done reading
2111:23:51 <Command u'/usr/bin/git show -s --format=%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>: <scap.sh.StreamReader object at 0x7f0b81f00850> ready to be read from
2211:23:51 <Command u'/usr/bin/git show -s --format=%%%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>.stdout: got no chunk, done reading
2311:23:51 acquiring buffering lock for flushing buffer
2411:23:51 got buffering lock for flushing buffer
2511:23:51 released buffering lock for flushing buffer
2611:23:51 <Command u'/usr/bin/git show -s --format=%%%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>.stdout: got chunk size 0 to flush: ''
2711:23:51 acquiring buffering lock for flushing buffer
2811:23:51 got buffering lock for flushing buffer
2911:23:51 released buffering lock for flushing buffer
3011:23:51 got chunk size 0 to flush: ''
3111:23:51 <Command u'/usr/bin/git show -s --format=%ct b67bba77acb7c0ffc678201c9f3f54f198da6650', pid 4196>: process completed
3211:23:51 <Command u'/usr/bin/git show -s --format=%ct b67bba77acb7c0ffc678201c9f3f54f198da6650', pid 4196>: process completed
3311:23:51 <Command u'/usr/bin/git show -s --format=%ct b67bba77acb7c0ffc678201c9f3f54f198da6650', pid 4196>: process completed
3411:23:51 <Command u'/usr/bin/git ls-remote --get-url'>: starting process
3511:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>: started process
3611:23:51 <Command u'/usr/bin/git ls-remote --get-url', pid 4200>: process started
3711:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>: acquiring wait lock to wait for completion
3811:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>: got wait lock
3911:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>: exit code not set, waiting on pid
4011:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>: <scap.sh.StreamReader object at 0x7f0b81ef6d90> ready to be read from
4111:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>.stdout: got chunk size 68: 'https://gerrit.wikimedia.org/r'
4211:23:51 acquiring buffering lock to process chunk (buffering: 1)
4311:23:51 got buffering lock to process chunk (buffering: 1)
4411:23:51 released buffering lock for processing chunk (buffering: 1)
4511:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>.stdout: putting chunk onto pipe: 'https://gerrit.wikimedia.org/r'
4611:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>: <scap.sh.StreamReader object at 0x7f0b81f17590> ready to be read from
4711:23:51 got no chunk, done reading
4811:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>: <scap.sh.StreamReader object at 0x7f0b81ef6d90> ready to be read from
4911:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>.stdout: got no chunk, done reading
5011:23:51 acquiring buffering lock for flushing buffer
5111:23:51 got buffering lock for flushing buffer
5211:23:51 released buffering lock for flushing buffer
5311:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>.stdout: got chunk size 0 to flush: ''
5411:23:51 acquiring buffering lock for flushing buffer
5511:23:51 got buffering lock for flushing buffer
5611:23:51 released buffering lock for flushing buffer
5711:23:51 got chunk size 0 to flush: ''
5811:23:51 <Command u'/usr/bin/git ls-remote --get-url', pid 4200>: process completed
5911:23:51 <Command u'/usr/bin/git ls-remote --get-url', pid 4200>: process completed
6011:23:51 <Command u'/usr/bin/git ls-remote --get-url', pid 4200>: process completed
6111:23:51 Started deploy [ores/deploy@b67bba7]
6211:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*'>: starting process
6311:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/*']>: started process
6411:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*', pid 4204>: process started
6511:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/*']>: acquiring wait lock to wait for completion
6611:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/*']>: got wait lock
6711:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/*']>: exit code not set, waiting on pid
6811:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/*']>: <scap.sh.StreamReader object at 0x7f0b8125f090> ready to be read from
6911:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/*']>.stdout: got chunk size 104: 'scap/sync/2017-12-14/0001\nscap'
7011:23:51 acquiring buffering lock to process chunk (buffering: 1)
7111:23:51 got buffering lock to process chunk (buffering: 1)
7211:23:51 released buffering lock for processing chunk (buffering: 1)
7311:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/*']>.stdout: putting chunk onto pipe: 'scap/sync/2017-12-14/0001\n'
7411:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/*']>.stdout: putting chunk onto pipe: 'scap/sync/2017-12-14/0002\n'
7511:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/*']>.stdout: putting chunk onto pipe: 'scap/sync/2017-12-14/0003\n'
7611:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/*']>.stdout: putting chunk onto pipe: 'scap/sync/2017-12-14/0004\n'
7711:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/*']>: <scap.sh.StreamReader object at 0x7f0b8125f090> ready to be read from
7811:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/*']>.stdout: got no chunk, done reading
7911:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/*']>: <scap.sh.StreamReader object at 0x7f0b8125f1d0> ready to be read from
8011:23:51 got no chunk, done reading
8111:23:51 acquiring buffering lock for flushing buffer
8211:23:51 got buffering lock for flushing buffer
8311:23:51 released buffering lock for flushing buffer
8411:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/*']>.stdout: got chunk size 0 to flush: ''
8511:23:51 acquiring buffering lock for flushing buffer
8611:23:51 got buffering lock for flushing buffer
8711:23:51 released buffering lock for flushing buffer
8811:23:51 got chunk size 0 to flush: ''
8911:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*', pid 4204>: process completed
9011:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*', pid 4204>: process completed
9111:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*', pid 4204>: process completed
9211:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>: starting process
9311:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>: started process
9411:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD', pid 4208>: process started
9511:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>: acquiring wait lock to wait for completion
9611:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>: got wait lock
9711:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>: exit code not set, waiting on pid
9811:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>: <scap.sh.StreamReader object at 0x7f0b81f17850> ready to be read from
9911:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>.stdout: got chunk size 41: 'b67bba77acb7c0ffc678201c9f3f54'
10011:23:51 acquiring buffering lock to process chunk (buffering: 1)
10111:23:51 got buffering lock to process chunk (buffering: 1)
10211:23:51 released buffering lock for processing chunk (buffering: 1)
10311:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>.stdout: putting chunk onto pipe: 'b67bba77acb7c0ffc678201c9f3f54'
10411:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>: <scap.sh.StreamReader object at 0x7f0b81f17850> ready to be read from
10511:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>.stdout: got no chunk, done reading
10611:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>: <scap.sh.StreamReader object at 0x7f0b81f17750> ready to be read from
10711:23:51 got no chunk, done reading
10811:23:51 acquiring buffering lock for flushing buffer
10911:23:51 got buffering lock for flushing buffer
11011:23:51 released buffering lock for flushing buffer
11111:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>.stdout: got chunk size 0 to flush: ''
11211:23:51 acquiring buffering lock for flushing buffer
11311:23:51 got buffering lock for flushing buffer
11411:23:51 released buffering lock for flushing buffer
11511:23:51 got chunk size 0 to flush: ''
11611:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD', pid 4208>: process completed
11711:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD', pid 4208>: process completed
11811:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD', pid 4208>: process completed
11911:23:51 Deploying Rev: b67bba77acb7c0ffc678201c9f3f54f198da6650
12011:23:51 Update DEPLOY_HEAD
12111:23:51 Creating /srv/deployment/ores/deploy/.git/DEPLOY_HEAD
12211:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%(refname) refs/tags'>: starting process
12311:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%(refname)', 'refs/tags']>: started process
12411:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%(refname) refs/tags', pid 4214>: process started
12511:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%(refname)', 'refs/tags']>: acquiring wait lock to wait for completion
12611:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%(refname)', 'refs/tags']>: got wait lock
12711:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%(refname)', 'refs/tags']>: exit code not set, waiting on pid
12811:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%(refname)', 'refs/tags']>: <scap.sh.StreamReader object at 0x7f0b8125f8d0> ready to be read from
12911:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: got chunk size 756: 'refs/tags/scap/sync/2017-12-05'
13011:23:51 acquiring buffering lock to process chunk (buffering: 1)
13111:23:51 got buffering lock to process chunk (buffering: 1)
13211:23:51 released buffering lock for processing chunk (buffering: 1)
13311:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-05'
13411:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-09'
13511:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-09'
13611:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-09'
13711:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-09'
13811:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-11'
13911:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-12'
14011:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-12'
14111:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-12'
14211:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-12'
14311:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-12'
14411:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-12'
14511:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-13'
14611:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-13'
14711:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-13'
14811:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-13'
14911:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-14'
15011:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-14'
15111:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-14'
15211:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-14'
15311:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-14'
15411:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%(refname)', 'refs/tags']>: <scap.sh.StreamReader object at 0x7f0b8125f8d0> ready to be read from
15511:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: got no chunk, done reading
15611:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%(refname)', 'refs/tags']>: <scap.sh.StreamReader object at 0x7f0b8125f950> ready to be read from
15711:23:51 got no chunk, done reading
15811:23:51 acquiring buffering lock for flushing buffer
15911:23:51 got buffering lock for flushing buffer
16011:23:51 released buffering lock for flushing buffer
16111:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: got chunk size 0 to flush: ''
16211:23:51 acquiring buffering lock for flushing buffer
16311:23:51 got buffering lock for flushing buffer
16411:23:51 released buffering lock for flushing buffer
16511:23:51 got chunk size 0 to flush: ''
16611:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%(refname) refs/tags', pid 4214>: process completed
16711:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%(refname) refs/tags', pid 4214>: process completed
16811:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%(refname) refs/tags', pid 4214>: process completed
16911:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>: starting process
17011:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>: started process
17111:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001', pid 4218>: process started
17211:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>: acquiring wait lock to wait for completion
17311:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>: got wait lock
17411:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>: exit code not set, waiting on pid
17511:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>: <scap.sh.StreamReader object at 0x7f0b81f00a50> ready to be read from
17611:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>.stdout: got chunk size 54: "Deleted tag 'scap/sync/2017-12"
17711:23:51 acquiring buffering lock to process chunk (buffering: 1)
17811:23:51 got buffering lock to process chunk (buffering: 1)
17911:23:51 released buffering lock for processing chunk (buffering: 1)
18011:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>.stdout: putting chunk onto pipe: "Deleted tag 'scap/sync/2017-12"
18111:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>: <scap.sh.StreamReader object at 0x7f0b81f00a50> ready to be read from
18211:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>.stdout: got no chunk, done reading
18311:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>: <scap.sh.StreamReader object at 0x7f0b81f00590> ready to be read from
18411:23:51 got no chunk, done reading
18511:23:51 acquiring buffering lock for flushing buffer
18611:23:51 got buffering lock for flushing buffer
18711:23:51 released buffering lock for flushing buffer
18811:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>.stdout: got chunk size 0 to flush: ''
18911:23:51 acquiring buffering lock for flushing buffer
19011:23:51 got buffering lock for flushing buffer
19111:23:51 released buffering lock for flushing buffer
19211:23:51 got chunk size 0 to flush: ''
19311:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001', pid 4218>: process completed
19411:23:51 Update server info
19511:23:51 <Command u'/usr/bin/git update-server-info'>: starting process
19611:23:51 <Command u'/usr/bin/git update-server-info'>.<Process 4222 ['/usr/bin/git', 'update-server-info']>: started process
19711:23:51 <Command u'/usr/bin/git update-server-info', pid 4222>: process started
19811:23:51 <Command u'/usr/bin/git update-server-info'>.<Process 4222 ['/usr/bin/git', 'update-server-info']>: acquiring wait lock to wait for completion
19911:23:51 <Command u'/usr/bin/git update-server-info'>.<Process 4222 ['/usr/bin/git', 'update-server-info']>: got wait lock
20011:23:51 <Command u'/usr/bin/git update-server-info'>.<Process 4222 ['/usr/bin/git', 'update-server-info']>: exit code not set, waiting on pid
20111:23:51 <Command u'/usr/bin/git update-server-info'>.<Process 4222 ['/usr/bin/git', 'update-server-info']>: <scap.sh.StreamReader object at 0x7f0b8126e910> ready to be read from
20211:23:51 <Command u'/usr/bin/git update-server-info'>.<Process 4222 ['/usr/bin/git', 'update-server-info']>.stdout: got no chunk, done reading
20311:23:51 <Command u'/usr/bin/git update-server-info'>.<Process 4222 ['/usr/bin/git', 'update-server-info']>: <scap.sh.StreamReader object at 0x7f0b8126ea50> ready to be read from
20411:23:51 got no chunk, done reading
20511:23:51 acquiring buffering lock for flushing buffer
20611:23:51 got buffering lock for flushing buffer
20711:23:51 released buffering lock for flushing buffer
20811:23:51 <Command u'/usr/bin/git update-server-info'>.<Process 4222 ['/usr/bin/git', 'update-server-info']>.stdout: got chunk size 0 to flush: ''
20911:23:51 acquiring buffering lock for flushing buffer
21011:23:51 got buffering lock for flushing buffer
21111:23:51 released buffering lock for flushing buffer
21211:23:51 got chunk size 0 to flush: ''
21311:23:51 <Command u'/usr/bin/git update-server-info', pid 4222>: process completed
21411:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>: starting process
21511:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: started process
21611:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info', pid 4226>: process started
21711:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: acquiring wait lock to wait for completion
21811:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: got wait lock
21911:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: exit code not set, waiting on pid
22011:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: <scap.sh.StreamReader object at 0x7f0b8125fc10> ready to be read from
22111:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: got chunk size 35: "Entering 'submodules/draftqual"
22211:23:51 acquiring buffering lock to process chunk (buffering: 1)
22311:23:51 got buffering lock to process chunk (buffering: 1)
22411:23:51 released buffering lock for processing chunk (buffering: 1)
22511:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: putting chunk onto pipe: "Entering 'submodules/draftqual"
22611:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: <scap.sh.StreamReader object at 0x7f0b8125fc10> ready to be read from
22711:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: got chunk size 34: "Entering 'submodules/editquali"
22811:23:51 acquiring buffering lock to process chunk (buffering: 1)
22911:23:51 got buffering lock to process chunk (buffering: 1)
23011:23:51 released buffering lock for processing chunk (buffering: 1)
23111:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: putting chunk onto pipe: "Entering 'submodules/editquali"
23211:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: <scap.sh.StreamReader object at 0x7f0b8125fc10> ready to be read from
23311:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: got chunk size 27: "Entering 'submodules/ores'\n"
23411:23:51 acquiring buffering lock to process chunk (buffering: 1)
23511:23:51 got buffering lock to process chunk (buffering: 1)
23611:23:51 released buffering lock for processing chunk (buffering: 1)
23711:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: putting chunk onto pipe: "Entering 'submodules/ores'\n"
23811:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: <scap.sh.StreamReader object at 0x7f0b8125fc10> ready to be read from
23911:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: got chunk size 29: "Entering 'submodules/wheels'\n"
24011:23:51 acquiring buffering lock to process chunk (buffering: 1)
24111:23:51 got buffering lock to process chunk (buffering: 1)
24211:23:51 released buffering lock for processing chunk (buffering: 1)
24311:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: putting chunk onto pipe: "Entering 'submodules/wheels'\n"
24411:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: <scap.sh.StreamReader object at 0x7f0b8125fc10> ready to be read from
24511:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: got chunk size 32: "Entering 'submodules/wikiclass"
24611:23:51 acquiring buffering lock to process chunk (buffering: 1)
24711:23:51 got buffering lock to process chunk (buffering: 1)
24811:23:51 released buffering lock for processing chunk (buffering: 1)
24911:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: putting chunk onto pipe: "Entering 'submodules/wikiclass"
25011:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: <scap.sh.StreamReader object at 0x7f0b8125fc10> ready to be read from
25111:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: got no chunk, done reading
25211:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: <scap.sh.StreamReader object at 0x7f0b8125ff90> ready to be read from
25311:23:51 got no chunk, done reading
25411:23:51 acquiring buffering lock for flushing buffer
25511:23:51 got buffering lock for flushing buffer
25611:23:51 released buffering lock for flushing buffer
25711:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: got chunk size 0 to flush: ''
25811:23:51 acquiring buffering lock for flushing buffer
25911:23:51 got buffering lock for flushing buffer
26011:23:51 released buffering lock for flushing buffer
26111:23:51 got chunk size 0 to flush: ''
26211:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info', pid 4226>: process completed
26311:23:51 Started deploy [ores/deploy@b67bba7]: (non-production) Update ORES on new cluster
26411:23:51
265== CLUSTER ==
266:* ores1001.eqiad.wmnet
26711:23:51 Running remote deploy cmd ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '--force', '-g', 'cluster', 'fetch', '--refresh-config']
26811:23:51 Using key: /etc/keyholder.d/deploy_service.pub
26911:23:53 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '--force', '-g', 'cluster', 'fetch', '--refresh-config'] on ores1001.eqiad.wmnet returned [70]: OpenSSH_6.7p1 Debian-5+deb8u3, OpenSSL 1.0.2m 2 Nov 2017
270debug1: Reading configuration data /dev/null
271debug1: Connecting to ores1001.eqiad.wmnet [10.64.0.51] port 22.
272debug1: Connection established.
273debug1: identity file /etc/keyholder.d/deploy_service.pub type 1
274debug1: key_load_public: No such file or directory
275debug1: identity file /etc/keyholder.d/deploy_service.pub-cert type -1
276debug1: Enabling compatibility mode for protocol 2.0
277debug1: Local version string SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3
278debug1: Remote protocol version 2.0, remote software version OpenSSH_6.7p1 Debian-5+deb8u3
279debug1: match: OpenSSH_6.7p1 Debian-5+deb8u3 pat OpenSSH* compat 0x04000000
280debug1: SSH2_MSG_KEXINIT sent
281debug1: SSH2_MSG_KEXINIT received
282debug1: kex: server->client aes128-ctr umac-128-etm@openssh.com none
283debug1: kex: client->server aes128-ctr umac-128-etm@openssh.com none
284debug1: sending SSH2_MSG_KEX_ECDH_INIT
285debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
286debug1: Server host key: ECDSA 18:71:ca:a4:ae:29:83:c6:46:55:43:56:7a:74:42:7c
287debug1: Host 'ores1001.eqiad.wmnet' is known and matches the ECDSA host key.
288debug1: Found key in /etc/ssh/ssh_known_hosts:1086
289debug1: SSH2_MSG_NEWKEYS sent
290debug1: expecting SSH2_MSG_NEWKEYS
291debug1: SSH2_MSG_NEWKEYS received
292debug1: SSH2_MSG_SERVICE_REQUEST sent
293debug1: SSH2_MSG_SERVICE_ACCEPT received
294debug1: Authentications that can continue: publickey,keyboard-interactive
295debug1: Next authentication method: publickey
296debug1: Offering RSA public key: /etc/keyholder.d/deploy_service.pub
297debug1: Server accepts key: pkalg ssh-rsa blen 279
298debug1: Authentication succeeded (publickey).
299Authenticated to ores1001.eqiad.wmnet ([10.64.0.51]:22).
300debug1: channel 0: new [client-session]
301debug1: Requesting no-more-sessions@openssh.com
302debug1: Entering interactive session.
303debug1: Sending command: /usr/bin/scap deploy-local -v --repo ores/deploy --force -g cluster fetch --refresh-config
304http://tin.eqiad.wmnet/ores/deploy/.git
305/srv/deployment/ores/deploy-cache/cache
306debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
307debug1: client_input_channel_req: channel 0 rtype eow@openssh.com reply 0
308debug1: channel 0: free: client-session, nchannels 1
309debug1: fd 1 clearing O_NONBLOCK
310Transferred: sent 3448, received 162856 bytes, in 2.0 seconds
311Bytes per second: sent 1707.3, received 80639.4
312debug1: Exit status 70
313
314ores/deploy: fetch stage(s): 100% (ok: 0; fail: 1; left: 0)
31511:23:53 1 targets had deploy errors
31611:23:53 1 targets failed
31711:23:53 1 of 1 cluster targets failed, exceeding limit

scap deploy-log -v reveals more:

11:27:10 [ores1001.eqiad.wmnet] Unhandled error:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/scap/cli.py", line 329, in run
    exit_status = app.main(app.extra_arguments)
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 146, in main
    getattr(self, stage)()
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 321, in fetch
    reference=self.context.cache_dir)
  File "/usr/lib/python2.7/dist-packages/scap/git.py", line 388, in update_submodules
    git.submodule(*cmd)
  File "/usr/lib/python2.7/dist-packages/scap/sh.py", line 1428, in __call__
    return RunningCommand(cmd, call_args, stdin, stdout, stderr)
  File "/usr/lib/python2.7/dist-packages/scap/sh.py", line 775, in __init__
    self.wait()
  File "/usr/lib/python2.7/dist-packages/scap/sh.py", line 793, in wait
    self.handle_command_exit_code(exit_code)
  File "/usr/lib/python2.7/dist-packages/scap/sh.py", line 816, in handle_command_exit_code
    raise exc
ErrorReturnCode_1: 

  RAN: /usr/bin/git submodule update --init --recursive --reference /srv/deployment/ores/deploy-cache/cache

  STDOUT:


  STDERR:
fatal: reference is not a tree: 15d5283b7422919d85203b5ba907027f9356e421
fatal: reference is not a tree: a3a50bb0b5aea50178e428430438ae3323d7376f
Unable to checkout '15d5283b7422919d85203b5ba907027f9356e421' in submodule path 'submodules/editquality'
Unable to checkout 'a3a50bb0b5aea50178e428430438ae3323d7376f' in submodule path 'submodules/ores'

Those revisions aren't in gerrit. I think the github -> gerrit mirroring broke when we were messing around with pointing to git-lfs repos?

The Phabricator control panels look happy,
https://phabricator.wikimedia.org/source/editquality/manage/uris/ shows that we're observing GitHub,
https://github.com/wiki-ai/editquality/commit/15d5283b7422919d85203b5ba907027f9356e421 that commit has been in GitHub for 3 days,
https://gerrit.wikimedia.org/r/#/q/15d5283b7422919d85203b5ba907027f9356e421 It's not in Gerrit yet,
https://phabricator.wikimedia.org/source/editquality/manage/basics/ "Last updated Thu, Dec 14, 7:07 AM (9 m, 19 s ago)."

Oh what. A clue,
https://phabricator.wikimedia.org/source/editquality/history/master/ <-- this does contain the revision.

Oops—we aren't expecting this repo to be mirrored to gerrit. So the surprise is that the revision exists in Phabricator but can be found by the deployment tool. Making a new task: T182865: Source revision is in Phabricator, but can't be found by deployment tools

@awight: yeah, I'm getting to the bottom of it now. The issue is that the commit does not exist on a local branch on tin, it's only in refs/remotes/origin/ which scap doesn't fetch automatically as the default refspec only fetches refs/heads/* and not refs/remotes/*

awight closed this task as Resolved.Dec 14 2017, 12:45 PM
awight claimed this task.

Using a workaround for T182865, where we go into submodules and checkout master before submodule updating from the ORES root, I was able to successfully parallel deploy to ores*, and in under a minute! It looks like the SSH timeout is resolved!!

I'll close this task for now, and will reopen if anything untoward rears its head again. Thanks everyone for the great work debugging!