Connection timeout from tin to new ores servers
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	awight
	Nov 29 2017, 6:09 PM

Description

I'm not sure if this is a scap or network thing:

scap deploy-log -f scap/log/scap-sync-2017-11-29-0003-1-g532bd0b.log

out20 KBDownload

Details

	Subject	Repo	Branch	Lines +/-
	scap: Remove git_rev setting	mediawiki/services/ores/deploy	master	+0 -1

Customize query in gerrit

Revisions and Commits

	Restricted Differential Revision
	Restricted Differential Revision

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T176324 Scoring platform team FY18 Q2
Declined	None	T179501 Use external dsh group to list pooled ORES nodes
Resolved	akosiaris	T168073 Switch ORES to dedicated cluster
Resolved	awight	T181661 Connection timeout from tin to new ores servers
Resolved	akosiaris	T182347 SCAP: Upload debian package version 3.7.4-3
Resolved	akosiaris	T183046 scap 3.7.4-2 is broken
Invalid	None	T182801 New, mysterious scap failure

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

In T181661#3808791, @akosiaris wrote:

So, scap closed the connection, 1m, 6s after the login. scap complained about deploy-local failing 50 secs after successful SSH login, but way before the SSH connection closing.

Interesting...

How do I get more verbosity, i.e. for ssh?

Hm, judging by https://phabricator.wikimedia.org/source/scap/browse/master/scap/ssh.py;0d2afa233f3ee489ddbfb9e7dbfaf467452e50db$41, it should probably require some changes in scap's code so the scap -v would propagate to SSH as well.

That sounds like a thoroughly sensible change. I'll create a patch now.

• mmodell added a revision: Restricted Differential Revision.Dec 4 2017, 9:34 PM

• mmodell mentioned this in rMSCA027e043d87ac: pass -v to ssh when running `scap deploy -v`.Dec 5 2017, 12:51 PM

Halfak edited projects, added Machine-Learning-Team; removed Machine-Learning-Team (Active Tasks).Dec 6 2017, 5:36 PM

Halfak moved this task from Unsorted to Backlog/Lift Wing on the Machine-Learning-Team board.

I'd like to push the latest scap code to production this week if I can get an opsen to upload the package. I'll create a new release tag today. That will at least help to debug this issue since -v will be passed on to ssh as suggested by @akosiaris above.

I can handle that. I 'll try and build it tomorrow and upload it if successful

awight removed a parent task: T169246: Stress/capacity test new ores* cluster.Dec 6 2017, 8:05 PM

awight added a parent task: T168073: Switch ORES to dedicated cluster.

• mmodell added a subtask: T182347: SCAP: Upload debian package version 3.7.4-3.Dec 9 2017, 3:52 AM

• mmodell mentioned this in T182347: SCAP: Upload debian package version 3.7.4-3.

This is affecting me in production, now:

Timeout, server scb2004.codfw.wmnet not responding.

16:01:39 connection to scb2004.codfw.wmnet failed and future stages will not be attempted for this target

akosiaris closed subtask T182347: SCAP: Upload debian package version 3.7.4-3 as Resolved.Dec 11 2017, 12:09 PM

@awight: can you try it now with the -v flag, scap should include the ssh logs in verbose output now.

Mentioned in SAL (#wikimedia-operations) [2017-12-11T19:17:56Z] <awight@tin> Started deploy [ores/deploy@1c0ede0]: (non-production) Testing parallel ORES deployment, T181661

Mentioned in SAL (#wikimedia-operations) [2017-12-11T19:19:07Z] <awight@tin> Finished deploy [ores/deploy@1c0ede0]: (non-production) Testing parallel ORES deployment, T181661 (duration: 01m 12s)

Done. Logs are in scap-sync-2017-12-09-0004.log (sic., note that the dates are still misleading)

Thanks for this feature, it looks helpful!

Something really strange is going on...

debug1: Offering RSA public key: /etc/keyholder.d/deploy_service.pub
debug1: Server accepts key: pkalg ssh-rsa blen 279
debug1: Authentication succeeded (publickey).
Authenticated to ores1004.eqiad.wmnet ([10.64.16.95]:22).
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: Sending command: /usr/bin/scap deploy-local -v --repo ores/deploy -g cluster fetch --refresh-config
http://tin.eqiad.wmnet/ores/deploy/.git
Timeout, server ores1004.eqiad.wmnet not responding.

19:18:57 [tin] connection to ores1004.eqiad.wmnet failed and future stages will not be attempted for this target

So it would appear that this deploy-local command freezes: /usr/bin/scap deploy-local -v --repo ores/deploy -g cluster fetch --refresh-config http://tin.eqiad.wmnet/ores/deploy/.git

@thcipriani, @akosiaris does that seem right to you or am I missing something?

I'd like to see the corresponding syslog on the target machine, there might be clues in ~~/var/log/auth.log~~ /var/log/daemon.log

Pardon me, but I have to ask why a file with timestamps in the log file dating Dec 11th, and with a local fs system timestamp (on tin) of December 11th as well is named 2017-12-09 giving the impression it was created on Dec 9th. I guess a bug ?

In any case. daemon.log is not really expected to hold anything of significant value for Dec 11th 19:00-19:30 but here goes

Dec 11 19:03:15 ores1004 puppet-agent[7736]: Retrieving pluginfacts
Dec 11 19:03:15 ores1004 puppet-agent[7736]: Retrieving plugin
Dec 11 19:03:15 ores1004 puppet-agent[7736]: Loading facts
Dec 11 19:03:27 ores1004 puppet-agent[7736]: Caching catalog for ores1004.eqiad.wmnet
Dec 11 19:03:28 ores1004 puppet-agent[7736]: Applying configuration version '1513018998'
Dec 11 19:03:30 ores1004 systemd[1]: Reloading.
Dec 11 19:03:30 ores1004 systemd[1]: Started ACPI event daemon.
Dec 11 19:03:30 ores1004 systemd[1]: Listening on ACPID Listen Socket.
Dec 11 19:03:30 ores1004 systemd[1]: Mounted /.
Dec 11 19:03:36 ores1004 puppet-agent[7736]: Finished catalog run in 8.37 seconds
Dec 11 19:22:44 ores1004 dbus[921]: [system] Activating via systemd: service name='org.freedesktop.timedate1' unit='dbus-org.freedesktop.timedate1.service'
Dec 11 19:22:44 ores1004 systemd[1]: Starting Time & Date Service...
Dec 11 19:22:44 ores1004 systemd[1]: Failed to reset devices.list on /system.slice: Invalid argument
Dec 11 19:22:44 ores1004 dbus[921]: [system] Successfully activated service 'org.freedesktop.timedate1'
Dec 11 19:22:44 ores1004 systemd-timedated[13695]: /etc/localtime should be a symbolic link to a time zone data file in /usr/share/zoneinfo/.
Dec 11 19:22:44 ores1004 systemd[1]: Started Time & Date Service.
Dec 11 19:33:20 ores1004 puppet-agent[16060]: Retrieving pluginfacts
Dec 11 19:33:20 ores1004 puppet-agent[16060]: Retrieving plugin
Dec 11 19:33:20 ores1004 puppet-agent[16060]: Loading facts
Dec 11 19:33:28 ores1004 puppet-agent[16060]: Caching catalog for ores1004.eqiad.wmnet
Dec 11 19:33:30 ores1004 puppet-agent[16060]: Applying configuration version '1513020803'
Dec 11 19:33:31 ores1004 systemd[1]: Reloading.
Dec 11 19:33:31 ores1004 systemd[1]: Started ACPI event daemon.
Dec 11 19:33:31 ores1004 systemd[1]: Listening on ACPID Listen Socket.
Dec 11 19:33:31 ores1004 systemd[1]: Mounted /.
Dec 11 19:33:37 ores1004 puppet-agent[16060]: Finished catalog run in 7.96 seconds
Dec 11 19:52:44 ores1004 dbus[921]: [system] Activating via systemd: service name='org.freedesktop.timedate1' unit='dbus-org.freedesktop.timedate1.service'
Dec 11 19:52:44 ores1004 systemd[1]: Starting Time & Date Service...
Dec 11 19:52:44 ores1004 systemd[1]: Failed to reset devices.list on /system.slice: Invalid argument
Dec 11 19:52:44 ores1004 dbus[921]: [system] Successfully activated service 'org.freedesktop.timedate1'
Dec 11 19:52:44 ores1004 systemd-timedated[20231]: /etc/localtime should be a symbolic link to a time zone data file in /usr/share/zoneinfo/.
Dec 11 19:52:44 ores1004 systemd[1]: Started Time & Date Service.
Dec 11 20:04:05 ores1004 puppet-agent[22822]: Retrieving pluginfacts

auth.log is just a tad more useful, just a repeat of what we 've seen before

Dec 11 19:17:50 ores1004 sshd[11976]: Set /proc/self/oom_score_adj to 0
Dec 11 19:17:50 ores1004 sshd[11976]: Connection from 208.80.155.119 port 46296 on 10.64.16.95 port 22
Dec 11 19:17:50 ores1004 sshd[11976]: Connection closed by 208.80.155.119 [preauth]
Dec 11 19:17:55 ores1004 sshd[11978]: Set /proc/self/oom_score_adj to 0
Dec 11 19:17:55 ores1004 sshd[11978]: Connection from 208.80.153.74 port 45262 on 10.64.16.95 port 22
Dec 11 19:17:55 ores1004 sshd[11978]: Connection closed by 208.80.153.74 [preauth]
Dec 11 19:17:56 ores1004 sshd[11980]: Set /proc/self/oom_score_adj to 0
Dec 11 19:17:56 ores1004 sshd[11980]: Connection from 10.64.0.196 port 57196 on 10.64.16.95 port 22
Dec 11 19:17:56 ores1004 sshd[11980]: Postponed publickey for deploy-service from 10.64.0.196 port 57196 ssh2 [preauth]
Dec 11 19:17:56 ores1004 sshd[11980]: Accepted publickey for deploy-service from 10.64.0.196 port 57196 ssh2: RSA 6d:54:92:8b:39:10:f5:9b:84:40:36:ef:3c:9a:6d:d8
Dec 11 19:17:56 ores1004 sshd[11980]: pam_unix(sshd:session): session opened for user deploy-service by (uid=0)
Dec 11 19:17:56 ores1004 sshd[11980]: User child is on pid 11995
Dec 11 19:17:56 ores1004 sshd[11995]: Starting session: command for deploy-service from 10.64.0.196 port 57196
Dec 11 19:18:01 ores1004 sudo:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib/nagios/plugins/check_ferm
Dec 11 19:18:01 ores1004 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Dec 11 19:18:01 ores1004 sudo: pam_unix(sudo:session): session closed for user root
Dec 11 19:18:01 ores1004 CRON[12265]: pam_unix(cron:session): session opened for user prometheus by (uid=0)
Dec 11 19:18:01 ores1004 CRON[12265]: pam_unix(cron:session): session closed for user prometheus
Dec 11 19:18:04 ores1004 sudo:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/lib/nagios/plugins/check_raid md
Dec 11 19:18:04 ores1004 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Dec 11 19:18:04 ores1004 sudo: pam_unix(sudo:session): session closed for user root
Dec 11 19:18:06 ores1004 sudo:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib/nagios/plugins/check_ferm
Dec 11 19:18:06 ores1004 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Dec 11 19:18:06 ores1004 sudo: pam_unix(sudo:session): session closed for user root
Dec 11 19:18:17 ores1004 sudo:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/lib/nagios/plugins/check_raid md
Dec 11 19:18:17 ores1004 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Dec 11 19:18:17 ores1004 sudo: pam_unix(sudo:session): session closed for user root
Dec 11 19:18:50 ores1004 sshd[12515]: Set /proc/self/oom_score_adj to 0
Dec 11 19:18:50 ores1004 sshd[12515]: Connection from 208.80.155.119 port 58488 on 10.64.16.95 port 22
Dec 11 19:18:50 ores1004 sshd[12515]: Connection closed by 208.80.155.119 [preauth]
Dec 11 19:18:55 ores1004 sshd[12517]: Set /proc/self/oom_score_adj to 0
Dec 11 19:18:55 ores1004 sshd[12517]: Connection from 208.80.153.74 port 57728 on 10.64.16.95 port 22
Dec 11 19:18:55 ores1004 sshd[12517]: Connection closed by 208.80.153.74 [preauth]
Dec 11 19:19:01 ores1004 sudo:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib/nagios/plugins/check_ferm
Dec 11 19:19:01 ores1004 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Dec 11 19:19:01 ores1004 sudo: pam_unix(sudo:session): session closed for user root
Dec 11 19:19:01 ores1004 CRON[12550]: pam_unix(cron:session): session opened for user prometheus by (uid=0)
Dec 11 19:19:02 ores1004 CRON[12550]: pam_unix(cron:session): session closed for user prometheus
Dec 11 19:19:04 ores1004 sudo:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/lib/nagios/plugins/check_raid md
Dec 11 19:19:04 ores1004 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Dec 11 19:19:04 ores1004 sudo: pam_unix(sudo:session): session closed for user root
Dec 11 19:19:06 ores1004 sudo:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib/nagios/plugins/check_ferm
Dec 11 19:19:06 ores1004 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Dec 11 19:19:06 ores1004 sudo: pam_unix(sudo:session): session closed for user root
Dec 11 19:19:13 ores1004 sshd[11995]: Connection closed by 10.64.0.196
Dec 11 19:19:13 ores1004 sshd[11995]: Transferred: sent 76296, received 3484 bytes
Dec 11 19:19:13 ores1004 sshd[11995]: Closing connection to 10.64.0.196 port 57196
Dec 11 19:19:13 ores1004 sshd[11980]: pam_unix(sshd:session): session closed for user deploy-service

Since syslog was mentioned, here it is as well

Dec 11 19:00:01 ores1004 CRON[7031]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:01:01 ores1004 CRON[7244]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:02:01 ores1004 CRON[7464]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:03:01 ores1004 CRON[7690]: (root) CMD (/usr/local/sbin/puppet-run > /dev/null 2>&1)
Dec 11 19:03:01 ores1004 CRON[7691]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:03:15 ores1004 puppet-agent[7736]: Retrieving pluginfacts
Dec 11 19:03:15 ores1004 puppet-agent[7736]: Retrieving plugin
Dec 11 19:03:15 ores1004 puppet-agent[7736]: Loading facts
Dec 11 19:03:27 ores1004 puppet-agent[7736]: Caching catalog for ores1004.eqiad.wmnet
Dec 11 19:03:28 ores1004 puppet-agent[7736]: Applying configuration version '1513018998'
Dec 11 19:03:30 ores1004 systemd[1]: Reloading.
Dec 11 19:03:30 ores1004 systemd[1]: Started ACPI event daemon.
Dec 11 19:03:30 ores1004 systemd[1]: Listening on ACPID Listen Socket.
Dec 11 19:03:30 ores1004 systemd[1]: Mounted /.
Dec 11 19:03:32 ores1004 crontab[8226]: (root) LIST (root)
Dec 11 19:03:32 ores1004 crontab[8228]: (root) LIST (prometheus)
Dec 11 19:03:36 ores1004 puppet-agent[7736]: Finished catalog run in 8.37 seconds
Dec 11 19:04:01 ores1004 CRON[8868]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:05:01 ores1004 CRON[9089]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 11 19:05:01 ores1004 CRON[9090]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:06:01 ores1004 CRON[9302]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:07:01 ores1004 CRON[9530]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:08:01 ores1004 CRON[9762]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:09:01 ores1004 CRON[9975]: (root) CMD (  [ -x /usr/lib/php5/sessionclean ] && /usr/lib/php5/sessionclean)
Dec 11 19:09:01 ores1004 CRON[9976]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:10:01 ores1004 CRON[10222]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:11:01 ores1004 CRON[10433]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:12:02 ores1004 CRON[10660]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:13:01 ores1004 CRON[10888]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:14:01 ores1004 CRON[11109]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:15:01 ores1004 CRON[11329]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 11 19:15:01 ores1004 CRON[11330]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:16:01 ores1004 CRON[11553]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:17:01 ores1004 CRON[11782]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Dec 11 19:17:01 ores1004 CRON[11783]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:18:01 ores1004 CRON[12266]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:19:01 ores1004 CRON[12551]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:20:01 ores1004 CRON[12765]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:21:01 ores1004 CRON[13215]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:22:01 ores1004 CRON[13522]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:22:44 ores1004 dbus[921]: [system] Activating via systemd: service name='org.freedesktop.timedate1' unit='dbus-org.freedesktop.timedate1.service'
Dec 11 19:22:44 ores1004 systemd[1]: Starting Time & Date Service...
Dec 11 19:22:44 ores1004 systemd[1]: Failed to reset devices.list on /system.slice: Invalid argument
Dec 11 19:22:44 ores1004 dbus[921]: [system] Successfully activated service 'org.freedesktop.timedate1'
Dec 11 19:22:44 ores1004 systemd-timedated[13695]: /etc/localtime should be a symbolic link to a time zone data file in /usr/share/zoneinfo/.
Dec 11 19:22:44 ores1004 systemd[1]: Started Time & Date Service.
Dec 11 19:23:01 ores1004 CRON[13781]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:24:01 ores1004 CRON[14005]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:25:01 ores1004 CRON[14210]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 11 19:25:01 ores1004 CRON[14211]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:26:01 ores1004 CRON[14431]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:27:01 ores1004 CRON[14657]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:28:01 ores1004 CRON[14900]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:29:01 ores1004 CRON[15121]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:30:01 ores1004 CRON[15341]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:31:01 ores1004 CRON[15560]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:32:01 ores1004 CRON[15786]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:32:17 ores1004 systemd-timesyncd[791]: interval/delta/delay/jitter/drift 2048s/+0.000s/0.000s/0.000s/+8ppm
Dec 11 19:33:01 ores1004 CRON[15999]: (root) CMD (/usr/local/sbin/puppet-run > /dev/null 2>&1)
Dec 11 19:33:01 ores1004 CRON[16004]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Dec 11 19:33:20 ores1004 puppet-agent[16060]: Retrieving pluginfacts
Dec 11 19:33:20 ores1004 puppet-agent[16060]: Retrieving plugin
Dec 11 19:33:20 ores1004 puppet-agent[16060]: Loading facts
Dec 11 19:33:28 ores1004 puppet-agent[16060]: Caching catalog for ores1004.eqiad.wmnet
Dec 11 19:33:30 ores1004 puppet-agent[16060]: Applying configuration version '1513020803'
Dec 11 19:33:31 ores1004 systemd[1]: Reloading.
Dec 11 19:33:31 ores1004 systemd[1]: Started ACPI event daemon.
Dec 11 19:33:31 ores1004 systemd[1]: Listening on ACPID Listen Socket.
Dec 11 19:33:31 ores1004 systemd[1]: Mounted /.
Dec 11 19:33:33 ores1004 crontab[16520]: (root) LIST (root)
Dec 11 19:33:33 ores1004 crontab[16522]: (root) LIST (prometheus)
Dec 11 19:33:37 ores1004 puppet-agent[16060]: Finished catalog run in 7.96 seconds
Dec 11 19:34:01 ores1004 CRON[17085]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)

Now for the more interesting stuff. I 've tried to run /usr/bin/scap deploy-local -v --repo ores/deploy -g cluster fetch --refresh-config manually as deploy-service on ores1004.eqiad.wmnet and it croaks with a stacktrace

{"name": "deploy-local", "created": 1513091889.277656, "args": [], "msecs": 277.65607833862305, "filename": "cli.py", "levelno": 30, "msg": "Unhandled error:", "lineno": 230, "exc_text": "Traceback (most recent call last):\n  File \"/usr/lib/python2.7/dist-packages/scap/cli.py\", line 329, in run\n    exit_status = app.main(app.extra_arguments)\n  File \"/usr/lib/python2.7/dist-packages/scap/deploy.py\", line 146, in main\n    getattr(self, stage)()\n  File \"/usr/lib/python2.7/dist-packages/scap/deploy.py\", line 316, in fetch\n    git.checkout(rev_dir, self.rev)\n  File \"/usr/lib/python2.7/dist-packages/scap/git.py\", line 349, in checkout\n    git.checkout('--force', '--quiet', rev)\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 1428, in __call__\n    return RunningCommand(cmd, call_args, stdin, stdout, stderr)\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 775, in __init__\n    self.wait()\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 793, in wait\n    self.handle_command_exit_code(exit_code)\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 816, in handle_command_exit_code\n    raise exc\nErrorReturnCode_128: \n\n  RAN: /usr/bin/git checkout --force --quiet b4f2b0249a60754cfa4e18650fdd45c62ef4a09a\n\n  STDOUT:\n\n\n  STDERR:\nfatal: reference is not a tree: b4f2b0249a60754cfa4e18650fdd45c62ef4a09a\n", "funcName": "_handle_exception", "relativeCreated": 1288.093090057373}
{"name": "deploy-local", "created": 1513091889.280495, "args": ["deploy-local", "ErrorReturnCode_128", {"full_cmd": "/usr/bin/git checkout --force --quiet b4f2b0249a60754cfa4e18650fdd45c62ef4a09a", "stderr": "fatal: reference is not a tree: b4f2b0249a60754cfa4e18650fdd45c62ef4a09a\n", "stdout": ""}], "msecs": 280.49492835998535, "filename": "cli.py", "levelno": 40, "msg": "%s failed: <%s> %s", "lineno": 232, "exc_text": null, "funcName": "_handle_exception", "relativeCreated": 1290.9319400787354}

simply put b4f2b0249a60754cfa4e18650fdd45c62ef4a09a does not exist in whichever repo scap tries to find it at. For some more information

deploy-service@ores1004:/srv/deployment/ores/deploy-cache$ ls -la . revs/
.:
total 20
drwxr-xr-x  4 deploy-service deploy-service 4096 Dec  1 14:24 .
drwxr-xr-x  4 deploy-service root           4096 Dec  1 14:23 ..
-rw-r--r--  1 deploy-service deploy-service 1851 Dec 12 15:18 .config
lrwxrwxrwx  1 deploy-service deploy-service   45 Dec  1 14:24 .done -> revs/532bd0bc6e923b8594aba71295e8c6db9bbc5f8e
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec 12 15:18 cache
lrwxrwxrwx  1 deploy-service deploy-service   45 Dec  1 14:22 current -> revs/532bd0bc6e923b8594aba71295e8c6db9bbc5f8e
drwxr-xr-x 10 deploy-service deploy-service 4096 Dec 11 19:21 revs

revs/:
total 40
drwxr-xr-x 10 deploy-service deploy-service 4096 Dec 11 19:21 .
drwxr-xr-x  4 deploy-service deploy-service 4096 Dec  1 14:24 ..
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov  6 14:40 29905e509984050a2ea675725f3c110033fc67a0
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov 29 18:04 532bd0bc6e923b8594aba71295e8c6db9bbc5f8e
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec  5 15:53 6baed71ef9a02264660f8637fb2313c9c560f71d
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov  6 17:33 82a13ae1173ec570ea563f389ab22e1a69aa154a
drwxr-xr-x  6 deploy-service deploy-service 4096 Oct 26 16:43 971be2294a51772d309c967465b32f4f9a9b66f5
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec  1 20:18 9afbf148ec3007a6787ab099978bf5a08360879d
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec 11 19:21 b4f2b0249a60754cfa4e18650fdd45c62ef4a09a
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov 29 13:17 e58bfbf2827be1ac453bc3d07dfe7fe41a08dc66

So the object from a first look does exist in the deploy-cache. However...

deploy-service@ores1004:/srv/deployment/ores/deploy-cache/revs/b4f2b0249a60754cfa4e18650fdd45c62ef4a09a$ git show 
commit 82dfd5652c1550dbee362100204d7aa5242e59ab
Author: halfak <aaron.halfaker@gmail.com>
Date:   Mon Jun 26 17:15:23 2017 -0500

    Bumps submodules for editquality/ores/wheels (1.3.15 update)
    
    Change-Id: I72eabc15cf84ca1abc38b1f19883eb229d2bf778

OK, what's going on here ?

In T181661#3831316, @akosiaris wrote:

Pardon me, but I have to ask why a file with timestamps in the log file dating Dec 11th, and with a local fs system timestamp (on tin) of December 11th as well is named 2017-12-09 giving the impression it was created on Dec 9th. I guess a bug ?

This is a known bug, T181171: scap logs sometimes have incorrect date. The date is derived from the git commit being deployed, rather than the current system date.

In T181661#3831321, @awight wrote:

In T181661#3831316, @akosiaris wrote:

Pardon me, but I have to ask why a file with timestamps in the log file dating Dec 11th, and with a local fs system timestamp (on tin) of December 11th as well is named 2017-12-09 giving the impression it was created on Dec 9th. I guess a bug ?

This is a known bug, T181171: scap logs sometimes have incorrect date. The date is derived from the git commit being deployed, rather than the current system date.

Aha. That's one piece of information I was missing. Thanks!

Mentioned in SAL (#wikimedia-operations) [2017-12-12T15:58:58Z] <akosiaris@tin> Started deploy [ores/deploy@b4f2b02]: T181661

Mentioned in SAL (#wikimedia-operations) [2017-12-12T15:59:08Z] <akosiaris@tin> Finished deploy [ores/deploy@b4f2b02]: T181661 (duration: 00m 09s)

Mentioned in SAL (#wikimedia-operations) [2017-12-12T15:59:19Z] <akosiaris@tin> Started deploy [ores/deploy@b4f2b02]: T181661

Mentioned in SAL (#wikimedia-operations) [2017-12-12T15:59:28Z] <akosiaris@tin> Finished deploy [ores/deploy@b4f2b02]: T181661 (duration: 00m 04s)

In T181661#3831316, @akosiaris wrote:

{"name": "deploy-local", "created": 1513091889.277656, "args": [], "msecs": 277.65607833862305, "filename": "cli.py", "levelno": 30, "msg": "Unhandled error:", "lineno": 230, "exc_text": "Traceback (most recent call last):\n  File \"/usr/lib/python2.7/dist-packages/scap/cli.py\", line 329, in run\n    exit_status = app.main(app.extra_arguments)\n  File \"/usr/lib/python2.7/dist-packages/scap/deploy.py\", line 146, in main\n    getattr(self, stage)()\n  File \"/usr/lib/python2.7/dist-packages/scap/deploy.py\", line 316, in fetch\n    git.checkout(rev_dir, self.rev)\n  File \"/usr/lib/python2.7/dist-packages/scap/git.py\", line 349, in checkout\n    git.checkout('--force', '--quiet', rev)\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 1428, in __call__\n    return RunningCommand(cmd, call_args, stdin, stdout, stderr)\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 775, in __init__\n    self.wait()\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 793, in wait\n    self.handle_command_exit_code(exit_code)\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 816, in handle_command_exit_code\n    raise exc\nErrorReturnCode_128: \n\n  RAN: /usr/bin/git checkout --force --quiet b4f2b0249a60754cfa4e18650fdd45c62ef4a09a\n\n  STDOUT:\n\n\n  STDERR:\nfatal: reference is not a tree: b4f2b0249a60754cfa4e18650fdd45c62ef4a09a\n", "funcName": "_handle_exception", "relativeCreated": 1288.093090057373}
{"name": "deploy-local", "created": 1513091889.280495, "args": ["deploy-local", "ErrorReturnCode_128", {"full_cmd": "/usr/bin/git checkout --force --quiet b4f2b0249a60754cfa4e18650fdd45c62ef4a09a", "stderr": "fatal: reference is not a tree: b4f2b0249a60754cfa4e18650fdd45c62ef4a09a\n", "stdout": ""}], "msecs": 280.49492835998535, "filename": "cli.py", "levelno": 40, "msg": "%s failed: <%s> %s", "lineno": 232, "exc_text": null, "funcName": "_handle_exception", "relativeCreated": 1290.9319400787354}

FWIW, if you pass -Dlog_json:False you get non-json output that's easier to read: /usr/bin/scap deploy-local -v -Dlog_json:False --repo ores/deploy -g cluster fetch --refresh-config

simply put b4f2b0249a60754cfa4e18650fdd45c62ef4a09a does not exist in whichever repo scap tries to find it at. For some more information

deploy-service@ores1004:/srv/deployment/ores/deploy-cache$ ls -la . revs/
.:
total 20
drwxr-xr-x  4 deploy-service deploy-service 4096 Dec  1 14:24 .
drwxr-xr-x  4 deploy-service root           4096 Dec  1 14:23 ..
-rw-r--r--  1 deploy-service deploy-service 1851 Dec 12 15:18 .config
lrwxrwxrwx  1 deploy-service deploy-service   45 Dec  1 14:24 .done -> revs/532bd0bc6e923b8594aba71295e8c6db9bbc5f8e
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec 12 15:18 cache
lrwxrwxrwx  1 deploy-service deploy-service   45 Dec  1 14:22 current -> revs/532bd0bc6e923b8594aba71295e8c6db9bbc5f8e
drwxr-xr-x 10 deploy-service deploy-service 4096 Dec 11 19:21 revs

revs/:
total 40
drwxr-xr-x 10 deploy-service deploy-service 4096 Dec 11 19:21 .
drwxr-xr-x  4 deploy-service deploy-service 4096 Dec  1 14:24 ..
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov  6 14:40 29905e509984050a2ea675725f3c110033fc67a0
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov 29 18:04 532bd0bc6e923b8594aba71295e8c6db9bbc5f8e
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec  5 15:53 6baed71ef9a02264660f8637fb2313c9c560f71d
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov  6 17:33 82a13ae1173ec570ea563f389ab22e1a69aa154a
drwxr-xr-x  6 deploy-service deploy-service 4096 Oct 26 16:43 971be2294a51772d309c967465b32f4f9a9b66f5
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec  1 20:18 9afbf148ec3007a6787ab099978bf5a08360879d
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec 11 19:21 b4f2b0249a60754cfa4e18650fdd45c62ef4a09a
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov 29 13:17 e58bfbf2827be1ac453bc3d07dfe7fe41a08dc66

So the object from a first look does exist in the deploy-cache. However...

deploy-service@ores1004:/srv/deployment/ores/deploy-cache/revs/b4f2b0249a60754cfa4e18650fdd45c62ef4a09a$ git show 
commit 82dfd5652c1550dbee362100204d7aa5242e59ab
Author: halfak <aaron.halfaker@gmail.com>
Date:   Mon Jun 26 17:15:23 2017 -0500

    Bumps submodules for editquality/ores/wheels (1.3.15 update)
    
    Change-Id: I72eabc15cf84ca1abc38b1f19883eb229d2bf778

OK, what's going on here ?

This is a bit of scap internals. The origin remote of anything in the revs dirs is the deploy-cache/cache dir. The object is likely not there because the directory exists (somehow) and you didn't pass --force see:

https://github.com/wikimedia/scap/blob/master/scap/deploy.py#L289-L295

Mentioned in SAL (#wikimedia-operations) [2017-12-12T16:22:38Z] <akosiaris@tin> Started deploy [ores/deploy@b4f2b02]: T181661

akosiaris@tin:/srv/deployment/ores/deploy$ scap deploy -v -l 'ores1004.eqiad.wmnet' T181661

fails reproducibly.

Log is

16:22:37 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info', pid 33404>: process completed
16:22:37 Started deploy [ores/deploy@b4f2b02]: T181661
16:22:37 
== CLUSTER ==
:* ores1004.eqiad.wmnet
16:22:37 Running remote deploy cmd ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '-g', 'cluster', 'fetch', '--refresh-config']
16:22:37 Using key: /etc/keyholder.d/deploy_service.pub
16:22:39 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '-g', 'cluster', 'fetch', '--refresh-config'] on ores1004.eqiad.wmnet returned [70]: OpenSSH_6.7p1 Debian-5+deb8u3, OpenSSL 1.0.2m  2 Nov 2017
debug1: Reading configuration data /dev/null
debug1: Connecting to ores1004.eqiad.wmnet [10.64.16.95] port 22.
debug1: Connection established.
debug1: identity file /etc/keyholder.d/deploy_service.pub type 1
debug1: key_load_public: No such file or directory
debug1: identity file /etc/keyholder.d/deploy_service.pub-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3
debug1: Remote protocol version 2.0, remote software version OpenSSH_6.7p1 Debian-5+deb8u3
debug1: match: OpenSSH_6.7p1 Debian-5+deb8u3 pat OpenSSH* compat 0x04000000
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: server->client aes128-ctr umac-128-etm@openssh.com none
debug1: kex: client->server aes128-ctr umac-128-etm@openssh.com none
debug1: sending SSH2_MSG_KEX_ECDH_INIT
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ECDSA 5d:bd:36:84:e8:2a:57:85:03:ef:80:cc:3e:fc:fb:6f
debug1: Host 'ores1004.eqiad.wmnet' is known and matches the ECDSA host key.
debug1: Found key in /etc/ssh/ssh_known_hosts:1116
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,keyboard-interactive
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /etc/keyholder.d/deploy_service.pub
debug1: Server accepts key: pkalg ssh-rsa blen 279
debug1: Authentication succeeded (publickey).
Authenticated to ores1004.eqiad.wmnet ([10.64.16.95]:22).
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: Sending command: /usr/bin/scap deploy-local -v --repo ores/deploy -g cluster fetch --refresh-config
http://tin.eqiad.wmnet/ores/deploy/.git
/srv/deployment/ores/deploy-cache/cache
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug1: client_input_channel_req: channel 0 rtype eow@openssh.com reply 0
debug1: channel 0: free: client-session, nchannels 1
debug1: fd 1 clearing O_NONBLOCK
Transferred: sent 3432, received 119092 bytes, in 1.4 seconds
Bytes per second: sent 2459.9, received 85360.0
debug1: Exit status 70

This correlates with trying to run scap deploy-local manually.

@mmodell Ideas on how I can debug this further ?

Mentioned in SAL (#wikimedia-operations) [2017-12-12T17:03:29Z] <akosiaris@tin> Started deploy [ores/deploy@b4f2b02]: T181661

Mentioned in SAL (#wikimedia-operations) [2017-12-12T17:11:14Z] <akosiaris@tin> Started deploy [ores/deploy@b4f2b02]: T181661

Mentioned in SAL (#wikimedia-operations) [2017-12-12T17:11:50Z] <akosiaris@tin> Finished deploy [ores/deploy@b4f2b02]: T181661 (duration: 00m 36s)

Mentioned in SAL (#wikimedia-operations) [2017-12-12T17:12:00Z] <akosiaris@tin> Started deploy [ores/deploy@b4f2b02]: T181661

Mentioned in SAL (#wikimedia-operations) [2017-12-12T17:12:10Z] <akosiaris@tin> Finished deploy [ores/deploy@b4f2b02]: T181661 (duration: 00m 03s)

In T181661#3831411, @thcipriani wrote:

In T181661#3831316, @akosiaris wrote:

{"name": "deploy-local", "created": 1513091889.277656, "args": [], "msecs": 277.65607833862305, "filename": "cli.py", "levelno": 30, "msg": "Unhandled error:", "lineno": 230, "exc_text": "Traceback (most recent call last):\n  File \"/usr/lib/python2.7/dist-packages/scap/cli.py\", line 329, in run\n    exit_status = app.main(app.extra_arguments)\n  File \"/usr/lib/python2.7/dist-packages/scap/deploy.py\", line 146, in main\n    getattr(self, stage)()\n  File \"/usr/lib/python2.7/dist-packages/scap/deploy.py\", line 316, in fetch\n    git.checkout(rev_dir, self.rev)\n  File \"/usr/lib/python2.7/dist-packages/scap/git.py\", line 349, in checkout\n    git.checkout('--force', '--quiet', rev)\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 1428, in __call__\n    return RunningCommand(cmd, call_args, stdin, stdout, stderr)\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 775, in __init__\n    self.wait()\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 793, in wait\n    self.handle_command_exit_code(exit_code)\n  File \"/usr/lib/python2.7/dist-packages/scap/sh.py\", line 816, in handle_command_exit_code\n    raise exc\nErrorReturnCode_128: \n\n  RAN: /usr/bin/git checkout --force --quiet b4f2b0249a60754cfa4e18650fdd45c62ef4a09a\n\n  STDOUT:\n\n\n  STDERR:\nfatal: reference is not a tree: b4f2b0249a60754cfa4e18650fdd45c62ef4a09a\n", "funcName": "_handle_exception", "relativeCreated": 1288.093090057373}
{"name": "deploy-local", "created": 1513091889.280495, "args": ["deploy-local", "ErrorReturnCode_128", {"full_cmd": "/usr/bin/git checkout --force --quiet b4f2b0249a60754cfa4e18650fdd45c62ef4a09a", "stderr": "fatal: reference is not a tree: b4f2b0249a60754cfa4e18650fdd45c62ef4a09a\n", "stdout": ""}], "msecs": 280.49492835998535, "filename": "cli.py", "levelno": 40, "msg": "%s failed: <%s> %s", "lineno": 232, "exc_text": null, "funcName": "_handle_exception", "relativeCreated": 1290.9319400787354}

FWIW, if you pass -Dlog_json:False you get non-json output that's easier to read: /usr/bin/scap deploy-local -v -Dlog_json:False --repo ores/deploy -g cluster fetch --refresh-config

Ah indeed, that's more useful.

simply put b4f2b0249a60754cfa4e18650fdd45c62ef4a09a does not exist in whichever repo scap tries to find it at. For some more information

deploy-service@ores1004:/srv/deployment/ores/deploy-cache$ ls -la . revs/
.:
total 20
drwxr-xr-x  4 deploy-service deploy-service 4096 Dec  1 14:24 .
drwxr-xr-x  4 deploy-service root           4096 Dec  1 14:23 ..
-rw-r--r--  1 deploy-service deploy-service 1851 Dec 12 15:18 .config
lrwxrwxrwx  1 deploy-service deploy-service   45 Dec  1 14:24 .done -> revs/532bd0bc6e923b8594aba71295e8c6db9bbc5f8e
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec 12 15:18 cache
lrwxrwxrwx  1 deploy-service deploy-service   45 Dec  1 14:22 current -> revs/532bd0bc6e923b8594aba71295e8c6db9bbc5f8e
drwxr-xr-x 10 deploy-service deploy-service 4096 Dec 11 19:21 revs

revs/:
total 40
drwxr-xr-x 10 deploy-service deploy-service 4096 Dec 11 19:21 .
drwxr-xr-x  4 deploy-service deploy-service 4096 Dec  1 14:24 ..
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov  6 14:40 29905e509984050a2ea675725f3c110033fc67a0
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov 29 18:04 532bd0bc6e923b8594aba71295e8c6db9bbc5f8e
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec  5 15:53 6baed71ef9a02264660f8637fb2313c9c560f71d
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov  6 17:33 82a13ae1173ec570ea563f389ab22e1a69aa154a
drwxr-xr-x  6 deploy-service deploy-service 4096 Oct 26 16:43 971be2294a51772d309c967465b32f4f9a9b66f5
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec  1 20:18 9afbf148ec3007a6787ab099978bf5a08360879d
drwxr-xr-x  6 deploy-service deploy-service 4096 Dec 11 19:21 b4f2b0249a60754cfa4e18650fdd45c62ef4a09a
drwxr-xr-x  6 deploy-service deploy-service 4096 Nov 29 13:17 e58bfbf2827be1ac453bc3d07dfe7fe41a08dc66

So the object from a first look does exist in the deploy-cache. However...

deploy-service@ores1004:/srv/deployment/ores/deploy-cache/revs/b4f2b0249a60754cfa4e18650fdd45c62ef4a09a$ git show 
commit 82dfd5652c1550dbee362100204d7aa5242e59ab
Author: halfak <aaron.halfaker@gmail.com>
Date:   Mon Jun 26 17:15:23 2017 -0500

    Bumps submodules for editquality/ores/wheels (1.3.15 update)
    
    Change-Id: I72eabc15cf84ca1abc38b1f19883eb229d2bf778

OK, what's going on here ?

https://github.com/wikimedia/scap/blob/master/scap/deploy.py#L289-L295

--force did not solve the issue unfortunately :(

Moving the deploy-cache/cache directory aside did solve the issue (partially?) and moved on until...

7:18:27 acquiring buffering lock for flushing buffer
17:18:27 got buffering lock for flushing buffer
17:18:27 released buffering lock for flushing buffer
17:18:27 <Command u'/usr/bin/git submodule update --init --recursive --reference /srv/deployment/ores/deploy-cache/cache'>.<Process 3440 ['/usr/bin/git', 'submodule', 'update', '--init', '--recursive', '--reference', '/srv/deployment/ores/deploy-cache/cache']>.stdout: got chunk size 0 to flush: ''
17:18:27 acquiring buffering lock for flushing buffer
17:18:27 got buffering lock for flushing buffer
17:18:27 released buffering lock for flushing buffer
17:18:27 got chunk size 0 to flush: ''
17:18:27 Unhandled error:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/scap/cli.py", line 329, in run
    exit_status = app.main(app.extra_arguments)
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 146, in main
    getattr(self, stage)()
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 321, in fetch
    reference=self.context.cache_dir)
  File "/usr/lib/python2.7/dist-packages/scap/git.py", line 388, in update_submodules
    git.submodule(*cmd)
  File "/usr/lib/python2.7/dist-packages/scap/sh.py", line 1428, in __call__
    return RunningCommand(cmd, call_args, stdin, stdout, stderr)
  File "/usr/lib/python2.7/dist-packages/scap/sh.py", line 775, in __init__
    self.wait()
  File "/usr/lib/python2.7/dist-packages/scap/sh.py", line 793, in wait
    self.handle_command_exit_code(exit_code)
  File "/usr/lib/python2.7/dist-packages/scap/sh.py", line 816, in handle_command_exit_code
    raise exc
ErrorReturnCode_1: 

  RAN: /usr/bin/git submodule update --init --recursive --reference /srv/deployment/ores/deploy-cache/cache

  STDOUT:
Submodule 'submodules/draftquality' (http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/draftquality) registered for path 'submodules/draftquality'
Submodule 'submodules/editquality' (http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/editquality) registered for path 'submodules/editquality'
Submodule 'submodules/ores' (http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/ores) registered for path 'submodules/ores'
Submodule 'submodules/wheels' (http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/wheels) registered for path 'submodules/wheels'
Submodule 'submodules/wikiclass' (http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/wikiclass) registered for path 'submodules/wikiclass'
Submodule path 'submodul... (258 more, please see e.stdout)

  STDERR:
Cloning into 'submodules/draftquality'...
Cloning into 'submodules/editquality'...
fatal: reference is not a tree: 15d5283b7422919d85203b5ba907027f9356e421
Cloning into 'submodules/ores'...
fatal: reference is not a tree: b7413d6abf80688d44b42da4a7d3a4d1a80ca150
Cloning into 'submodules/wheels'...
Cloning into 'submodules/wikiclass'...
Unable to checkout '15d5283b7422919d85203b5ba907027f9356e421' in submodule path 'submodules/editquality'
Unable to checkout 'b7413d6abf80688d44b42da4a7d3a4d1a80ca150' in submodule path 'submodules/ores'

17:18:27 deploy-local failed: <ErrorReturnCode_1> 

  RAN: /usr/bin/git submodule update --init --recursive --reference /srv/deployment/ores/deploy-cache/cache

  STDOUT:
Submodule 'submodules/draftquality' (http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/draftquality) registered for path 'submodules/draftquality'
Submodule 'submodules/editquality' (http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/editquality) registered for path 'submodules/editquality'
Submodule 'submodules/ores' (http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/ores) registered for path 'submodules/ores'
Submodule 'submodules/wheels' (http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/wheels) registered for path 'submodules/wheels'
Submodule 'submodules/wikiclass' (http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/wikiclass) registered for path 'submodules/wikiclass'
Submodule path 'submodul... (258 more, please see e.stdout)

  STDERR:
Cloning into 'submodules/draftquality'...
Cloning into 'submodules/editquality'...
fatal: reference is not a tree: 15d5283b7422919d85203b5ba907027f9356e421
Cloning into 'submodules/ores'...
fatal: reference is not a tree: b7413d6abf80688d44b42da4a7d3a4d1a80ca150
Cloning into 'submodules/wheels'...
Cloning into 'submodules/wikiclass'...
Unable to checkout '15d5283b7422919d85203b5ba907027f9356e421' in submodule path 'submodules/editquality'
Unable to checkout 'b7413d6abf80688d44b42da4a7d3a4d1a80ca150' in submodule path 'submodules/ores'

rOEQ15d5283b7422 doesn't exist in the editquality repo. Somehow the submodule pointers are wrong.

But it does exist on tin

akosiaris@tin:/srv/deployment/ores/deploy/.git/modules/submodules/editquality$ git show 15d5283b7422919d85203b5ba907027f9356e421
commit 15d5283b7422919d85203b5ba907027f9356e421
Merge: 529371c 4813659
Author: Aaron Halfaker <aaron.halfaker@gmail.com>
Date:   Wed Dec 6 11:47:01 2017 -0600

    Merge pull request #108 from wiki-ai/eswikiquote
    
    Add eswikiquote reverted models and revisions for review

For completeness sake

deploy-service@ores1004:/srv/deployment/ores/deploy-cache/cache$ cat .gitmodules 
[submodule "submodules/wikiclass"]
	path = submodules/wikiclass
	url = http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/wikiclass
[submodule "submodules/editquality"]
	path = submodules/editquality
	url = http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/editquality
[submodule "submodules/ores"]
	path = submodules/ores
	url = http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/ores
[submodule "submodules/wheels"]
	path = submodules/wheels
	url = http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/wheels
[submodule "submodules/draftquality"]
	path = submodules/draftquality
	url = http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/draftquality
deploy-service@ores1004:/srv/deployment/ores/deploy-cache/cache$ cat .git/config 
[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
[remote "origin"]
	url = http://tin.eqiad.wmnet/ores/deploy/.git
	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
	remote = origin
	merge = refs/heads/master
[submodule "submodules/draftquality"]
	url = http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/draftquality
[submodule "submodules/editquality"]
	url = http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/editquality
[submodule "submodules/ores"]
	url = http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/ores
[submodule "submodules/wheels"]
	url = http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/wheels
[submodule "submodules/wikiclass"]
	url = http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/wikiclass

But a git fetch indeed does not fetch anything new.

Another thing: I'm having difficulty just cloning the editquality submodule.

It's so large that git pack-objects takes over 100 seconds of cpu time and uses ~5.5gigs of ram.

RES    SHR    S  %CPU %MEM     TIME+ COMMAND 
5.665g 486352 S  1822  9.0  102:16.34 git pack-objects --revs --thin --stdout --delta-base-offset

I don't think there is anything really wrong with scap, it's just an unusably-large git repo.

@mmodell is on to something though with the comment about that commit not being in the repo

akosiaris@tin:/srv/deployment/ores/deploy/.git/modules/submodules/editquality$ git branch --contains 15d5283b7422919d85203b5ba907027f9356e421
akosiaris@tin:/srv/deployment/ores/deploy/.git/modules/submodules/editquality$ git tag --contains 15d5283b7422919d85203b5ba907027f9356e421
akosiaris@tin:/srv/deployment/ores/deploy/.git/modules/submodules/editquality$

So this commit does not exist in any of the branches or tags

Hmm, indeed, if the object does not exist on any branch or tag then it likely won't be fetched by the "dumb" git http transport (which is what we still use in scap.)

Probably wouldn't be fetched by other transports either, though I guess it might get fetched along with other objects if it's in the same pack file.

Just fetching this one repo (editquality) from phabricator is causing inordinate load on the server. It's nothing that phab1001 can't handle, however, it's surprising how much work it takes for git to clone a single ~3gb repository.

load average: 18.23, 26.87, 28.41

Behavior is erratic as well

akosiaris@bast1001:~$ git clone https://phabricator.wikimedia.org/source/editquality.git
Cloning into 'editquality'...
error: RPC failed; result=22, HTTP code = 504
fatal: The remote end hung up unexpectedly

Ahem...

Yeah that repo is 334M in the current workdir but the .git is 2.1 gigs. That doesn't seem too unreasonable but apparently git can't handle it. It's especially bad cloning from phabricator over https (upstream:T4369)

A fresh clone of http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/editquality on bast1001 does not have rOEQ15d5283b7422 either so aside from the difficult to fetch repo we have at least identified 1 more cause for the problem. It seems like the editquality submodule referenced by the deploy repo needs to be updated to contain that change as well and not be at HEAD detached at ac3042e state. @awight care to have the honours of updating that submodule ?

I can't help but wonder why scap is trying to checkout rOEQ15d5283b7422 however. Where does it get that info from ? master ?

@akosiaris: scap should be getting the hash from the submodule pointers contained at HEAD of tin.eqiad.wmnet/ores/deploy/.git

That said, I can't actually find that hash anywhere in the repo on tin so I'm baffled really, I can't figure out where that commit is coming from at all.

the ores submodule btw is in the exact same state and also fails to checkout

akosiaris@tin:/srv/deployment/ores/deploy/ores$ git status
HEAD detached at f3a4a63
nothing to commit, working tree clean
akosiaris@tin:/srv/deployment/ores/deploy/ores$ cd ../
akosiaris@tin:/srv/deployment/ores/deploy$ git status
On branch master

ah ha! I figured _something_ out at least! The rOEQ15d5283b7422 commit is in origin/master it just hasn't been merged to master on tin. git pull would fast forward it and then that commit exists. The question that this raises, however, is this:

Why is scap trying to fetch that commit? It would appear as if scap is fetching the repo from origin instead of tin, at least initially, then attempting to fetch only the submodules from tin (but not finding the commit in question because it hasn't been pulled there)

(If this is true, it's incorrect behavior, I think, as scap should always fetch from tin unless instructed otherwise by scap.cfg)

In T181661#3834452, @mmodell wrote:

@akosiaris: scap should be getting the hash from the submodule pointers contained at HEAD of tin.eqiad.wmnet/ores/deploy/.git

Well, it's getting the submodule pointers from origin/master instead I 'd say. See 42cf53249ffacc078aa786a581fe165e00bb1520

That said, I can't actually find that hash anywhere in the repo on tin so I'm baffled really, I can't figure out where that commit is coming from at all.

In T181661#3834478, @akosiaris wrote:

In T181661#3834452, @mmodell wrote:

@akosiaris: scap should be getting the hash from the submodule pointers contained at HEAD of tin.eqiad.wmnet/ores/deploy/.git

Well, it's getting the submodule pointers from origin/master instead I 'd say. See 42cf53249ffacc078aa786a581fe165e00bb1520

Indeed, and that's the important detail that was eluding me before. And I think that I have found the culprit: scap/scap.cfg has this line: git_rev: origin/master

so @awight, can you enlighten me about your scap.cfg? Is git_rev: origin/master intentional? If you want to deploy from origin then I think we need to use git_upstream_submodules = True

Aha! nice find. It looks like it's been there since the very beginning. See fd1067ff4da. It has undergone a few minor changes since then but nothing major.

I 've crafted a commit on tin removing that line and retrying a scap deploy from tin just for ores1004.

I think I should add a NOTICE to scap that says something along the lines of "Deploying from non-default origin: $GIT_REV" just to make this more obvious ;)

I 've crafted a commit on tin removing that line and retrying a scap deploy from tin just for ores1004.

OK, success! deploy for the first time proceeded fine on ores1004. @mmodell your rock! Thanks!

Now to see if we have solved the parallel scap issue on all ores* boxes as well

In T181661#3834531, @akosiaris wrote:

I 've crafted a commit on tin removing that line and retrying a scap deploy from tin just for ores1004.

OK, success! deploy for the first time proceeded fine on ores1004. @mmodell your rock! Thanks!

Now to see if we have solved the parallel scap issue on all ores* boxes as well

Hurray!!! I had to manually (viva cumin) rm -rf /srv/deployment/ores/deploy/deploy-cache/cache on all ores* hosts, but it now works.

AFAICT, next steps forward should probably be:

A patch to remove git_rev: origin/master from scap config (I 'll do that) for ores
The above NOTICE that @mmodell suggested would probably be nice
Figure out why the deletion of the deploy-cache/cache directory was required and whether that's correct behavior or not

Change 398067 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[mediawiki/services/ores/deploy@master] scap: Remove git_rev setting

https://gerrit.wikimedia.org/r/398067

gerritbot added a project: Patch-For-Review.Dec 13 2017, 4:01 PM

akosiaris mentioned this in rORESDEPLOY4761187a7703: scap: Remove git_rev setting.Dec 13 2017, 4:02 PM

Change 398067 merged by Awight:
[mediawiki/services/ores/deploy@master] scap: Remove git_rev setting

https://gerrit.wikimedia.org/r/398067

@mmodell Tangential note, I've been happy using git clone --depth 1 on personal projects. Would that make any sense for scap?

@awight: from what I understand, git has to do a lot of extra work on the server side in order to build to shallow clone. I may look into this a bit further, it would certainly save space on the deployment target but it probably doesn't solve the issue of slow fetches.

• mmodell mentioned this in T182498: Scap is unhappy about deploying from a branch other than master.Dec 13 2017, 4:53 PM

• mmodell added a revision: Restricted Differential Revision.Dec 13 2017, 4:58 PM

• mmodell mentioned this in rMSCAdae7fd15b3f6: Make it more obvious when we are using `git_rev` in scap.cfg.Dec 13 2017, 5:08 PM

In T181661#3834679, @mmodell wrote:

@awight: from what I understand, git has to do a lot of extra work on the server side in order to build to shallow clone. I may look into this a bit further, it would certainly save space on the deployment target but it probably doesn't solve the issue of slow fetches.

I may actually be wrong about this, upon reviewing my sources I think I was just remembering something incorrectly. It's only hard on the server when you subsequently try to fetch into a shallow clone - git has to do more work to figure out what needs to be fetched and it makes the server work harder than simply cloning the whole repo to begin with.

So --depth=1 might work as long as we don't have to fetch from the repo later. I'm not sure how that would be better than the --reference method that scap is using right now but I am definitely open to considering anything which can improve performance and stability.

awight closed subtask T182801: New, mysterious scap failure as Invalid.Dec 13 2017, 5:12 PM

Looks like I'm getting the same error.

commit b67bba77acb7c0ffc678201c9f3f54f198da6650

scap deploy -v -l "ores*" "(non-production) Update ORES on new cluster"

scap-sync-2017-12-12-0006-3-gb67bba7.log

Mentioned in SAL (#wikimedia-operations) [2017-12-14T11:12:07Z] <akosiaris@tin> Started deploy [ores/deploy@b67bba7]: T181661

Mentioned in SAL (#wikimedia-operations) [2017-12-14T11:13:02Z] <akosiaris@tin> Finished deploy [ores/deploy@b67bba7]: T181661 (duration: 00m 55s)

Mentioned in SAL (#wikimedia-operations) [2017-12-14T11:13:59Z] <akosiaris@tin> Started deploy [ores/deploy@b67bba7]: T181661

Mentioned in SAL (#wikimedia-operations) [2017-12-14T11:14:08Z] <akosiaris@tin> Finished deploy [ores/deploy@b67bba7]: T181661 (duration: 00m 03s)

In T181661#3834939, @awight wrote:

Looks like I'm getting the same error.

commit b67bba77acb7c0ffc678201c9f3f54f198da6650

scap deploy -v -l "ores*" "(non-production) Update ORES on new cluster"

scap-sync-2017-12-12-0006-3-gb67bba7.log

I can not reproduce this. A deploy of b67bba77a (see T181661#3836939) just worked fine. In fact it worked fine 2 times in a row. Command used was scap deploy -v -l "ores*" T181661.

Mentioned in SAL (#wikimedia-operations) [2017-12-14T11:16:12Z] <akosiaris@tin> Started deploy [ores/deploy@b67bba7]: T181661

Mentioned in SAL (#wikimedia-operations) [2017-12-14T11:16:21Z] <akosiaris@tin> Finished deploy [ores/deploy@b67bba7]: T181661 (duration: 00m 03s)

Not sure if this is related, but now I'm seeing a deploy-local failure with no diagnostics other than error code 70:

P6463 (An Untitled Masterwork)

1	awight@tin:/srv/deployment/ores/deploy$ scap deploy -v -f -l "ores1001*" "(non-production) Update ORES on new cluster"
2	11:23:51 INFO - <Command u'/usr/bin/git describe --always'>: starting process
3	11:23:51 INFO - <Command u'/usr/bin/git describe --always', pid 4191>: process started
4	11:23:51 INFO - <Command u'/usr/bin/git describe --always', pid 4191>: process completed
5	11:23:51 INFO - <Command u'/usr/bin/git describe --always', pid 4191>: process completed
6	11:23:51 INFO - <Command u'/usr/bin/git describe --always', pid 4191>: process completed
7	11:23:51 <Command u'/usr/bin/git show -s --format=%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>: starting process
8	11:23:51 <Command u'/usr/bin/git show -s --format=%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>: started process
9	11:23:51 <Command u'/usr/bin/git show -s --format=%ct b67bba77acb7c0ffc678201c9f3f54f198da6650', pid 4196>: process started
10	11:23:51 <Command u'/usr/bin/git show -s --format=%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>: acquiring wait lock to wait for completion
11	11:23:51 <Command u'/usr/bin/git show -s --format=%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>: got wait lock
12	11:23:51 <Command u'/usr/bin/git show -s --format=%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>: exit code not set, waiting on pid
13	11:23:51 <Command u'/usr/bin/git show -s --format=%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>: <scap.sh.StreamReader object at 0x7f0b81f00850> ready to be read from
14	11:23:51 <Command u'/usr/bin/git show -s --format=%%%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>.stdout: got chunk size 11: '1513181123\n'
15	11:23:51 acquiring buffering lock to process chunk (buffering: 1)
16	11:23:51 got buffering lock to process chunk (buffering: 1)
17	11:23:51 released buffering lock for processing chunk (buffering: 1)
18	11:23:51 <Command u'/usr/bin/git show -s --format=%%%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>.stdout: putting chunk onto pipe: '1513181123\n'
19	11:23:51 <Command u'/usr/bin/git show -s --format=%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>: <scap.sh.StreamReader object at 0x7f0b81f00fd0> ready to be read from
20	11:23:51 got no chunk, done reading
21	11:23:51 <Command u'/usr/bin/git show -s --format=%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>: <scap.sh.StreamReader object at 0x7f0b81f00850> ready to be read from
22	11:23:51 <Command u'/usr/bin/git show -s --format=%%%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>.stdout: got no chunk, done reading
23	11:23:51 acquiring buffering lock for flushing buffer
24	11:23:51 got buffering lock for flushing buffer
25	11:23:51 released buffering lock for flushing buffer
26	11:23:51 <Command u'/usr/bin/git show -s --format=%%%%ct b67bba77acb7c0ffc678201c9f3f54f198da6650'>.<Process 4196 ['/usr/bin/git', 'show', '-s', '--format=%%ct', 'b67bba77acb7c0ffc678201c9f3f54f198da6650']>.stdout: got chunk size 0 to flush: ''
27	11:23:51 acquiring buffering lock for flushing buffer
28	11:23:51 got buffering lock for flushing buffer
29	11:23:51 released buffering lock for flushing buffer
30	11:23:51 got chunk size 0 to flush: ''
31	11:23:51 <Command u'/usr/bin/git show -s --format=%ct b67bba77acb7c0ffc678201c9f3f54f198da6650', pid 4196>: process completed
32	11:23:51 <Command u'/usr/bin/git show -s --format=%ct b67bba77acb7c0ffc678201c9f3f54f198da6650', pid 4196>: process completed
33	11:23:51 <Command u'/usr/bin/git show -s --format=%ct b67bba77acb7c0ffc678201c9f3f54f198da6650', pid 4196>: process completed
34	11:23:51 <Command u'/usr/bin/git ls-remote --get-url'>: starting process
35	11:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>: started process
36	11:23:51 <Command u'/usr/bin/git ls-remote --get-url', pid 4200>: process started
37	11:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>: acquiring wait lock to wait for completion
38	11:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>: got wait lock
39	11:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>: exit code not set, waiting on pid
40	11:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>: <scap.sh.StreamReader object at 0x7f0b81ef6d90> ready to be read from
41	11:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>.stdout: got chunk size 68: 'https://gerrit.wikimedia.org/r'
42	11:23:51 acquiring buffering lock to process chunk (buffering: 1)
43	11:23:51 got buffering lock to process chunk (buffering: 1)
44	11:23:51 released buffering lock for processing chunk (buffering: 1)
45	11:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>.stdout: putting chunk onto pipe: 'https://gerrit.wikimedia.org/r'
46	11:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>: <scap.sh.StreamReader object at 0x7f0b81f17590> ready to be read from
47	11:23:51 got no chunk, done reading
48	11:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>: <scap.sh.StreamReader object at 0x7f0b81ef6d90> ready to be read from
49	11:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>.stdout: got no chunk, done reading
50	11:23:51 acquiring buffering lock for flushing buffer
51	11:23:51 got buffering lock for flushing buffer
52	11:23:51 released buffering lock for flushing buffer
53	11:23:51 <Command u'/usr/bin/git ls-remote --get-url'>.<Process 4200 ['/usr/bin/git', 'ls-remote', '--get-url']>.stdout: got chunk size 0 to flush: ''
54	11:23:51 acquiring buffering lock for flushing buffer
55	11:23:51 got buffering lock for flushing buffer
56	11:23:51 released buffering lock for flushing buffer
57	11:23:51 got chunk size 0 to flush: ''
58	11:23:51 <Command u'/usr/bin/git ls-remote --get-url', pid 4200>: process completed
59	11:23:51 <Command u'/usr/bin/git ls-remote --get-url', pid 4200>: process completed
60	11:23:51 <Command u'/usr/bin/git ls-remote --get-url', pid 4200>: process completed
61	11:23:51 Started deploy [ores/deploy@b67bba7]
62	11:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*'>: starting process
63	11:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/']>: started process
64	11:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*', pid 4204>: process started
65	11:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/']>: acquiring wait lock to wait for completion
66	11:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/']>: got wait lock
67	11:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/']>: exit code not set, waiting on pid
68	11:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/']>: <scap.sh.StreamReader object at 0x7f0b8125f090> ready to be read from
69	11:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/']>.stdout: got chunk size 104: 'scap/sync/2017-12-14/0001\nscap'
70	11:23:51 acquiring buffering lock to process chunk (buffering: 1)
71	11:23:51 got buffering lock to process chunk (buffering: 1)
72	11:23:51 released buffering lock for processing chunk (buffering: 1)
73	11:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/']>.stdout: putting chunk onto pipe: 'scap/sync/2017-12-14/0001\n'
74	11:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/']>.stdout: putting chunk onto pipe: 'scap/sync/2017-12-14/0002\n'
75	11:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/']>.stdout: putting chunk onto pipe: 'scap/sync/2017-12-14/0003\n'
76	11:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/']>.stdout: putting chunk onto pipe: 'scap/sync/2017-12-14/0004\n'
77	11:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/']>: <scap.sh.StreamReader object at 0x7f0b8125f090> ready to be read from
78	11:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/']>.stdout: got no chunk, done reading
79	11:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/']>: <scap.sh.StreamReader object at 0x7f0b8125f1d0> ready to be read from
80	11:23:51 got no chunk, done reading
81	11:23:51 acquiring buffering lock for flushing buffer
82	11:23:51 got buffering lock for flushing buffer
83	11:23:51 released buffering lock for flushing buffer
84	11:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/'>.<Process 4204 ['/usr/bin/git', 'tag', '--list', 'scap/sync/2017-12-14/']>.stdout: got chunk size 0 to flush: ''
85	11:23:51 acquiring buffering lock for flushing buffer
86	11:23:51 got buffering lock for flushing buffer
87	11:23:51 released buffering lock for flushing buffer
88	11:23:51 got chunk size 0 to flush: ''
89	11:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*', pid 4204>: process completed
90	11:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*', pid 4204>: process completed
91	11:23:51 <Command u'/usr/bin/git tag --list scap/sync/2017-12-14/*', pid 4204>: process completed
92	11:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>: starting process
93	11:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>: started process
94	11:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD', pid 4208>: process started
95	11:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>: acquiring wait lock to wait for completion
96	11:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>: got wait lock
97	11:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>: exit code not set, waiting on pid
98	11:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>: <scap.sh.StreamReader object at 0x7f0b81f17850> ready to be read from
99	11:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>.stdout: got chunk size 41: 'b67bba77acb7c0ffc678201c9f3f54'
100	11:23:51 acquiring buffering lock to process chunk (buffering: 1)
101	11:23:51 got buffering lock to process chunk (buffering: 1)
102	11:23:51 released buffering lock for processing chunk (buffering: 1)
103	11:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>.stdout: putting chunk onto pipe: 'b67bba77acb7c0ffc678201c9f3f54'
104	11:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>: <scap.sh.StreamReader object at 0x7f0b81f17850> ready to be read from
105	11:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>.stdout: got no chunk, done reading
106	11:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>: <scap.sh.StreamReader object at 0x7f0b81f17750> ready to be read from
107	11:23:51 got no chunk, done reading
108	11:23:51 acquiring buffering lock for flushing buffer
109	11:23:51 got buffering lock for flushing buffer
110	11:23:51 released buffering lock for flushing buffer
111	11:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD'>.<Process 4208 ['/usr/bin/git', 'rev-parse', '--verify', 'HEAD']>.stdout: got chunk size 0 to flush: ''
112	11:23:51 acquiring buffering lock for flushing buffer
113	11:23:51 got buffering lock for flushing buffer
114	11:23:51 released buffering lock for flushing buffer
115	11:23:51 got chunk size 0 to flush: ''
116	11:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD', pid 4208>: process completed
117	11:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD', pid 4208>: process completed
118	11:23:51 <Command u'/usr/bin/git rev-parse --verify HEAD', pid 4208>: process completed
119	11:23:51 Deploying Rev: b67bba77acb7c0ffc678201c9f3f54f198da6650
120	11:23:51 Update DEPLOY_HEAD
121	11:23:51 Creating /srv/deployment/ores/deploy/.git/DEPLOY_HEAD
122	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%(refname) refs/tags'>: starting process
123	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%(refname)', 'refs/tags']>: started process
124	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%(refname) refs/tags', pid 4214>: process started
125	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%(refname)', 'refs/tags']>: acquiring wait lock to wait for completion
126	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%(refname)', 'refs/tags']>: got wait lock
127	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%(refname)', 'refs/tags']>: exit code not set, waiting on pid
128	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%(refname)', 'refs/tags']>: <scap.sh.StreamReader object at 0x7f0b8125f8d0> ready to be read from
129	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: got chunk size 756: 'refs/tags/scap/sync/2017-12-05'
130	11:23:51 acquiring buffering lock to process chunk (buffering: 1)
131	11:23:51 got buffering lock to process chunk (buffering: 1)
132	11:23:51 released buffering lock for processing chunk (buffering: 1)
133	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-05'
134	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-09'
135	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-09'
136	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-09'
137	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-09'
138	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-11'
139	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-12'
140	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-12'
141	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-12'
142	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-12'
143	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-12'
144	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-12'
145	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-13'
146	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-13'
147	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-13'
148	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-13'
149	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-14'
150	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-14'
151	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-14'
152	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-14'
153	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: putting chunk onto pipe: 'refs/tags/scap/sync/2017-12-14'
154	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%(refname)', 'refs/tags']>: <scap.sh.StreamReader object at 0x7f0b8125f8d0> ready to be read from
155	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: got no chunk, done reading
156	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%(refname)', 'refs/tags']>: <scap.sh.StreamReader object at 0x7f0b8125f950> ready to be read from
157	11:23:51 got no chunk, done reading
158	11:23:51 acquiring buffering lock for flushing buffer
159	11:23:51 got buffering lock for flushing buffer
160	11:23:51 released buffering lock for flushing buffer
161	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%%%%(refname) refs/tags'>.<Process 4214 ['/usr/bin/git', 'for-each-ref', '--sort=taggerdate', '--format=%%(refname)', 'refs/tags']>.stdout: got chunk size 0 to flush: ''
162	11:23:51 acquiring buffering lock for flushing buffer
163	11:23:51 got buffering lock for flushing buffer
164	11:23:51 released buffering lock for flushing buffer
165	11:23:51 got chunk size 0 to flush: ''
166	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%(refname) refs/tags', pid 4214>: process completed
167	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%(refname) refs/tags', pid 4214>: process completed
168	11:23:51 <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%(refname) refs/tags', pid 4214>: process completed
169	11:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>: starting process
170	11:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>: started process
171	11:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001', pid 4218>: process started
172	11:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>: acquiring wait lock to wait for completion
173	11:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>: got wait lock
174	11:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>: exit code not set, waiting on pid
175	11:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>: <scap.sh.StreamReader object at 0x7f0b81f00a50> ready to be read from
176	11:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>.stdout: got chunk size 54: "Deleted tag 'scap/sync/2017-12"
177	11:23:51 acquiring buffering lock to process chunk (buffering: 1)
178	11:23:51 got buffering lock to process chunk (buffering: 1)
179	11:23:51 released buffering lock for processing chunk (buffering: 1)
180	11:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>.stdout: putting chunk onto pipe: "Deleted tag 'scap/sync/2017-12"
181	11:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>: <scap.sh.StreamReader object at 0x7f0b81f00a50> ready to be read from
182	11:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>.stdout: got no chunk, done reading
183	11:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>: <scap.sh.StreamReader object at 0x7f0b81f00590> ready to be read from
184	11:23:51 got no chunk, done reading
185	11:23:51 acquiring buffering lock for flushing buffer
186	11:23:51 got buffering lock for flushing buffer
187	11:23:51 released buffering lock for flushing buffer
188	11:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001'>.<Process 4218 ['/usr/bin/git', 'tag', '-d', 'scap/sync/2017-12-05/0001']>.stdout: got chunk size 0 to flush: ''
189	11:23:51 acquiring buffering lock for flushing buffer
190	11:23:51 got buffering lock for flushing buffer
191	11:23:51 released buffering lock for flushing buffer
192	11:23:51 got chunk size 0 to flush: ''
193	11:23:51 <Command u'/usr/bin/git tag -d scap/sync/2017-12-05/0001', pid 4218>: process completed
194	11:23:51 Update server info
195	11:23:51 <Command u'/usr/bin/git update-server-info'>: starting process
196	11:23:51 <Command u'/usr/bin/git update-server-info'>.<Process 4222 ['/usr/bin/git', 'update-server-info']>: started process
197	11:23:51 <Command u'/usr/bin/git update-server-info', pid 4222>: process started
198	11:23:51 <Command u'/usr/bin/git update-server-info'>.<Process 4222 ['/usr/bin/git', 'update-server-info']>: acquiring wait lock to wait for completion
199	11:23:51 <Command u'/usr/bin/git update-server-info'>.<Process 4222 ['/usr/bin/git', 'update-server-info']>: got wait lock
200	11:23:51 <Command u'/usr/bin/git update-server-info'>.<Process 4222 ['/usr/bin/git', 'update-server-info']>: exit code not set, waiting on pid
201	11:23:51 <Command u'/usr/bin/git update-server-info'>.<Process 4222 ['/usr/bin/git', 'update-server-info']>: <scap.sh.StreamReader object at 0x7f0b8126e910> ready to be read from
202	11:23:51 <Command u'/usr/bin/git update-server-info'>.<Process 4222 ['/usr/bin/git', 'update-server-info']>.stdout: got no chunk, done reading
203	11:23:51 <Command u'/usr/bin/git update-server-info'>.<Process 4222 ['/usr/bin/git', 'update-server-info']>: <scap.sh.StreamReader object at 0x7f0b8126ea50> ready to be read from
204	11:23:51 got no chunk, done reading
205	11:23:51 acquiring buffering lock for flushing buffer
206	11:23:51 got buffering lock for flushing buffer
207	11:23:51 released buffering lock for flushing buffer
208	11:23:51 <Command u'/usr/bin/git update-server-info'>.<Process 4222 ['/usr/bin/git', 'update-server-info']>.stdout: got chunk size 0 to flush: ''
209	11:23:51 acquiring buffering lock for flushing buffer
210	11:23:51 got buffering lock for flushing buffer
211	11:23:51 released buffering lock for flushing buffer
212	11:23:51 got chunk size 0 to flush: ''
213	11:23:51 <Command u'/usr/bin/git update-server-info', pid 4222>: process completed
214	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>: starting process
215	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: started process
216	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info', pid 4226>: process started
217	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: acquiring wait lock to wait for completion
218	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: got wait lock
219	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: exit code not set, waiting on pid
220	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: <scap.sh.StreamReader object at 0x7f0b8125fc10> ready to be read from
221	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: got chunk size 35: "Entering 'submodules/draftqual"
222	11:23:51 acquiring buffering lock to process chunk (buffering: 1)
223	11:23:51 got buffering lock to process chunk (buffering: 1)
224	11:23:51 released buffering lock for processing chunk (buffering: 1)
225	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: putting chunk onto pipe: "Entering 'submodules/draftqual"
226	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: <scap.sh.StreamReader object at 0x7f0b8125fc10> ready to be read from
227	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: got chunk size 34: "Entering 'submodules/editquali"
228	11:23:51 acquiring buffering lock to process chunk (buffering: 1)
229	11:23:51 got buffering lock to process chunk (buffering: 1)
230	11:23:51 released buffering lock for processing chunk (buffering: 1)
231	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: putting chunk onto pipe: "Entering 'submodules/editquali"
232	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: <scap.sh.StreamReader object at 0x7f0b8125fc10> ready to be read from
233	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: got chunk size 27: "Entering 'submodules/ores'\n"
234	11:23:51 acquiring buffering lock to process chunk (buffering: 1)
235	11:23:51 got buffering lock to process chunk (buffering: 1)
236	11:23:51 released buffering lock for processing chunk (buffering: 1)
237	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: putting chunk onto pipe: "Entering 'submodules/ores'\n"
238	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: <scap.sh.StreamReader object at 0x7f0b8125fc10> ready to be read from
239	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: got chunk size 29: "Entering 'submodules/wheels'\n"
240	11:23:51 acquiring buffering lock to process chunk (buffering: 1)
241	11:23:51 got buffering lock to process chunk (buffering: 1)
242	11:23:51 released buffering lock for processing chunk (buffering: 1)
243	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: putting chunk onto pipe: "Entering 'submodules/wheels'\n"
244	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: <scap.sh.StreamReader object at 0x7f0b8125fc10> ready to be read from
245	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: got chunk size 32: "Entering 'submodules/wikiclass"
246	11:23:51 acquiring buffering lock to process chunk (buffering: 1)
247	11:23:51 got buffering lock to process chunk (buffering: 1)
248	11:23:51 released buffering lock for processing chunk (buffering: 1)
249	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: putting chunk onto pipe: "Entering 'submodules/wikiclass"
250	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: <scap.sh.StreamReader object at 0x7f0b8125fc10> ready to be read from
251	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: got no chunk, done reading
252	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>: <scap.sh.StreamReader object at 0x7f0b8125ff90> ready to be read from
253	11:23:51 got no chunk, done reading
254	11:23:51 acquiring buffering lock for flushing buffer
255	11:23:51 got buffering lock for flushing buffer
256	11:23:51 released buffering lock for flushing buffer
257	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>.<Process 4226 ['/usr/bin/git', 'submodule', 'foreach', '--recursive', 'git update-server-info']>.stdout: got chunk size 0 to flush: ''
258	11:23:51 acquiring buffering lock for flushing buffer
259	11:23:51 got buffering lock for flushing buffer
260	11:23:51 released buffering lock for flushing buffer
261	11:23:51 got chunk size 0 to flush: ''
262	11:23:51 <Command u'/usr/bin/git submodule foreach --recursive git update-server-info', pid 4226>: process completed
263	11:23:51 Started deploy [ores/deploy@b67bba7]: (non-production) Update ORES on new cluster
264	11:23:51
265	== CLUSTER ==
266	:* ores1001.eqiad.wmnet
267	11:23:51 Running remote deploy cmd ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '--force', '-g', 'cluster', 'fetch', '--refresh-config']
268	11:23:51 Using key: /etc/keyholder.d/deploy_service.pub
269	11:23:53 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '--force', '-g', 'cluster', 'fetch', '--refresh-config'] on ores1001.eqiad.wmnet returned [70]: OpenSSH_6.7p1 Debian-5+deb8u3, OpenSSL 1.0.2m 2 Nov 2017
270	debug1: Reading configuration data /dev/null
271	debug1: Connecting to ores1001.eqiad.wmnet [10.64.0.51] port 22.
272	debug1: Connection established.
273	debug1: identity file /etc/keyholder.d/deploy_service.pub type 1
274	debug1: key_load_public: No such file or directory
275	debug1: identity file /etc/keyholder.d/deploy_service.pub-cert type -1
276	debug1: Enabling compatibility mode for protocol 2.0
277	debug1: Local version string SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3
278	debug1: Remote protocol version 2.0, remote software version OpenSSH_6.7p1 Debian-5+deb8u3
279	debug1: match: OpenSSH_6.7p1 Debian-5+deb8u3 pat OpenSSH* compat 0x04000000
280	debug1: SSH2_MSG_KEXINIT sent
281	debug1: SSH2_MSG_KEXINIT received
282	debug1: kex: server->client aes128-ctr umac-128-etm@openssh.com none
283	debug1: kex: client->server aes128-ctr umac-128-etm@openssh.com none
284	debug1: sending SSH2_MSG_KEX_ECDH_INIT
285	debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
286	debug1: Server host key: ECDSA 18:71:ca:a4:ae:29:83:c6:46:55:43:56:7a:74:42:7c
287	debug1: Host 'ores1001.eqiad.wmnet' is known and matches the ECDSA host key.
288	debug1: Found key in /etc/ssh/ssh_known_hosts:1086
289	debug1: SSH2_MSG_NEWKEYS sent
290	debug1: expecting SSH2_MSG_NEWKEYS
291	debug1: SSH2_MSG_NEWKEYS received
292	debug1: SSH2_MSG_SERVICE_REQUEST sent
293	debug1: SSH2_MSG_SERVICE_ACCEPT received
294	debug1: Authentications that can continue: publickey,keyboard-interactive
295	debug1: Next authentication method: publickey
296	debug1: Offering RSA public key: /etc/keyholder.d/deploy_service.pub
297	debug1: Server accepts key: pkalg ssh-rsa blen 279
298	debug1: Authentication succeeded (publickey).
299	Authenticated to ores1001.eqiad.wmnet ([10.64.0.51]:22).
300	debug1: channel 0: new [client-session]
301	debug1: Requesting no-more-sessions@openssh.com
302	debug1: Entering interactive session.
303	debug1: Sending command: /usr/bin/scap deploy-local -v --repo ores/deploy --force -g cluster fetch --refresh-config
304	http://tin.eqiad.wmnet/ores/deploy/.git
305	/srv/deployment/ores/deploy-cache/cache
306	debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
307	debug1: client_input_channel_req: channel 0 rtype eow@openssh.com reply 0
308	debug1: channel 0: free: client-session, nchannels 1
309	debug1: fd 1 clearing O_NONBLOCK
310	Transferred: sent 3448, received 162856 bytes, in 2.0 seconds
311	Bytes per second: sent 1707.3, received 80639.4
312	debug1: Exit status 70
313
314	ores/deploy: fetch stage(s): 100% (ok: 0; fail: 1; left: 0)
315	11:23:53 1 targets had deploy errors
316	11:23:53 1 targets failed
317	11:23:53 1 of 1 cluster targets failed, exceeding limit

scap deploy-log -v reveals more:

11:27:10 [ores1001.eqiad.wmnet] Unhandled error:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/scap/cli.py", line 329, in run
    exit_status = app.main(app.extra_arguments)
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 146, in main
    getattr(self, stage)()
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 321, in fetch
    reference=self.context.cache_dir)
  File "/usr/lib/python2.7/dist-packages/scap/git.py", line 388, in update_submodules
    git.submodule(*cmd)
  File "/usr/lib/python2.7/dist-packages/scap/sh.py", line 1428, in __call__
    return RunningCommand(cmd, call_args, stdin, stdout, stderr)
  File "/usr/lib/python2.7/dist-packages/scap/sh.py", line 775, in __init__
    self.wait()
  File "/usr/lib/python2.7/dist-packages/scap/sh.py", line 793, in wait
    self.handle_command_exit_code(exit_code)
  File "/usr/lib/python2.7/dist-packages/scap/sh.py", line 816, in handle_command_exit_code
    raise exc
ErrorReturnCode_1: 

  RAN: /usr/bin/git submodule update --init --recursive --reference /srv/deployment/ores/deploy-cache/cache

  STDOUT:


  STDERR:
fatal: reference is not a tree: 15d5283b7422919d85203b5ba907027f9356e421
fatal: reference is not a tree: a3a50bb0b5aea50178e428430438ae3323d7376f
Unable to checkout '15d5283b7422919d85203b5ba907027f9356e421' in submodule path 'submodules/editquality'
Unable to checkout 'a3a50bb0b5aea50178e428430438ae3323d7376f' in submodule path 'submodules/ores'

Those revisions aren't in gerrit. I think the github -> gerrit mirroring broke when we were messing around with pointing to git-lfs repos?

The Phabricator control panels look happy,
https://phabricator.wikimedia.org/source/editquality/manage/uris/ shows that we're observing GitHub,
https://github.com/wiki-ai/editquality/commit/15d5283b7422919d85203b5ba907027f9356e421 that commit has been in GitHub for 3 days,
https://gerrit.wikimedia.org/r/#/q/15d5283b7422919d85203b5ba907027f9356e421 It's not in Gerrit yet,
https://phabricator.wikimedia.org/source/editquality/manage/basics/ "Last updated Thu, Dec 14, 7:07 AM (9 m, 19 s ago)."

Oh what. A clue,
https://phabricator.wikimedia.org/source/editquality/history/master/ <-- this does contain the revision.

Oops—we aren't expecting this repo to be mirrored to gerrit. So the surprise is that the revision exists in Phabricator but can be found by the deployment tool. Making a new task: T182865: Source revision is in Phabricator, but can't be found by deployment tools

@awight: yeah, I'm getting to the bottom of it now. The issue is that the commit does not exist on a local branch on tin, it's only in refs/remotes/origin/ which scap doesn't fetch automatically as the default refspec only fetches refs/heads/* and not refs/remotes/*

Using a workaround for T182865, where we go into submodules and checkout master before submodule updating from the ORES root, I was able to successfully parallel deploy to ores*, and in under a minute! It looks like the SSH timeout is resolved!!

I'll close this task for now, and will reopen if anything untoward rears its head again. Thanks everyone for the great work debugging!

thcipriani reopened subtask T182347: SCAP: Upload debian package version 3.7.4-3 as Open.Dec 14 2017, 8:42 PM

thcipriani closed subtask T182347: SCAP: Upload debian package version 3.7.4-3 as Resolved.Dec 15 2017, 7:13 PM

akosiaris reopened subtask T182347: SCAP: Upload debian package version 3.7.4-3 as Open.Dec 18 2017, 10:16 AM

akosiaris closed subtask T182347: SCAP: Upload debian package version 3.7.4-3 as Resolved.Dec 18 2017, 5:08 PM

thcipriani mentioned this in T179965: Scap deploying arbitrary revision, incorrectly reporting correct revision.Jul 11 2018, 4:28 PM