Page MenuHomePhabricator

Find a solution for tools-exec-gift on Trusty
Closed, ResolvedPublic

Description

Currently there is a custom node and a custom queue was created for this purpose. There is a very real chance we can collapse this added complexity and rely on standard grid scheduling but we need to verify all the things work.

Current node:

id7e79e49d-b540-459b-a59e-3faa934d730e
flavorm1.medium (3)
imageubuntu-12.04-precise

Most of this process is outlined on https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Admin/new_exec_host

Event Timeline

chasemp created this task.Feb 1 2017, 11:26 PM
scfc moved this task from Triage to Backlog on the Toolforge board.Feb 7 2017, 2:49 PM

@Giftpflanze if we stand up a Trusty host with the same characteristics as the existing precise node are you up for migrating things? That is our current thinking. Transition the existing scheme. I don't think any of us have the insight into how this all works to do this transition in any timely manner. It could be fairly benign depending on precise version(s) reliance.

< annika> chasemp: […] I guess you mean: You will provide a trusty exec node with an appropriate queue? And I will just use that instead of the current one?
< chasemp> annika: basically, a second trusty node in the same queue for you to migrate to
< annika> that would be fine

details on existing

tools-exec-gift ubuntu-12.04-precise (deprecated 2014-04-17) 10.68.16.40 m1.medium - Active nova None Running 2 years, 11 months

1
    tools-exec-gift
ID
    7e79e49d-b540-459b-a59e-3faa934d730e
Status
    Active
Availability Zone
    nova
Created
    Feb. 28, 2014, 4:40 a.m.
Time Since Created
    2 years, 11 months
Host
    labvirt1002

Specs

Flavor Name
    m1.medium
Flavor ID
    3
RAM
    4GB
VCPUs
    2 VCPU
Disk
    40GB

IP Addresses

Public
    10.68.16.40 

Security Groups

default

        ALLOW -1:-1/icmp from 0.0.0.0/0
        ALLOW 5666:5666/tcp from 10.0.0.0/8
        ALLOW 22:22/tcp from 0.0.0.0/0
        ALLOW 60000:61000/udp from 0.0.0.0/0
        ALLOW 6666:6666/tcp from 0.0.0.0/0
        ALLOW 8090:8090/tcp from 0.0.0.0/0
        ALLOW 1:65535/tcp from default
        ALLOW 1:65535/udp from default

execnode

        ALLOW 6444:6445/tcp from 10.4.0.0/21
        ALLOW 6444:6445/udp from 10.4.0.0/21
        ALLOW 113:113/tcp from 0.0.0.0/0
        ALLOW 3000:3000/tcp from 10.4.0.0/21

Metadata

Key Name
    None
Image Name
    ubuntu-12.04-precise (deprecated 2014-04-17)
Image ID
    ff0a06ae-e7bf-4533-a3aa-176c366fdb4a
puppetstatus
    changed
puppettimestamp
    1429716563
project-id
    tools

Volumes Attached

Volume
    No volumes attached.

I created tools-exec-gift-trusty.tools.eqiad.wmflabs but it seems to be having issues with puppet certificates. I haven't figured out why yet, but I imagine it's connected to the project specific master.

Actually, it looks like it may be a somewhat known issue:

hostname: Name or service not known

1* Stopping Send an event to indicate plymouth is up[74G[ OK ]
2 * Starting Mount filesystems on boot[74G[ OK ]
3 * Starting Populate and link to /run filesystem[74G[ OK ]
4 * Stopping Populate and link to /run filesystem[74G[ OK ]
5 * Stopping Track if upstart is running in a container[74G[ OK ]
6 * Starting Signal sysvinit that the rootfs is mounted[74G[ OK ]
7 * Starting Initialize or finalize resolvconf[74G[ OK ]
8 * Starting Clean /tmp directory[74G[ OK ]
9 * Stopping Clean /tmp directory[74G[ OK ]
10Cloud-init v. 0.7.5 running 'init-local' at Fri, 17 Feb 2017 12:42:04 +0000. Up 2.67 seconds.
11cloud-init-nonet[3.03]: waiting 10 seconds for network device
12 * Starting set console keymap[74G[ OK ]
13 * Starting Signal sysvinit that virtual filesystems are mounted[74G[ OK ]
14 * Starting Signal sysvinit that virtual filesystems are mounted[74G[ OK ]
15 * Starting Bridge udev events into upstart[74G[ OK ]
16 * Starting Signal sysvinit that remote filesystems are mounted[74G[ OK ]
17 * Starting device node and kernel event manager[74G[ OK ]
18 * Starting load modules from /etc/modules[74G[ OK ]
19 * Starting cold plug devices[74G[ OK ]
20 * Starting log initial device creation[74G[ OK ]
21 * Stopping load modules from /etc/modules[74G[ OK ]
22 * Stopping set console keymap[74G[ OK ]
23 * Starting Uncomplicated firewall[74G[ OK ]
24 * Starting configure network device security[74G[ OK ]
25 * Starting configure network device security[74G[ OK ]
26 * Starting Mount network filesystems[74G[ OK ]
27 * Starting Upstart job to start rpcbind on boot only[74G[ OK ]
28 * Stopping Upstart job to start rpcbind on boot only[74G[ OK ]
29 * Stopping Mount network filesystems[74G[ OK ]
30 * Starting RPC portmapper replacement[74G[ OK ]
31 * Starting LLDP daemon[74G[ OK ]
32 * Starting NSM status monitor[74G[ OK ]
33 * Starting configure network device[74G[ OK ]
34 * Starting Bridge socket events into upstart[74G[ OK ]
35 * Starting Mount network filesystems[74G[ OK ]
36 * Stopping Mount network filesystems[74G[ OK ]
37cloud-init-nonet[4.76]: static networking is now up
38 * Starting configure network device[74G[ OK ]
39 * Stopping cold plug devices[74G[ OK ]
40 * Stopping log initial device creation[74G[ OK ]
41 * Starting set console font[74G[ OK ]
42Cloud-init v. 0.7.5 running 'init' at Fri, 17 Feb 2017 12:42:06 +0000. Up 5.19 seconds.
43ci-info: ++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++
44ci-info: +--------+------+-------------+---------------+-------------------+
45ci-info: | Device | Up | Address | Mask | Hw-Address |
46ci-info: +--------+------+-------------+---------------+-------------------+
47ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | . |
48ci-info: | eth0 | True | 10.68.21.65 | 255.255.248.0 | fa:16:3e:21:ca:d4 |
49ci-info: +--------+------+-------------+---------------+-------------------+
50ci-info: +++++++++++++++++++++++++++++++Route info+++++++++++++++++++++++++++++++
51ci-info: +-------+-------------+------------+---------------+-----------+-------+
52ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags |
53ci-info: +-------+-------------+------------+---------------+-----------+-------+
54ci-info: | 0 | 0.0.0.0 | 10.68.16.1 | 0.0.0.0 | eth0 | UG |
55ci-info: | 1 | 10.68.16.0 | 0.0.0.0 | 255.255.248.0 | eth0 | U |
56ci-info: +-------+-------------+------------+---------------+-----------+-------+
57 * Stopping set console font[74G[ OK ]
58 * Starting userspace bootsplash[74G[ OK ]
59 * Starting Send an event to indicate plymouth is up[74G[ OK ]
60 * Stopping userspace bootsplash[74G[ OK ]
61 * Stopping Send an event to indicate plymouth is up[74G[ OK ]
622017-02-17 12:42:11,775 - __init__.py[WARNING]: Format for 'users' key must be a comma separated string or a dictionary or a list and not NoneType
632017-02-17 12:42:12,081 - __init__.py[WARNING]: Format for 'users' key must be a comma separated string or a dictionary or a list and not NoneType
64Generating public/private rsa key pair.
65Your identification has been saved in /etc/ssh/ssh_host_rsa_key.
66Your public key has been saved in /etc/ssh/ssh_host_rsa_key.pub.
67The key fingerprint is:
6850:71:8a:d8:bf:c0:04:30:c6:2d:75:41:31:4d:4a:21 root@tools-exec-gift-trusty
69The key's randomart image is:
70+--[ RSA 2048]----+
71| .++E.O*+.. |
72| .o..B =.o |
73| .. * . |
74| o o |
75| o S |
76| . . |
77| . |
78| |
79| |
80+-----------------+
81Generating public/private dsa key pair.
82Your identification has been saved in /etc/ssh/ssh_host_dsa_key.
83Your public key has been saved in /etc/ssh/ssh_host_dsa_key.pub.
84The key fingerprint is:
850c:93:f4:6c:9b:c7:bd:1e:2c:27:b4:46:be:d4:ba:81 root@tools-exec-gift-trusty
86The key's randomart image is:
87+--[ DSA 1024]----+
88| . |
89| . + |
90| + + |
91| = + . |
92| S = . |
93| * + . |
94| E O * |
95| o O . |
96| +.. |
97+-----------------+
98Generating public/private ecdsa key pair.
99Your identification has been saved in /etc/ssh/ssh_host_ecdsa_key.
100Your public key has been saved in /etc/ssh/ssh_host_ecdsa_key.pub.
101The key fingerprint is:
1021d:3f:14:5c:c5:ce:34:ed:fd:f9:1a:f9:01:d1:3f:62 root@tools-exec-gift-trusty
103The key's randomart image is:
104+--[ECDSA 256]---+
105| ....oo|
106| ....+|
107| . .. *o|
108| . + . *|
109| S . oE .+|
110| ..ooo|
111| o..|
112| oo|
113| ...|
114+-----------------+
115Generating public/private ed25519 key pair.
116Your identification has been saved in /etc/ssh/ssh_host_ed25519_key.
117Your public key has been saved in /etc/ssh/ssh_host_ed25519_key.pub.
118The key fingerprint is:
1192a:cc:f2:0c:f8:71:d1:02:ed:b9:ad:4a:c0:f3:94:77 root@tools-exec-gift-trusty
120The key's randomart image is:
121+--[ED25519 256--+
122| |
123| . |
124| . . |
125|. o.o |
126|.o o=..ES |
127| o+o.=.. |
128|. =.* o |
129| o B o |
130| o.+ |
131+-----------------+
132 * Starting Signal sysvinit that local filesystems are mounted[74G[ OK ]
133 * Starting configure network device security[74G[ OK ]
134 * Stopping Mount filesystems on boot[74G[ OK ]
135 * Starting flush early job output to logs[74G[ OK ]
136 * Stopping Failsafe Boot Delay[74G[ OK ]
137 * Starting System V initialisation compatibility[74G[ OK ]
138 * Stopping flush early job output to logs[74G[ OK ]
139 * Starting D-Bus system message bus[74G[ OK ]
140 * Starting configure virtual network devices[74G[ OK ]
141 * Starting NFSv4 id <-> name mapper[74G[ OK ]
142 * Starting SystemD login management service[74G[ OK ]
143 * Stopping rpcsec_gss daemon[74G[ OK ]
144 * Starting Bridge file events into upstart[74G[ OK ]
145 * Starting system logging daemon[74G[ OK ]
146Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
147 * Starting Handle applying cloud-config[74G[ OK ]
148 * Starting AppArmor profiles [80G [74G[ OK ]
149 * Setting up X socket directories... [80G [74G[ OK ]
150 * Stopping System V initialisation compatibility[74G[ OK ]
151 * Starting System V runlevel compatibility[74G[ OK ]
152 * Starting Salt Minion[74G[ OK ]
153 * Starting save kernel messages[74G[ OK ]
154 * Starting regular background program processing daemon[74G[ OK ]
155 * Stopping save kernel messages[74G[ OK ]
156 * Starting OpenSSH server[74G[ OK ]
157 * Starting CPU interrupts balancing daemon[74G[ OK ]
158Turning on process accounting, file set to '/var/log/account/pacct'.
159 * Done.
160Cloud-init v. 0.7.5 running 'modules:config' at Fri, 17 Feb 2017 12:42:13 +0000. Up 11.40 seconds.
161 * Starting MTA [80G [74G[ OK ]
162 * Starting Name Service Cache Daemon nscd [80G [74G[ OK ]
163hostname: Name or service not known
164 * Starting LDAP connection daemon nslcd [80G [74G[ OK ]
165Generating locales...
166 en_US.UTF-8... up-to-date
167Generation complete.
168 * Starting puppet agent [80G [74G[ OK ]
169 * Restoring resolver state... [80G [74G[ OK ]
170 * Stopping System V runlevel compatibility[74G[ OK ]
171 * Stopping Handle applying cloud-config[74G[ OK ]
172Cloud-init v. 0.7.5 running 'modules:final' at Fri, 17 Feb 2017 12:42:20 +0000. Up 18.50 seconds.
173+ echo 'Enabling console logging for puppet while it does the initial run'
174Enabling console logging for puppet while it does the initial run
175+ echo 'daemon.* |/dev/console'
176+ restart rsyslog
177rsyslog start/running, process 1283
178+ /sbin/vgdisplay -c vd
179File descriptor 3 (socket:[10411]) leaked on vgdisplay invocation. Parent PID 1274: /bin/bash
180 Volume group "vd" not found
181+ echo 'Creating the volume group'
182Creating the volume group
183+ /sbin/parted -s /dev/vda print
184Model: Virtio Block Device (virtblk)
185Disk /dev/vda: 42.9GB
186Sector size (logical/physical): 512B/512B
187Partition Table: msdos
188
189Number Start End Size Type File system Flags
190 1 32.3kB 19.5GB 19.5GB primary ext4
191 2 19.5GB 20.0GB 512MB primary linux-swap(v1)
192
193+ /sbin/parted -ms /dev/vda print
194BYT;
195/dev/vda:42.9GB:virtblk:512:512:msdos:Virtio Block Device;
1961:32.3kB:19.5GB:19.5GB:ext4::;
1972:19.5GB:20.0GB:512MB:linux-swap(v1)::;
198+ /sbin/parted -s /dev/vda print free
199Model: Virtio Block Device (virtblk)
200Disk /dev/vda: 42.9GB
201Sector size (logical/physical): 512B/512B
202Partition Table: msdos
203
204Number Start End Size Type File system Flags
205 1 32.3kB 19.5GB 19.5GB primary ext4
206 19.5GB 19.5GB 279kB Free Space
207 2 19.5GB 20.0GB 512MB primary linux-swap(v1)
208 20.0GB 42.9GB 23.0GB Free Space
209
210+ /sbin/parted -ms /dev/vda print free
211BYT;
212/dev/vda:42.9GB:virtblk:512:512:msdos:Virtio Block Device;
2131:32.3kB:19.5GB:19.5GB:ext4::;
2141:19.5GB:19.5GB:279kB:free;
2152:19.5GB:20.0GB:512MB:linux-swap(v1)::;
2161:20.0GB:42.9GB:23.0GB:free;
217++ /sbin/parted -ms /dev/vda print free
218++ /usr/bin/tail -n 1
219++ /usr/bin/cut -d : -f 2,3 '--output-delimiter= '
220+ /sbin/parted -s /dev/vda mkpart primary 20.0GB 42.9GB
221++ /sbin/parted -ms /dev/vda print
222++ /usr/bin/cut -d : -f 1
223++ /usr/bin/tail -n 1
224+ part=3
225+ '[' 3 '!=' '' ']'
226+ '[' 3 -gt 1 ']'
227+ /sbin/parted -s /dev/vda set 3 lvm on
228+ /sbin/pvcreate /dev/vda3
229File descriptor 3 (socket:[10411]) leaked on pvcreate invocation. Parent PID 1274: /bin/bash
230 Physical volume "/dev/vda3" successfully created
231+ /sbin/vgcreate vd /dev/vda3
232File descriptor 3 (socket:[10411]) leaked on vgcreate invocation. Parent PID 1274: /bin/bash
233 Volume group "vd" successfully created
234+ /sbin/partprobe
235++ curl http://169.254.169.254/openstack/latest/meta_data.json/
236++ sed -r 's/^.*project_id\": \"//'
237++ sed -r 's/\".*$//g'
238 % Total % Received % Xferd Average Speed Time Time Time Current
239 Dload Upload Total Spent Left Speed
240 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0100 908 100 908 0 0 128k 0 --:--:-- --:--:-- --:--:-- 147k
241+ project=tools
242++ curl http://169.254.169.254/1.0/meta-data/local-ipv4
243+ ip=10.68.21.65
244++ hostname
245+ hostname=tools-exec-gift-trusty
246++ hostname -d
247++ sed -r 's/.*\.([^.]+\.[^.]+)$/\1/'
248hostname: Name or service not known
249+ domain=
250+ fqdn=tools-exec-gift-trusty.tools.
251+ saltfinger=c5:b1:35:45:3e:0a:19:70:aa:5f:3a:cf:bf:a0:61:dd
252+ '[' '' == eqiad.wmflabs ']'
253+ '[' '' == codfw.wmflabs ']'
254+ sed -i s/_PROJECT_/tools/g /etc/security/access.conf
255+ sed -i s/_PROJECT_/tools/g /etc/ldap/ldap.conf
256+ sed -i s/_PROJECT_/tools/g /etc/sudo-ldap.conf
257+ sed -i s/_PROJECT_/tools/g /etc/nslcd.conf
258+ sed -i s/_FQDN_/tools-exec-gift-trusty.tools./g /etc/puppet/puppet.conf
259+ sed -i s/_MASTER_//g /etc/puppet/puppet.conf
260+ echo ''
261+ mkdir /etc/dhcp/dhclient-enter-hooks.d
262mkdir: cannot create directory '/etc/dhcp/dhclient-enter-hooks.d': File exists
263+ cat
264++ /usr/bin/dig +short labs-recursor0.wikimedia.org
265+ nameserver=208.80.155.118
266+ cat
267+ echo '10.68.21.65 tools-exec-gift-trusty.tools.'
268+ /etc/init.d/nslcd restart
269hostname: Name or service not known
270 * Restarting LDAP connection daemon nslcd
271 ...done.
272+ /etc/init.d/nscd restart
273 * Restarting Name Service Cache Daemon nscd
274 ...done.
275+ dpkg-reconfigure -fnoninteractive -pcritical openssh-server
276ssh stop/waiting
277ssh start/running, process 1457
278+ /etc/init.d/ssh stop
279ssh stop/waiting
280+ /etc/init.d/ssh start
281ssh start/running, process 1490
282+ nscd -i hosts
283+ echo tools-exec-gift-trusty.tools.
284+ echo -e 'master: \n'
285+ echo 'id: tools-exec-gift-trusty.tools.'
286+ echo 'master_finger: c5:b1:35:45:3e:0a:19:70:aa:5f:3a:cf:bf:a0:61:dd'
287+ echo tools-exec-gift-trusty.tools.
288+ /etc/init.d/salt-minion restart
289salt-minion stop/waiting
290salt-minion start/running, process 1501
291+ puppet agent --enable
292+ puppet agent --onetime --verbose --no-daemonize --no-splay --show_diff --waitforcert=10 --certname=tools-exec-gift-trusty.tools. --server=
293[0;32mInfo: Creating a new SSL key for tools-exec-gift-trusty.tools.[0m
294[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
295[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
296[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
297[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
298[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
299[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
300[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
301[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
302[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
303[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
304[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
305[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
306[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
307[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
308[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
309[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
310[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
311[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
312[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
313[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
314[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
315[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
316[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
317[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
318[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
319[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
320[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
321[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
322[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
323[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
324[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
325[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
326[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
327[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
328[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
329[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
330[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
331[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
332[1;31mError: Could not request certificate: Connection refused - connect(2)[0m
333[1;31mError: Could not request certificate: Connection refused - connect(2)[0m

ok well, I created tools-exec-gift-trusty-01 successfully but it has failed to migrate to the tools specific master :)

Error: Could not retrieve catalog from remote server: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed: [self signed certificate in certificate chain for /CN=Puppet CA: tools-puppetmaster-02.tools.eqiad.wmflabs]
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
Error: Could not send report: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed: [self signed certificate in certificate chain for /CN=Puppet CA: tools-puppetmaster-02.tools.eqiad.wmflabs]
root@tools-exec-gift-trusty-01:~#

OK @Giftpflanze.

tools-exec-gift-trusty-01.tools.eqiad.wmflabs

qconf -Ae /var/lib/gridengine/etc/exechosts/tools-exec-gift.tools.eqiad.wmflabs
exechost "tools-exec-gift.eqiad.wmflabs" already exists

qconf -mq giftbot

qname                 giftbot
hostlist              tools-exec-gift.eqiad.wmflabs \
                      tools-exec-gift-trusty-01.tools.eqiad.wmflabs

root@tools-bastion-03.tools.eqiad.wmflabs modified "giftbot" in cluster queue list

/usr/bin/qconf -sel | grep -e gift | grep trusty
tools-exec-gift-trusty-01.tools.eqiad.wmflabs

qmod -e "*@tools-exec-gift-trusty-01.tools.eqiad.wmflabs"
Queue instance "giftbot@tools-exec-gift-trusty-01.tools.eqiad.wmflabs" is already in the specified state: enabled

qhost -q -h tools-exec-gift-trusty-01.tools.eqiad.wmflabs

HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
tools-exec-gift-trusty-01.tools.eqiad.wmflabs -               -     -       -       -       -       -
   giftbot              BC    0/0/1000      au
chasemp updated the task description. (Show Details)Feb 17 2017, 8:20 PM

@Giftpflanze friendly ping this is ready for you to migrate off Precise, or let us know what's not working out. 35 days about until we have to turn off the existing tools-exec-gift

Killing the Precise instance (and defaulting to Trusty) shouldn't pose a problem. I switched my code to Trusty (I didn't even know anymore that I had to hardcode that) and I will test it with the next regular run starting on the 1st of March. (Also, I'm kinda swamped with school stuff but I'm glad to have found a minute for this.)

A note from yesterday:

annika: chasemp: can you take a look at job 1820616, please? it doesn't schedule on tools-exec-gift-trusty-01.tools.eqiad.wmflabs and i don't exactly see why

root@tools-bastion-03:~# qstat -j 1820616
==============================================================
job_number:                 1820616
exec_file:                  job_scripts/1820616
submission_time:            Wed Mar  1 00:07:04 2017
owner:                      tools.giftbot
uid:                        51072
group:                      tools.giftbot
gid:                        51072
sge_o_home:                 /data/project/giftbot
sge_o_log_name:             tools.giftbot
sge_o_path:                 /usr/bin:/bin
sge_o_shell:                /bin/sh
sge_o_tz:                   Europe/Berlin
sge_o_workdir:              /mnt/nfs/labstore-secondary-tools-project/giftbot
sge_o_host:                 tools-cron-01
account:                    sge
stderr_path_list:           NONE:NONE:dwl01.out-20170301
merge:                      y
hard resource_list:         h_vmem=524288k,release=trusty
mail_list:                  tools.giftbot@tools.wmflabs.org
notify:                     FALSE
job_name:                   dwl01
stdout_path_list:           NONE:NONE:dwl01.out-20170301
jobshare:                   0
hard_queue_list:            giftbot
env_list:
script_file:                /mnt/nfs/labstore-secondary-tools-project/giftbot/dwl01.tcl
scheduling info:            queue instance "webgrid-lighttpd@tools-webgrid-lighttpd-1201.eqiad.wmflabs" dropped because it is disabled
                            cannot run in queue "webgrid-lighttpd" because it is not contained in its hard queue list (-q)
                            (-l h_vmem=524288k,release=trusty) cannot run at host "tools-exec-gift.eqiad.wmflabs" because it offers only hf:release=precise
                            cannot run in queue "mailq" because it is not contained in its hard queue list (-q)
                            cannot run in queue "task" because it is not contained in its hard queue list (-q)
                            cannot run in queue "continuous" because it is not contained in its hard queue list (-q)
                            (-l h_vmem=524288k,release=trusty) cannot run in queue "giftbot@tools-exec-gift-trusty-01.tools.eqiad.wmflabs" because job requests unknown resource (release)
                            cannot run in queue "webgrid-generic" because it is not contained in its hard queue list (-q)
Giftpflanze closed this task as Resolved.Mar 6 2017, 9:24 PM
Giftpflanze claimed this task.

< yuvipanda> !log tools set complex_values slots=300,release=trusty for tools-exec-gift-trusty-01.tools.eqiad.wmflabs
This seems to have done the trick, thank you!
@chasemp The old instance can be cleaned up now.