toolforge: webservicemonitor for stretch/sge
Closed, ResolvedPublic

Description

We need to make the webservicemonitor stuff working in our new Toolforge.

Hopefully we can redesign this stuff so services nodes aren't grid submit hosts, but:

webservicemonitor.py
def run(self):
    qstat_xml = ET.fromstring(subprocess.check_output(['/usr/bin/qstat', '-u', '*', '-xml']))

Package source code is at https://gerrit.wikimedia.org/r/operations/software/tools-manifest

aborrero created this task.Dec 14 2018, 2:11 PM
aborrero triaged this task as Normal priority.
aborrero updated the task description. (Show Details)
aborrero moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.

webservicemonitor is just a service to start/restart tools, just like bigbrother, but for web services.
For this, it interacts with the grid.

Change 479737 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] toolforge: webservicemonitor is now in cron nodes

https://gerrit.wikimedia.org/r/479737

Change 479737 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] toolforge: webservicemonitor is now in cron nodes

https://gerrit.wikimedia.org/r/479737

The tools-manifest package is missing:

aborrero@toolsbeta-sgecron-01:~$ sudo puppet agent -t -v
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for toolsbeta-sgecron-01.toolsbeta.eqiad.wmflabs
Notice: /Stage[main]/Base::Environment/Tidy[/var/tmp/core]: Tidying 0 files
Info: Applying configuration version '1545053879'
Error: Could not update: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install tools-manifest' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package tools-manifest
Error: /Stage[main]/Profile::Toolforge::Grid::Webservicemonitor/Package[tools-manifest]/ensure: change from purged to latest failed: Could not update: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install tools-manifest' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package tools-manifest

Mentioned in SAL (#wikimedia-cloud) [2018-12-17T13:46:11Z] <arturo> T211977 aborrero@tools-services-01:~$ sudo aptly repo move trusty-tools stretch-toolsbeta 'tools-manifest (=0.12)'

Change 480085 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/software/tools-manifest@master] tools-manifest: bump debhelper version to 11

https://gerrit.wikimedia.org/r/480085

Change 480086 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/software/tools-manifest@master] tools-manifest: add systemd support

https://gerrit.wikimedia.org/r/480086

Change 480132 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/software/tools-manifest@master] tools-manifest: bump debhelper version to 11

https://gerrit.wikimedia.org/r/480132

Change 480134 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/software/tools-manifest@master] tools-manifest: bump debhelper version to 11

https://gerrit.wikimedia.org/r/480134

Change 480135 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/software/tools-manifest@stretch] tools-manifest: bump debhelper version to 11

https://gerrit.wikimedia.org/r/480135

Change 480135 merged by Arturo Borrero Gonzalez:
[operations/software/tools-manifest@stretch] tools-manifest: bump debhelper version to 11

https://gerrit.wikimedia.org/r/480135

Change 480141 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/software/tools-manifest@stretch] tools-manifest: add systemd support

https://gerrit.wikimedia.org/r/480141

Change 480141 merged by Arturo Borrero Gonzalez:
[operations/software/tools-manifest@stretch] tools-manifest: add systemd support

https://gerrit.wikimedia.org/r/480141

Mentioned in SAL (#wikimedia-cloud) [2018-12-17T19:02:04Z] <arturo> T211977 add package tools-manifest 0.13 to stretch-tools & stretch-toolsbeta in aptly

aborrero closed this task as Resolved.Mon, Dec 17, 7:20 PM

This seems to work. The webservicemonitor daemon now runs on cronrunner nodes, so services nodes aren't grid submit nodes.