Research what is running on tools-services nodes and attempt to get them running on Stretch in the toolsbeta cluster.
Puppetize all loose ends.
Task from WMCS 2018 offsite meetings.
Research what is running on tools-services nodes and attempt to get them running on Stretch in the toolsbeta cluster.
Puppetize all loose ends.
Task from WMCS 2018 offsite meetings.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • GTirloni | T207591 tools-services: Migrate to Stretch | |||
Resolved | • GTirloni | T208221 tools-service: Build missing packages on Stretch | |||
Resolved | • GTirloni | T208357 toolforge - Deprecate BigBrother in Grid Engine | |||
Resolved | bd808 | T211684 Toolforge: Port sge.py stats to Prometheus | |||
Resolved | • Bstorm | T215845 Add monitoring for disabled grid nodes to the prometheus collector | |||
Open | None | T213567 Toolforge: refresh grafana dashboard | |||
Resolved | aborrero | T211977 toolforge: webservicemonitor for stretch/sge |
From https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin#Services:
These are services that run off service manifests for each tool - currently just the webservicemonitor service. They're in warm standby requiring manual switchover. tools-services-01 and tools-service-02 both have the exact same code running, but only one of them is 'active' at a time. Which one is determined by the puppet role param role::labs::tools::services::active_host. Set that via [[1]] to the fqdn of the host that should be 'active' and run puppet on all the services hosts. This will start the services in appropriate hosts and stop them in the appropriate hosts. Since services should not have any internal state, they can be run from any host without having to switch back compulsorily. Bigbrother also runs on this host, via upstart. The log file can be found in /var/log/upstart/bigbrother.log.
Relevant files in Puppet:
Part of the Puppet configuration comes from OpenStack through the use of a prefix for tools-services hosts:
Role: role::toollabs::services Parameters: - active_host: 'tools-services-01.tools.eqiad.wmflabs'
@aborrero I've created toolsbeta-services-01 to serve as a testbed for trying this on Stretch.
Some points as discussed on IRC:
Possible next steps:
Change 469614 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] toolforge: bootstrap service node puppet code
Change 469614 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] toolforge: refactor/bootstrap service node puppet code
Change 470386 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] tools-services: Fix typo in updatetools service exec path
Change 470386 merged by GTirloni:
[operations/puppet@production] tools-services: Fix typo in updatetools service exec path
Change 470397 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] toolforge: add missing grid base profile to services role
Change 470397 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] toolforge: add missing grid base profile to services role
Removing bigbrother from the next iteration of services would make sense. I might see if it is possible to put it on a bastion for now, but an alternative is to simply communicate that we are dropping it to these folks: P7717
There's a couple big names in there.
Change 470683 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] tools-services: Add updatetools_enabled key
Change 470683 merged by GTirloni:
[operations/puppet@production] tools-services: Add updatetools_enabled key
Is all done. Specifically in T213421: Toolforge: move services nodes from eqiad to eqiad1