Page MenuHomePhabricator

Phase out Nodepool from production
Closed, ResolvedPublic

Assigned To
Authored By
hashar
Nov 13 2018, 2:22 PM
Referenced Files
None
Tokens
"Love" token, awarded by Jdforrester-WMF."Barnstar" token, awarded by greg."Yellow Medal" token, awarded by thcipriani."Love" token, awarded by chasemp.

Description

All CI jobs have been migrated off of Nodepool (T190097) This task is to decommission Nodepool from the infrastructure.

WMCS

  • delete leftover instances
  • delete images and snapshots
  • purge custom disk images from WMCS infrastructure
  • delete contintcloud WMCS project - T209644
  • disable nodepoolmanager user from LDAP - T217064
  • clean up router/firewall rules between production and wmcs - T215173

Prod / puppet

See also server decom T209642

  • archive operations/debs/nodepool
  • remove nodepool package from apt.wikimedia.org
  • Kept around per operations
  • clean up puppet
  • drop sudo rules
  • clean up passwords/tokens from private repository done via T212230
  • remove firewall rules on contint1001/contint2001 (ferm should clean them up though)
  • drop database nodepooldb and user 'nodepool'@'10.64.16.155' on MySQL database (production-m5) - T212230

Misc

  • Delete Grafana boards
  • Purge Graphite metrics - T215172
  • Archive wiki documentation

Jenkins

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I have migrated the last jobs still using Nodepool. Ready to phase out Nodepool :-]

Change 473202 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Remove Nodepool / diskimage-builder material

https://gerrit.wikimedia.org/r/473202

Change 473202 merged by jenkins-bot:
[integration/config@master] Remove Nodepool / diskimage-builder material

https://gerrit.wikimedia.org/r/473202

Change 473824 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Clean Zuul config

https://gerrit.wikimedia.org/r/473824

Change 473827 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Clean JJB config

https://gerrit.wikimedia.org/r/473827

Change 473824 merged by jenkins-bot:
[integration/config@master] Clean Zuul config

https://gerrit.wikimedia.org/r/473824

Change 473827 merged by jenkins-bot:
[integration/config@master] Clean JJB config

https://gerrit.wikimedia.org/r/473827

Change 473834 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] nodepool: labtestservices2003 is not used for testing

https://gerrit.wikimedia.org/r/473834

Change 473846 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] ci: stop monitoring zmq on Jenkins

https://gerrit.wikimedia.org/r/473846

Mentioned in SAL (#wikimedia-operations) [2018-11-15T21:05:38Z] <hashar> Stopped nodepool on labnodepool1001.eqiad.wmnet . Service is no more used. T209361 T209642

Change 473834 merged by Andrew Bogott:
[operations/puppet@production] nodepool: labtestservices2003 is not used for testing

https://gerrit.wikimedia.org/r/473834

Mentioned in SAL (#wikimedia-releng) [2018-11-23T11:09:22Z] <hashar> Jenkins: removing plugins "Single Use Slave" and "Event Publisher (via ZMQ PUB SUB)". Were used for Nodepool | T209361

greg triaged this task as Medium priority.Nov 27 2018, 9:13 PM

Change 473846 merged by Andrew Bogott:
[operations/puppet@production] ci: stop monitoring zmq on Jenkins

https://gerrit.wikimedia.org/r/473846

hashar updated the task description. (Show Details)

Change 480546 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] admin: remove CI sudo rule for "nodepool"

https://gerrit.wikimedia.org/r/480546

Change 480659 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] nodepool is gone, no need to assign a cluster

https://gerrit.wikimedia.org/r/480659

Change 480659 merged by Andrew Bogott:
[operations/puppet@production] nodepool is gone, no need to assign a cluster

https://gerrit.wikimedia.org/r/480659

Change 480663 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] cumin: remove nodepool from misc-releng group

https://gerrit.wikimedia.org/r/480663

Change 480663 merged by CRusnov:
[operations/puppet@production] cumin: remove nodepool from misc-releng group

https://gerrit.wikimedia.org/r/480663

Change 480546 merged by Muehlenhoff:
[operations/puppet@production] admin: remove CI sudo rule for "nodepool"

https://gerrit.wikimedia.org/r/480546

Change 481201 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] contint: remove unused classes

https://gerrit.wikimedia.org/r/481201

Change 481201 merged by Dzahn:
[operations/puppet@production] contint: remove unused classes

https://gerrit.wikimedia.org/r/481201

Mentioned in SAL (#wikimedia-releng) [2019-02-04T15:25:43Z] <hashar> removed Jenkins user "nodepoolmanager" as well as related authorizations | T209361

I have filled sub tasks for the other teams to act on :)

hashar changed the task status from Open to Stalled.Feb 4 2019, 3:35 PM

Change 488019 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] jenkins: stop purging nodepool agents config history

https://gerrit.wikimedia.org/r/488019

Change 488019 merged by Dzahn:
[operations/puppet@production] jenkins: stop purging nodepool agents config history

https://gerrit.wikimedia.org/r/488019

Mentioned in SAL (#wikimedia-operations) [2019-02-05T18:17:39Z] <mutante> contint1001/contint2001 -manually deleting crontab lines unpuppetized in gerrit:488019 (T209361)

I caught up with some more cleanup. Last thing to do is to disable the nodepoolmanager user in LDAP. I am not quite sure how to do that though.

I caught up with some more cleanup. Last thing to do is to disable the nodepoolmanager user in LDAP. I am not quite sure how to do that though.

Filled as sub task T217064

hashar updated the task description. (Show Details)

All clean up tasks conducted.

The remaining one is to dispose of the hardware which is T209642.