Page MenuHomePhabricator

Service implementation for wdqs101[4,5,6]
Closed, ResolvedPublic

Description

Creating this ticket to track work required to bring wqds hosts wdqs101[4,5,6] into service.

Event Timeline

Change 821785 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] wdqs: bring more hosts online

https://gerrit.wikimedia.org/r/821785

Change 821785 merged by Bking:

[operations/puppet@production] wdqs: bring more hosts online

https://gerrit.wikimedia.org/r/821785

Mentioned in SAL (#wikimedia-operations) [2022-08-09T19:55:27Z] <bking@cumin1001> START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on wdqs1015.eqiad.wmnet with reason: T314890

Mentioned in SAL (#wikimedia-operations) [2022-08-09T19:55:51Z] <bking@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on wdqs1015.eqiad.wmnet with reason: T314890

Mentioned in SAL (#wikimedia-operations) [2022-08-09T19:56:26Z] <bking@cumin1001> START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: T314890

Mentioned in SAL (#wikimedia-operations) [2022-08-09T19:56:40Z] <bking@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: T314890

Mentioned in SAL (#wikimedia-operations) [2022-08-09T19:57:06Z] <bking@cumin1001> START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on wdqs1014.eqiad.wmnet with reason: T314890

Mentioned in SAL (#wikimedia-operations) [2022-08-09T19:57:12Z] <bking@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on wdqs1014.eqiad.wmnet with reason: T314890

On new hosts only, the data transfer cookbook fails on the repooling step. I believe this is because the newly-provisioned server is not yet enabled in the load balancer pool.

To enable it, run the following command from cumin:

confctl select name=wdqs1014.eqiad.wmnet set/weight=10:pooled=yes

Mentioned in SAL (#wikimedia-operations) [2022-08-22T13:37:47Z] <bking@cumin1001> START - Cookbook sre.hosts.downtime for 3:00:00 on wdqs[1014-1016].eqiad.wmnet with reason: T314890

Mentioned in SAL (#wikimedia-operations) [2022-08-22T13:38:02Z] <bking@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on wdqs[1014-1016].eqiad.wmnet with reason: T314890

This should be finished, closing...