Page MenuHomePhabricator

move mw241[2-9].codfw.wmnet into production
Closed, ResolvedPublic

Description

follow-up task after T290192 is resolved

a couple new mw hosts in codfw are not fully in production yet.

see the last couple comments on the task above

roles added: https://gerrit.wikimedia.org/r/785147
conftool-data: https://gerrit.wikimedia.org/r/785918

  • mw2412
  • mw2413
  • mw2414
  • mw2415
  • mw2416
  • mw2417
  • mw2418
  • mw2419

Event Timeline

roles added: https://gerrit.wikimedia.org/r/785147

conftool-data: https://gerrit.wikimedia.org/r/785918


Mentioned in SAL (#wikimedia-operations) [2022-04-26T19:48:18Z] <mutante> mw2419 - set weight to 25 in conftool, scap pull, first time in production, jobrunner/videoscaler T290192


after https://gerrit.wikimedia.org/r/c/operations/puppet/+/785918 the conftool-data change does not appear on https://config-master.wikimedia.org/pybal/codfw/ ?

mw241[2-9] where pooled in an incident this morning (accidentally depool and pool of codfw datacenter) . I run a scap pull on all machines to make sure they are up to date.

I've just red about the conftool-data change being not visible on config-master. However sudo -i confctl select "name=mw241[2-9].codfw.wmnet" get returns all the new mw hosts fine.

@Dzahn do you think this hosts should be depooled again until the issue with config-master was found? I'll try to find out why the new hosts are missing meanwhile.

@Jelto what is the config-master issue exactly?
I see the boxes here

https://config-master.wikimedia.org/pybal/codfw/api-https

and here

https://config-master.wikimedia.org/pybal/codfw/appservers-https

and here

https://config-master.wikimedia.org/pybal/codfw/jobrunner

Maybe you got confused by the stale files there that we should remove for the non-https LVSes?

Jelto claimed this task.
Jelto updated the task description. (Show Details)

Maybe you got confused by the stale files there that we should remove for the non-https LVSes?

exactly this ^. Thanks for clarification.

New mw hosts are also in config-master (https LVSes). I'm closing this task, the hosts are in production now.

@Jelto mw2412 is not pooled. expected?

Maybe you got confused by the stale files there that we should remove for the non-https LVSes?

I _thought_ I had checked the https versions as well when I wrote that. But maybe not.

After yesterdays incident mw2412 got depooled again to restore the state before the incident (see SAL). I'm going to adjust this and pool mw2412 again. This host is ready for production similar to the other hosts of mw241[2-9].