Page MenuHomePhabricator

Puppet broken on VMs in deployment-prep
Closed, ResolvedPublic

Description

Due to some services moving to k8s in production, a few puppet classes have been deleted, breaking the following VMs:

deployment-mathoid.deployment-prep.eqiad.wmflabs
deployment-sca[01-02].deployment-prep.eqiad.wmflabs

With puppet broken there, those VMs will soon stop working entirely. I know that actually setting up a k8s cluster in deployment-prep is a big job but I don't know what else to suggest.

The current name server those VMs are using will be removed in a couple of days, at which point they will probably cease to function and may become unreachable.

Related: T220235 T221183 T218609#5034042 T200832#5122766

Event Timeline

Andrew created this task.Apr 23 2019, 3:42 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 23 2019, 3:42 PM
Andrew updated the task description. (Show Details)Apr 23 2019, 3:46 PM
Joe added a subscriber: Joe.May 9 2019, 3:26 PM

The way to go for such things is to use role::beta::docker_services on a fresh VM.

I've already created deployment-docker-mathoid01 that should replace the old mathoid server once all the hiera vars and the mediawiki config are changed (I won't do that part).

Change 509595 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] deployment-prep: Move to working Mathoid service

https://gerrit.wikimedia.org/r/509595

Change 509596 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/mediawiki-config@master] deployment-prep: Move to working Mathoid service

https://gerrit.wikimedia.org/r/509596

Change 509596 merged by jenkins-bot:
[operations/mediawiki-config@master] deployment-prep: Move to working Mathoid service

https://gerrit.wikimedia.org/r/509596

Change 509595 merged by Alexandros Kosiaris:
[operations/puppet@production] deployment-prep: Move to working Mathoid service

https://gerrit.wikimedia.org/r/509595

deleted deployment-mathoid

Mentioned in SAL (#wikimedia-releng) [2019-05-15T02:35:10Z] <Krenair> Logged into deployment-sca0[12] as root and given them the correct nameservers to try to unbreak things. T221654

Removed citoid role from them, and now:
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Could not find class role::cxserver for deployment-sca01.deployment-prep.eqiad.wmflabs on node deployment-sca01.deployment-prep.eqiad.wmflabs

Krenair closed this task as Resolved.May 15 2019, 6:58 PM
Krenair claimed this task.

Cxserver migration commits are https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/510588/ and https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/510586/
Removed role::cxserver and puppet now runs on these sca instances.

Dzahn awarded a token.May 15 2019, 9:02 PM