Page MenuHomePhabricator

Puppet broken on mediawiki instances in deployment-prep
Closed, ResolvedPublic

Description

Puppet on mediawiki instances in deployment-prep (beta) is failing since August 3. Message when run:

Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Operator '[]' is not applicable to an Undef Value. (file: /etc/puppet/modules/profile/manifests/mediawiki/mcrouter_wancache.pp, line: 69, column: 30) on node deployment-mediawiki12.deployment-prep.eqiad1.wikimedia.cloud

When looking at the specified file and line number, I see

$wikifunctions_servers = $servers_by_datacenter_category['wikifunctions'][$::site]

which was introduced in I482290db26c029b1a9494d7783a154d69fd40f82 (https://gerrit.wikimedia.org/r/c/operations/puppet/+/944248) by @Joe on Aug 1, so I'm guessing that's the cause. Hoping he can sort this out.

Event Timeline

ArielGlenn created this task.

Note that since dumps snapshot instances are sorta-kinda mediawiki instances, this affects them too.

This is caused by wikifunctions definition missing from profile::mediawiki::mcrouter_wancache::shards in hieradata/cloud/eqiad1/deployment-prep/common.yaml. Adding an empty declaration fixes the puppet symptom, but I'm 95% sure Wikifunctions should have some memcached routes in beta, too.

I placed the following locally on beta to unbreak Puppet (for an unrelated issue):

root@deployment-puppetmaster04:/var/lib/git/operations/puppet(production u+8)# git show HEAD
commit 1d88723fb64936b36d81373781d7e8d7072b486a (HEAD -> production)
Author: Martin Urbanec <murbanec@wikimedia.org>
Date:   Wed Aug 23 15:15:32 2023 +0000

    [BETA HACK] unbreak puppet

diff --git a/hieradata/cloud/eqiad1/deployment-prep/common.yaml b/hieradata/cloud/eqiad1/deployment-prep/common.yaml
index dd1eefbaace..4f8899c9010 100644
--- a/hieradata/cloud/eqiad1/deployment-prep/common.yaml
+++ b/hieradata/cloud/eqiad1/deployment-prep/common.yaml
@@ -556,6 +556,8 @@ profile::mediawiki::mcrouter_wancache::shards:
         host: deployment-memc08
       shard02:
         host: deployment-memc09
+  wikifunctions:
+     eqiad: {}
 
 profile::query_service::federation_user_agent: 'Wikimedia Commons Query Service; test'
 profile::prometheus::memcached_exporter::arguments: ''
root@deployment-puppetmaster04:/var/lib/git/operations/puppet(production u+8)#

CC @Jdforrester-WMF for an actual fix, as I expect they'd know what memcached instance would be suitable for beta. I tried to inspire using production config, which uses separate memcached servers. I guess that means we would need a deployment-memc11 on beta to do that?

It's fine for this to be blank and fall back to the main memcached servers. No need to provision a new set in Beta Cluster.

Change 952000 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/puppet@production] mediawiki::mcrouter_wancache: add wikifunctions entry

https://gerrit.wikimedia.org/r/952000

Change 952000 merged by Effie Mouzeli:

[operations/puppet@production] mediawiki::mcrouter_wancache: add wikifunctions entry

https://gerrit.wikimedia.org/r/952000

jijiki changed the task status from Open to In Progress.Aug 29 2023, 2:41 PM
jijiki added subscribers: Urbanecm, jijiki.

@ArielGlenn please mark it as resolved if everything is as expected. (thanks @Urbanecm!)

Thanks for merging the patch in operations/puppet! I've removed the beta specific patch and verified Puppet still runs correctly.

ArielGlenn assigned this task to Urbanecm_WMF.

Thanks for the fix(es), everything is working as expected now.