Page MenuHomePhabricator

Update toolhub helm chart to use the mcrouter helm chart module
Open, HighPublic

Description

toolhub is tightly coupled to how mediawiki choose to configure mcrouter in k8s. Since the mediawiki chart moved to the cache.mcrouter helm chart module, toolhub needs to follow:

https://gerrit.wikimedia.org/r/c/operations/puppet/+/878004
https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/870903
https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/874908

This blocks T327664: Update staging-eqiad to k8s 1.23

Error: Failed to render chart: exit status 1
Error: template: toolhub/templates/deployment.yaml:24:28: executing "toolhub/templates/deployment.yaml" at <include (print $.Template.BasePath "/configmap.yaml") .>: error calling include: template: toolhub/templates/configmap.yam
l:17:3: executing "toolhub/templates/configmap.yaml" at <include "mcrouter.config" .>: error calling include: template: toolhub/templates/_mcrouter_helpers.tpl:256:3: executing "mcrouter.config" at <include "mcrouter.config_template
" .>: error calling include: template: toolhub/templates/_mcrouter_helpers.tpl:37:21: executing "mcrouter.config_template" at <.Values.mw.mcrouter.pools>: nil pointer evaluating interface {}.mcrouter

The reason for this not being caught by CI is that the required data structure (from /etc/helmfile-defaults/mediawiki/mcrouter_pools.yaml) is not available in CI and toolhub provides a fixture instead.

Event Timeline

JMeybohm created this task.

I'm going to fix this short term by changing just the references to the data structure but ultimately toolhob should migrate to the cache.mcrouter module to prevent this from happening again.
Also @Joe: Do we maybe need a way in which module can provide their own fixtures, so that charts or helmfile setups don't have to?

Change 883177 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Fix reference to mcrouter pools

https://gerrit.wikimedia.org/r/883177

JMeybohm renamed this task from toolhub is undeployable since introduction of the mcrouter helm chart module to Update toolhub helm chart to use the mcrouter helm chart module.Jan 24 2023, 3:40 PM
JMeybohm updated the task description. (Show Details)

Change 883177 merged by jenkins-bot:

[operations/deployment-charts@master] Fix reference to mcrouter pools

https://gerrit.wikimedia.org/r/883177

My ragged patch does create a diff removing the proxies from the mcrouter config.json (because they are no longer present in the backing data structure I guess). toolhub can be deployed without those (did that in staging-codfw) but I'm not 100% sure if they might be required, so I did not yet deploy to staging-eqiad or somwhere production. @bd808 could you please take a look?

My ragged patch does create a diff removing the proxies from the mcrouter config.json (because they are no longer present in the backing data structure I guess). toolhub can be deployed without those (did that in staging-codfw) but I'm not 100% sure if they might be required, so I did not yet deploy to staging-eqiad or somwhere production. @bd808 could you please take a look?

The relevant section of the helmfile -e staging -i diff output is:

toolhub, toolhub-main-mcrouter-config, ConfigMap (v1) has changed:
  # Source: toolhub/templates/configmap.yaml
  apiVersion: v1
  kind: ConfigMap
  metadata:
    name: toolhub-main-mcrouter-config
    labels:
      app: toolhub
-     chart: toolhub-1.1.3
+     chart: toolhub-1.1.4
      release: main
      heritage: Helm
  data:
    config.json: |-
      {
        "pools": {
          "eqiad-servers": {
            "servers": [
            "10.64.0.124:11211:ascii:plain",
            "10.64.0.125:11211:ascii:plain",
            "10.64.0.64:11211:ascii:plain",
            "10.64.0.65:11211:ascii:plain",
            "10.64.16.140:11211:ascii:plain",
            "10.64.16.21:11211:ascii:plain",
            "10.64.16.102:11211:ascii:plain",
            "10.64.16.190:11211:ascii:plain",
            "10.64.32.133:11211:ascii:plain",
            "10.64.32.148:11211:ascii:plain",
            "10.64.32.151:11211:ascii:plain",
            "10.64.32.153:11211:ascii:plain",
            "10.64.32.157:11211:ascii:plain",
            "10.64.32.158:11211:ascii:plain",
            "10.64.48.90:11211:ascii:plain",
            "10.64.48.91:11211:ascii:plain",
            "10.64.48.92:11211:ascii:plain",
            "10.64.48.93:11211:ascii:plain"
          ]
          },
          "eqiad-servers-failover": {
            "servers": [
            "10.64.0.53:11211:ascii:plain",
            "10.64.32.101:11211:ascii:plain",
            "10.64.48.32:11211:ascii:plain"
          ]
          },
          "codfw-servers": {
            "servers": [
            "10.192.0.191:11211:ascii:plain",
            "10.192.0.22:11211:ascii:plain",
            "10.192.0.148:11211:ascii:plain",
            "10.192.0.200:11211:ascii:plain",
            "10.192.16.117:11211:ascii:plain",
            "10.192.16.116:11211:ascii:plain",
            "10.192.16.118:11211:ascii:plain",
            "10.192.16.119:11211:ascii:plain",
            "10.192.16.120:11211:ascii:plain",
            "10.192.32.79:11211:ascii:plain",
            "10.192.32.80:11211:ascii:plain",
            "10.192.32.81:11211:ascii:plain",
            "10.192.32.82:11211:ascii:plain",
            "10.192.48.74:11211:ascii:plain",
            "10.192.48.176:11211:ascii:plain",
            "10.192.48.177:11211:ascii:plain",
            "10.192.48.178:11211:ascii:plain",
            "10.192.0.205:11211:ascii:plain"
          ]
          },
          "codfw-servers-failover": {
            "servers": [
            "10.192.0.156:11211:ascii:plain",
            "10.192.16.147:11211:ascii:plain",
            "10.192.48.138:11211:ascii:plain"
-         ]
-         },
-         "eqiad-proxies": {
-           "servers": [
-           "10.64.0.124:11211:ascii:plain",
-           "10.64.0.125:11211:ascii:plain",
-           "10.64.0.64:11211:ascii:plain",
-           "10.64.0.65:11211:ascii:plain",
-           "10.64.16.140:11211:ascii:plain",
-           "10.64.16.21:11211:ascii:plain",
-           "10.64.16.102:11211:ascii:plain",
-           "10.64.16.190:11211:ascii:plain",
-           "10.64.32.133:11211:ascii:plain",
-           "10.64.32.148:11211:ascii:plain",
-           "10.64.32.151:11211:ascii:plain",
-           "10.64.32.153:11211:ascii:plain",
-           "10.64.32.157:11211:ascii:plain",
-           "10.64.32.158:11211:ascii:plain",
-           "10.64.48.90:11211:ascii:plain",
-           "10.64.48.91:11211:ascii:plain",
-           "10.64.48.92:11211:ascii:plain",
-           "10.64.48.93:11211:ascii:plain"
-         ]
-         },
-         "eqiad-proxies-failover": {
-           "servers": [
-           "10.64.0.124:11211:ascii:plain",
-           "10.64.0.125:11211:ascii:plain",
-           "10.64.0.64:11211:ascii:plain",
-           "10.64.0.65:11211:ascii:plain",
-           "10.64.16.140:11211:ascii:plain",
-           "10.64.16.21:11211:ascii:plain",
-           "10.64.16.102:11211:ascii:plain",
-           "10.64.16.190:11211:ascii:plain",
-           "10.64.32.133:11211:ascii:plain",
-           "10.64.32.148:11211:ascii:plain",
-           "10.64.32.151:11211:ascii:plain",
-           "10.64.32.153:11211:ascii:plain",
-           "10.64.32.157:11211:ascii:plain",
-           "10.64.32.158:11211:ascii:plain",
-           "10.64.48.90:11211:ascii:plain",
-           "10.64.48.91:11211:ascii:plain",
-           "10.64.48.92:11211:ascii:plain",
-           "10.64.48.93:11211:ascii:plain"
-         ]
-         },
-         "codfw-proxies": {
-           "servers": [
-           "10.192.0.191:11211:ascii:plain",
-           "10.192.0.22:11211:ascii:plain",
-           "10.192.0.148:11211:ascii:plain",
-           "10.192.0.200:11211:ascii:plain",
-           "10.192.16.117:11211:ascii:plain",
-           "10.192.16.116:11211:ascii:plain",
-           "10.192.16.118:11211:ascii:plain",
-           "10.192.16.119:11211:ascii:plain",
-           "10.192.16.120:11211:ascii:plain",
-           "10.192.32.79:11211:ascii:plain",
-           "10.192.32.80:11211:ascii:plain",
-           "10.192.32.81:11211:ascii:plain",
-           "10.192.32.82:11211:ascii:plain",
-           "10.192.48.74:11211:ascii:plain",
-           "10.192.48.176:11211:ascii:plain",
-           "10.192.48.177:11211:ascii:plain",
-           "10.192.48.178:11211:ascii:plain",
-           "10.192.0.205:11211:ascii:plain"
-         ]
-         },
-         "codfw-proxies-failover": {
-           "servers": [
-           "10.192.0.191:11211:ascii:plain",
-           "10.192.0.22:11211:ascii:plain",
-           "10.192.0.148:11211:ascii:plain",
-           "10.192.0.200:11211:ascii:plain",
-           "10.192.16.117:11211:ascii:plain",
-           "10.192.16.116:11211:ascii:plain",
-           "10.192.16.118:11211:ascii:plain",
-           "10.192.16.119:11211:ascii:plain",
-           "10.192.16.120:11211:ascii:plain",
-           "10.192.32.79:11211:ascii:plain",
-           "10.192.32.80:11211:ascii:plain",
-           "10.192.32.81:11211:ascii:plain",
-           "10.192.32.82:11211:ascii:plain",
-           "10.192.48.74:11211:ascii:plain",
-           "10.192.48.176:11211:ascii:plain",
-           "10.192.48.177:11211:ascii:plain",
-           "10.192.48.178:11211:ascii:plain",
-           "10.192.0.205:11211:ascii:plain"
          ]
          }
        },
        "routes": [
          {
            "aliases": [
              "/eqiad/toolhub/"
            ],
            "route": {
              "failover": "PoolRoute|eqiad-servers-failover",
              "failover_errors": [
                "tko"
              ],
              "failover_exptime": 600,
              "normal": "PoolRoute|eqiad-servers",
              "type": "FailoverWithExptimeRoute"
            }
          }
        ]
      }

This config section is driven by the data in the profile::mediawiki::mcrouter_wancache::shards hiera map. rOPUP9d304b502859: P:mediawiki::mcrouter_wancache minor refactoring removed the "proxies" subset of this collection across all environments. This change preceded rDEPLOYCHARTS13e2649d63c8: mediawiki: use the mcrouter module by 5 days so I would assume that the generated diff is reasonably expected. @jijiki can likely confirm this as the author of the change that removed the proxy listings from hiera. In any case, the helmfile.d config for Toolhub only uses the "eqiad-servers" and "codfw-servers" pools so I am not worried about errors caused by this diff arising from rolling the toolhub-1.1.4 chart out in other clusters.

[...]
In any case, the helmfile.d config for Toolhub only uses the "eqiad-servers" and "codfw-servers" pools so I am not worried about errors caused by this diff arising from rolling the toolhub-1.1.4 chart out in other clusters.

Great. Thanks for checking!

I've deployed the updated chart to all clusters.
Moving this to serviceops-radar for now, but feel free to ask for help ofc. with the transition to the mcrouter module.