Page MenuHomePhabricator

Unable to deploy wikifunctions services in production: Pool wf-codfw has no failover servers list, route /local/wf
Closed, ResolvedPublic

Description

Presumably fall-out from 3eb2065a9ed82bd41450143c6f5c3f4a2fa8977f for T391986?

jforrester@deploy1003:/srv/deployment-charts/helmfile.d/services/wikifunctions$ helmfile -e codfw -i apply --context 5
skipping missing values file matching "/etc/helmfile-defaults/private/main_services/wikifunctions/codfw.yaml"
skipping missing values file matching "values-codfw.yaml"
skipping missing values file matching "/etc/helmfile-defaults/private/main_services/wikifunctions/codfw.yaml"
skipping missing values file matching "values-codfw.yaml"
skipping missing values file matching "values-python-evaluator-codfw.yaml"
skipping missing values file matching "/etc/helmfile-defaults/private/main_services/wikifunctions/codfw.yaml"
skipping missing values file matching "values-codfw.yaml"
skipping missing values file matching "values-javascript-evaluator-codfw.yaml"
Comparing release=main-orchestrator, chart=wmf-stable/function-orchestrator, namespace=wikifunctions
Comparing release=javascript-evaluator, chart=wmf-stable/function-evaluator, namespace=wikifunctions
wikifunctions, function-evaluator-javascript-evaluator, Deployment (apps) has changed:
...
      spec:
        automountServiceAccountToken: false
        containers:
          # The main application container
          - name: function-evaluator-javascript-evaluator
-           image: "docker-registry.discovery.wmnet/repos/abstract-wiki/wikifunctions/function-evaluator/wasm-javascript-all:2025-05-21-192515"
+           image: "docker-registry.discovery.wmnet/repos/abstract-wiki/wikifunctions/function-evaluator/wasm-javascript-all:2025-06-03-205630"
            imagePullPolicy: IfNotPresent
            securityContext:
              allowPrivilegeEscalation: false
              capabilities:
                 drop:
...

Comparing release=python-evaluator, chart=wmf-stable/function-evaluator, namespace=wikifunctions
wikifunctions, function-evaluator-python-evaluator, Deployment (apps) has changed:
...
      spec:
        automountServiceAccountToken: false
        containers:
          # The main application container
          - name: function-evaluator-python-evaluator
-           image: "docker-registry.discovery.wmnet/repos/abstract-wiki/wikifunctions/function-evaluator/wasm-python3-all:2025-05-21-192515"
+           image: "docker-registry.discovery.wmnet/repos/abstract-wiki/wikifunctions/function-evaluator/wasm-python3-all:2025-06-03-205630"
            imagePullPolicy: IfNotPresent
            securityContext:
              allowPrivilegeEscalation: false
              capabilities:
                 drop:
...

in ./helmfile.yaml: command "/usr/bin/helm3.11" exited with non-zero status:

PATH:
  /usr/bin/helm3.11

ARGS:
  0: helm3.11 (8 bytes)
  1: diff (4 bytes)
  2: upgrade (7 bytes)
  3: --allow-unreleased (18 bytes)
  4: main-orchestrator (17 bytes)
  5: wmf-stable/function-orchestrator (32 bytes)
  6: --namespace (11 bytes)
  7: wikifunctions (13 bytes)
  8: --values (8 bytes)
  9: /tmp/helmfile3740832668/wikifunctions-main-orchestrator-values-568cf796bf (73 bytes)
  10: --values (8 bytes)
  11: /tmp/helmfile1746087393/wikifunctions-main-orchestrator-values-554cfbbbcb (73 bytes)
  12: --values (8 bytes)
  13: /tmp/helmfile747802997/wikifunctions-main-orchestrator-values-b48884c4c (71 bytes)
  14: --values (8 bytes)
  15: /tmp/helmfile756501242/wikifunctions-main-orchestrator-values-5d86847d9b (72 bytes)
  16: --values (8 bytes)
  17: /tmp/helmfile1360282044/wikifunctions-main-orchestrator-values-7d7f47867c (73 bytes)
  18: --detailed-exitcode (19 bytes)
  19: --color (7 bytes)
  20: --context (9 bytes)
  21: 5 (1 bytes)
  22: --reset-values (14 bytes)
  23: --kubeconfig (12 bytes)
  24: /etc/kubernetes/wikifunctions-deploy-codfw.config (49 bytes)

ERROR:
  exit status 1

EXIT STATUS
  1

STDERR:
  WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /etc/kubernetes/wikifunctions-deploy-codfw.config
  Error: Failed to render chart: exit status 1: WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /etc/kubernetes/wikifunctions-deploy-codfw.config
  Error: execution error at (function-orchestrator/templates/configmap.yaml:2:3): Pool wf-codfw has no failover servers list, route /local/wf
  Use --debug flag to render out invalid YAML
  Error: plugin "diff" exited with error

Event Timeline

Joe triaged this task as Unbreak Now! priority.

Change #1153638 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/deployment-charts@master] wikifunctions: disable mcrouter failover

https://gerrit.wikimedia.org/r/1153638

Change #1153638 merged by jenkins-bot:

[operations/deployment-charts@master] wikifunctions: disable mcrouter failover

https://gerrit.wikimedia.org/r/1153638

The code is deployable again. helfmfile diff now returns a proper diff.

The code is deployable again. helfmfile diff now returns a proper diff.

Thank you!

Logs have the following

E20250606 14:46:30.010773     1 Server-inl.h:593] mcrouter error (router name '11213', flavor 'unknown', service 'mcrouter'): Failed to configure, initial error 'Failed to reconfigure: Unknown RouteHandle: KeyModifyRoute line: 81', from backup 'Failed to reconfigure: Unknown RouteHandle: KeyModifyRoute line: 81'

Logs have the following

E20250606 14:46:30.010773     1 Server-inl.h:593] mcrouter error (router name '11213', flavor 'unknown', service 'mcrouter'): Failed to configure, initial error 'Failed to reconfigure: Unknown RouteHandle: KeyModifyRoute line: 81', from backup 'Failed to reconfigure: Unknown RouteHandle: KeyModifyRoute line: 81'

Presumably aligned with https://sal.toolforge.org/log/drexRZcBvg159pQr5-r8 ? I filed T396074 but perhaps they're not distinct issues?

Indeed, I commented in the wrong task. My mistake, thanks.