Page MenuHomePhabricator

Issue with custom domains not having their ingress created
Closed, ResolvedPublic

Description

We've had two reports of people failing to have their custom domains working and instead resulting in:
`` response from their domain.

This is symptomatic of the ingress that should have been created for them not existing.

Indeed we can see evidence of this in the logs:

{
  "protoPayload": {
    "@type": "type.googleapis.com/google.cloud.audit.AuditLog",
    "authenticationInfo": {
      "principalEmail": "system:serviceaccount:default:api-defaultrole"
    },
    "authorizationInfo": [
      {
        "granted": true,
        "permission": "io.k8s.networking.v1.ingresses.create",
        "resource": "networking.k8s.io/v1/namespaces/api-jobs/ingresses/mediawiki-site-589"
      }
    ],
    "methodName": "io.k8s.networking.v1.ingresses.create",
    "request": {
      "@type": "networking.k8s.io/v1.Ingress",
      "apiVersion": "networking.k8s.io/v1",
      "kind": "Ingress",
      "metadata": {
        "annotations": {
          "cert-manager.io/cluster-issuer": "letsencrypt-prod",
          "kubernetes.io/ingress.class": "nginx",
          "nginx.ingress.kubernetes.io/force-ssl-redirect": "true"
        },
        "creationTimestamp": null,
        "labels": {
          "app.kubernetes.io/managed-by": "wbstack-platform",
          "wbstack-ingress-generation": "2020-04-18.1",
          "wbstack-wiki-domain": "christian-nationalists.org",
          "wbstack-wiki-id": "589"
        },
        "name": "mediawiki-site-589",
        "namespace": "default"
      },
      "spec": {
        "rules": [
          {
            "host": "christian-nationalists.org",
            "http": {
              "paths": [
                {
                  "backend": {
                    "service": {
                      "name": "platform-nginx",
                      "port": {
                        "number": 8080
                      }
                    }
                  },
                  "path": "/",
                  "pathType": "Prefix"
                }
              ]
            }
          }
        ],
        "tls": [
          {
            "hosts": [
              "christian-nationalists.org"
            ],
            "secretName": "mediawiki-site-tls-589"
          }
        ]
      },
      "status": {
        "loadBalancer": {}
      }
    },
    "requestMetadata": {
      "callerIp": "192.168.0.2",
      "callerSuppliedUserAgent": "GuzzleHttp/6.5.5 curl/7.74.0 PHP/7.4.33"
    },
    "resourceName": "networking.k8s.io/v1/namespaces/api-jobs/ingresses/mediawiki-site-589",
    "response": {
      "@type": "core.k8s.io/v1.Status",
      "apiVersion": "v1",
      "code": 400,
      "kind": "Status",
      "message": "the namespace of the provided object does not match the namespace sent on the request",
      "metadata": {},
      "reason": "BadRequest",
      "status": "Failure"
    },
    "serviceName": "k8s.io",
    "status": {
      "code": 3,
      "message": "the namespace of the provided object does not match the namespace sent on the request"
    }
  },
  "insertId": "947db941-12d4-4920-a1cf-0b0fd2c4c0e5",
  "resource": {
    "type": "k8s_cluster",
    "labels": {
      "location": "europe-west3-a",
      "cluster_name": "wbaas-3",
      "project_id": "wikibase-cloud"
    }
  },
  "timestamp": "2023-08-31T18:57:49.670095Z",
  "labels": {
    "authorization.k8s.io/reason": "RBAC: allowed by ClusterRoleBinding \"api-defaultrole\" of ClusterRole \"api-defaultrole\" to ServiceAccount \"api-defaultrole/default\"",
    "authorization.k8s.io/decision": "allow"
  },
  "logName": "projects/wikibase-cloud/logs/cloudaudit.googleapis.com%2Factivity",
  "operation": {
    "id": "947db941-12d4-4920-a1cf-0b0fd2c4c0e5",
    "producer": "k8s.io",
    "first": true,
    "last": true
  },
  "receiveTimestamp": "2023-08-31T18:57:57.371393304Z"
}

This has probably happened since we moved api jobs to run in their own namespace.

Event Timeline

Attempting to manually resolve this for one wiki with: kubectl exec -it deployment/api-app-backend -- sh -c "php artisan job:dispatchNow KubernetesIngressCreate 589 christian-nationalists.org"
fails with:

In Client.php line 415:
                                                                                                                                                                                                                                          
  Authentication Exception: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"ingresses.networking.k8s.io \"mediawiki-site-589\" is forbidden: User \"system:serviceaccount:default:default\" cannot list r  
  esource \"ingresses\" in API group \"networking.k8s.io\" in the namespace \"default\"","reason":"Forbidden","details":{"name":"mediawiki-site-589","group":"networking.k8s.io","kind":"ingresses"},"code":403}

Turns out this was actually semi-expected; the right credentials are available on the queue pod:

kubectl exec -it deployment/api-queue -- sh -c "php artisan job:dispatchNow KubernetesIngressCreate 589 christian-nationalists.org" 
[2023-09-05 11:20:10] production.INFO: Creating k8s client  
[2023-09-05 11:20:10] production.INFO: Getting ingress resources  
[2023-09-05 11:20:10] production.INFO: Checking if resource exists: mediawiki-site-589  
[2023-09-05 11:20:10] production.INFO: Creating ingress resource

I've been back through the logs of these errors and manually resolved the issue for people going back 12weeks. We still need to solve the root cause issue which is likely that the k8s client is trying to submit the ingress to the wrong namespace. This probably requires adding something like $kubernetesClient->setNamespace('default'); to app/Jobs/KubernetesIngressCreate.php since it probably has some hangover namespace setting from https://github.com/wbstack/api/blob/4e545fd98cc6bba07d193218a7d1545ba15e0b7d/app/Jobs/ProcessMediaWikiJobsJob.php#L52

Evelien_WMDE claimed this task.