Page MenuHomePhabricator

Move datahub to dse-k8s cluster
Closed, ResolvedPublic

Description

[WIP]
Datahub is currently hosted on the wikiKube cluster and we would like to move it to the dse-k8s cluster. This task is to track any and all work related to the move.

  • Preferably ensure we are on v0.12.1 pending the move to Java 17 before further upgrades T361688 T355211
  • Provision staging and production namespaces on dse-k8s T363298
  • Add kubaedm configs for the namespaces T363832
  • Move datahub and datahub-staging helfile deployments to dse-k8s T363300
  • Provision datahub-next subdomain T365576
  • Cleanup datahub staging/next oidc settings and links T365674
  • Point datahub and datahub-next subdomains to traffic server T365668
  • Create Internal service DNS record for datahub services pointing to dse-k8s T363299
  • deploy datahub (next and prod) to dse-k8s-eqiad
  • Adjust airflow DAGS to point to new endpoint datahub-gms.svc.eqiad.wmnet datahub-gms T366135
  • Cleanup datahub LVS and WikiKube endpoints T366137
  • Monitor the availability of datahub deployment on dse-k8s T363301
  • Create saved views for datahub deployment logs on dse-k8s T363304
  • Delete the WikiKube datahub release. T366338

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Adding here for visibility,
Upon some discussion on how best we can implement the move to k8s, we are first going to get datahub-next up and running on dse-k8s then set a time to get datahub production up and running this will require some downtime of say an hour that shall be communicated with the team beforehand

Change #1036202 had a related patch set uploaded (by Stevemunene; author: Stevemunene):

[operations/puppet@production] Add dse range to an-test coordinator

https://gerrit.wikimedia.org/r/1036202

Change #1036202 merged by Stevemunene:

[operations/puppet@production] Add dse range to an-test coordinator

https://gerrit.wikimedia.org/r/1036202

Change #1036237 had a related patch set uploaded (by Stevemunene; author: Stevemunene):

[operations/deployment-charts@master] Enable mesh for datahub-next

https://gerrit.wikimedia.org/r/1036237

Change #1036237 merged by jenkins-bot:

[operations/deployment-charts@master] Enable mesh for datahub-next

https://gerrit.wikimedia.org/r/1036237

Change #1036263 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] datahub-next: make sure subcharts get the environment default values

https://gerrit.wikimedia.org/r/1036263

Change #1036263 merged by Brouberol:

[operations/deployment-charts@master] datahub-next: make sure subcharts get the environment default values

https://gerrit.wikimedia.org/r/1036263

Change #1036266 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] datahub-next: fix the ingress by restoring default gateway host

https://gerrit.wikimedia.org/r/1036266

Change #1036266 abandoned by Brouberol:

[operations/deployment-charts@master] datahub-next: fix the ingress by restoring default gateway host

Reason:

Looking at the CI job output, this is not what we want after all

https://gerrit.wikimedia.org/r/1036266

Change #1036346 had a related patch set uploaded (by Stevemunene; author: Stevemunene):

[operations/deployment-charts@master] Configure datahub-gms-next not to wait for upgrade before starting

https://gerrit.wikimedia.org/r/1036346

Change #1036346 merged by jenkins-bot:

[operations/deployment-charts@master] Configure datahub-gms-next not to wait for upgrade before starting

https://gerrit.wikimedia.org/r/1036346

Change #1036557 had a related patch set uploaded (by Stevemunene; author: Stevemunene):

[operations/deployment-charts@master] admin_ng: create datahub-next namespace tlsHostnames and extra SANs

https://gerrit.wikimedia.org/r/1036557

Change #1036557 merged by jenkins-bot:

[operations/deployment-charts@master] admin_ng: create datahub-next namespace tlsHostnames and extra SANs

https://gerrit.wikimedia.org/r/1036557

Change #1036994 had a related patch set uploaded (by Stevemunene; author: Stevemunene):

[operations/puppet@production] Remove datahub from LVS

https://gerrit.wikimedia.org/r/1036994

Hello, as DPE-SRE we have been working on miving one of our services datahub to the dse-k8s cluster datahub is currently hosted on wikiKube and accessed via LVS.
For some of the final steps towards availing datahub-wikimedia.org we need to provision datahub service records and avail the new endpoints datahub-frontend.svc.eqiad.wmnet while simultaneously removing the WikiKube records datahub-frontend.discovery.wmnet and datahub-gms.discovery.wmnet. This might cause some warnings/alerts however we do have page: false.
Next would be to Remove datahub from LVS which would require some input from wikimedia-traffic

Mentioned in SAL (#wikimedia-analytics) [2024-05-29T14:04:07Z] <stevemunene> getting started on moving datahub to dse-k8s T361185

Change #1037471 had a related patch set uploaded (by Stevemunene; author: Stevemunene):

[operations/puppet@production] Remove datahub from LVS

https://gerrit.wikimedia.org/r/1037471

Change #1037479 had a related patch set uploaded (by Stevemunene; author: Stevemunene):

[operations/puppet@production] Remove datahub service entry

https://gerrit.wikimedia.org/r/1037479

Change #1036994 merged by Stevemunene:

[operations/puppet@production] Set datahub LVS to state lvs_setup

https://gerrit.wikimedia.org/r/1036994

Change #1037471 merged by Stevemunene:

[operations/puppet@production] Set datahub LVS to service_setup

https://gerrit.wikimedia.org/r/1037471

Change #1037479 merged by Stevemunene:

[operations/puppet@production] Remove datahub service entry

https://gerrit.wikimedia.org/r/1037479

Mentioned in SAL (#wikimedia-analytics) [2024-06-04T12:30:46Z] <stevemunene> delete WikiKube datahub release T361185

Change #1038773 had a related patch set uploaded (by Stevemunene; author: Stevemunene):

[operations/deployment-charts@master] Clean up datahub from main cluster

https://gerrit.wikimedia.org/r/1038773

Change #1038773 merged by jenkins-bot:

[operations/deployment-charts@master] Clean up datahub from main cluster

https://gerrit.wikimedia.org/r/1038773

Change #1039618 had a related patch set uploaded (by Stevemunene; author: Stevemunene):

[operations/puppet@production] Delete datahub kubeconfigs on main

https://gerrit.wikimedia.org/r/1039618

Change #1039618 merged by Stevemunene:

[operations/puppet@production] Delete datahub kubeconfigs on main

https://gerrit.wikimedia.org/r/1039618