Page MenuHomePhabricator

move query.wikidata.org to kubernetes
Closed, ResolvedPublic

Description

Subtask to move commons-query.wikimedia.org (T381909), query.wikidata.org, query-main.wikidata.org, and query-scholarly.wikidata.org microsites to Kubernets wikikube.

Migration checklist (see complete steps here):

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/puppetproduction+1 -1
operations/puppetproduction+0 -69
operations/puppetproduction+12 -2
operations/deployment-chartsmaster+1 -0
operations/deployment-chartsmaster+13 -5
operations/puppetproduction+0 -6
operations/puppetproduction+0 -3
operations/deployment-chartsmaster+13 -5
operations/deployment-chartsmaster+28 -7
operations/puppetproduction+6 -0
operations/puppetproduction+0 -6
operations/deployment-chartsmaster+2 -0
operations/deployment-chartsmaster+1 -1
operations/puppetproduction+3 -0
operations/puppetproduction+10 -10
operations/puppetproduction+3 -3
operations/puppetproduction+15 -9
operations/puppetproduction+2 -2
operations/deployment-chartsmaster+1 -1
operations/puppetproduction+9 -0
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/deployment-chartsmaster+1 -1
operations/deployment-chartsmaster+2 -2
operations/deployment-chartsmaster+2 -2
operations/deployment-chartsmaster+3 -0
operations/deployment-chartsmaster+19 -0
operations/deployment-chartsmaster+1 -1
operations/deployment-chartsmaster+185 -40
operations/deployment-chartsmaster+151 -0
operations/deployment-chartsmaster+2 -1
operations/deployment-chartsmaster+28 -1
operations/deployment-chartsmaster+52 -2
operations/deployment-chartsmaster+166 -41
operations/deployment-chartsmaster+0 -6
operations/deployment-chartsmaster+2 -0
operations/deployment-chartsmaster+122 -0
operations/deployment-chartsmaster+9 -0
operations/puppetproduction+4 -0
Show related patches Customize query in gerrit
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Updated Deploy instructions to WMF kubernetes deploymentsrepos/wmde/wikidata-query-gui!5wmde-leszekdeploy-docsmain
add wikidata-query-builder to trusted runnersrepos/releng/gitlab-trusted-runner!100jeltoadd-wikidata-query-buildermain
add ci to build the container image for query-builderrepos/wmde/wikidata-query-builder!1jeltoadd-cimaster
also add query-main and query-scholarly to Apache ServerAliasrepos/wmde/wikidata-query-gui!3jeltoadd-additional-ServerAliasmain
update wikidata-query-gui pathrepos/releng/gitlab-trusted-runner!94jeltoupdate-query-pathmain
add ci to build the container image for query-guirepos/wmde/wikidata-query-gui!1jeltoadd-cimaster
add wikidata query service to Trusted Runnersrepos/releng/gitlab-trusted-runner!90jeltoadd-wikidata-querymain
Customize query in GitLab

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Thanks @Jelto for making the move, and addressing the issue that emerged. I wonder if I should use this as an opportunity to clarify the VCS confusion a bit, and archive the Gerrit repository, stating the development has moved to gitlab.

  1. Do you think it is right time now from your point of view, or still intend to monitor a situation for some time before calling queryservice UI moved to k8s?
  2. Should this wait until T381909 is resolved?

Change #1118074 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] wcqs: proxy requests to query qui to new wikikube endpoint

https://gerrit.wikimedia.org/r/1118074

Thanks @Jelto for making the move, and addressing the issue that emerged. I wonder if I should use this as an opportunity to clarify the VCS confusion a bit, and archive the Gerrit repository, stating the development has moved to gitlab.

  1. Do you think it is right time now from your point of view, or still intend to monitor a situation for some time before calling queryservice UI moved to k8s?

Query gui on wikikube Kubernetes looks good to me. We should archive the Gerrit repository soon to make it less confusing because currently people might create changes against the old Gerrit repo. Let me know if you are fine with this step then I'll mark the gerrit repo archived. I'm fine with calling the migration done (traffic looks good, some cleanup has to happen afterwards).

  1. Should this wait until T381909 is resolved?

I can't see any progress or plan for this task. As far as I can see it's also dependent on the query-gui because it proxies some requests to the gui. I created https://gerrit.wikimedia.org/r/1118074 which might be the fix to proxy every gui requests from wcqs to wikikube as well. But someone from Search Platform Team might have more insights and know more dependencies.

took me forever, sorry. Requested archiving Gerrit repo in T387199.

From WMDE's perspective we can have it archived any time. And then call the migration done.
I'll leave Commons SPARQL GUI to be handled by the Search Platform team.

thank you!

Change #1118074 merged by Bking:

[operations/puppet@production] wcqs: proxy requests to query qui to new wikikube endpoint

https://gerrit.wikimedia.org/r/1118074

Change #1128987 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/puppet@production] query_service gui: Allow proxy to k8s miscweb

https://gerrit.wikimedia.org/r/1128987

Change #1128987 merged by Bking:

[operations/puppet@production] query_service gui: Allow proxy to k8s miscweb

https://gerrit.wikimedia.org/r/1128987

Thanks a lot @EBernhardson for the work in T381909!

I checked the traffic and logs for the legacy miscweb machine, and it looks like the traffic dropped significantly on March 18 (when T381909 was closed).

In the miscweb logstash dashboard, you can see the drop clearly—I filtered out health checks and other system traffic.

query-traffic-legacy.png (306×1 px, 38 KB)

So this is already a big step forward with the migration to Kubernetes. There’s still some traffic hitting the old legacy machine though, mostly on the /querybuilder path. @ItamarWMDE, do you think we can move forward here? From the git log, it doesn't look like there's been much activity on the Gerrit query-builder repo lately. So I’d assume it’s a good time to switch over to the GitLab + Kubernetes setup?

I’ve already prepared the GitLab repo and the Kubernetes deployment—so we’d just need to rebase the latest changes, switch the traffic, and run some final tests.

Hey @Jelto , EM for wikidata here. Things are good on our side. Feel free to make the switch whenever is convenient for you

Change #1133120 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] trafficserver: switch /querybuilder to wikikube miscweb

https://gerrit.wikimedia.org/r/1133120

Change #1133122 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/deployment-charts@master] wikidata-query-builder: bump image version

https://gerrit.wikimedia.org/r/1133122

Hey @Jelto , EM for wikidata here. Things are good on our side. Feel free to make the switch whenever is convenient for you

Great that sounds good!

I rebased the GitLab repository one last time with the latest changes from Gerrit and bumped the version in the change above.

We can do a switch of querybuilder tomorrow or on Monday if that works for you. We should merge https://gerrit.wikimedia.org/r/1133122 first and https://gerrit.wikimedia.org/r/1133120 after that for the actual traffic switch. A possible rollback is to just revert https://gerrit.wikimedia.org/r/1133120.

Change #1133122 merged by jenkins-bot:

[operations/deployment-charts@master] wikidata-query-builder: bump image version

https://gerrit.wikimedia.org/r/1133122

Change #1133120 merged by Jelto:

[operations/puppet@production] trafficserver: switch querybuilder scholarly and main to wikikube

https://gerrit.wikimedia.org/r/1133120

Change #1133388 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/deployment-charts@master] wikidata-query-gui: add query-main and query-scholarly to querybuilder hosts

https://gerrit.wikimedia.org/r/1133388

Change #1133388 merged by jenkins-bot:

[operations/deployment-charts@master] wikidata-query-gui: add query-main and query-scholarly to querybuilder hosts

https://gerrit.wikimedia.org/r/1133388

Change #1133395 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] Revert "trafficserver: switch querybuilder scholarly and main to wikikube"

https://gerrit.wikimedia.org/r/1133395

Change #1133395 merged by Jelto:

[operations/puppet@production] Revert "trafficserver: switch querybuilder scholarly and main to wikikube"

https://gerrit.wikimedia.org/r/1133395

We tested /querybuilder on wikikube but had to revert it. The querybuilder returned a 404 for both https://query-scholarly.wikidata.org/querybuilder/ and https://query-main.wikidata.org/querybuilder/.

I'll dig a bit deeper into this. Luckily I can reproduce the error behind the CDN as well, so I should be able to work on a fix without causing further disruptions. I’ll upload patches once I’ve figured out what’s causing the 404.

Change #1134656 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/deployment-charts@master] wikidata-query-gui: add gateway route / for gui services

https://gerrit.wikimedia.org/r/1134656

Change #1134656 merged by jenkins-bot:

[operations/deployment-charts@master] wikidata-query-gui: add gateway route / for gui services

https://gerrit.wikimedia.org/r/1134656

Change #1134686 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/deployment-charts@master] wikidata-query-gui: add gateway route "/" for query-main

https://gerrit.wikimedia.org/r/1134686

Change #1134686 merged by jenkins-bot:

[operations/deployment-charts@master] wikidata-query-gui: add gateway route "/" for query-main

https://gerrit.wikimedia.org/r/1134686

I found the issue in the query-service ingress config that was causing the 404s. The GUI ingress needed an explicit mapping for the / path to make sure /querybuilder could be routed to a different Kubernetes service. The two changes above include this fix. I’ve already deployed it to all wikikube clusters.

Before the patch, /querybuilder returned a 404:

curl -I --resolve query-scholarly.wikidata.org:30443:$(dig +short k8s-ingress-staging.discovery.wmnet) https://query-scholarly.wikidata.org:30443/querybuilder
HTTP/2 404

After the patch, the /querybuilder endpoint now responds with 200:

curl -I --resolve query-scholarly.wikidata.org:30443:$(dig +short k8s-ingress-staging.discovery.wmnet) https://query-scholarly.wikidata.org:30443/querybuilder/
HTTP/2 200

I also ran some more detailed checks for all eqiad endpoints, and all return HTTP 200 now:

# query
curl -s -I --resolve query.wikidata.org:30443:$(dig +short k8s-ingress-wikikube.svc.eqiad.wmnet) https://query.wikidata.org:30443 | grep HTTP
HTTP/2 200
curl -s -I --resolve query.wikidata.org:30443:$(dig +short k8s-ingress-wikikube.svc.eqiad.wmnet) https://query.wikidata.org:30443/querybuilder/ | grep HTTP
HTTP/2 200

# query-main
curl -s -I --resolve query-main.wikidata.org:30443:$(dig +short k8s-ingress-wikikube.svc.eqiad.wmnet) https://query-main.wikidata.org:30443 | grep HTTP
HTTP/2 200
curl -s -I --resolve query-main.wikidata.org:30443:$(dig +short k8s-ingress-wikikube.svc.eqiad.wmnet) https://query-main.wikidata.org:30443/querybuilder/ | grep HTTP
HTTP/2 200

# query-scholarly
curl -s -I --resolve query-scholarly.wikidata.org:30443:$(dig +short k8s-ingress-wikikube.svc.eqiad.wmnet) https://query-scholarly.wikidata.org:30443 | grep HTTP
HTTP/2 200
curl -s -I --resolve query-scholarly.wikidata.org:30443:$(dig +short k8s-ingress-wikikube.svc.eqiad.wmnet) https://query-scholarly.wikidata.org:30443/querybuilder/ | grep HTTP
HTTP/2 200

Looks good now, so I feel more confident to give the /querybuilder traffic switch to wikikube another try. I’ll upload a patch shortly.

Change #1134697 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] trafficserver: switch querybuilder scholarly to wikikube

https://gerrit.wikimedia.org/r/1134697

Change #1134697 merged by Jelto:

[operations/puppet@production] trafficserver: switch querybuilder scholarly to wikikube

https://gerrit.wikimedia.org/r/1134697

query-scholarly.wikidata.org/querybuilder/ was switched to wikikube successfully and the service answers with 200, no more 404:

curl -s -I https://query-scholarly.wikidata.org | grep -E 'HTTP|server|last'
HTTP/2 200 
server: Apache/2.4.59 (Debian)
last-modified: Fri, 31 Jan 2025 09:28:51 GMT


curl -s -I https://query-scholarly.wikidata.org/querybuilder/ | grep -E 'HTTP|server|last'
HTTP/2 200 
server: Apache/2.4.59 (Debian)
last-modified: Tue, 01 Apr 2025 12:17:03 GMT

I'll upload a patch to switch all other querybuilder to wikikube too.

Change #1134988 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] trafficserver: switch all querybuilder backends to wikikube

https://gerrit.wikimedia.org/r/1134988

At this point, I think that the work for Data-Platform-SRE is done. Ping me if there is an expectation that we'll need to do some more work here.

Change #1134988 merged by Jelto:

[operations/puppet@production] trafficserver: switch all querybuilder backends to wikikube

https://gerrit.wikimedia.org/r/1134988

Change #1135383 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/deployment-charts@master] wikidata-query-gui: add query-legacy-full to existing gateway

https://gerrit.wikimedia.org/r/1135383

Querybuilder for query, query-main and query-scholarly were switched to wikikube successfully. The services return http 200 and some test queries worked fine for me.

I'll archive the old Gerrit projects and link to the new GitLab projects.

Beside the originally planned three services query`, query-main and query-scholarly a new service was added called query-legacy-full. The change above adds mapping for /querybuilder in query-legacy-full as well.

Change #1135383 merged by jenkins-bot:

[operations/deployment-charts@master] wikidata-query-gui: add query-legacy-full to existing gateway

https://gerrit.wikimedia.org/r/1135383

Change #1136714 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/deployment-charts@master] wikidata-query-gui: add query-legacy-full.w.o to querybuilder hosts

https://gerrit.wikimedia.org/r/1136714

Change #1136714 merged by jenkins-bot:

[operations/deployment-charts@master] wikidata-query-gui: add query-legacy-full.w.o to querybuilder hosts

https://gerrit.wikimedia.org/r/1136714

Change #1136724 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] miscweb: remove query-service from legacy vms

https://gerrit.wikimedia.org/r/1136724

All (known) queryservice frontends run in wikikube now (gui and querybuilder).

I prepared to remove the query service frontend from the old miscweb hosts in the change above. Before removing all components it could make sense to do a less disruptive test with a combination of a2dissite and disabling puppet. During this test none of the query services should be affected.

Mentioned in SAL (#wikimedia-operations) [2025-04-16T11:37:37Z] <jelto> temporarily disable query sites on miscweb vms - T350793

I temporarily disabled all query sites on the legacy miscweb vms to test no service is using the old gui anymore. All query services still work for me.

sudo disable-puppet " temporarily disable query sites on miscweb vms - T350793"
sudo a2dissite 50-query-scholarly-wikidata-org.conf
sudo a2dissite 50-query-main-wikidata-org.conf
sudo a2dissite 50-query-wikidata-org.conf
sudo a2dissite 50-commons-query-wikimedia-org.conf
sudo systemctl reload apache2

Just enabling puppet should revert this. I'll monitor the alerts for the rest of the day and will re-enable puppet before the weekend.

I enabled puppet again on the legacy miscweb vms. No impact on the query service was noticed. So we should be good to remove query service from the legacy vms next week (https://gerrit.wikimedia.org/r/1136724).

Change #1136724 merged by Jelto:

[operations/puppet@production] miscweb: remove query-service from legacy vms

https://gerrit.wikimedia.org/r/1136724

Change #1138255 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] microsites: fix regex_matches for query.wikidata.org

https://gerrit.wikimedia.org/r/1138255

Change #1138255 merged by Jelto:

[operations/puppet@production] microsites: fix regex_matches for query.wikidata.org

https://gerrit.wikimedia.org/r/1138255

Change #1138296 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] microsites: remove profile::microsites::query_service*

https://gerrit.wikimedia.org/r/1138296

Change #1138296 merged by Jelto:

[operations/puppet@production] microsites: remove profile::microsites::query_service*

https://gerrit.wikimedia.org/r/1138296

All query gui services are running on wikikube now and the legacy system was removed. So everything within the scope of this task is done and I'll resolve the task.

Thanks again for all the help from WMDE and Search Platform folks! This was a big step forward for decommissioning the legacy miscweb setup.

If you have problems or need help with the query-gui or querybuilder deployment you can reach out in IRC in #wikimedia-sre-collab or tag our team collaboration-services.