Page MenuHomePhabricator

Cleanup after decommission of the WDQS full graph endpoint
Open, In Progress, HighPublic

Description

At least the following needs to be cleanup:

  • puppet role / profiles / configuration related to the full graph
  • DNS entries
  • UI minisite

Event Timeline

Gehel removed Gehel as the assignee of this task.Feb 3 2026, 3:39 PM

Grafana is showing data for performance metrics for the legacy endpoint and it appears queries still run on query-legacy-full.wikidata.org (example). Is this because the cleanup work is still in progress?

Grafana is showing data for performance metrics for the legacy endpoint and it appears queries still run on query-legacy-full.wikidata.org (example). Is this because the cleanup work is still in progress?

Damn, Blazegraph was restarted on that node. Let me kill it again.

Mentioned in SAL (#wikimedia-operations) [2026-03-04T09:03:24Z] <gehel> switching off Blazegraph on wdqs2009 (legacy full graph endpoint is end of life) - T411410 / T415073

Change #1247926 had a related patch set uploaded (by Gehel; author: Gehel):

[operations/dns@master] wdqs: remove query-legacy-full.wikidata.org - end of life

https://gerrit.wikimedia.org/r/1247926

Change #1247933 had a related patch set uploaded (by Gehel; author: Gehel):

[operations/puppet@production] wdqs: cleanup code related to query-legacy-full.wikidata.org

https://gerrit.wikimedia.org/r/1247933

Change #1247947 had a related patch set uploaded (by Gehel; author: Gehel):

[operations/deployment-charts@master] wdqs: remove query-legay-full

https://gerrit.wikimedia.org/r/1247947

Change #1247926 merged by Gehel:

[operations/dns@master] wdqs: remove query-legacy-full.wikidata.org - end of life

https://gerrit.wikimedia.org/r/1247926

I've removed the DNS records, so query-legacy-full.wd.o is truely unreachable now.

Change #1247933 merged by Ryan Kemper:

[operations/puppet@production] wdqs: cleanup code related to query-legacy-full.wikidata.org

https://gerrit.wikimedia.org/r/1247933

Change #1247947 merged by jenkins-bot:

[operations/deployment-charts@master] wdqs: remove query-legacy-full

https://gerrit.wikimedia.org/r/1247947

Change #1248694 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/deployment-charts@master] wdqs: remove stale legacy-full-gui release entry

https://gerrit.wikimedia.org/r/1248694

Merged puppet cleanup (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1247933)
and deployment-charts cleanup (https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1247947).
Also ran helmfile apply across staging/eqiad/codfw to teardown the
wikidata-query-legacy-full-gui helm release.

Follow-up patch to remove stale releases block entry:
https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1248694

AC Checklist status:

  • puppet role / profiles / configuration related to the full graph
  • DNS entries (already removed — query-legacy-full.wikidata.org no longer resolves)
  • UI minisite (helmfile reference removed + helm release uninstalled)

Change #1248694 merged by jenkins-bot:

[operations/deployment-charts@master] wdqs: remove stale legacy-full-gui release entry

https://gerrit.wikimedia.org/r/1248694

Cookbook cookbooks.sre.hosts.reimage was started by ryankemper@cumin2002 for host wdqs2009.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by ryankemper@cumin2002 for host wdqs2009.codfw.wmnet with OS bullseye completed:

  • wdqs2009 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603062313_ryankemper_95432_wdqs2009.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1249298 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] wdqs: repurpose wdqs2009 to test blazegraph alternatives

https://gerrit.wikimedia.org/r/1249298

Change #1249298 merged by Bking:

[operations/puppet@production] wdqs: repurpose wdqs2009 to test blazegraph alternatives

https://gerrit.wikimedia.org/r/1249298

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host wdqs2009.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host wdqs2009.codfw.wmnet with OS bullseye executed with errors:

  • wdqs2009 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603091420_bking_860931_wdqs2009.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wdqs2009.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by bking@cumin2002 for host wdqs2009.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by bking@cumin2002 for host wdqs2009.codfw.wmnet with OS bookworm executed with errors:

  • wdqs2009 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202603091512_bking_874523_wdqs2009.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console wdqs2009.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Change #1250683 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] wdqs: allow NFS mount from wdqs2009

https://gerrit.wikimedia.org/r/1250683

Change #1250683 merged by Bking:

[operations/puppet@production] wdqs: allow NFS mount from wdqs2009

https://gerrit.wikimedia.org/r/1250683

RKemper claimed this task.

I spoke too soon. Came across some entries in deployment-charts, as well as some dangling puppet aliases. Working on a more rigorous enumeration of remaining AC

Change #1278562 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/deployment-charts@master] wdqs: drop dangling query-legacy-full helm refs

https://gerrit.wikimedia.org/r/1278562

Change #1278602 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] wdqs: nuke dead config from legacy-full decom

https://gerrit.wikimedia.org/r/1278602

Change #1278603 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] cumin: repurpose wdqs-public, add wdqs-internal

https://gerrit.wikimedia.org/r/1278603

Change #1278602 merged by Bking:

[operations/puppet@production] wdqs: nuke dead config from legacy-full decom

https://gerrit.wikimedia.org/r/1278602