Page MenuHomePhabricator

WDQS: write a playbook for restarting blazegraph
Closed, ResolvedPublic

Description

There have been many recent incidents for WDQS being overloaded . The band-aid fix is to restart the WDQS service on all hosts across the entire datacenter. While SREs have access to do this via cumin, SWEs do not. An ansible playbook could help SWEs (who do have shell access to the WDQS servers) restart the service safely and quickly.

Creating this ticket to:

  • Write an ansible playbook to restart services for WDQS
  • Train SWEs on its use

Event Timeline

bking changed the task status from Open to In Progress.Dec 4 2025, 6:38 PM
bking claimed this task.
bking triaged this task as Medium priority.

I've created this repo, which includes a playbook to restart wdqs.

bking added a subscriber: gmodena.

This is great @bking !

How do I sign up for training?

Nit: could we move the playbook to the Wikidata Platform gitlab org, once it becomes available?

bking updated the task description. (Show Details)

@gmodena paired on the playbook today.

I think we're good, so I'm gonna close this one out. Feel free to ping me here or in Slack/IRC if you do have any other questions.