Page MenuHomePhabricator

wdqs1004 is lagging
Closed, DuplicatePublic

Event Timeline

Zabe renamed this task from 1004 is lagging 5 hours more than all others to wdqs1004 is lagging 5 hours more than all others.Sep 12 2021, 8:11 PM
So9q closed this task as Resolved.EditedSep 13 2021, 5:12 AM
So9q claimed this task.

problem solved according to grafana!

I was looking at Icinga for other reasons and noticed:

wdqs1004 - "..Query Service HTTP Port on wdqs1004 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable ".

(unhandled CRIT since about 18 hours, does it have notifications?)

I did a systemctl restart wdqs-blazegraph and that caused:

RECOVERY - Query Service HTTP Port on wdqs1004 is OK: HTTP OK: HTTP/1.1 200 OK

but in turn also a new:

<+icinga-wm> PROBLEM - WDQS high update lag on wdqs1004 is CRITICAL: 1.224e+05 ge....

https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag told me to also restart the wdqs-updater service, so I did that.

When that did not seem to immediately resolve it I also depooled the server as the docs above say to do until it catches up.

Reusing this ticket.


2021-09-20
22:14 mutante: wdqs1004 - depool
22:10 mutante: wdqs1004 - service wdqs-updater restart
22:06 mutante: wdqs1004 - HTTP/1.1 503 Service Unavailable - systemctl restart wdqs-blazegraph

dcausse moved this task from Incoming to Waiting on the Discovery-Search (Current work) board.

Thanks @So9q for the report and @Dzahn for the depool!
We'll repool once the lag is back to normal.

dcausse renamed this task from wdqs1004 is lagging 5 hours more than all others to wdqs1004 is lagging.Sep 21 2021, 12:20 PM
dcausse closed this task as a duplicate of T291488: wdqs1004 lags 16 hours please depool.