For about 20 hours Wikidata dispatch lag has been being higher than 10 minutes. Further investigation is needed.
See https://grafana.wikimedia.org/d/000000156/wikidata-dispatch?orgId=1&from=now-2d&to=now&refresh=1m
Bugreporter | |
May 16 2020, 8:03 PM |
F31822694: image.png | |
May 16 2020, 9:43 PM |
F31822692: image.png | |
May 16 2020, 9:43 PM |
F31822639: image.png | |
May 16 2020, 9:05 PM |
F31822631: image.png | |
May 16 2020, 9:03 PM |
For about 20 hours Wikidata dispatch lag has been being higher than 10 minutes. Further investigation is needed.
See https://grafana.wikimedia.org/d/000000156/wikidata-dispatch?orgId=1&from=now-2d&to=now&refresh=1m
Dispatching is slowing down, as the db server is lagged and the dispatch process has a waitForReplication call in the code.
Specifically this is in SqlChangeDispatchCoordinator::releaseClient
This db server probably needs to be depooled.
Change 596824 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikibase@master] SqlChangeDispatchCoordinator statsd time the wait for replication
Marking as resolved as the impact on wikidata is now gone
10:32 PM <marostegui> This host is lagging just a few seconds but all the time
10:32 PM <marostegui> Due to:
10:32 PM <marostegui> yes
10:33 PM <jynus> Query | 95861
10:33 PM <marostegui> that
10:33 PM <marostegui> I am going to kill all that
10:33 PM <marostegui> the query killer is disabled due to a 10.4 bug
Some more details.
There were a few long running queries
| 669154401 | wikiuser | 10.64.0.59:40112 | wikidatawiki | Query | 95780 | Sending data | SELECT /* SpecialRecentChanges::doMainQuery */ rc_id,rc_timestamp,rc_namespace,rc_title,rc_
Those queries were running for hours and as the query killer wasn't enabled due to a mariadb 10.4 bug (T247728), they were causing 3-5 seconds of continued lag.
Change 596824 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Observability for SqlChangeDispatchCoordinator wait for replication