Page MenuHomePhabricator

Define a hard upper limit of query service based maxlag
Open, HighPublic

Description

Between 3:00 and 4:00 today, Wikidata has a maxlag of 26490422 seconds (306 days). This may break many bots that sleeps accordings to the maxlag. I proposed that in any cases Wikidata should not report a maxlag more than 5 minutes (300 seconds). In comparison, the usual limit of maxlag is 5 seconds.
https://grafana.wikimedia.org/d/000000170/wikidata-edits?orgId=1&from=1589422190411&to=1589431678958

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 14 2020, 5:43 AM

Notes backend QuickStatements batches seems broken for this reason.

This breaks Widars too as Widar sleeps 3*maxlag+1 seconds before edits, and now Widar sleeps 919 days.

@Gehel and @Zbyszko any idea what's causing these huge spikes? Work on the server? Something else?

Looking at WDQS lag and [[ URL | Wikidata lag ]] in parallel, I don't see a clear correlation between the 2. There have been a few data reloads last week (you can see the spike on lag on the WDQS side), but the servers should be depooled during that operation. And the spike on lag is only 2h.

It looks like the Wikidata lag is always at 43.80 weeks. This looks like a NaN converted to something wrong, or an overflow of some kind.

@Addshore: you might have a better idea of what's could be happening on the Wikidata side.

Side note: exposing both the MySQL replication lag and the WDQS replication lag through the same value seems like a bit of an abuse of the system. Would it be better to expose both separately and let the clients have a more specific interpretation of those numbers?

The number of second was current Unix time/60, which means unavailable server are treated as have a epoch of zero.

I guess this is partly T252077

Gehel triaged this task as High priority.Sep 15 2020, 7:59 AM