Page MenuHomePhabricator

Replication lag starting a script on tools.taxonbot
Closed, ResolvedPublic

Description

(tools.taxonbot)

Replication lag of 10 minutes now starting the script catstruct.tcl in tcl shell and at other scripts running on the grid engine since about 12:12 UTC, Jan. 20th 2016

Event Timeline

doctaxon raised the priority of this task from to Unbreak Now!.
doctaxon updated the task description. (Show Details)
doctaxon added a project: Toolforge.
doctaxon subscribed.
Restricted Application added subscribers: Luke081515, Aklapper. · View Herald Transcript

Replication lag of 10 minutes

Are you talking about the production replica databases or something else?

I think, it is MariaDB ... but I am not sure.

Replication lag of 10 minutes

Are you talking about the production replica databases or something else?

Its the dewiki_p but I think its related to changes being made on prod. (Schema change) So this may not be an actual issue.

Then yes, this is a known issue. A production schema change is ongoing on wikidata, affecting replication lag on some servers, both in production and labs of wikidata and dewiki. The ETA for the ending of the maintenance is 3 hours, after which replication should slowly go back to normal.

More details:

https://de.wikipedia.org/wiki/Wikipedia:Projektdiskussion#Datenbankwartung
https://www.wikidata.org/wiki/Wikidata:Project_chat#Wikidata_under_database_maintenance
https://tools.wmflabs.org/replag/
https://wikitech.wikimedia.org/wiki/Server_Admin_Log

I can announce you when the maintenance has finished.

It's since about 12:12 UTC, now it's 15:18 UTC - the 3 hours should have ended now. Replication lag should go back to normal these minutes now ... Is it possible to monitor?

@doctaxon I meant 3 hours from now (a bit less, now) You can monitor the exact replication lag at https://tools.wmflabs.org/replag/

I will announce the end of the maintenance on the Server Admin Log page above, and I can do it here, too. The ETA for the end of the maintenance is 18:15 UTC, plus the time that replicas take to return back to 0. Please note that this is not a labs-only issue, the same "lag" will be found on most servers in production, too.

@doctaxon, maintenance has finished on production. How much time it will take for labsdb to catch up will depend on how loaded is the wikidata (s5) shard on labs. I will monitor it to make sure the lag shrinks as intended, and kill blocking queries if needed to restore the service.

jcrespo claimed this task.

All lag finally went away at 3:25 UTC.