Replication lag starting a script on tools.taxonbot
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	doctaxon
	Jan 20 2016, 2:52 PM

Description

(tools.taxonbot)

Replication lag of 10 minutes now starting the script catstruct.tcl in tcl shell and at other scripts running on the grid engine since about 12:12 UTC, Jan. 20th 2016

Event Timeline

doctaxon created this task.Jan 20 2016, 2:52 PM

doctaxon raised the priority of this task from to Unbreak Now!.

doctaxon updated the task description. (Show Details)

doctaxon added a project: Toolforge.

doctaxon subscribed.

Restricted Application added a project: Cloud-Services. · View Herald TranscriptJan 20 2016, 2:52 PM

Restricted Application added subscribers: Luke081515, Aklapper. · View Herald Transcript

doctaxon set Security to None.Jan 20 2016, 2:54 PM

doctaxon added a subscriber: Giftpflanze.

Replication lag of 10 minutes

Are you talking about the production replica databases or something else?

I think, it is MariaDB ... but I am not sure.

In T124172#1948028, @jcrespo wrote:

Replication lag of 10 minutes

Are you talking about the production replica databases or something else?

Its the dewiki_p but I think its related to changes being made on prod. (Schema change) So this may not be an actual issue.

Then yes, this is a known issue. A production schema change is ongoing on wikidata, affecting replication lag on some servers, both in production and labs of wikidata and dewiki. The ETA for the ending of the maintenance is 3 hours, after which replication should slowly go back to normal.

More details:

https://de.wikipedia.org/wiki/Wikipedia:Projektdiskussion#Datenbankwartung
https://www.wikidata.org/wiki/Wikidata:Project_chat#Wikidata_under_database_maintenance
https://tools.wmflabs.org/replag/
https://wikitech.wikimedia.org/wiki/Server_Admin_Log

I can announce you when the maintenance has finished.

It's since about 12:12 UTC, now it's 15:18 UTC - the 3 hours should have ended now. Replication lag should go back to normal these minutes now ... Is it possible to monitor?

@doctaxon I meant 3 hours from now (a bit less, now) You can monitor the exact replication lag at https://tools.wmflabs.org/replag/

I will announce the end of the maintenance on the Server Admin Log page above, and I can do it here, too. The ETA for the end of the maintenance is 18:15 UTC, plus the time that replicas take to return back to 0. Please note that this is not a labs-only issue, the same "lag" will be found on most servers in production, too.

@doctaxon, maintenance has finished on production. How much time it will take for labsdb to catch up will depend on how loaded is the wikidata (s5) shard on labs. I will monitor it to make sure the lag shrinks as intended, and kill blocking queries if needed to restore the service.

Giftpflanze unsubscribed.Jan 20 2016, 9:29 PM

Boshomi subscribed.Jan 20 2016, 9:52 PM

All lag finally went away at 3:25 UTC.

Replication lag starting a script on tools.taxonbotClosed, ResolvedPublicActions

Description

Event Timeline

Replication lag starting a script on tools.taxonbot
Closed, ResolvedPublic
Actions