Replication lag on multiple databases on tool-labs
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Steinsplitter
	Jul 11 2015, 11:33 AM

Description

https://tools.wmflabs.org/betacommand-dev/cgi-bin/replag

MariaDB [dewiki_p]>  SELECT UNIX_TIMESTAMP() - UNIX_TIMESTAMP(MAX(rc_timestamp)) FROM recentchanges;
+------------------------------------------------------+
| UNIX_TIMESTAMP() - UNIX_TIMESTAMP(MAX(rc_timestamp)) |
+------------------------------------------------------+
|                                         32647.000000 |
+------------------------------------------------------+
1 row in set (0.00 sec)

MariaDB [dewiki_p]> USE commonswiki_p;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [commonswiki_p]>  SELECT UNIX_TIMESTAMP() - UNIX_TIMESTAMP(MAX(rc_timestamp)) FROM recentchanges;
+------------------------------------------------------+
| UNIX_TIMESTAMP() - UNIX_TIMESTAMP(MAX(rc_timestamp)) |
+------------------------------------------------------+
|                                         32659.000000 |
+------------------------------------------------------+
1 row in set (0.00 sec)

Related Objects

Mentioned Here: T105503: Tables corrupted or impossible to work with them

Event Timeline

Steinsplitter created this task.Jul 11 2015, 11:33 AM

Steinsplitter raised the priority of this task from to Unbreak Now!.

Steinsplitter updated the task description. (Show Details)

Steinsplitter added a project: Toolforge.

Steinsplitter subscribed.

Restricted Application added a project: Cloud-Services. · View Herald TranscriptJul 11 2015, 11:33 AM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Steinsplitter renamed this task from Replications lag on multiple databases to Replication lag on multiple databases on tool-labs.Jul 11 2015, 11:34 AM

Steinsplitter set Security to None.

Steinsplitter added a subscriber: doctaxon.Jul 11 2015, 11:40 AM

Restricted Application added a subscriber: Luke081515. · View Herald TranscriptJul 11 2015, 11:40 AM

labsdb1002 crashed yesterday at 2:38 UTC due to excessive memory usage. I've restarted replication, there is not much to do now but wait.

doctaxon edited subscribers, added: coren, yuvipanda; removed: jcrespo.Jul 11 2015, 11:46 AM

ah, replag has been corrected now. It's over. Thanks!

but yesterday 2:38 UTC, it was running still till 10:15 UTC today

db was running, replication wasn't. Replication being stopped for 8 hours is consistent with the effects seen. Does that answer your question?

However, there seem to be corruption in some user-created tables (T105503) as some people use unsafe engines such as MyISAM.

Jcrespo: Why could my tasks work till 10:15 AM UTC today, if you say, it crashed 2:38 UTC yesterday

When mysql crashes, mysqld_safe, the watchdog process restarts mysql automatically. To avoid replication errors, replication is configured to not restart automatically and require human intervention.

While I am ok with answering questions on IRC, please do not reopen a task unless it as been closed incorrectly. If there is another issue with the databases, open a new task. Thank you!

this task is resolved only for dewiki_p but not for commonswiki_p , there is still replication lag

In T105585#1447296, @jcrespo wrote:

labsdb1002 crashed yesterday at 2:38 UTC due to excessive memory usage. I've restarted replication, there is not much to do now but wait.

Are you aware that there is still a replag on commonswiki_p which is blocking a lot of stuff on commons?

MariaDB [commonswiki_p]> SELECT UNIX_TIMESTAMP() - UNIX_TIMESTAMP(MAX(rc_timestamp)) AS replag FROM recentchanges;
+--------------+
| replag       |
+--------------+
| 45998.000000 |
+--------------+
1 row in set (0.01 sec)

doctaxon added a subscriber: valhallasw.Jul 12 2015, 7:54 AM

doctaxon added a subscriber: Andrew.Jul 12 2015, 8:06 AM

Nemo_bis added a project: DBA.Jul 12 2015, 9:14 PM

Nemo_bis updated the task description. (Show Details)

Nemo_bis subscribed.

Krenair subscribed.Jul 12 2015, 9:15 PM

Betacommand subscribed.Jul 12 2015, 10:55 PM

Regardless with what happened to the primary mariadb server (wmf switched to this a while back) the actual database replication is still non-functional. As of now we are at one day 4 hours lag and growing.

Lag seems suddenly resolved, mostly under few minutes with few exceptions.

@Betacommand, can you check again?

yes, looks fine now.

Ricordisamoa subscribed.Jul 13 2015, 9:16 PM

This issue seems to have caused some serious problems on replica servers. See T105713.

Steinsplitter closed this task as Resolved.Jul 15 2015, 2:53 PM

Superyetkin reopened this task as Stalled.Jul 15 2015, 2:57 PM

Superyetkin, your issue is not about replication and is already tracked in its own report; please don't reopen this one.

• Phabricator_maintenance removed a subscriber: yuvipanda.Jun 7 2017, 6:49 PM

Restricted Application added subscribers: Jay8g, TerraCodes. · View Herald TranscriptJun 7 2017, 6:49 PM

Replication lag on multiple databases on tool-labsClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Replication lag on multiple databases on tool-labs
Closed, ResolvedPublic
Actions