Page MenuHomePhabricator

db2042 reimage
Closed, ResolvedPublic

Description

Tables were corrupted or got corrupted on transference. There is a WARNING: Slot 0: Predictive Failure: 1I:1:11 that showed after reinstalling. This means we have lost all rc hosts on codfw.

I will try the transfer again or clone from another host and repartition.

Details

Related Gerrit Patches:
operations/mediawiki-config : masterRepool db2042 after maintenance
operations/mediawiki-config : masterDepool db2048 for maintenance

Event Timeline

jcrespo created this task.Nov 9 2016, 3:17 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 9 2016, 3:17 PM
jcrespo moved this task from Triage to In progress on the DBA board.Nov 9 2016, 3:17 PM

Change 320608 had a related patch set uploaded (by Jcrespo):
Depool db2048 for maintenance

https://gerrit.wikimedia.org/r/320608

Change 320608 merged by jenkins-bot:
Depool db2048 for maintenance

https://gerrit.wikimedia.org/r/320608

I have recovered the data from db2048, while both are depooled. However, that means I have to re-parttition. I will do that when replication lag goes down.

Stashbot added a subscriber: Stashbot.

Mentioned in SAL (#wikimedia-operations) [2016-11-10T09:38:27Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: wmf-config/db-codfw.php Depool db1059 - T149079. Repool db2048 T150334 (duration: 00m 50s)

jcrespo closed this task as Resolved.Nov 14 2016, 7:38 PM

Partitioning done:

Query OK, 691803971 rows affected (3 days 3 hours 22 min 9.74 sec)

We will copy to db2034 (or what substitutes it) from here.

jcrespo reopened this task as Open.Nov 14 2016, 7:38 PM

Actually, we need to repool it still.

Change 321476 had a related patch set uploaded (by Jcrespo):
Repool db2042 after maintenance

https://gerrit.wikimedia.org/r/321476

Change 321476 merged by jenkins-bot:
Repool db2042 after maintenance

https://gerrit.wikimedia.org/r/321476

jcrespo closed this task as Resolved.Nov 16 2016, 10:14 AM