Page MenuHomePhabricator
Feed Advanced Search

Feb 19 2019

Marostegui added a project to T216526: Degraded RAID on cloudvirt1018: cloud-services-team.
Feb 19 2019, 5:26 PM · cloud-services-team, ops-eqiad, SRE
Marostegui added a comment to T172410: Replace the current multisource analytics-store setup.

I just noticed that the tables related to the Echo extension are (surprisingly) not yet available in the enwiki shard (s1-analytics-replica.eqiad.wmnet), but are in analytics-store.eqiad.wmnet. Is there a page we can refer to to check on parity/status of data availability?

Feb 19 2019, 4:29 PM · Analytics-Radar, Product-Analytics, WMDE-Analytics-Engineering, User-Addshore, User-Elukey, Research
Marostegui closed T216273: New cronspam from db clusters as Resolved.

Nothing has arrived since the restart without debug, so I think we are good

Feb 19 2019, 2:09 PM · SRE
Marostegui closed T216273: New cronspam from db clusters, a subtask of T132324: Tracking and Reducing cron-spam to root@ , as Resolved.
Feb 19 2019, 2:09 PM · Patch-For-Review, Tracking-Neverending, SRE
Marostegui added a comment to T149077: Certain ApiQueryRecentChanges::run api query is too slow, slowing down dewiki.

Could this be another case of MariaDB getting the optimizer fixed with a new version as it doesn't happen on 10.1.36 or 10.1.37 for the original query?

root@db1070.eqiad.wmnet[dewiki]> EXPLAIN SELECT /* ApiQueryRecentChanges::run */ rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>='20161024013525') AND rc_namespace IN ('0', '120') AND rc_type IN ('0', '1', '3', '6') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: recentchanges
         type: range
possible_keys: rc_timestamp,rc_ns_usertext,rc_name_type_patrolled_timestamp,rc_ns_actor,rc_namespace_title_timestamp
          key: rc_timestamp
      key_len: 16
          ref: NULL
         rows: 518658
        Extra: Using index condition; Using where
1 row in set (0.00 sec)
Feb 19 2019, 1:47 PM · Platform Team Workboards (Clinic Duty Team), Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), Wikimedia-production-error, MediaWiki-Action-API, DBA
Marostegui added a subtask for T216491: Decommission dbstore1002: T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5].
Feb 19 2019, 10:44 AM · Analytics-Radar, Patch-For-Review, decommission-hardware, ops-eqiad, SRE
Marostegui added a parent task for T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5]: T216491: Decommission dbstore1002.
Feb 19 2019, 10:44 AM · Analytics-Radar, Patch-For-Review, User-Banyek, Analytics-Kanban, DBA
Marostegui changed the status of T216491: Decommission dbstore1002 from Open to Stalled.
Feb 19 2019, 10:44 AM · Analytics-Radar, Patch-For-Review, decommission-hardware, ops-eqiad, SRE
Marostegui created T216491: Decommission dbstore1002.
Feb 19 2019, 10:44 AM · Analytics-Radar, Patch-For-Review, decommission-hardware, ops-eqiad, SRE
Marostegui lowered the priority of T213670: dbstore1002 Mysql errors from High to Low.

Reducing priority as the errors on dbstore1002 are not too important anymore as this host shouldn't be used anymore and everything using it should migrate to the new hosts T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5]

Feb 19 2019, 10:39 AM · Patch-For-Review, SRE, Product-Analytics, Analytics-Kanban, Analytics
Marostegui added a comment to T213670: dbstore1002 Mysql errors.

For what is worth, dbstore1002 is now lagging behind on s8 (wikidatawiki) 7 days and it keeps lagging, I doubt it will ever catch up.
Yesterday the migration to dbstore1003-1005 of the staging database happened (T210478#4963411), so everyone should start using that one as soon as possible, specially after seeing so many crashes, lags that will never recover and corrupted data (due to the above crashes)

Feb 19 2019, 10:37 AM · Patch-For-Review, SRE, Product-Analytics, Analytics-Kanban, Analytics
Marostegui added a comment to T215589: Migrate users to dbstore100[3-5].

For what is worth, dbstore1002 is now lagging behind on s8 (wikidatawiki) 7 days and it keeps lagging, I doubt it will ever catch up.

Feb 19 2019, 10:36 AM · User-Marostegui, Analytics-Kanban, Analytics
Marostegui closed T174802: Archive and drop education program (ep_*) tables on all wikis as Resolved.

This is all done.
The only pending follow up is to remove the views which has its own task T216481: Remove views on ep_* tables on the wikireplicas hosts

Feb 19 2019, 9:23 AM · User-notice-archive, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui closed T174802: Archive and drop education program (ep_*) tables on all wikis, a subtask of T54921: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking), as Resolved.
Feb 19 2019, 9:23 AM · Epic, DBA, Tracking-Neverending
Marostegui updated the task description for T174802: Archive and drop education program (ep_*) tables on all wikis.
Feb 19 2019, 9:22 AM · User-notice-archive, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui updated the task description for T174802: Archive and drop education program (ep_*) tables on all wikis.
Feb 19 2019, 8:41 AM · User-notice-archive, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui triaged T216481: Remove views on ep_* tables on the wikireplicas hosts as Medium priority.
Feb 19 2019, 8:36 AM · Patch-For-Review, cloud-services-team (Kanban), Data-Services
Marostegui updated the task description for T174802: Archive and drop education program (ep_*) tables on all wikis.
Feb 19 2019, 8:34 AM · User-notice-archive, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui updated the task description for T174802: Archive and drop education program (ep_*) tables on all wikis.
Feb 19 2019, 8:24 AM · User-notice-archive, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui updated the task description for T174802: Archive and drop education program (ep_*) tables on all wikis.
Feb 19 2019, 8:06 AM · User-notice-archive, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui added a comment to T216240: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092.

db1106 has been rebooted (and kernel was upgraded)

Feb 19 2019, 7:56 AM · DBA
Marostegui updated the task description for T216240: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092.
Feb 19 2019, 7:56 AM · DBA
Marostegui added a comment to T216273: New cronspam from db clusters.

I have rebooted db1106, I will give it sometime to confirm the spam is gone before closing this task.

Feb 19 2019, 7:56 AM · SRE
Marostegui updated the task description for T174802: Archive and drop education program (ep_*) tables on all wikis.
Feb 19 2019, 7:49 AM · User-notice-archive, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui changed the status of T216444: Global rename of Дагиров Умар → Takhirgeran Umar: supervision needed from Open to Stalled.
Feb 19 2019, 7:15 AM · DBA, Wikimedia-Site-requests
Marostegui changed the status of T216444: Global rename of Дагиров Умар → Takhirgeran Umar: supervision needed, a subtask of T169440: Pending global renames in need of sysadmin supervision (tracking), from Open to Stalled.
Feb 19 2019, 7:14 AM · MediaWiki-extensions-CentralAuth, GlobalRename, Tracking-Neverending
Marostegui updated the task description for T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5].
Feb 19 2019, 6:10 AM · Analytics-Radar, Patch-For-Review, User-Banyek, Analytics-Kanban, DBA
Marostegui added a comment to T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5].

The migration finished. These are the times in UTC from 18th Feb 2019:

Feb 19 2019, 6:10 AM · Analytics-Radar, Patch-For-Review, User-Banyek, Analytics-Kanban, DBA

Feb 18 2019

Marostegui added a comment to T216444: Global rename of Дагиров Умар → Takhirgeran Umar: supervision needed.

This should wait until T215107 is unblocked and resolved T215107#4962933

Feb 18 2019, 8:07 PM · DBA, Wikimedia-Site-requests
Marostegui added a comment to T216441: Evaluate transferring the non-replicated tables to the new toolsdb server.

Of course :-). Just mentioning this as an idea to Cloud Team

Feb 18 2019, 8:05 PM · Data-Services, cloud-services-team (Kanban)
Marostegui added a comment to T216441: Evaluate transferring the non-replicated tables to the new toolsdb server.

Just saying: we have a testing host where we could try to import those databases from labsdb1005 and see if they fail or what fails during the import process.
Let me know if I can help with this.

Feb 18 2019, 7:55 PM · Data-Services, cloud-services-team (Kanban)
Marostegui added a comment to T215107: Global rename of The_Photographer → Wilfredor: supervision needed.

There have been no retries from what I can see on: https://logstash.wikimedia.org/goto/65afdb88fef30982130c53e40a644b06

Feb 18 2019, 5:44 PM · MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), Patch-For-Review, User-MarcoAurelio, DBA, Wikimedia-Site-requests
Marostegui added a comment to T215107: Global rename of The_Photographer → Wilfredor: supervision needed.

It timed out on Commonswiki:
https://logstash.wikimedia.org/goto/34de73560ce6692f0012e846f7a4de0c
Maybe @Legoktm can help to unblock it?

Feb 18 2019, 2:56 PM · MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), Patch-For-Review, User-MarcoAurelio, DBA, Wikimedia-Site-requests
Marostegui added a comment to T215107: Global rename of The_Photographer → Wilfredor: supervision needed.

Go for it!

Feb 18 2019, 2:41 PM · MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), Patch-For-Review, User-MarcoAurelio, DBA, Wikimedia-Site-requests
Marostegui updated the task description for T174802: Archive and drop education program (ep_*) tables on all wikis.
Feb 18 2019, 2:22 PM · User-notice-archive, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui added a comment to T193264: Replace labsdb100[4567] with instances on cloudvirt1019 and cloudvirt1020.

I have been talking to @aborrero about the new instance on clouddb1001 - and I have been taking a general look.
While comparing the grants, I have realised that clouddb1001 is missing a grant for the following user: s52716 (that grant exists on labsdb1005) it could be a new user. I can easily copy that grant over to clouddb1001, but I want the green light from @Bstorm just in case this has something to do with maintain-dbusers or something :-)

Feb 18 2019, 2:21 PM · Machine-Learning-Team, Wikilabels, cloud-services-team (Kanban), Patch-For-Review, Epic, Cloud-VPS
Marostegui updated the task description for T174802: Archive and drop education program (ep_*) tables on all wikis.
Feb 18 2019, 10:50 AM · User-notice-archive, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui claimed T216273: New cronspam from db clusters.

I will take care of db1106 as I need to depool it anyways today or tomorrow.

Feb 18 2019, 10:08 AM · SRE
Marostegui updated the task description for T174802: Archive and drop education program (ep_*) tables on all wikis.
Feb 18 2019, 9:32 AM · User-notice-archive, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui updated the task description for T210713: Drop change_tag.ct_tag column in production.
Feb 18 2019, 9:05 AM · Schema-change-in-production, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui added a comment to T210713: Drop change_tag.ct_tag column in production.

s1 eqiad progress

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1003
  • dbstore1002
  • dbstore1001
  • db1124
  • db1119
  • db1118
  • db1106
  • db1105
  • db1099
  • db1089
  • db1083
  • db1080
  • db1067 T210713#4967984
Feb 18 2019, 9:04 AM · Schema-change-in-production, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui updated the task description for T210713: Drop change_tag.ct_tag column in production.
Feb 18 2019, 9:02 AM · Schema-change-in-production, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui added a comment to T213406: Purchase and setup remaining hosts for database backups.

I think we can close this, we already have the tasks in place:
T216142
T216138
T216137
T214069
T214066

Feb 18 2019, 8:37 AM · Goal, DBA
Marostegui updated the task description for T174802: Archive and drop education program (ep_*) tables on all wikis.
Feb 18 2019, 8:32 AM · User-notice-archive, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui added a comment to T216273: New cronspam from db clusters.

db2085 has been rebooted - let's see if that stops the amount of emails.

Feb 18 2019, 7:01 AM · SRE
Marostegui added a comment to T216240: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092.

I have rebooted db2085 without debug option on kernel as part of (T216273) and I have taken the opportunity to upgrade its kernel too.

Feb 18 2019, 6:59 AM · DBA
Marostegui updated the task description for T216240: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092.
Feb 18 2019, 6:58 AM · DBA
Marostegui removed a project from T216213: s52481__stats_global running CREATE DATABASE IF NOT EXISTS on too many queries causing locking issues: DBA.
Feb 18 2019, 6:18 AM · Data-Services, Tracking-Neverending, Toolforge

Feb 17 2019

Marostegui edited projects for T216353: toolsdb: firewalling changes for new setup (temporal mysql replication), added: User-Marostegui; removed DBA.
Feb 17 2019, 7:26 PM · User-Marostegui, netops, SRE, cloud-services-team (Kanban), Cloud-VPS

Feb 16 2019

Marostegui added a comment to T216273: New cronspam from db clusters.

We probably just need to reboot them without the kernel running debug mode as spoken on Friday

Feb 16 2019, 9:03 PM · SRE

Feb 15 2019

Marostegui added a comment to T215589: Migrate users to dbstore100[3-5].

Ah, nevermind my comment, you decided to completely move away from dbstore1002 :-)
Thanks!

Feb 15 2019, 9:23 PM · User-Marostegui, Analytics-Kanban, Analytics
Marostegui added a comment to T215589: Migrate users to dbstore100[3-5].

Another staging database where? Just to clarify: dbstore1002 will be full read only after the migration (MySQL doesn't allow to set read only on a database level, it is a global flag).

Feb 15 2019, 9:21 PM · User-Marostegui, Analytics-Kanban, Analytics
Marostegui added a comment to T216133: Increase visibility of auto-generated tasks for RAID errors.

For what is worth, I do have a Herald rule that automatically subscribes me to any degraded RAID ticket for the databases and that proved to be a good way to get my attention, as otherwise monitoring the Operations queue is hard and it is easy to miss things.

Feb 15 2019, 5:41 PM · Sustainability (Incident Followup), cloud-services-team (Kanban), SRE, DC-Ops
Marostegui updated subscribers of T216223: Degraded RAID on labsdb1005.
Feb 15 2019, 10:05 AM · cloud-services-team (Kanban), Toolforge, ops-eqiad, SRE
Marostegui added a watcher for Schema-change: Marostegui.
Feb 15 2019, 9:56 AM
Marostegui added a comment to T216213: s52481__stats_global running CREATE DATABASE IF NOT EXISTS on too many queries causing locking issues.

I don't know what was the situation yesterday night, as I wasn't present during the troubleshooting - however, we had pretty much the same issue at around 6AM UTC today, and from what I could see, your user wasn't among the ones creating issues (see T216208#4956626)

Feb 15 2019, 9:29 AM · Data-Services, Tracking-Neverending, Toolforge
Marostegui triaged T216170: toolsdb - Per-user connection limits as High priority.
Feb 15 2019, 7:55 AM · cloud-services-team (Kanban), Toolforge, Data-Services
Marostegui updated subscribers of T216170: toolsdb - Per-user connection limits.

For what is worth, the server has looked stable for one hour now, since I enabled the global max_user_connections. It might be preventing some tools to work if they require more than 20 connections, but at least the rest of tools/users do not suffer the outage.
As per my conversation with @Bstorm this is a temporary mitigation issue to get the server under control again - if we finally want to go for per user limit, we should look at individual cases where we will need to increase the connection limit as we do with the wikireplicas.

Feb 15 2019, 7:55 AM · cloud-services-team (Kanban), Toolforge, Data-Services
Marostegui moved T216213: s52481__stats_global running CREATE DATABASE IF NOT EXISTS on too many queries causing locking issues from Triage to Blocked external/Not db team on the DBA board.
Feb 15 2019, 7:50 AM · Data-Services, Tracking-Neverending, Toolforge
Marostegui edited projects for T216223: Degraded RAID on labsdb1005, added: Toolforge; removed Data-Services.
Feb 15 2019, 7:24 AM · cloud-services-team (Kanban), Toolforge, ops-eqiad, SRE
Marostegui updated subscribers of T216223: Degraded RAID on labsdb1005.

cloud-services-team I would suggest you coordinate with @Cmjohnson to get this disk replaced

Feb 15 2019, 7:23 AM · cloud-services-team (Kanban), Toolforge, ops-eqiad, SRE
Marostegui added projects to T216223: Degraded RAID on labsdb1005: cloud-services-team, Data-Services.
Feb 15 2019, 7:21 AM · cloud-services-team (Kanban), Toolforge, ops-eqiad, SRE
Marostegui lowered the priority of T216183: Special:ProtectedPages times out on enwiki for Module namespace from High to Medium.

Another brilliant analysis from @Anomie :-)
(Decreasing priority as this doesn't seem to happen very often as per: https://logstash.wikimedia.org/goto/4854d6d92b272ad88d23696570c7dad6)

Feb 15 2019, 6:49 AM · Performance-Team, User-Marostegui, Wikimedia-production-error, MediaWiki-libs-Rdbms, MediaWiki-Special-pages
Marostegui added a comment to T216170: toolsdb - Per-user connection limits.

Cross posting from the main track task as an emergency mitigation: T216208#4956634

Feb 15 2019, 6:40 AM · cloud-services-team (Kanban), Toolforge, Data-Services
Marostegui added a comment to T216208: ToolsDB overload and cleanup.

I have restarted the server with max_user_connections = 20 to try to mitigate this, the server was unusable anyways.

Feb 15 2019, 6:39 AM · TCB-Team (now WMDE-TechWish), Phragile, Data-Services, cloud-services-team (Kanban)
Marostegui added a comment to T216208: ToolsDB overload and cleanup.

The server is again with "too many connections"

Feb 15 2019, 6:31 AM · TCB-Team (now WMDE-TechWish), Phragile, Data-Services, cloud-services-team (Kanban)
Marostegui added a project to T216208: ToolsDB overload and cleanup: Phragile.
root@labsdb1005:~# mysql --skip-ssl information_schema -e "select user, count(*) as count FROM information_Schema.processlist GROUP BY user ORDER BY count DESC limit 10"
+----------+-------+
| user     | count |
+----------+-------+
| u2815    |   169 |
| s52552   |   151 |
| watchdog |   121 |
| s53098   |    87 |
| s51344   |    78 |
| s52524   |    45 |
| s53213   |    40 |
| s51434   |    23 |
| s52585   |    22 |
| s52680   |    20 |
+----------+-------+
Feb 15 2019, 6:25 AM · TCB-Team (now WMDE-TechWish), Phragile, Data-Services, cloud-services-team (Kanban)
Marostegui added a comment to T216213: s52481__stats_global running CREATE DATABASE IF NOT EXISTS on too many queries causing locking issues.

That can be a consecuence and not really the cause. If the server is too overloaded, it might not be able to create that and the code might be retrying, and as we don't have a per user limit...
Both things should be probably fixed 1) code 2) establish a per user limit.

Feb 15 2019, 6:19 AM · Data-Services, Tracking-Neverending, Toolforge

Feb 14 2019

Marostegui added a comment to T216173: labsdb1005/6 - Upgrade to Stretch.

And also very very old hardware.

Feb 14 2019, 9:26 PM · Data-Services, cloud-services-team (Kanban)
Marostegui edited projects for T216183: Special:ProtectedPages times out on enwiki for Module namespace, added: User-Marostegui; removed DBA.

I believe this is the query, which is too slow and gets killed by the query killer:

root@db1106.eqiad.wmnet[enwiki]> explain SELECT /* IndexPager::buildQueryInfo (ProtectedPagesPager) */ pr_id, page_namespace, page_title, page_len, pr_type, pr_level, pr_expiry, pr_cascade, log_timestamp, log_deleted, comment_log_comment.comment_text AS `log_comment_text`, comment_log_comment.comment_data AS `log_comment_data`, comment_log_comment.comment_id AS `log_comment_cid`, log_user, log_user_text, NULL AS `log_actor` FROM `page`, `page_restrictions` LEFT JOIN `log_search` ON (ls_field = 'pr_id' AND (ls_value = pr_id)) LEFT JOIN (`logging` JOIN `comment` `comment_log_comment` ON ((comment_log_comment.comment_id = log_comment_id))) ON ((ls_log_id = log_id)) WHERE (pr_expiry > '20190214194211' OR pr_expiry IS NULL) AND (page_id=pr_page) AND (pr_type='edit') AND (page_namespace='828') ORDER BY pr_id LIMIT 101 ;
+------+-------------+---------------------+--------+------------------------------+------------+---------+-------------------------------+--------+---------------------------------+
| id   | select_type | table               | type   | possible_keys                | key        | key_len | ref                           | rows   | Extra                           |
+------+-------------+---------------------+--------+------------------------------+------------+---------+-------------------------------+--------+---------------------------------+
|    1 | SIMPLE      | page                | ref    | PRIMARY,name_title           | name_title | 4       | const                         |  25438 | Using temporary; Using filesort |
|    1 | SIMPLE      | page_restrictions   | eq_ref | PRIMARY,pr_page,pr_typelevel | PRIMARY    | 261     | enwiki.page.page_id,const     |      1 | Using where                     |
|    1 | SIMPLE      | log_search          | ref    | PRIMARY                      | PRIMARY    | 34      | const                         | 293280 | Using where; Using index        |
|    1 | SIMPLE      | logging             | eq_ref | PRIMARY                      | PRIMARY    | 4       | enwiki.log_search.ls_log_id   |      1 | Using where                     |
|    1 | SIMPLE      | comment_log_comment | eq_ref | PRIMARY                      | PRIMARY    | 8       | enwiki.logging.log_comment_id |      1 |                                 |
+------+-------------+---------------------+--------+------------------------------+------------+---------+-------------------------------+--------+---------------------------------+
5 rows in set (0.00 sec)
Feb 14 2019, 9:24 PM · Performance-Team, User-Marostegui, Wikimedia-production-error, MediaWiki-libs-Rdbms, MediaWiki-Special-pages
Marostegui added a comment to T215993: tools.db.svc.eqiad.wmflabs hitting it's limit?.

From what I can see none of the labsdb1005 have any connections limit, maybe we need to establish a limit of connections per user similar to what we have on the replicas. Better to "break" a tool than the whole server.
We can probably also take a look at those specific tools that might need more than X number of connections (being X the number of connections we decide to set).

Feb 14 2019, 9:14 PM · Data-Services
Marostegui added a comment to T214840: db2085/db1106 don't boot with 4.9.0-8-amd64.

Sure - go ahead :-)

Feb 14 2019, 4:23 PM · ops-codfw, Patch-For-Review, SRE, DBA
Marostegui added a comment to T214840: db2085/db1106 don't boot with 4.9.0-8-amd64.

All eqiad servers from the same batch as db1106 are running 4.9.0-8 already
db1096-db1106

Feb 14 2019, 4:19 PM · ops-codfw, Patch-For-Review, SRE, DBA
Marostegui closed T214840: db2085/db1106 don't boot with 4.9.0-8-amd64 as Resolved.

Reboot tests with db2085 4.9.0-8 after getting the BIOS and FW upgraded by Papaul (T214840#4954418)

Feb 14 2019, 3:44 PM · ops-codfw, Patch-For-Review, SRE, DBA
Marostegui added a comment to T214840: db2085/db1106 don't boot with 4.9.0-8-amd64.

Thank you! I will delete the idrac logs and start testing

Feb 14 2019, 3:12 PM · ops-codfw, Patch-For-Review, SRE, DBA
Marostegui added a comment to T214840: db2085/db1106 don't boot with 4.9.0-8-amd64.

@Papaul thanks - I am going to put it down now. Will ping you on IRC once it is down
Thanks!

Feb 14 2019, 2:20 PM · ops-codfw, Patch-For-Review, SRE, DBA
Marostegui closed Restricted Task, a subtask of T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5], as Resolved.
Feb 14 2019, 10:57 AM · Analytics-Radar, Patch-For-Review, User-Banyek, Analytics-Kanban, DBA
Marostegui added a comment to T216067: Recover from corrupted beta MySQL slave (deployment-db04).

@Marostegui /srv/sqldata has 39G on it on db04, presumably that's pretty close to the amount of data on the master.

Feb 14 2019, 10:20 AM · User-Ryasmeen, Release-Engineering-Team (Kanban), Beta-Cluster-Infrastructure
Marostegui moved T215616: Improve interlingual links across wikis through Wikidata IDs from Triage to Blocked external/Not db team on the DBA board.
Feb 14 2019, 8:41 AM · Data-Engineering-Icebox, Analytics-Radar, Research-Freezer, MediaWiki-General, Wikidata
Marostegui added a comment to T214840: db2085/db1106 don't boot with 4.9.0-8-amd64.

After the FW and BIOS upgraded I have rebooted db1106 a number of times with 4.9.0-8 and this is the result:

Feb 14 2019, 8:01 AM · ops-codfw, Patch-For-Review, SRE, DBA
Marostegui added a comment to T216067: Recover from corrupted beta MySQL slave (deployment-db04).

As per

1Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: Starting crash recovery from checkpoint LSN=407832716048
2Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [ERROR] InnoDB: checksum mismatch in tablespace ./enwiki/logging.ibd (table enwiki/logging)
3Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: Page size:1024 Pages to analyze:64
4Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: Page size: 1024, Possible space_id count:0
5Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: Page size:2048 Pages to analyze:64
6Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: Page size: 2048, Possible space_id count:0
7Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: Page size:4096 Pages to analyze:64
8Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: Page size: 4096, Possible space_id count:0
9Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: Page size:8192 Pages to analyze:64
10Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: Page size: 8192, Possible space_id count:0
11Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: Page size:16384 Pages to analyze:64
12Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:1 page_size:16384
13Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:2 page_size:16384
14Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:3 page_size:16384
15Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:4 page_size:16384
16Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:5 page_size:16384
17Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:6 page_size:16384
18Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:7 page_size:16384
19Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:8 page_size:16384
20Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:9 page_size:16384
21Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:10 page_size:16384
22Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:11 page_size:16384
23Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:12 page_size:16384
24Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:13 page_size:16384
25Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:14 page_size:16384
26Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:15 page_size:16384
27Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:16 page_size:16384
28Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:17 page_size:16384
29Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:18 page_size:16384
30Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:19 page_size:16384
31Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:20 page_size:16384
32Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:21 page_size:16384
33Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:22 page_size:16384
34Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:23 page_size:16384
35Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:24 page_size:16384
36Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:25 page_size:16384
37Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:26 page_size:16384
38Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:27 page_size:16384
39Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:28 page_size:16384
40Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:29 page_size:16384
41Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:30 page_size:16384
42Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:31 page_size:16384
43Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:32 page_size:16384
44Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:33 page_size:16384
45Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:34 page_size:16384
46Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:35 page_size:16384
47Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:36 page_size:16384
48Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:37 page_size:16384
49Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:38 page_size:16384
50Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:39 page_size:16384
51Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:40 page_size:16384
52Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:41 page_size:16384
53Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:42 page_size:16384
54Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:43 page_size:16384
55Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:44 page_size:16384
56Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:45 page_size:16384
57Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:46 page_size:16384
58Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:47 page_size:16384
59Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:48 page_size:16384
60Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:49 page_size:16384
61Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:50 page_size:16384
62Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:51 page_size:16384
63Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:52 page_size:16384
64Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:53 page_size:16384
65Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:54 page_size:16384
66Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:55 page_size:16384
67Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:56 page_size:16384
68Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:57 page_size:16384
69Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:58 page_size:16384
70Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:59 page_size:16384
71Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:60 page_size:16384
72Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:61 page_size:16384
73Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:62 page_size:16384
74Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: VALID: space:24829 page_no:63 page_size:16384
75Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: Page size: 16384, Possible space_id count:1
76Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: space_id:24829, Number of pages matched: 63/63 (16384)
77Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: Chosen space:24829
78Feb 13 18:12:54 deployment-db04 mysqld[1483]:
79Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Note] InnoDB: Restoring page 0 of tablespace 24829
80Feb 13 18:12:54 deployment-db04 mysqld[1483]: 190213 18:12:54 [Warning] InnoDB: Doublewrite does not have page_no=0 of space: 24829
81Feb 13 18:12:54 deployment-db04 mysqld[1483]: 2019-02-13 18:12:54 7f238e13e780 InnoDB: Operating system error number 2 in a file operation.
82Feb 13 18:12:54 deployment-db04 mysqld[1483]: InnoDB: The error means the system cannot find the path specified.
83Feb 13 18:12:54 deployment-db04 mysqld[1483]: InnoDB: If you are installing InnoDB, remember that you must create
84Feb 13 18:12:54 deployment-db04 mysqld[1483]: InnoDB: directories yourself, InnoDB does not create them.
85Feb 13 18:12:54 deployment-db04 mysqld[1483]: InnoDB: Error: could not open single-table tablespace file ./enwiki/logging.ibd
86Feb 13 18:12:54 deployment-db04 mysqld[1483]: InnoDB: We do not continue the crash recovery, because the table may become
87Feb 13 18:12:54 deployment-db04 mysqld[1483]: InnoDB: corrupt if we cannot apply the log records in the InnoDB log to it.
88Feb 13 18:12:54 deployment-db04 mysqld[1483]: InnoDB: To fix the problem and start mysqld:
89Feb 13 18:12:54 deployment-db04 mysqld[1483]: InnoDB: 1) If there is a permission problem in the file and mysqld cannot
90Feb 13 18:12:54 deployment-db04 mysqld[1483]: InnoDB: open the file, you should modify the permissions.
91Feb 13 18:12:54 deployment-db04 mysqld[1483]: InnoDB: 2) If the table is not needed, or you can restore it from a backup,
92Feb 13 18:12:54 deployment-db04 mysqld[1483]: InnoDB: then you can remove the .ibd file, and InnoDB will do a normal
93Feb 13 18:12:54 deployment-db04 mysqld[1483]: InnoDB: crash recovery and ignore that table.
94Feb 13 18:12:54 deployment-db04 mysqld[1483]: InnoDB: 3) If the file system or the disk is broken, and you cannot remove
95Feb 13 18:12:54 deployment-db04 mysqld[1483]: InnoDB: the .ibd file, you can set innodb_force_recovery > 0 in my.cnf
96Feb 13 18:12:54 deployment-db04 mysqld[1483]: InnoDB: and force InnoDB to continue crash recovery here.
data looks in a pretty bad state :(
Probably the best approach would be to re-clone that new instance directly from the master - how big is the data size on the master?

Feb 14 2019, 6:43 AM · User-Ryasmeen, Release-Engineering-Team (Kanban), Beta-Cluster-Infrastructure
Marostegui added a comment to T214720: db1114 crashed (HW memory issues).

@Cmjohnson should we also try to exchange the DIMM modules listed at T214720#4937872 and see if they fail again?

Feb 14 2019, 6:07 AM · Patch-For-Review, DBA, SRE, ops-eqiad
Marostegui added a comment to T214840: db2085/db1106 don't boot with 4.9.0-8-amd64.

@Marostegui in most cases the CPU1/CPU2 Machine check error detected is caused from outdated BIOS. I will recommend that we first update the BIOS. The system BIOS right now is at 2.4.3 and there is a new version out (2.9.1) from 11/02/2019.After this we can check some settings in the BIOS under BIOS profile .

Feb 14 2019, 6:01 AM · ops-codfw, Patch-For-Review, SRE, DBA

Feb 13 2019

Marostegui added a comment to T214840: db2085/db1106 don't boot with 4.9.0-8-amd64.

Chris has upgraded FW/BIOS on db1106 (thanks!) - so tomorrow I will do a few more reboots to keep debugging this.

Feb 13 2019, 5:53 PM · ops-codfw, Patch-For-Review, SRE, DBA
Marostegui added a comment to T215902: Investigate normalization of data stored in wb_terms table.

term_type, small table holding the types of strings to be indexed in the db, right now this would be labels, descriptions and aliases, but this would scale to allowing more similar terms into the index (if desired) It might be the case not having a table here would be better and just keep INT ids in code.

There may be little point in normalizing the term_type for example. This thing only has 3 rows. (languages is also pretty small)

We could also turn term types and languages into short IDs via a hash function: as far as I’m aware, Wikibase only needs the string→ID direction (hash function), and if we need the ID→string direction (e. g. during manual investigation) we can hash all the known term types / language codes and look for the value we have.

Why not combining strings langstring and langstringtype on the same table?

For certain common types of items – especially people, but also e. g. cities – it is common to have the same label in a lot of different languages (see also T188992#4026839), so I think a strings table without a language code should help a lot. I’m not sure about the distinction between langstring and langstringtype though.

I guess the easiest is to migrate things while writing to both and at some point once both set of tables are in sync switch the writes to the new tables only?

Do the DB servers have enough storage space for this?

Feb 13 2019, 1:35 PM · Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)), User-Addshore, [DEPRECATED] wdwb-tech, Wikidata
Marostegui added a comment to T215902: Investigate normalization of data stored in wb_terms table.

My first investigation into table normalization went for full normalization:

Feb 13 2019, 10:37 AM · Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)), User-Addshore, [DEPRECATED] wdwb-tech, Wikidata
Marostegui updated subscribers of T214840: db2085/db1106 don't boot with 4.9.0-8-amd64.

db1106 with 4.9.0-8 with debug enabled on the kernel, reboots sequence:

Feb 13 2019, 10:05 AM · ops-codfw, Patch-For-Review, SRE, DBA
Marostegui assigned T214840: db2085/db1106 don't boot with 4.9.0-8-amd64 to Papaul.

After power cycling db2085, this is what happened:

Feb 13 2019, 8:59 AM · ops-codfw, Patch-For-Review, SRE, DBA
Marostegui added a comment to T214840: db2085/db1106 don't boot with 4.9.0-8-amd64.

db2085 got stuck when booting up on:

[    0.560579] x86: Booting SMP configuration:
[    0.565246] .... node  #1, CPUs:        #1
[    0.674090] .... node  #0, CPUs:    #2
Feb 13 2019, 8:40 AM · ops-codfw, Patch-For-Review, SRE, DBA
Marostegui added a comment to T214840: db2085/db1106 don't boot with 4.9.0-8-amd64.

db2085 reboots with 4.9.0-8 with debug enabled:

Feb 13 2019, 8:24 AM · ops-codfw, Patch-For-Review, SRE, DBA
Marostegui added a comment to T214840: db2085/db1106 don't boot with 4.9.0-8-amd64.

db2085 reboots with 4.9.0-7 with debug enabled - all fine:

Feb 13 2019, 8:04 AM · ops-codfw, Patch-For-Review, SRE, DBA
Marostegui added a comment to T214840: db2085/db1106 don't boot with 4.9.0-8-amd64.

db2085: debug added to the kernel boot, to see if we catch something

	linux	/boot/vmlinuz-4.9.0-7-amd64 root=UUID=63e5ddbd-3c18-4bf5-ad22-88458ec175b7 ro ixgbe.allow_unsupported_sfp=1 console=ttyS1,115200n8 elevator=deadline debug
Feb 13 2019, 7:27 AM · ops-codfw, Patch-For-Review, SRE, DBA
Marostegui added a comment to T214840: db2085/db1106 don't boot with 4.9.0-8-amd64.

db2085 current BIOS setting:

Captura de pantalla 2019-02-13 a las 8.00.50.png (211×557 px, 30 KB)

Feb 13 2019, 7:01 AM · ops-codfw, Patch-For-Review, SRE, DBA
Marostegui added a comment to T214840: db2085/db1106 don't boot with 4.9.0-8-amd64.

db2085 with kernel 4.9.0-7-amd64 reboots, another FAIL at the 6th and 7th reboot (similar patter as with kernel -9 at T214840#4948016):

Feb 13 2019, 6:56 AM · ops-codfw, Patch-For-Review, SRE, DBA
Marostegui added a comment to T214840: db2085/db1106 don't boot with 4.9.0-8-amd64.

db2085:
So I can confirm that the BIOS setting for Serial Communication is being sent to COM2 (which is ttyS1).
Which is the same as:

linux   /boot/vmlinuz-4.9.0-7-amd64 root=UUID=63e5ddbd-3c18-4bf5-ad22-88458ec175b7 ro ixgbe.allow_unsupported_sfp=1 console=ttyS1,115200n8 elevator=deadline
Feb 13 2019, 6:37 AM · ops-codfw, Patch-For-Review, SRE, DBA

Feb 12 2019

Marostegui added a comment to T214840: db2085/db1106 don't boot with 4.9.0-8-amd64.

@MoritzMuehlenhoff has removed -8 kernel from db2085 and I have rebooted it 8 times with -7 now

Feb 12 2019, 5:02 PM · ops-codfw, Patch-For-Review, SRE, DBA
Marostegui added a comment to T214840: db2085/db1106 don't boot with 4.9.0-8-amd64.

After restarting with the previous kernel 4.9.0-7-amd64 on db2085, the first time it didn't boot up, the second time it did.

Feb 12 2019, 4:32 PM · ops-codfw, Patch-For-Review, SRE, DBA
Marostegui added a comment to T214840: db2085/db1106 don't boot with 4.9.0-8-amd64.

@MoritzMuehlenhoff has installed 4.9.144-3 on db2085.
Out of 8 reboots, two of them got stuck (in a row).
1st reboot by @MoritzMuehlenhoff OK
2nd reboot by @MoritzMuehlenhoff OK
3rd reboot by @Marostegui OK
4th reboot by @Marostegui OK
5th reboot by @Marostegui OK
6th reboot by @Marostegui FAIL
7th reboot by @Marostegui FAIL
8th reboot by @Marostegui OK

Feb 12 2019, 4:11 PM · ops-codfw, Patch-For-Review, SRE, DBA
Marostegui updated the task description for T211613: rack/setup/install db11[26-38].eqiad.wmnet.
Feb 12 2019, 3:32 PM · Goal, DBA, ops-eqiad, User-Marostegui, SRE
Marostegui added a comment to T215589: Migrate users to dbstore100[3-5].

Read only time would be around 16h (T210478#4942371)

Feb 12 2019, 9:42 AM · User-Marostegui, Analytics-Kanban, Analytics