Page MenuHomePhabricator

dbstore2002 s2 crashed
Closed, ResolvedPublic

Description

dbstore2002 s2 instance crashed:

1Aug 24 09:10:59 dbstore2002 mysqld[3550]: 2018-08-24 9:10:59 140612790758144 [ERROR] Slave I/O: error reconnecting to master 'repl@db2035.codfw.wmnet:3306' - retry-time: 60 maximum-retries: 86400 message: Can't connect to MySQL server on 'db2035.codfw.wmnet' (111 "Connection refused"), Internal MariaDB error code: 2003
2Aug 24 09:12:59 dbstore2002 mysqld[3550]: 2018-08-24 9:12:59 140612790758144 [Note] Slave: connected to master 'repl@db2035.codfw.wmnet:3306',replication resumed in log 'db2035-bin.004846' at position 924796334
3Sep 17 17:23:58 dbstore2002 mysqld[3550]: InnoDB: tried to purge sec index entry not marked for deletion in
4Sep 17 17:23:58 dbstore2002 mysqld[3550]: InnoDB: index "cl_sortkey" of table "nowiki"."categorylinks"
5Sep 17 17:23:58 dbstore2002 mysqld[3550]: InnoDB: tuple DATA TUPLE: 4 fields;
6Sep 17 17:23:58 dbstore2002 mysqld[3550]: 0: len 36; hex 417274696b6c65725f6d65645f66696c6d6c656e6b65725f6672615f57696b6964617461; asc Artikler_med_filmlenker_fra_Wikidata;;
7Sep 17 17:23:58 dbstore2002 mysqld[3550]: 1: len 1; hex 01; asc ;;
8Sep 17 17:23:58 dbstore2002 mysqld[3550]: 2: len 25; hex 2b454b4f314333454b044d4f393f3f37314f3143011801dc17; asc +EKO1C3EK MO9??71O1C ;;
9Sep 17 17:23:58 dbstore2002 mysqld[3550]: 3: len 4; hex 00058d34; asc 4;;
10Sep 17 17:23:58 dbstore2002 mysqld[3550]: InnoDB: record PHYSICAL RECORD: n_fields 4; compact format; info bits 0
11Sep 17 17:23:58 dbstore2002 mysqld[3550]: 0: len 30; hex 417274696b6c65725f6d65645f66696c6d6c656e6b65725f6672615f5769; asc Artikler_med_filmlenker_fra_Wi; (total 36 bytes);
12Sep 17 17:23:58 dbstore2002 mysqld[3550]: 1: len 1; hex 01; asc ;;
13Sep 17 17:23:58 dbstore2002 mysqld[3550]: 2: len 25; hex 2b454b4f314333454b044d4f393f3f37314f3143011801dc17; asc +EKO1C3EK MO9??71O1C ;;
14Sep 17 17:23:58 dbstore2002 mysqld[3550]: 3: len 4; hex 00058d34; asc 4;;
15Sep 17 17:25:05 dbstore2002 mysqld[3550]: 2018-09-17 17:25:05 7fe2f76d0b00 InnoDB: Assertion failure in thread 140612790455040 in file row0ins.cc line 285
16Sep 17 17:25:05 dbstore2002 mysqld[3550]: InnoDB: Failing assertion: *cursor->index->name == TEMP_INDEX_PREFIX
17Sep 17 17:25:05 dbstore2002 mysqld[3550]: InnoDB: We intentionally generate a memory trap.
18Sep 17 17:25:05 dbstore2002 mysqld[3550]: InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
19Sep 17 17:25:05 dbstore2002 mysqld[3550]: InnoDB: If you get repeated assertion failures or crashes, even
20Sep 17 17:25:05 dbstore2002 mysqld[3550]: InnoDB: immediately after the mysqld startup, there may be
21Sep 17 17:25:05 dbstore2002 mysqld[3550]: InnoDB: corruption in the InnoDB tablespace. Please refer to
22Sep 17 17:25:05 dbstore2002 mysqld[3550]: InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
23Sep 17 17:25:05 dbstore2002 mysqld[3550]: InnoDB: about forcing recovery.
24Sep 17 17:25:05 dbstore2002 mysqld[3550]: 180917 17:25:05 [ERROR] mysqld got signal 6 ;
25Sep 17 17:25:05 dbstore2002 mysqld[3550]: This could be because you hit a bug. It is also possible that this binary
26Sep 17 17:25:05 dbstore2002 mysqld[3550]: or one of the libraries it was linked against is corrupt, improperly built,
27Sep 17 17:25:05 dbstore2002 mysqld[3550]: or misconfigured. This error can also be caused by malfunctioning hardware.
28Sep 17 17:25:05 dbstore2002 mysqld[3550]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
29Sep 17 17:25:05 dbstore2002 mysqld[3550]: We will try our best to scrape up some info that will hopefully help
30Sep 17 17:25:05 dbstore2002 mysqld[3550]: diagnose the problem, but since we have already crashed,
31Sep 17 17:25:05 dbstore2002 mysqld[3550]: something is definitely wrong and this may fail.
32Sep 17 17:25:05 dbstore2002 mysqld[3550]: Server version: 10.1.35-MariaDB
33Sep 17 17:25:05 dbstore2002 mysqld[3550]: key_buffer_size=1048576
34Sep 17 17:25:05 dbstore2002 mysqld[3550]: read_buffer_size=131072
35Sep 17 17:25:05 dbstore2002 mysqld[3550]: max_used_connections=20
36Sep 17 17:25:05 dbstore2002 mysqld[3550]: max_threads=252
37Sep 17 17:25:05 dbstore2002 mysqld[3550]: thread_count=15
38Sep 17 17:25:05 dbstore2002 mysqld[3550]: It is possible that mysqld could use up to
39Sep 17 17:25:05 dbstore2002 mysqld[3550]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 554601 K bytes of memory
40Sep 17 17:25:05 dbstore2002 mysqld[3550]: Hope that's ok; if not, decrease some variables in the equation.
41Sep 17 17:25:05 dbstore2002 mysqld[3550]: Thread pointer: 0x7fdd7c42b008
42Sep 17 17:25:05 dbstore2002 mysqld[3550]: Attempting backtrace. You can use the following information to find out
43Sep 17 17:25:05 dbstore2002 mysqld[3550]: where mysqld died. If you see no messages after this, something went
44Sep 17 17:25:05 dbstore2002 mysqld[3550]: terribly wrong...
45Sep 17 17:25:05 dbstore2002 mysqld[3550]: stack_bottom = 0x7fe2f76cf838 thread_stack 0x48400
46Sep 17 17:25:06 dbstore2002 mysqld[3550]: *** buffer overflow detected ***: /opt/wmf-mariadb101/bin/mysqld terminated
47Sep 17 17:25:06 dbstore2002 mysqld[3550]: ======= Backtrace: =========
48Sep 17 17:25:06 dbstore2002 mysqld[3550]: /lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7fe3051a7bfb]
49Sep 17 17:25:06 dbstore2002 mysqld[3550]: /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7fe3052301f7]
50Sep 17 17:25:06 dbstore2002 mysqld[3550]: /lib/x86_64-linux-gnu/libc.so.6(+0xf7330)[0x7fe30522e330]
51Sep 17 17:25:06 dbstore2002 mysqld[3550]: /lib/x86_64-linux-gnu/libc.so.6(+0xf916a)[0x7fe30523016a]
52Sep 17 17:25:06 dbstore2002 mysqld[3550]: /opt/wmf-mariadb101/bin/mysqld(my_addr_resolve+0xd8)[0x562689cabcd8]
53Sep 17 17:25:06 dbstore2002 mysqld[3550]: /opt/wmf-mariadb101/bin/mysqld(my_print_stacktrace+0x1bb)[0x562689c9502b]
54Sep 17 17:25:06 dbstore2002 mysqld[3550]: /opt/wmf-mariadb101/bin/mysqld(handle_fatal_signal+0x3bd)[0x5626897d62fd]
55Sep 17 17:25:06 dbstore2002 mysqld[3550]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x110c0)[0x7fe306a880c0]
56Sep 17 17:25:06 dbstore2002 mysqld[3550]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcf)[0x7fe305169fff]
57Sep 17 17:25:06 dbstore2002 mysqld[3550]: /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7fe30516b42a]
58Sep 17 17:25:06 dbstore2002 mysqld[3550]: /opt/wmf-mariadb101/bin/mysqld(+0x8dbd90)[0x562689b01d90]
59Sep 17 17:25:06 dbstore2002 mysqld[3550]: /opt/wmf-mariadb101/bin/mysqld(+0x8dde8b)[0x562689b03e8b]
60Sep 17 17:25:06 dbstore2002 mysqld[3550]: /opt/wmf-mariadb101/bin/mysqld(+0x8de2b4)[0x562689b042b4]
61Sep 17 17:25:06 dbstore2002 mysqld[3550]: /opt/wmf-mariadb101/bin/mysqld(+0x8ea4f7)[0x562689b104f7]
62Sep 17 17:25:06 dbstore2002 mysqld[3550]: /opt/wmf-mariadb101/bin/mysqld(+0x83f615)[0x562689a65615]
63Sep 17 17:25:06 dbstore2002 mysqld[3550]: /opt/wmf-mariadb101/bin/mysqld(_ZN7handler12ha_write_rowEPh+0x4cf)[0x5626897e0eff]
64Sep 17 17:25:06 dbstore2002 mysqld[3550]: /opt/wmf-mariadb101/bin/mysqld(_Z12write_recordP3THDP5TABLEP12st_copy_info+0x72)[0x562689622af2]
65Sep 17 17:25:06 dbstore2002 mysqld[3550]: /opt/wmf-mariadb101/bin/mysqld(_Z12mysql_insertP3THDP10TABLE_LISTR4ListI4ItemERS3_IS5_ES6_S6_15enum_duplicatesb+0x1216)[0x56268962ce66]
66Sep 17 17:25:06 dbstore2002 mysqld[3550]: /opt/wmf-mariadb101/bin/mysqld(_Z21mysql_execute_commandP3THD+0x3a52)[0x562689641bc2]
67Sep 17 17:25:06 dbstore2002 mysqld[3550]: /opt/wmf-mariadb101/bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x311)[0x5626896473b1]
68Sep 17 17:25:06 dbstore2002 mysqld[3550]: /opt/wmf-mariadb101/bin/mysqld(_ZN15Query_log_event14do_apply_eventEP14rpl_group_infoPKcj+0x131d)[0x5626898a2d5d]
69Sep 17 17:25:06 dbstore2002 mysqld[3550]: /opt/wmf-mariadb101/bin/mysqld(+0x39903b)[0x5626895bf03b]
70Sep 17 17:25:06 dbstore2002 mysqld[3550]: /opt/wmf-mariadb101/bin/mysqld(handle_slave_sql+0x2bf3)[0x5626895ca2c3]
71Sep 17 17:25:07 dbstore2002 systemd[1]: mariadb@s2.service: Main process exited, code=killed, status=6/ABRT
72Sep 17 17:25:07 dbstore2002 systemd[1]: mariadb@s2.service: Unit entered failed state.
73Sep 17 17:25:07 dbstore2002 systemd[1]: mariadb@s2.service: Failed with result 'signal'.
74Sep 17 17:25:13 dbstore2002 systemd[1]: mariadb@s2.service: Service hold-off time over, scheduling restart.
75Sep 17 17:25:13 dbstore2002 systemd[1]: Stopped mariadb database server.
76Sep 17 17:25:13 dbstore2002 systemd[1]: Starting mariadb database server...
77Sep 17 17:25:13 dbstore2002 mysqld[23213]: 2018-09-17 17:25:13 140666393803008 [Note] /opt/wmf-mariadb101/bin/mysqld (mysqld 10.1.35-MariaDB) starting as process 23213 ...
78Sep 17 17:25:14 dbstore2002 mysqld[23213]: 2018-09-17 17:25:14 140666393803008 [ERROR] Plugin 'unix_socket' already installed
79Sep 17 17:25:14 dbstore2002 mysqld[23213]: 2018-09-17 17:25:14 7fef726f3900 InnoDB: Warning: Using innodb_locks_unsafe_for_binlog is DEPRECATED. This option may be removed in future releases. Please use READ COMMITTED transaction isolation level instead, see http://dev.mysql.com/doc/refman/5.6/en/set-transaction.html.
80Sep 17 17:25:14 dbstore2002 mysqld[23213]: 2018-09-17 17:25:14 140666393803008 [Note] InnoDB: Using mutexes to ref count buffer pool pages
81Sep 17 17:25:14 dbstore2002 mysqld[23213]: 2018-09-17 17:25:14 140666393803008 [Note] InnoDB: The InnoDB memory heap is disabled
82Sep 17 17:25:14 dbstore2002 mysqld[23213]: 2018-09-17 17:25:14 140666393803008 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
83Sep 17 17:25:14 dbstore2002 mysqld[23213]: 2018-09-17 17:25:14 140666393803008 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
84Sep 17 17:25:14 dbstore2002 mysqld[23213]: 2018-09-17 17:25:14 140666393803008 [Note] InnoDB: Compressed tables use zlib 1.2.3
85Sep 17 17:25:14 dbstore2002 mysqld[23213]: 2018-09-17 17:25:14 140666393803008 [Note] InnoDB: Using Linux native AIO
86Sep 17 17:25:14 dbstore2002 mysqld[23213]: 2018-09-17 17:25:14 140666393803008 [Note] InnoDB: Using SSE crc32 instructions
87Sep 17 17:25:14 dbstore2002 mysqld[23213]: 2018-09-17 17:25:14 140666393803008 [Note] InnoDB: Initializing buffer pool, size = 20.0G
88Sep 17 17:25:15 dbstore2002 mysqld[23213]: 2018-09-17 17:25:15 140666393803008 [Note] InnoDB: Completed initialization of buffer pool
89Sep 17 17:25:15 dbstore2002 mysqld[23213]: 2018-09-17 17:25:15 140666393803008 [Note] InnoDB: Highest supported file format is Barracuda.
90Sep 17 17:25:15 dbstore2002 mysqld[23213]: 2018-09-17 17:25:15 140666393803008 [Note] InnoDB: Starting crash recovery from checkpoint LSN=34186413407073
91Sep 17 17:25:31 dbstore2002 mysqld[23213]: 2018-09-17 17:25:31 140666393803008 [Note] InnoDB: Processed 1650 .ibd/.isl files
92Sep 17 17:25:32 dbstore2002 mysqld[23213]: 2018-09-17 17:25:32 140666393803008 [Note] InnoDB: Restoring possible half-written data pages from the doublewrite buffer...
93Sep 17 17:25:32 dbstore2002 mysqld[23213]: 2018-09-17 17:25:32 140666393803008 [Note] InnoDB: Read redo log up to LSN=34186413537792
94Sep 17 17:25:47 dbstore2002 mysqld[23213]: 2018-09-17 17:25:47 140666393803008 [Note] InnoDB: To recover: 194027 pages from log
95Sep 17 17:25:48 dbstore2002 mysqld[23213]: InnoDB: 1 transaction(s) which must be rolled back or cleaned up
96Sep 17 17:25:48 dbstore2002 mysqld[23213]: InnoDB: in total 1 row operations to undo
97Sep 17 17:25:48 dbstore2002 mysqld[23213]: InnoDB: Trx id counter is 43927584256
98Sep 17 17:25:48 dbstore2002 mysqld[23213]: 2018-09-17 17:25:48 140666393803008 [Note] InnoDB: Starting final batch to recover 193656 pages from redo log
99Sep 17 17:26:02 dbstore2002 mysqld[23213]: 2018-09-17 17:26:02 140642936145664 [Note] InnoDB: To recover: 160469 pages from log
100Sep 17 17:26:17 dbstore2002 mysqld[23213]: 2018-09-17 17:26:17 140642927752960 [Note] InnoDB: To recover: 116522 pages from log
101Sep 17 17:26:32 dbstore2002 mysqld[23213]: 2018-09-17 17:26:32 140642919360256 [Note] InnoDB: To recover: 71081 pages from log
102Sep 17 17:26:47 dbstore2002 mysqld[23213]: 2018-09-17 17:26:47 140642927752960 [Note] InnoDB: To recover: 33458 pages from log
103Sep 17 17:26:59 dbstore2002 mysqld[23213]: InnoDB: In a MySQL replication slave the last master binlog file
104Sep 17 17:26:59 dbstore2002 mysqld[23213]: InnoDB: position 0 713880055, file name db1024-bin.000534
105Sep 17 17:26:59 dbstore2002 mysqld[23213]: InnoDB: Last MySQL binlog file position 0 140307185, file name ./db2056-bin.002629
106Sep 17 17:27:24 dbstore2002 mysqld[23213]: 2018-09-17 17:27:24 140666393803008 [Note] InnoDB: 128 rollback segment(s) are active.
107Sep 17 17:27:24 dbstore2002 mysqld[23213]: 2018-09-17 17:27:24 140642516723456 [Note] InnoDB: Starting in background the rollback of recovered transactions
108Sep 17 17:27:24 dbstore2002 mysqld[23213]: 2018-09-17 17:27:24 140666393803008 [Note] InnoDB: Waiting for purge to start
109Sep 17 17:27:24 dbstore2002 mysqld[23213]: 2018-09-17 17:27:24 140642516723456 [Note] InnoDB: To roll back: 1 transactions, 1 rows
110Sep 17 17:27:24 dbstore2002 mysqld[23213]: 2018-09-17 17:27:24 140666393803008 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.39-83.1 started; log sequence number 34188143428820
111Sep 17 17:27:24 dbstore2002 mysqld[23213]: 2018-09-17 17:27:24 140642516723456 [Note] InnoDB: Rollback of trx with id 43927583911 completed
112Sep 17 17:27:24 dbstore2002 mysqld[23213]: 2018-09-17 17:27:24 140642516723456 [Note] InnoDB: Rollback of non-prepared transactions completed
113Sep 17 17:28:52 dbstore2002 mysqld[23213]: 2018-09-17 17:28:52 140642516723456 [Note] InnoDB: Dumping buffer pool(s) not yet started
114Sep 17 17:28:52 dbstore2002 mysqld[23213]: 2018-09-17 17:28:52 7fe9e33fe700 InnoDB: Loading buffer pool(s) from .//ib_buffer_pool
115Sep 17 17:28:52 dbstore2002 mysqld[23213]: 2018-09-17 17:28:52 140666393803008 [Note] Plugin 'FEEDBACK' is disabled.
116Sep 17 17:28:52 dbstore2002 mysqld[23213]: 2018-09-17 17:28:52 140666393803008 [Note] Recovering after a crash using tc.log
117Sep 17 17:28:52 dbstore2002 mysqld[23213]: 2018-09-17 17:28:52 140666393803008 [Note] Starting crash recovery...
118Sep 17 17:28:52 dbstore2002 mysqld[23213]: 2018-09-17 17:28:52 140666393803008 [Note] Crash recovery finished.
119Sep 17 17:28:52 dbstore2002 mysqld[23213]: 2018-09-17 17:28:52 140666393803008 [Note] Server socket created on IP: '::'.
120Sep 17 17:28:52 dbstore2002 mysqld[23213]: 2018-09-17 17:28:52 140666393803008 [ERROR] mysqld: Table './mysql/user' is marked as crashed and should be repaired
121Sep 17 17:28:52 dbstore2002 mysqld[23213]: 2018-09-17 17:28:52 140666393803008 [Warning] Checking table: './mysql/user'
122Sep 17 17:28:52 dbstore2002 mysqld[23213]: 2018-09-17 17:28:52 140666393803008 [ERROR] mysql.user: 1 client is using or hasn't closed the table properly
123Sep 17 17:28:53 dbstore2002 mysqld[23213]: 2018-09-17 17:28:53 140666393803008 [ERROR] mysqld: Table './mysql/db' is marked as crashed and should be repaired
124Sep 17 17:28:53 dbstore2002 mysqld[23213]: 2018-09-17 17:28:53 140666393803008 [Warning] Checking table: './mysql/db'
125Sep 17 17:28:53 dbstore2002 mysqld[23213]: 2018-09-17 17:28:53 140666393803008 [ERROR] mysql.db: 1 client is using or hasn't closed the table properly
126Sep 17 17:28:53 dbstore2002 mysqld[23213]: 2018-09-17 17:28:53 140666393803008 [ERROR] mysqld: Table './mysql/event' is marked as crashed and should be repaired
127Sep 17 17:28:53 dbstore2002 mysqld[23213]: 2018-09-17 17:28:53 140666393803008 [Warning] Checking table: './mysql/event'
128Sep 17 17:28:53 dbstore2002 mysqld[23213]: 2018-09-17 17:28:53 140666393803008 [ERROR] mysql.event: 1 client is using or hasn't closed the table properly
129Sep 17 17:28:53 dbstore2002 mysqld[23213]: 2018-09-17 17:28:53 140666393242368 [Note] Event Scheduler: scheduler thread started with id 1
130Sep 17 17:28:53 dbstore2002 mysqld[23213]: 2018-09-17 17:28:53 140666393803008 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--log-basename=#' or '--relay-log=dbstore2002-relay-bin' to avoid this problem.
131Sep 17 17:28:53 dbstore2002 mysqld[23213]: 2018-09-17 17:28:53 140666393803008 [Note] /opt/wmf-mariadb101/bin/mysqld: ready for connections.
132Sep 17 17:28:53 dbstore2002 mysqld[23213]: Version: '10.1.35-MariaDB' socket: '/run/mysqld/mysqld.s2.sock' port: 3312 MariaDB Server
133Sep 17 17:28:53 dbstore2002 systemd[1]: Started mariadb database server.
134Sep 17 17:31:32 dbstore2002 mysqld[23213]: 2018-09-17 17:31:32 7fe9e33fe700 InnoDB: Buffer pool(s) load completed at 180917 17:31:32

https://grafana.wikimedia.org/dashboard/db/mysql?orgId=1&from=1537192110814&to=1537213710814&var-dc=codfw%20prometheus%2Fops&var-server=dbstore2002&var-port=13312
https://grafana.wikimedia.org/dashboard/db/prometheus-machine-stats?orgId=1&var-server=dbstore2002&var-datasource=codfw%20prometheus%2Fops&from=1537192187956&to=1537213727956

Event Timeline

Nothing on HW logs that could indicate something is wrong with storage or similar.

Triaging this as high as it is the backup source.
I think we should just reimport s2 there.

I will do that, I'll see from where.

Change 461098 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool db2041 to recover dbstore2002 s2

https://gerrit.wikimedia.org/r/461098

Change 461098 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool db2041 to recover dbstore2002 s2

https://gerrit.wikimedia.org/r/461098

Cloning is ongoing from db2041 to dbstore2002 /srv/sqldata, will rename later to /srv/sqldata.s2 and restart replication there quickly. After the backups run, some compression process will have to be rerun, which may take quite a lot on the dbstore host with so much load.

jcrespo lowered the priority of this task from High to Medium.

dbstore2002:s2 is up and running (althought still catching up replication), enough for the backups to run in codfw unaltered.

Grants have been altered to allow the dump process.

However dbstore2002 is at 93% disk usage, we need to compress the tables to go back to a reasonable size. It will be done after the backups run tonight.

This oneliner will comress the tables from the direction of the smallest towards the largest:

mysql -BN -S /run/mysqld/mysqld.s2.sock -e "SELECT table_schema, table_name FROM information_Schema.tables WHERE engine='INNODB' and row_format <> 'COMPRESSED' ORDER BY DATA_LENGTH ASC" | while read db table; do mysql --skip-ssl --socket /run/mysqld/mysqld.s2.sock -e "ALTER TABLE $db.$table ROW_FORMAT=COMPRESSED;"; done

It shall be run inside a tmux/screen

This oneliner will comress the tables from the direction of the smallest towards the largest:

mysql -BN -S /run/mysqld/mysqld.s2.sock -e "SELECT table_schema, table_name FROM information_Schema.tables WHERE engine='INNODB' and row_format <> 'COMPRESSED' ORDER BY DATA_LENGTH ASC" | while read db table; do mysql --skip-ssl --socket /run/mysqld/mysqld.s2.sock -e "ALTER TABLE $db.$table ROW_FORMAT=COMPRESSED;"; done

It shall be run inside a tmux/screen

That looks good to me, but you might want to add: set session sql_log_bin=0 - I tend to use it to avoid dealing with GTID and related things, just in case, as we don't trust MariaDB GTID much :-)

and stop replication beforehand!

mysql -BN -S /run/mysqld/mysqld.s2.sock -e "STOP SLAVE";
mysql -BN -S /run/mysqld/mysqld.s2.sock -e "SELECT table_schema, table_name FROM information_Schema.tables WHERE engine='INNODB' and row_format <> 'COMPRESSED' ORDER BY DATA_LENGTH ASC" | while read db table; do mysql --skip-ssl --socket /run/mysqld/mysqld.s2.sock -e "SET SESSION sql_log_bin=0; ALTER TABLE $db.$table ROW_FORMAT=COMPRESSED, FORCE"; done
mysql -BN -S /run/mysqld/mysqld.s2.sock -e "START SLAVE";

The tables are being compressed

A quick not before I forgot: before the compression begun, we had 470G free space on that host.

The compression stopped and the replication was resumed.
The compression will continue after the tomorrow's backup is done.

The disk performs pretty slow, and the reason might be T205257
Until replication catch up I enabled the write cache with

hpssacli ctrl slot=0 modify dwc=enable
jcrespo lowered the priority of this task from Medium to Low.Sep 24 2018, 2:56 PM

This certainly now have a lower priority as most of the compression needed to not run out of space was solved and the instance was rebuilt. You can also close it as resolved and create a lower priority ticket to review missing compression on dbstore200X tables.

Until replication catch up I enabled the write cache with

Given that, feel free also to disable binlog_sync and inodb_FLATC as long as you remember to reenable them when caught up.

This certainly now have a lower priority as most of the compression needed to not run out of space was solved and the instance was rebuilt. You can also close it as resolved and create a lower priority ticket to review missing compression on dbstore200X tables.

That is already done! :-) T204930

sync_binlog was already disabled, and I don't found innodb_flatc

You can also close it as resolved and create a lower priority ticket to review missing compression on dbstore200X tables.

+1 to close this as resolved and follow up on T204930

The original problem solved, the compression part will continue in T204930