Page MenuHomePhabricator

Enable replication eqiad -> codfw and other checks
Closed, ResolvedPublic

Description

This needs to be done before we switch back from codfw to eqiad:

Enable replication on codfw masters from eqiad on (with GTID disabled):

  • s1
  • s2
  • s3
  • s4
  • s5
  • s6
  • s7
  • s8

Double check the following sections have it enabled (it has not been disconnected):

  • x1
  • es4
  • es5
  • pc1
  • pc2
  • pc3
  • Check and disable GTID on codfw/eqiad masters for the above sections.
  • Check that all eqiad slaves have GTID enabled
  • Check which notifications are disabled for eqiad hosts
  • Check event scheduler is enabled on eqiad hosts
  • Check that query killers are installed and enabled on eqiad hosts
  • Check semisync
  • Compare codfw - eqiad tables (T261914#6571134) as done in T260042
    • s1
    • s2
    • s3
    • s4
    • s5
    • s6
    • s7
    • s8
  • Check all eqiad hosts are pooled and check their weights.
  • Warm up tables on the following sections (s1-s8, x1, pc1,pc2,pc3, es1, es2, es3, es4, es5)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 3 2020, 6:31 AM
Marostegui triaged this task as Medium priority.Sep 3 2020, 6:31 AM
Marostegui moved this task from Triage to Next on the DBA board.
Marostegui updated the task description. (Show Details)

Replication eqiad -> codfw was disconnected, for the record these are the coordinates from codfw masters :

1s1:
2
3root@db2112.codfw.wmnet[(none)]> show slave status\G
4*************************** 1. row ***************************
5 Slave_IO_State:
6 Master_Host: db1083.eqiad.wmnet
7 Master_User: repl
8 Master_Port: 3306
9 Connect_Retry: 60
10 Master_Log_File: db1083-bin.005389
11 Read_Master_Log_Pos: 208315905
12 Relay_Log_File: db2112-relay-bin.000054
13 Relay_Log_Pos: 1687558
14 Relay_Master_Log_File: db1083-bin.005389
15 Slave_IO_Running: No
16 Slave_SQL_Running: No
17 Replicate_Do_DB:
18 Replicate_Ignore_DB:
19 Replicate_Do_Table:
20 Replicate_Ignore_Table:
21 Replicate_Wild_Do_Table:
22 Replicate_Wild_Ignore_Table:
23 Last_Errno: 0
24 Last_Error:
25 Skip_Counter: 0
26 Exec_Master_Log_Pos: 208315905
27 Relay_Log_Space: 1687901
28 Until_Condition: None
29 Until_Log_File:
30 Until_Log_Pos: 0
31 Master_SSL_Allowed: Yes
32 Master_SSL_CA_File:
33 Master_SSL_CA_Path:
34 Master_SSL_Cert:
35 Master_SSL_Cipher:
36 Master_SSL_Key:
37 Seconds_Behind_Master: NULL
38Master_SSL_Verify_Server_Cert: No
39 Last_IO_Errno: 0
40 Last_IO_Error:
41 Last_SQL_Errno: 0
42 Last_SQL_Error:
43 Replicate_Ignore_Server_Ids:
44 Master_Server_Id: 171970661
45 Master_SSL_Crl:
46 Master_SSL_Crlpath:
47 Using_Gtid: No
48 Gtid_IO_Pos: 180355171-180355171-148310907,180363268-180363268-30421226,171970637-171970637-2116621969,180363372-180363372-23365506,171970661-171970661-2220951585,171974720-171974720-2572451842,180359172-180359172-49702203,171978774-171978774-5,0-171970637-5484646134
49 Replicate_Do_Domain_Ids:
50 Replicate_Ignore_Domain_Ids:
51 Parallel_Mode: conservative
521 row in set (0.034 sec)
53
54root@db2112.codfw.wmnet[(none)]> reset slave all;
55Query OK, 0 rows affected (0.035 sec)
56
57
58
59
60s2
61
62root@db2107.codfw.wmnet[(none)]> stop slave;
63Query OK, 0 rows affected (0.036 sec)
64
65root@db2107.codfw.wmnet[(none)]> show slave status\G
66*************************** 1. row ***************************
67 Slave_IO_State:
68 Master_Host: db1122.eqiad.wmnet
69 Master_User: repl
70 Master_Port: 3306
71 Connect_Retry: 60
72 Master_Log_File: db1122-bin.002213
73 Read_Master_Log_Pos: 554936297
74 Relay_Log_File: db2107-relay-bin.000034
75 Relay_Log_Pos: 8901944
76 Relay_Master_Log_File: db1122-bin.002213
77 Slave_IO_Running: No
78 Slave_SQL_Running: No
79 Replicate_Do_DB:
80 Replicate_Ignore_DB:
81 Replicate_Do_Table:
82 Replicate_Ignore_Table:
83 Replicate_Wild_Do_Table:
84 Replicate_Wild_Ignore_Table:
85 Last_Errno: 0
86 Last_Error:
87 Skip_Counter: 0
88 Exec_Master_Log_Pos: 554936297
89 Relay_Log_Space: 8902260
90 Until_Condition: None
91 Until_Log_File:
92 Until_Log_Pos: 0
93 Master_SSL_Allowed: Yes
94 Master_SSL_CA_File:
95 Master_SSL_CA_Path:
96 Master_SSL_Cert:
97 Master_SSL_Cipher:
98 Master_SSL_Key:
99 Seconds_Behind_Master: NULL
100Master_SSL_Verify_Server_Cert: No
101 Last_IO_Errno: 0
102 Last_IO_Error:
103 Last_SQL_Errno: 0
104 Last_SQL_Error:
105 Replicate_Ignore_Server_Ids:
106 Master_Server_Id: 171978786
107 Master_SSL_Crl:
108 Master_SSL_Crlpath:
109 Using_Gtid: No
110 Gtid_IO_Pos: 171966670-171966670-2410812544,0-180359173-4880477695,180359173-180359173-70825087,171978786-171978786-2084954068,180359271-180359271-33539640,171966574-171966574-2221092918,180359241-180359241-121693516,180363270-180363270-170,171970567-171970567-390719905
111 Replicate_Do_Domain_Ids:
112 Replicate_Ignore_Domain_Ids:
113 Parallel_Mode: conservative
1141 row in set (0.034 sec)
115
116root@db2107.codfw.wmnet[(none)]> reset slave all;
117Query OK, 0 rows affected (0.038 sec)
118
119
120s3
121
122root@db2105.codfw.wmnet[(none)]> stop slave;
123Query OK, 0 rows affected (0.035 sec)
124
125root@db2105.codfw.wmnet[(none)]> show slave status\G
126*************************** 1. row ***************************
127 Slave_IO_State:
128 Master_Host: db1123.eqiad.wmnet
129 Master_User: repl
130 Master_Port: 3306
131 Connect_Retry: 60
132 Master_Log_File: db1123-bin.003387
133 Read_Master_Log_Pos: 173749265
134 Relay_Log_File: db2105-relay-bin.000052
135 Relay_Log_Pos: 2052200
136 Relay_Master_Log_File: db1123-bin.003387
137 Slave_IO_Running: No
138 Slave_SQL_Running: No
139 Replicate_Do_DB:
140 Replicate_Ignore_DB:
141 Replicate_Do_Table:
142 Replicate_Ignore_Table:
143 Replicate_Wild_Do_Table:
144 Replicate_Wild_Ignore_Table:
145 Last_Errno: 0
146 Last_Error:
147 Skip_Counter: 0
148 Exec_Master_Log_Pos: 173749265
149 Relay_Log_Space: 2052543
150 Until_Condition: None
151 Until_Log_File:
152 Until_Log_Pos: 0
153 Master_SSL_Allowed: Yes
154 Master_SSL_CA_File:
155 Master_SSL_CA_Path:
156 Master_SSL_Cert:
157 Master_SSL_Cipher:
158 Master_SSL_Key:
159 Seconds_Behind_Master: NULL
160Master_SSL_Verify_Server_Cert: No
161 Last_IO_Errno: 0
162 Last_IO_Error:
163 Last_SQL_Errno: 0
164 Last_SQL_Error:
165 Replicate_Ignore_Server_Ids:
166 Master_Server_Id: 171978787
167 Master_SSL_Crl:
168 Master_SSL_Crlpath:
169 Using_Gtid: No
170 Gtid_IO_Pos: 171978787-171978787-1981135417,0-171966669-4075108480,171966669-171966669-4196523483,171974792-171974792-378345284,180363271-180363271-245332,180355192-180355192-32955270,180363367-180363367-134174373,180359174-180359174-94123433
171 Replicate_Do_Domain_Ids:
172 Replicate_Ignore_Domain_Ids:
173 Parallel_Mode: conservative
1741 row in set (0.034 sec)
175
176root@db2105.codfw.wmnet[(none)]> reset slave all;
177Query OK, 0 rows affected (0.036 sec)
178
179
180s4
181
182root@db2090.codfw.wmnet[(none)]> stop slave;
183Query OK, 0 rows affected (0.038 sec)
184
185root@db2090.codfw.wmnet[(none)]> show slave status\G
186*************************** 1. row ***************************
187 Slave_IO_State:
188 Master_Host: db1081.eqiad.wmnet
189 Master_User: repl
190 Master_Port: 3306
191 Connect_Retry: 60
192 Master_Log_File: db1081-bin.005538
193 Read_Master_Log_Pos: 174671689
194 Relay_Log_File: db2090-relay-bin.000100
195 Relay_Log_Pos: 901626
196 Relay_Master_Log_File: db1081-bin.005538
197 Slave_IO_Running: No
198 Slave_SQL_Running: No
199 Replicate_Do_DB:
200 Replicate_Ignore_DB:
201 Replicate_Do_Table:
202 Replicate_Ignore_Table:
203 Replicate_Wild_Do_Table:
204 Replicate_Wild_Ignore_Table:
205 Last_Errno: 0
206 Last_Error:
207 Skip_Counter: 0
208 Exec_Master_Log_Pos: 174671689
209 Relay_Log_Space: 901969
210 Until_Condition: None
211 Until_Log_File:
212 Until_Log_Pos: 0
213 Master_SSL_Allowed: Yes
214 Master_SSL_CA_File:
215 Master_SSL_CA_Path:
216 Master_SSL_Cert:
217 Master_SSL_Cipher:
218 Master_SSL_Key:
219 Seconds_Behind_Master: NULL
220Master_SSL_Verify_Server_Cert: No
221 Last_IO_Errno: 0
222 Last_IO_Error:
223 Last_SQL_Errno: 0
224 Last_SQL_Error:
225 Replicate_Ignore_Server_Ids:
226 Master_Server_Id: 171966557
227 Master_SSL_Crl:
228 Master_SSL_Crlpath:
229 Using_Gtid: No
230 Gtid_IO_Pos: 0-180359175-3368394787,180363436-180363436-38819606,171966557-171966557-1250318470,171978876-171978876-1972981824,171978775-171978775-4822899280,180359175-180359175-43143523,171970589-171970589-201132050,180359190-180359190-192195477
231 Replicate_Do_Domain_Ids:
232 Replicate_Ignore_Domain_Ids:
233 Parallel_Mode: conservative
2341 row in set (0.035 sec)
235
236root@db2090.codfw.wmnet[(none)]> reset slave all;
237Query OK, 0 rows affected (0.035 sec)
238
239s5
240
241root@db2123.codfw.wmnet[(none)]> stop slave;
242Query OK, 0 rows affected (0.036 sec)
243
244root@db2123.codfw.wmnet[(none)]> show slave status\G
245*************************** 1. row ***************************
246 Slave_IO_State:
247 Master_Host: db1100.eqiad.wmnet
248 Master_User: repl
249 Master_Port: 3306
250 Connect_Retry: 60
251 Master_Log_File: db1100-bin.002665
252 Read_Master_Log_Pos: 187349402
253 Relay_Log_File: db2123-relay-bin.000036
254 Relay_Log_Pos: 2544911
255 Relay_Master_Log_File: db1100-bin.002665
256 Slave_IO_Running: No
257 Slave_SQL_Running: No
258 Replicate_Do_DB:
259 Replicate_Ignore_DB:
260 Replicate_Do_Table:
261 Replicate_Ignore_Table:
262 Replicate_Wild_Do_Table:
263 Replicate_Wild_Ignore_Table:
264 Last_Errno: 0
265 Last_Error:
266 Skip_Counter: 0
267 Exec_Master_Log_Pos: 187349402
268 Relay_Log_Space: 2545254
269 Until_Condition: None
270 Until_Log_File:
271 Until_Log_Pos: 0
272 Master_SSL_Allowed: Yes
273 Master_SSL_CA_File:
274 Master_SSL_CA_Path:
275 Master_SSL_Cert:
276 Master_SSL_Cipher:
277 Master_SSL_Key:
278 Seconds_Behind_Master: NULL
279Master_SSL_Verify_Server_Cert: No
280 Last_IO_Errno: 0
281 Last_IO_Error:
282 Last_SQL_Errno: 0
283 Last_SQL_Error:
284 Replicate_Ignore_Server_Ids:
285 Master_Server_Id: 171974853
286 Master_SSL_Crl:
287 Master_SSL_Crlpath:
288 Using_Gtid: No
289 Gtid_IO_Pos: 171974884-171974884-1473084269,0-180359179-5734605861,171974853-171974853-1151510846,171970704-171970704-351094624,180363367-180363367-133158799,171978768-171978768-202416,171978777-171978777-2596176831,180367364-180367364-67917352,180359179-180359179-96523837,180359180-180359180-32243036
290 Replicate_Do_Domain_Ids:
291 Replicate_Ignore_Domain_Ids:
292 Parallel_Mode: conservative
2931 row in set (0.034 sec)
294
295root@db2123.codfw.wmnet[(none)]> reset slave all;
296Query OK, 0 rows affected (0.036 sec)
297
298s6
299
300root@db2129.codfw.wmnet[(none)]> stop slave;
301Query OK, 0 rows affected (0.035 sec)
302
303root@db2129.codfw.wmnet[(none)]> show slave status\G
304*************************** 1. row ***************************
305 Slave_IO_State:
306 Master_Host: db1093.eqiad.wmnet
307 Master_User: repl
308 Master_Port: 3306
309 Connect_Retry: 60
310 Master_Log_File: db1093-bin.003314
311 Read_Master_Log_Pos: 781155989
312 Relay_Log_File: db2129-relay-bin.000032
313 Relay_Log_Pos: 11663750
314 Relay_Master_Log_File: db1093-bin.003314
315 Slave_IO_Running: No
316 Slave_SQL_Running: No
317 Replicate_Do_DB:
318 Replicate_Ignore_DB:
319 Replicate_Do_Table:
320 Replicate_Ignore_Table:
321 Replicate_Wild_Do_Table:
322 Replicate_Wild_Ignore_Table:
323 Last_Errno: 0
324 Last_Error:
325 Skip_Counter: 0
326 Exec_Master_Log_Pos: 781155989
327 Relay_Log_Space: 11664093
328 Until_Condition: None
329 Until_Log_File:
330 Until_Log_Pos: 0
331 Master_SSL_Allowed: Yes
332 Master_SSL_CA_File:
333 Master_SSL_CA_Path:
334 Master_SSL_Cert:
335 Master_SSL_Cipher:
336 Master_SSL_Key:
337 Seconds_Behind_Master: NULL
338Master_SSL_Verify_Server_Cert: No
339 Last_IO_Errno: 0
340 Last_IO_Error:
341 Last_SQL_Errno: 0
342 Last_SQL_Error:
343 Replicate_Ignore_Server_Ids:
344 Master_Server_Id: 171978904
345 Master_SSL_Crl:
346 Master_SSL_Crlpath:
347 Using_Gtid: No
348 Gtid_IO_Pos: 180367475-180367475-31642404,0-180359184-3049354376,180363274-180363274-7949150,180363370-180363370-7087924,180367474-180367474-91976046,180359184-180359184-35598956,171970705-171970705-239075862,171978766-171978766-1375989821,171970594-171970594-1063329989,171978904-171978904-177867593,171974883-171974883-1921892293
349 Replicate_Do_Domain_Ids:
350 Replicate_Ignore_Domain_Ids:
351 Parallel_Mode: conservative
3521 row in set (0.034 sec)
353
354root@db2129.codfw.wmnet[(none)]> reset slave all;
355Query OK, 0 rows affected (0.039 sec)
356
357s7
358
359
360root@db2118.codfw.wmnet[(none)]> stop slave;
361Query OK, 0 rows affected (0.035 sec)
362
363root@db2118.codfw.wmnet[(none)]> show slave status\G
364*************************** 1. row ***************************
365 Slave_IO_State:
366 Master_Host: db1086.eqiad.wmnet
367 Master_User: repl
368 Master_Port: 3306
369 Connect_Retry: 60
370 Master_Log_File: db1086-bin.004835
371 Read_Master_Log_Pos: 982007049
372 Relay_Log_File: db2118-relay-bin.000038
373 Relay_Log_Pos: 14603160
374 Relay_Master_Log_File: db1086-bin.004835
375 Slave_IO_Running: No
376 Slave_SQL_Running: No
377 Replicate_Do_DB:
378 Replicate_Ignore_DB:
379 Replicate_Do_Table:
380 Replicate_Ignore_Table:
381 Replicate_Wild_Do_Table:
382 Replicate_Wild_Ignore_Table:
383 Last_Errno: 0
384 Last_Error:
385 Skip_Counter: 0
386 Exec_Master_Log_Pos: 982007049
387 Relay_Log_Space: 14603503
388 Until_Condition: None
389 Until_Log_File:
390 Until_Log_Pos: 0
391 Master_SSL_Allowed: Yes
392 Master_SSL_CA_File:
393 Master_SSL_CA_Path:
394 Master_SSL_Cert:
395 Master_SSL_Cipher:
396 Master_SSL_Key:
397 Seconds_Behind_Master: NULL
398Master_SSL_Verify_Server_Cert: No
399 Last_IO_Errno: 0
400 Last_IO_Error:
401 Last_SQL_Errno: 0
402 Last_SQL_Error:
403 Replicate_Ignore_Server_Ids:
404 Master_Server_Id: 171970664
405 Master_SSL_Crl:
406 Master_SSL_Crlpath:
407 Using_Gtid: No
408 Gtid_IO_Pos: 180367395-180367395-31049562,0-180359185-3359637071,180363371-180363371-19393161,171978767-171978767-4484858466,180355111-180355111-131673159,171970664-171970664-1628618366,180359185-180359185-71998080,171970590-171970590-196280066,180363275-180363275-7694462
409 Replicate_Do_Domain_Ids:
410 Replicate_Ignore_Domain_Ids:
411 Parallel_Mode: conservative
4121 row in set (0.034 sec)
413
414root@db2118.codfw.wmnet[(none)]> reset slave all;
415Query OK, 0 rows affected (0.041 sec)
416
417
418s8
419
420root@db2079.codfw.wmnet[(none)]> stop slave;
421Query OK, 0 rows affected (0.035 sec)
422
423root@db2079.codfw.wmnet[(none)]> show slave status\G
424*************************** 1. row ***************************
425 Slave_IO_State:
426 Master_Host: db1109.eqiad.wmnet
427 Master_User: repl
428 Master_Port: 3306
429 Connect_Retry: 60
430 Master_Log_File: db1109-bin.004891
431 Read_Master_Log_Pos: 384489469
432 Relay_Log_File: db2079-relay-bin.000074
433 Relay_Log_Pos: 2164571
434 Relay_Master_Log_File: db1109-bin.004891
435 Slave_IO_Running: No
436 Slave_SQL_Running: No
437 Replicate_Do_DB:
438 Replicate_Ignore_DB:
439 Replicate_Do_Table:
440 Replicate_Ignore_Table:
441 Replicate_Wild_Do_Table:
442 Replicate_Wild_Ignore_Table:
443 Last_Errno: 0
444 Last_Error:
445 Skip_Counter: 0
446 Exec_Master_Log_Pos: 384489469
447 Relay_Log_Space: 2164914
448 Until_Condition: None
449 Until_Log_File:
450 Until_Log_Pos: 0
451 Master_SSL_Allowed: Yes
452 Master_SSL_CA_File:
453 Master_SSL_CA_Path:
454 Master_SSL_Cert:
455 Master_SSL_Cipher:
456 Master_SSL_Key:
457 Seconds_Behind_Master: NULL
458Master_SSL_Verify_Server_Cert: No
459 Last_IO_Errno: 0
460 Last_IO_Error:
461 Last_SQL_Errno: 0
462 Last_SQL_Error:
463 Replicate_Ignore_Server_Ids:
464 Master_Server_Id: 171978924
465 Master_SSL_Crl:
466 Master_SSL_Crlpath:
467 Using_Gtid: No
468 Gtid_IO_Pos: 0-180359179-5734605861,171974884-171974884-1473084269,171978777-171978777-329020349,171978778-171978778-3298185533,171970704-171970704-351094624,171978768-171978768-202416,180363369-180363369-16071499,180355078-180355078-42362447,180359179-180359179-96523837,171978924-171978924-3473953670,171970645-171970645-288070551,180359242-180359242-170963125
469 Replicate_Do_Domain_Ids:
470 Replicate_Ignore_Domain_Ids:
471 Parallel_Mode: conservative
4721 row in set (0.034 sec)
473
474root@db2079.codfw.wmnet[(none)]> reset slave all;
475Query OK, 0 rows affected (0.036 sec)
476
477
478
479root@cumin1001:/home/marostegui# for i in db2112 db2107 db2105 db2090 db2123 db2129 db2118 db2079; do echo $i; mysql.py -h$i -e "show slave status\G";done
480db2112
481db2107
482db2105
483db2090
484db2123
485db2129
486db2118
487db2079
488root@cumin1001:/home/marostegui#

LSobanski moved this task from Next to Ready on the DBA board.
Marostegui updated the task description. (Show Details)
Marostegui moved this task from Next to In Progress on the Data-Persistence board.

The event scheduler has been checked across all hosts in eqiad, and it was ON everywhere.

Marostegui updated the task description. (Show Details)Oct 19 2020, 6:41 AM

events are enabled everywhere within eqiad

Marostegui updated the task description. (Show Details)Oct 19 2020, 7:06 AM

Notifications are enabled everywhere within eqiad

Marostegui updated the task description. (Show Details)Oct 19 2020, 7:56 AM

I have disabled GTID on eqiad masters (sX, x1, esX)
The rest of slaves have been checked, and they all have GTID enabled.

Marostegui updated the task description. (Show Details)Oct 20 2020, 5:46 AM
Marostegui updated the task description. (Show Details)Oct 20 2020, 12:51 PM

Mentioned in SAL (#wikimedia-operations) [2020-10-22T08:31:19Z] <kormat> enabling replication from eqiad to codfw T261914

Kormat updated the task description. (Show Details)Oct 22 2020, 8:41 AM
Kormat updated the task description. (Show Details)Oct 22 2020, 8:51 AM
Kormat updated the task description. (Show Details)Oct 22 2020, 8:56 AM
Kormat updated the task description. (Show Details)Oct 22 2020, 9:01 AM
Kormat updated the task description. (Show Details)Oct 22 2020, 9:04 AM
Kormat updated the task description. (Show Details)Oct 22 2020, 9:08 AM
Kormat updated the task description. (Show Details)Oct 22 2020, 9:10 AM
Kormat updated the task description. (Show Details)Oct 22 2020, 9:12 AM
Kormat added a subscriber: Kormat.

Replication re-enabling tracked here: https://phabricator.wikimedia.org/P13049

Thanks Stevie for working on that!
We have double checked independently and replication is running ,without GTID everywhere within core on both directions:

db1081
                  Master_Host: db2090.codfw.wmnet
        Seconds_Behind_Master: 0
                   Using_Gtid: No
db1083
                  Master_Host: db2112.codfw.wmnet
        Seconds_Behind_Master: 0
                   Using_Gtid: No
db1086
                  Master_Host: db2118.codfw.wmnet
        Seconds_Behind_Master: 0
                   Using_Gtid: No
db1100
                  Master_Host: db2123.codfw.wmnet
        Seconds_Behind_Master: 0
                   Using_Gtid: No
db1103
                   Master_Host: db2096.codfw.wmnet
         Seconds_Behind_Master: 0
                    Using_Gtid: No
db1104
                  Master_Host: db2079.codfw.wmnet
        Seconds_Behind_Master: 0
                   Using_Gtid: No
db1122
                  Master_Host: db2107.codfw.wmnet
        Seconds_Behind_Master: 0
                   Using_Gtid: No
db1123
                  Master_Host: db2105.codfw.wmnet
        Seconds_Behind_Master: 0
                   Using_Gtid: No
db1131
                  Master_Host: db2129.codfw.wmnet
        Seconds_Behind_Master: 0
                   Using_Gtid: No
db2079
                  Master_Host: db1104.eqiad.wmnet
        Seconds_Behind_Master: 0
                   Using_Gtid: No
db2090
                  Master_Host: db1081.eqiad.wmnet
        Seconds_Behind_Master: 0
                   Using_Gtid: No
db2096
                   Master_Host: db1103.eqiad.wmnet
         Seconds_Behind_Master: 0
                    Using_Gtid: No
db2105
                  Master_Host: db1123.eqiad.wmnet
        Seconds_Behind_Master: 0
                   Using_Gtid: No
db2107
                  Master_Host: db1122.eqiad.wmnet
        Seconds_Behind_Master: 0
                   Using_Gtid: No
db2112
                  Master_Host: db1083.eqiad.wmnet
        Seconds_Behind_Master: 0
                   Using_Gtid: No
db2118
                  Master_Host: db1086.eqiad.wmnet
        Seconds_Behind_Master: 0
                   Using_Gtid: No
db2123
                  Master_Host: db1100.eqiad.wmnet
        Seconds_Behind_Master: 0
                   Using_Gtid: No
db2129
                  Master_Host: db1131.eqiad.wmnet
        Seconds_Behind_Master: 0
                   Using_Gtid: No
es1021
                   Master_Host: es2021.codfw.wmnet
         Seconds_Behind_Master: 0
                    Using_Gtid: No
es1024
                   Master_Host: es2023.codfw.wmnet
         Seconds_Behind_Master: 0
                    Using_Gtid: No
es2021
                   Master_Host: es1021.eqiad.wmnet
         Seconds_Behind_Master: 0
                    Using_Gtid: No
es2023
                   Master_Host: es1024.eqiad.wmnet
         Seconds_Behind_Master: 0
                    Using_Gtid: No
pc1007
                   Master_Host: pc2007.codfw.wmnet
         Seconds_Behind_Master: 0
                    Using_Gtid: No
pc1008
                   Master_Host: pc2008.codfw.wmnet
         Seconds_Behind_Master: 0
                    Using_Gtid: No
pc1009
                   Master_Host: pc2009.codfw.wmnet
         Seconds_Behind_Master: 0
                    Using_Gtid: No
pc2007
                   Master_Host: pc1007.eqiad.wmnet
         Seconds_Behind_Master: 0
                    Using_Gtid: No
pc2008
                   Master_Host: pc1008.eqiad.wmnet
         Seconds_Behind_Master: 0
                    Using_Gtid: No
pc2009
                   Master_Host: pc1009.eqiad.wmnet
         Seconds_Behind_Master: 0
                    Using_Gtid: No

I am going to give it a few hours before starting the table comparisons, to make sure that gaps would arise in case we messed up things.

Marostegui added a comment.EditedOct 22 2020, 11:32 AM

The following tables will be checked across all the wikis from s1 to s8 (as the rest of sections never had replication disconnected):

revision rev_id
text old_id
user user_id
change_tag ct_id
actor actor_id
ipblocks ipb_id
comment comment_id
user user_id
watchlist wl_id
text old_id
logging log_id
page page_id
revision rev_id
revision_actor_temp revactor_rev
revision_comment_temp revcomment_rev
slots slot_revision_id
archive ar_id
Marostegui updated the task description. (Show Details)Oct 22 2020, 11:33 AM

Mentioned in SAL (#wikimedia-operations) [2020-10-22T11:38:36Z] <marostegui> Compare s1-s8 tables - T261914

Marostegui updated the task description. (Show Details)Oct 22 2020, 11:42 AM
Marostegui updated the task description. (Show Details)Oct 23 2020, 5:24 AM
Marostegui updated the task description. (Show Details)Oct 23 2020, 5:34 AM
Marostegui updated the task description. (Show Details)Oct 23 2020, 6:11 AM
Marostegui updated the task description. (Show Details)Oct 23 2020, 6:56 AM
Marostegui updated the task description. (Show Details)Oct 23 2020, 9:29 AM
Marostegui updated the task description. (Show Details)Oct 23 2020, 11:32 AM
Marostegui updated the task description. (Show Details)Oct 23 2020, 12:07 PM

All tables came clean. All those that reported differences, were confirmed as false positives by second runs.
The false positives were found at:

enwiki.page
itwiki.page
enwiktionary.page
commonswiki.user

Mentioned in SAL (#wikimedia-operations) [2020-10-26T06:10:48Z] <marostegui> Warm up tables T261914

Double checked that replication is enabled on all masters (on both dcs)

Marostegui updated the task description. (Show Details)Mon, Oct 26, 7:19 AM
Marostegui updated the task description. (Show Details)Mon, Oct 26, 7:47 AM
LSobanski moved this task from Ready to In progress on the DBA board.Mon, Oct 26, 2:45 PM
Marostegui closed this task as Resolved.Mon, Oct 26, 5:14 PM
Marostegui updated the task description. (Show Details)

This is all done