Event Timeline
$ ./compare.py db1111 db1112 commonswiki page page_id --step=10000
Starting comparison between id 1 and 64894894
DIFFERENCE: WHERE page_id BETWEEN 1 AND 10000
100 iterations, id 990001/64894894, 1 chunk(s) found different
200 iterations, id 1990001/64894894, 1 chunk(s) found different
300 iterations, id 2990001/64894894, 1 chunk(s) found different
400 iterations, id 3990001/64894894, 1 chunk(s) found different
500 iterations, id 4990001/64894894, 1 chunk(s) found different
600 iterations, id 5990001/64894894, 1 chunk(s) found different
700 iterations, id 6990001/64894894, 1 chunk(s) found different
800 iterations, id 7990001/64894894, 1 chunk(s) found different
$ ./compare.py db1111 db1112 commonswiki page page_id --step=10000 Starting comparison between id 1 and 64894894 DIFFERENCE: WHERE page_id BETWEEN 1 AND 10000 2018-01-16T15:05:41.084014: row id 990001/64894894, 1 chunk(s) found different 2018-01-16T15:05:51.302744: row id 1990001/64894894, 1 chunk(s) found different 2018-01-16T15:06:01.520885: row id 2990001/64894894, 1 chunk(s) found different 2018-01-16T15:06:11.738242: row id 3990001/64894894, 1 chunk(s) found different
Now you can compare more than 1 server at a time (as fast as doing only 2), and an eta for finishing is printed:
$ ./compare.py commonswiki page page_id db2073 db1111 db1112 Starting comparison between id 1 and 64894894 DIFFERENCE on db1111.eqiad.wmnet:3306: WHERE page_id BETWEEN 1 AND 10000 DIFFERENCE on db1112.eqiad.wmnet:3306: WHERE page_id BETWEEN 1 AND 10000 DIFFERENCE on db1111.eqiad.wmnet:3306: WHERE page_id BETWEEN 10001 AND 20000 DIFFERENCE on db1112.eqiad.wmnet:3306: WHERE page_id BETWEEN 10001 AND 20000 ... 2018-01-16T19:05:19.702697: row id 990001/64894894, ETA: 20m41s, 99 chunk(s) found different
I would find useful an option to ONLY print differences.
That is to say, not to print things like:
2018-01-17T07:19:03.238318: row id 33990001/40566930, ETA: 01m07s, 0 chunk(s) found different 2018-01-17T07:19:13.460769: row id 34990001/40566930, ETA: 00m57s, 0 chunk(s) found different 2018-01-17T07:19:23.677512: row id 35990001/40566930, ETA: 00m46s, 0 chunk(s) found different 2018-01-17T07:19:33.899286: row id 36990001/40566930, ETA: 00m36s, 0 chunk(s) found different 2018-01-17T07:19:44.120978: row id 37990001/40566930, ETA: 00m26s, 0 chunk(s) found different 2018-01-17T07:19:54.344143: row id 38990001/40566930, ETA: 00m16s, 0 chunk(s) found different 2018-01-17T07:20:04.566056: row id 39990001/40566930, ETA: 00m05s, 0 chunk(s) found different
And only print any possible differences like:
DIFFERENCE on db1089.eqiad.wmnet:3306: WHERE ct_id BETWEEN 40560001 AND 40566930
Now, with --print-every=0, only differences are printed (unless there is an error). Return value still must be checked to test if they are equal or there are differences:
./compare.py commonswiki page page_id db1111 db1112 --print-every=0 DIFFERENCE on db1112.eqiad.wmnet:3306: WHERE page_id BETWEEN 1 AND 10000
host:port not working?
root@neodymium:/home/marostegui# /home/jynus/compare.py enwiki change_tag ct_id db1089 db1099:3311 Traceback (most recent call last): File "/home/jynus/compare.py", line 184, in <module> main() File "/home/jynus/compare.py", line 74, in main connections = connect_in_parallel(options.hosts, options.database, options.threads) File "/home/jynus/compare.py", line 46, in connect_in_parallel mysql = async_result[host].get() KeyError: 'db1099:3311'
This should be fixed, I just changed internally references to e.g. db1099:3311 by db1099_3311 and assume we will never use _ in a hostname/dns:
./compare.py enwiki change_tag ct_id db1089 db1099:3311 Starting comparison between id 1 and 40627513 2018-01-18T10:45:59.628063: row id 990001/40627513, ETA: 06m44s, 0 chunk(s) found different 2018-01-18T10:46:09.936190: row id 1990001/40627513, ETA: 06m36s, 0 chunk(s) found different 2018-01-18T10:46:20.155734: row id 2990001/40627513, ETA: 06m25s, 0 chunk(s) found different 2018-01-18T10:46:30.377224: row id 3990001/40627513, ETA: 06m15s, 0 chunk(s) found different 2018-01-18T10:46:40.694896: row id 4990001/40627513, ETA: 06m05s, 0 chunk(s) found different 2018-01-18T10:46:50.915050: row id 5990001/40627513, ETA: 05m55s, 0 chunk(s) found different