Page MenuHomePhabricator

Move more wikis from s3 to s5
Open, MediumPublic

Description

As we did in the past at T184805: Move some wikis to s5 - as the DC failover might be approaching we would need to discuss if we want to keep moving some more wikis from s3 to s5 (or to s6) to reduce some load from s3.

Initial figures to find out which wikis could be moved:

1root@db1115.eqiad.wmnet[zarcillo]> select file_path, sum(size) as size FROM backup_files where backup_id = 1901 GROUP BY file_path ORDER BY SIZE DESC LIMIT 20;
2+---------------+-------------+
3| file_path | size |
4+---------------+-------------+
5| ruwiktionary | 24144430804 |
6| mediawikiwiki | 22529077439 |
7| loginwiki | 19756739841 |
8| plwiktionary | 19098819796 |
9| dewiktionary | 18069974270 |
10| frwikisource | 15744003279 |
11| enwikisource | 15316723187 |
12| euwiki | 14257190411 |
13| ruwikisource | 13555986560 |
14| dawiki | 13537448066 |
15| cewiki | 13426855110 |
16| enwikinews | 13365490330 |
17| hywiki | 13267092634 |
18| bewiki | 11361132645 |
19| mswiki | 11148073926 |
20| specieswiki | 11104854093 |
21| elwiki | 10762890706 |
22| warwiki | 10698456384 |
23| hiwiki | 10219741148 |
24| incubatorwiki | 9629467018 |
25+---------------+-------------+
2620 rows in set (0.12 sec)
27
28root@db1115.eqiad.wmnet[zarcillo]> select file_path, sum(size) as size FROM backup_files where backup_id = 1897 GROUP BY file_path ORDER BY SIZE DESC LIMIT 20;
29+---------------+-------------+
30| file_path | size |
31+---------------+-------------+
32| mediawikiwiki | 21638206609 |
33| ruwiktionary | 20839107238 |
34| loginwiki | 18500473043 |
35| plwiktionary | 17091109030 |
36| dewiktionary | 16592277712 |
37| frwikisource | 14803931297 |
38| enwikisource | 13998713285 |
39| ruwikisource | 12573463634 |
40| cewiki | 12215604647 |
41| dawiki | 12198392916 |
42| enwikinews | 11925682228 |
43| hywiki | 11193870444 |
44| euwiki | 11147549149 |
45| bewiki | 10722511174 |
46| warwiki | 10589307154 |
47| mswiki | 9967837080 |
48| specieswiki | 9778234399 |
49| elwiki | 9734582692 |
50| hiwiki | 9031959470 |
51| incubatorwiki | 8940258652 |
52+---------------+-------------+
5320 rows in set (0.12 sec)
1root@db1115.eqiad.wmnet[zarcillo]> select SUBSTRING_INDEX(file_name, '.', 1) as db, sum(size) as size FROM backup_files WHERE backup_id = 1862 and file_name not like '%-schema-create.sql.gz' GROUP BY db ORDER BY size desc LIMIT 20;
2+---------------+------------+
3| db | size |
4+---------------+------------+
5| mediawikiwiki | 4539665654 |
6| loginwiki | 2829553543 |
7| plwiktionary | 2358407852 |
8| ruwiktionary | 2355786729 |
9| frwikisource | 2315194029 |
10| dewiktionary | 2192119974 |
11| enwikisource | 1933894489 |
12| dawiki | 1818772589 |
13| sourceswiki | 1795213418 |
14| dewikivoyage | 1495157942 |
15| skwiki | 1449949778 |
16| euwiki | 1377163347 |
17| elwiki | 1371151371 |
18| bewiki | 1303039764 |
19| enwikinews | 1267055352 |
20| simplewiki | 1266661189 |
21| specieswiki | 1241517305 |
22| ruwikisource | 1218843714 |
23| hywiki | 1213727020 |
24| slwiki | 1212863732 |
25+---------------+------------+
2620 rows in set (0.20 sec)
27
28root@db1115.eqiad.wmnet[zarcillo]> select SUBSTRING_INDEX(file_name, '.', 1) as db, sum(size) as size FROM backup_files WHERE backup_id = 1866 and file_name not like '%-schema-create.sql.gz' GROUP BY db ORDER BY size desc LIMIT 20;
29+---------------+------------+
30| db | size |
31+---------------+------------+
32| mediawikiwiki | 4539656355 |
33| loginwiki | 2829590706 |
34| plwiktionary | 2358419554 |
35| ruwiktionary | 2355797164 |
36| frwikisource | 2315213285 |
37| dewiktionary | 2204128063 |
38| enwikisource | 1933912943 |
39| dawiki | 1818751991 |
40| sourceswiki | 1795214102 |
41| dewikivoyage | 1495165313 |
42| skwiki | 1449944125 |
43| elwiki | 1371124258 |
44| euwiki | 1371034939 |
45| bewiki | 1303054008 |
46| enwikinews | 1267056284 |
47| simplewiki | 1266639862 |
48| hywiki | 1247715009 |
49| specieswiki | 1241538353 |
50| ruwikisource | 1218846265 |
51| slwiki | 1212866336 |
52+---------------+------------+
5320 rows in set (0.21 sec)
1root@db1075[sys]> select table_schema, sum(total_latency) as latency FROM x$schema_table_statistics GROUP BY table_schema ORDER BY latency DESC LIMIT 20;
2+---------------+-------------------+
3| table_schema | latency |
4+---------------+-------------------+
5| loginwiki | 27408946574524658 |
6| heartbeat | 6598870761431520 |
7| ruwiktionary | 5763622545480800 |
8| ruwikisource | 5387585333028756 |
9| enwikisource | 4369249543450792 |
10| ruwikinews | 4348592997866546 |
11| cywiki | 4345551731211834 |
12| dawiki | 4010741717824270 |
13| specieswiki | 3772207160558122 |
14| elwiki | 3411122321623874 |
15| urwiki | 3097049706021414 |
16| mediawikiwiki | 2947554293969428 |
17| hiwiki | 2758917533134906 |
18| hywiki | 2708470437012054 |
19| azwiki | 2464451878996822 |
20| etwiki | 2353731533405538 |
21| cewiki | 2123102752972630 |
22| simplewiki | 2120499753908536 |
23| incubatorwiki | 2078290888311924 |
24| mswiki | 2075837350189746 |
25+---------------+-------------------+
2620 rows in set (0.14 sec)
27
28root@db1075[sys]> select table_schema, sum(rows_fetched) as row_reads FROM x$schema_table_statistics GROUP BY table_schema ORDER BY row_reads DESC LIMIT 20;
29+--------------+------------+
30| table_schema | row_reads |
31+--------------+------------+
32| heartbeat | 1987401108 |
33| ruwikinews | 1178327443 |
34| eswikinews | 903064930 |
35| ruwiktionary | 573260200 |
36| enwikisource | 404222322 |
37| dawiki | 345920511 |
38| elwiki | 335306424 |
39| ptwikinews | 328507775 |
40| urwiki | 299433460 |
41| specieswiki | 295739411 |
42| cywiki | 292026525 |
43| plwikisource | 261066955 |
44| ruwikisource | 243940983 |
45| simplewiki | 225388431 |
46| hiwiki | 216349589 |
47| etwiki | 216315626 |
48| azwiki | 190692054 |
49| mswiki | 181966577 |
50| hywiki | 179103722 |
51| zhwikisource | 175792365 |
52+--------------+------------+
5320 rows in set (0.14 sec)
54
55root@db1075[sys]> select table_schema, sum(rows_inserted + rows_updated + rows_deleted) as row_writes FROM x$schema_table_statistics GROUP BY table_schema ORDER BY row_writes DESC LIMIT 20;
56+---------------+------------+
57| table_schema | row_writes |
58+---------------+------------+
59| cywiki | 33084444 |
60| incubatorwiki | 24994139 |
61| ruwikisource | 21520204 |
62| specieswiki | 18228253 |
63| loginwiki | 17504340 |
64| heartbeat | 14354040 |
65| elwiki | 13954298 |
66| cewiki | 12531174 |
67| azwiki | 12429440 |
68| ruwiktionary | 12311758 |
69| euwiki | 11372492 |
70| enwikisource | 10696608 |
71| dawiki | 10316007 |
72| hywiki | 9536860 |
73| dewiktionary | 6881746 |
74| plwikisource | 6844363 |
75| hiwiki | 6397090 |
76| bewiki | 6359947 |
77| ruwikinews | 6149052 |
78| ruwikiquote | 5561521 |
79+---------------+------------+
8020 rows in set (0.14 sec)

Initially we could be looking at:

ruwiktionary
ruwikisource
mediawikiwiki
loginwiki

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 1 2019, 5:35 AM
Marostegui triaged this task as Medium priority.Jul 1 2019, 5:35 AM
Marostegui moved this task from Triage to Backlog on the DBA board.
Marostegui moved this task from Backlog to Next on the DBA board.Aug 1 2019, 1:55 PM

How is possible to loginwiki is so big??

Reedy added a subscriber: Reedy.Dec 22 2019, 4:40 PM

How is possible to loginwiki is so big??

A quick look suggests it's about 20G, with nearly half being logging, and just over 4G for the user table

@Marostegui Why is compressed size bigger than the physical? Or is it supposed to be uncompressed size?

The largest tables on loginwiki are:

user
logging
actor
watchlist
spoofuser

Together they take 2.9 GB compressed, or the 96% of the total size compressed. Note that is small for our standards (large wikis take 300-400GB compressed).

Why is compressed size bigger than the physical

It is not larger- Current physical size is 21493084009 bytes, or around 20GB, while compressed it takes 2936021259, or around 2.7GB. This is typical of the 5x compression ratio of text expected + other inefficiencies on a live database for performance, specially for smaller databases.

revi added a subscriber: revi.Fri, Jan 24, 12:27 AM

How is possible to loginwiki is so big??

All new accounts are auto-created on Meta, LoginWiki, and mediawikiwiki; so it's quite natural that the DB is quite large as all users (virtually) have an account on loginwiki.