Page MenuHomePhabricator

Move more wikis from s3 to s5
Open, MediumPublic

Description

As we did in the past at T184805: Move some wikis to s5 - as the DC failover might be approaching we would need to discuss if we want to keep moving some more wikis from s3 to s5 (or to s6) to reduce some load from s3.

Initial figures to find out which wikis could be moved:

1root@db1115.eqiad.wmnet[zarcillo]> select file_path, sum(size) as size FROM backup_files where backup_id = 1901 GROUP BY file_path ORDER BY SIZE DESC LIMIT 20;
2+---------------+-------------+
3| file_path | size |
4+---------------+-------------+
5| ruwiktionary | 24144430804 |
6| mediawikiwiki | 22529077439 |
7| loginwiki | 19756739841 |
8| plwiktionary | 19098819796 |
9| dewiktionary | 18069974270 |
10| frwikisource | 15744003279 |
11| enwikisource | 15316723187 |
12| euwiki | 14257190411 |
13| ruwikisource | 13555986560 |
14| dawiki | 13537448066 |
15| cewiki | 13426855110 |
16| enwikinews | 13365490330 |
17| hywiki | 13267092634 |
18| bewiki | 11361132645 |
19| mswiki | 11148073926 |
20| specieswiki | 11104854093 |
21| elwiki | 10762890706 |
22| warwiki | 10698456384 |
23| hiwiki | 10219741148 |
24| incubatorwiki | 9629467018 |
25+---------------+-------------+
2620 rows in set (0.12 sec)
27
28root@db1115.eqiad.wmnet[zarcillo]> select file_path, sum(size) as size FROM backup_files where backup_id = 1897 GROUP BY file_path ORDER BY SIZE DESC LIMIT 20;
29+---------------+-------------+
30| file_path | size |
31+---------------+-------------+
32| mediawikiwiki | 21638206609 |
33| ruwiktionary | 20839107238 |
34| loginwiki | 18500473043 |
35| plwiktionary | 17091109030 |
36| dewiktionary | 16592277712 |
37| frwikisource | 14803931297 |
38| enwikisource | 13998713285 |
39| ruwikisource | 12573463634 |
40| cewiki | 12215604647 |
41| dawiki | 12198392916 |
42| enwikinews | 11925682228 |
43| hywiki | 11193870444 |
44| euwiki | 11147549149 |
45| bewiki | 10722511174 |
46| warwiki | 10589307154 |
47| mswiki | 9967837080 |
48| specieswiki | 9778234399 |
49| elwiki | 9734582692 |
50| hiwiki | 9031959470 |
51| incubatorwiki | 8940258652 |
52+---------------+-------------+
5320 rows in set (0.12 sec)
1root@db1115.eqiad.wmnet[zarcillo]> select SUBSTRING_INDEX(file_name, '.', 1) as db, sum(size) as size FROM backup_files WHERE backup_id = 1862 and file_name not like '%-schema-create.sql.gz' GROUP BY db ORDER BY size desc LIMIT 20;
2+---------------+------------+
3| db | size |
4+---------------+------------+
5| mediawikiwiki | 4539665654 |
6| loginwiki | 2829553543 |
7| plwiktionary | 2358407852 |
8| ruwiktionary | 2355786729 |
9| frwikisource | 2315194029 |
10| dewiktionary | 2192119974 |
11| enwikisource | 1933894489 |
12| dawiki | 1818772589 |
13| sourceswiki | 1795213418 |
14| dewikivoyage | 1495157942 |
15| skwiki | 1449949778 |
16| euwiki | 1377163347 |
17| elwiki | 1371151371 |
18| bewiki | 1303039764 |
19| enwikinews | 1267055352 |
20| simplewiki | 1266661189 |
21| specieswiki | 1241517305 |
22| ruwikisource | 1218843714 |
23| hywiki | 1213727020 |
24| slwiki | 1212863732 |
25+---------------+------------+
2620 rows in set (0.20 sec)
27
28root@db1115.eqiad.wmnet[zarcillo]> select SUBSTRING_INDEX(file_name, '.', 1) as db, sum(size) as size FROM backup_files WHERE backup_id = 1866 and file_name not like '%-schema-create.sql.gz' GROUP BY db ORDER BY size desc LIMIT 20;
29+---------------+------------+
30| db | size |
31+---------------+------------+
32| mediawikiwiki | 4539656355 |
33| loginwiki | 2829590706 |
34| plwiktionary | 2358419554 |
35| ruwiktionary | 2355797164 |
36| frwikisource | 2315213285 |
37| dewiktionary | 2204128063 |
38| enwikisource | 1933912943 |
39| dawiki | 1818751991 |
40| sourceswiki | 1795214102 |
41| dewikivoyage | 1495165313 |
42| skwiki | 1449944125 |
43| elwiki | 1371124258 |
44| euwiki | 1371034939 |
45| bewiki | 1303054008 |
46| enwikinews | 1267056284 |
47| simplewiki | 1266639862 |
48| hywiki | 1247715009 |
49| specieswiki | 1241538353 |
50| ruwikisource | 1218846265 |
51| slwiki | 1212866336 |
52+---------------+------------+
5320 rows in set (0.21 sec)
1root@db1075[sys]> select table_schema, sum(total_latency) as latency FROM x$schema_table_statistics GROUP BY table_schema ORDER BY latency DESC LIMIT 20;
2+---------------+-------------------+
3| table_schema | latency |
4+---------------+-------------------+
5| loginwiki | 27408946574524658 |
6| heartbeat | 6598870761431520 |
7| ruwiktionary | 5763622545480800 |
8| ruwikisource | 5387585333028756 |
9| enwikisource | 4369249543450792 |
10| ruwikinews | 4348592997866546 |
11| cywiki | 4345551731211834 |
12| dawiki | 4010741717824270 |
13| specieswiki | 3772207160558122 |
14| elwiki | 3411122321623874 |
15| urwiki | 3097049706021414 |
16| mediawikiwiki | 2947554293969428 |
17| hiwiki | 2758917533134906 |
18| hywiki | 2708470437012054 |
19| azwiki | 2464451878996822 |
20| etwiki | 2353731533405538 |
21| cewiki | 2123102752972630 |
22| simplewiki | 2120499753908536 |
23| incubatorwiki | 2078290888311924 |
24| mswiki | 2075837350189746 |
25+---------------+-------------------+
2620 rows in set (0.14 sec)
27
28root@db1075[sys]> select table_schema, sum(rows_fetched) as row_reads FROM x$schema_table_statistics GROUP BY table_schema ORDER BY row_reads DESC LIMIT 20;
29+--------------+------------+
30| table_schema | row_reads |
31+--------------+------------+
32| heartbeat | 1987401108 |
33| ruwikinews | 1178327443 |
34| eswikinews | 903064930 |
35| ruwiktionary | 573260200 |
36| enwikisource | 404222322 |
37| dawiki | 345920511 |
38| elwiki | 335306424 |
39| ptwikinews | 328507775 |
40| urwiki | 299433460 |
41| specieswiki | 295739411 |
42| cywiki | 292026525 |
43| plwikisource | 261066955 |
44| ruwikisource | 243940983 |
45| simplewiki | 225388431 |
46| hiwiki | 216349589 |
47| etwiki | 216315626 |
48| azwiki | 190692054 |
49| mswiki | 181966577 |
50| hywiki | 179103722 |
51| zhwikisource | 175792365 |
52+--------------+------------+
5320 rows in set (0.14 sec)
54
55root@db1075[sys]> select table_schema, sum(rows_inserted + rows_updated + rows_deleted) as row_writes FROM x$schema_table_statistics GROUP BY table_schema ORDER BY row_writes DESC LIMIT 20;
56+---------------+------------+
57| table_schema | row_writes |
58+---------------+------------+
59| cywiki | 33084444 |
60| incubatorwiki | 24994139 |
61| ruwikisource | 21520204 |
62| specieswiki | 18228253 |
63| loginwiki | 17504340 |
64| heartbeat | 14354040 |
65| elwiki | 13954298 |
66| cewiki | 12531174 |
67| azwiki | 12429440 |
68| ruwiktionary | 12311758 |
69| euwiki | 11372492 |
70| enwikisource | 10696608 |
71| dawiki | 10316007 |
72| hywiki | 9536860 |
73| dewiktionary | 6881746 |
74| plwikisource | 6844363 |
75| hiwiki | 6397090 |
76| bewiki | 6359947 |
77| ruwikinews | 6149052 |
78| ruwikiquote | 5561521 |
79+---------------+------------+
8020 rows in set (0.14 sec)

Initially we could be looking at:

ruwiktionary
ruwikisource
mediawikiwiki
loginwiki

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 1 2019, 5:35 AM
Marostegui triaged this task as Medium priority.Jul 1 2019, 5:35 AM
Marostegui moved this task from Triage to Backlog on the DBA board.
Marostegui moved this task from Backlog to Next on the DBA board.Aug 1 2019, 1:55 PM
Kizule added a subscriber: Kizule.Dec 22 2019, 4:33 PM

How is possible to loginwiki is so big??

Reedy added a subscriber: Reedy.Dec 22 2019, 4:40 PM
In T226950#5759466, @Zoranzoki21 wrote:

How is possible to loginwiki is so big??

A quick look suggests it's about 20G, with nearly half being logging, and just over 4G for the user table

@Marostegui Why is compressed size bigger than the physical? Or is it supposed to be uncompressed size?

The largest tables on loginwiki are:

user
logging
actor
watchlist
spoofuser

Together they take 2.9 GB compressed, or the 96% of the total size compressed. Note that is small for our standards (large wikis take 300-400GB compressed).

Why is compressed size bigger than the physical

It is not larger- Current physical size is 21493084009 bytes, or around 20GB, while compressed it takes 2936021259, or around 2.7GB. This is typical of the 5x compression ratio of text expected + other inefficiencies on a live database for performance, specially for smaller databases.

revi added a subscriber: revi.Jan 24 2020, 12:27 AM
In T226950#5759466, @Zoranzoki21 wrote:

How is possible to loginwiki is so big??

All new accounts are auto-created on Meta, LoginWiki, and mediawikiwiki; so it's quite natural that the DB is quite large as all users (virtually) have an account on loginwiki.

Ok, thanks everyone for explaining!

What is the situation today?

In T226950#6083247, @Zoranzoki21 wrote:

What is the situation today?

For logingwiki?

Top used tables:

-rw-rw---- 1 mysql mysql 2.3G Apr 27 05:35 spoofuser.ibd
-rw-rw---- 1 mysql mysql 2.6G Apr 27 05:35 user_properties.ibd
-rw-rw---- 1 mysql mysql 3.2G Apr 27 05:35 actor.ibd
-rw-rw---- 1 mysql mysql 4.9G Apr 27 05:35 watchlist.ibd
-rw-rw---- 1 mysql mysql 7.3G Apr 27 05:35 user.ibd
-rw-rw---- 1 mysql mysql  13G Apr 27 05:35 logging.ibd
In T226950#6083247, @Zoranzoki21 wrote:

What is the situation today?

For logingwiki?

Top used tables:

-rw-rw---- 1 mysql mysql 2.3G Apr 27 05:35 spoofuser.ibd
-rw-rw---- 1 mysql mysql 2.6G Apr 27 05:35 user_properties.ibd
-rw-rw---- 1 mysql mysql 3.2G Apr 27 05:35 actor.ibd
-rw-rw---- 1 mysql mysql 4.9G Apr 27 05:35 watchlist.ibd
-rw-rw---- 1 mysql mysql 7.3G Apr 27 05:35 user.ibd
-rw-rw---- 1 mysql mysql  13G Apr 27 05:35 logging.ibd

I mean generally, no just for loginwiki.

We haven't moved any more s3 wikis to s5.
It is a complex procedure and it is normally best tried when we have a DC switchover scheduled.

We haven't moved any more s3 wikis to s5.
It is a complex procedure and it is normally best tried when we have a DC switchover scheduled.

Yes, I know that it is complicated.

Here are mentoined s3, s5 and s6. What is with s2?

What does "What is with..." mean exactly? What's your actual underlying question / intention?
(This task is about moving from s3 to s5, not s2. For general info, please see https://wikitech.wikimedia.org/wiki/MariaDB#Core_MediaWiki_databases )

Marostegui moved this task from Next to Backlog on the DBA board.Oct 13 2020, 11:20 AM
Marostegui moved this task from Backlog to Ready on the DBA board.Nov 27 2020, 12:47 PM