Page MenuHomePhabricator

flaggedtemplates table should not keep the whole history of all revisions
Closed, ResolvedPublic

Description

flaggedtemplates table that keeps track of templates used in stable versions, keep the tracking forever. In other words, it's templatelinks table but for the whole history of the wikis. That's why in ruwiki it's 168GB and bigger than all other tables of ruwiki combined. Most of these are not needed and should be just deleted.

Fixing this would drastically reduce size of s6 and possibly lots of other places.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

In ruwiki it's 1.9 billion rows. Jeez.

Out of 1859153710 rows, 1741198491 of them can be deleted. That's 94% of the table. Very likely only 11GB will remain.

I can write the patch that removes old entries of a page but it would bring down all of the wiki at this scale

Mentioned in SAL (#wikimedia-operations) [2021-08-19T15:50:04Z] <Amir1> test2wiki)> delete from flaggedtemplates where ft_rev_id not in (select fp_stable from flaggedpages); (T289249)

There is a PruneFRIncludeData script to clean these up. Probably could use a cron.

Mentioned in SAL (#wikimedia-operations) [2021-08-19T21:20:33Z] <Amir1> ladsgroup@mwmaint2002:~$ mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=huwiki --prune (T289249)

currently errors out:

ladsgroup@mwmaint2002:~$ mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=huwiki --prune
Pruning old flagged revision inclusion data...
...doing fp_page_id from 1 to 500
InvalidArgumentException from line 2583 of /srv/mediawiki/php-1.37.0-wmf.19/includes/libs/rdbms/database/Database.php: Wikimedia\Rdbms\Database::makeList: empty input for field ft_rev_id
#0 /srv/mediawiki/php-1.37.0-wmf.19/includes/libs/rdbms/database/Database.php(3565): Wikimedia\Rdbms\Database->makeList(Array, 1)
#1 /srv/mediawiki/php-1.37.0-wmf.19/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->delete('`flaggedtemplat...', Array, 'PruneFRIncludeD...')
#2 /srv/mediawiki/php-1.37.0-wmf.19/includes/libs/rdbms/database/DBConnRef.php(526): Wikimedia\Rdbms\DBConnRef->__call('delete', Array)
#3 /srv/mediawiki/php-1.37.0-wmf.19/extensions/FlaggedRevs/maintenance/pruneRevData.php(119): Wikimedia\Rdbms\DBConnRef->delete('flaggedtemplate...', Array, 'PruneFRIncludeD...')
#4 /srv/mediawiki/php-1.37.0-wmf.19/extensions/FlaggedRevs/maintenance/pruneRevData.php(37): PruneFRIncludeData->pruneFlaggedRevs('1', 1)
#5 /srv/mediawiki/php-1.37.0-wmf.19/maintenance/doMaintenance.php(108): PruneFRIncludeData->execute()
#6 /srv/mediawiki/php-1.37.0-wmf.19/extensions/FlaggedRevs/maintenance/pruneRevData.php(164): require_once('/srv/mediawiki/...')
#7 /srv/mediawiki/multiversion/MWScript.php(116): require_once('/srv/mediawiki/...')
#8 {main}
[83cf01f29d71f7e671f5ebe9] [no req]   Wikimedia\Rdbms\DBTransactionError: Explicit transaction still active. A caller may have caught an error. Open transactions: 
Backtrace:
from /srv/mediawiki/php-1.37.0-wmf.19/includes/libs/rdbms/database/Database.php(1622)
#0 /srv/mediawiki/php-1.37.0-wmf.19/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1753): Wikimedia\Rdbms\Database->assertNoOpenTransactions()
#1 /srv/mediawiki/php-1.37.0-wmf.19/includes/libs/rdbms/loadbalancer/LoadBalancer.php(2285): Wikimedia\Rdbms\LoadBalancer->Wikimedia\Rdbms\{closure}(Wikimedia\Rdbms\DatabaseMysqli)
#2 /srv/mediawiki/php-1.37.0-wmf.19/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1777): Wikimedia\Rdbms\LoadBalancer->forEachOpenMasterConnection(Closure)
#3 /srv/mediawiki/php-1.37.0-wmf.19/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1783): Wikimedia\Rdbms\LoadBalancer->approvePrimaryChanges(array, string, integer)
#4 /srv/mediawiki/php-1.37.0-wmf.19/includes/libs/rdbms/lbfactory/LBFactory.php(249): Wikimedia\Rdbms\LoadBalancer->approveMasterChanges(array, string, integer)
#5 /srv/mediawiki/php-1.37.0-wmf.19/includes/libs/rdbms/lbfactory/LBFactoryMulti.php(236): Wikimedia\Rdbms\LBFactory::Wikimedia\Rdbms\{closure}(Wikimedia\Rdbms\LoadBalancer, string, array)
#6 /srv/mediawiki/php-1.37.0-wmf.19/includes/libs/rdbms/lbfactory/LBFactory.php(251): Wikimedia\Rdbms\LBFactoryMulti->forEachLB(Closure, array)
#7 /srv/mediawiki/php-1.37.0-wmf.19/includes/libs/rdbms/lbfactory/LBFactory.php(310): Wikimedia\Rdbms\LBFactory->forEachLBCallMethod(string, array)
#8 /srv/mediawiki/php-1.37.0-wmf.19/maintenance/includes/Maintenance.php(1242): Wikimedia\Rdbms\LBFactory->commitMasterChanges(string)
#9 /srv/mediawiki/php-1.37.0-wmf.19/maintenance/doMaintenance.php(130): Maintenance->shutdown()
#10 /srv/mediawiki/php-1.37.0-wmf.19/extensions/FlaggedRevs/maintenance/pruneRevData.php(164): require_once(string)
#11 /srv/mediawiki/multiversion/MWScript.php(116): require_once(string)
#12 {main}

Change 714169 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/extensions/FlaggedRevs@master] Avoid calling delete() with empty arrays in PruneFRIncludeData

https://gerrit.wikimedia.org/r/714169

Change 714169 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@master] Avoid calling delete() with empty arrays in PruneFRIncludeData

https://gerrit.wikimedia.org/r/714169

Change 714177 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/FlaggedRevs@master] Add extra sleep option between each batch in pruneRevData.php

https://gerrit.wikimedia.org/r/714177

Change 714151 had a related patch set uploaded (by Ladsgroup; author: Aaron Schulz):

[mediawiki/extensions/FlaggedRevs@wmf/1.37.0-wmf.19] Avoid calling delete() with empty arrays in PruneFRIncludeData

https://gerrit.wikimedia.org/r/714151

Change 714151 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@wmf/1.37.0-wmf.19] Avoid calling delete() with empty arrays in PruneFRIncludeData

https://gerrit.wikimedia.org/r/714151

Mentioned in SAL (#wikimedia-operations) [2021-08-23T07:28:14Z] <ladsgroup@deploy1002> Synchronized php-1.37.0-wmf.19/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: [[gerrit:714151|Avoid calling delete() with empty arrays in PruneFRIncludeData (T289249)]] (duration: 00m 59s)

Change 714152 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/FlaggedRevs@wmf/1.37.0-wmf.19] Add extra sleep option between each batch in pruneRevData.php

https://gerrit.wikimedia.org/r/714152

Change 714177 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@master] Add extra sleep option between each batch in pruneRevData.php

https://gerrit.wikimedia.org/r/714177

Change 714152 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@wmf/1.37.0-wmf.19] Add extra sleep option between each batch in pruneRevData.php

https://gerrit.wikimedia.org/r/714152

Mentioned in SAL (#wikimedia-operations) [2021-08-23T09:56:40Z] <ladsgroup@deploy1002> Synchronized php-1.37.0-wmf.19/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: [[gerrit:714152|Add extra sleep option between each batch in pruneRevData.php (T289249)]] (duration: 00m 58s)

Per my IRC chat with Amir, this is the list of wikis with sorted by flaggedtemplates table size:

root@db1159.eqiad.wmnet[dbbackups]> select max(size), file_path FROM backup_files where file_name like '%flaggedtemplates%' and backup_id > 9000 GROUP BY file_path ORDER BY max(size) desc limit 100;
+-------------+---------------------------+
| max(size)   | file_path                 |
+-------------+---------------------------+
| 89481281536 | arwiki                    |
| 75040292864 | ruwiki                    |
| 55234789376 | dewiki                    |
| 33818673152 | plwiki                    |
| 23978835968 | ukwiki                    |
| 13723762688 | huwiki                    |
| 13274972160 | trwiki                    |
|  8514437120 | cewiki                    |
|  7759462400 | plwiktionary              |
|  7667187712 | idwiki                    |
|  6484393984 | dewiktionary              |
|  4366270464 | fiwiki                    |
|  3850371072 | ruwikinews                |
|  3649044480 | ruwiktionary              |
|  2332033024 | ruwikisource              |
|  2160066560 | bewiki                    |
|  1946157056 | mkwiki                    |
|  1807745024 | eowiki                    |
|  1400897536 | kawiki                    |
|  1312817152 | vecwiki                   |
|  1009754433 |                           |
|   918552576 | bswiki                    |
|   696254464 | plwikisource              |
|   369098752 | alswiki                   |
|   327155712 | sqwiki                    |
|   293601280 | ukwiktionary              |
|   255852544 | enwikibooks               |
|   218103808 | eswikinews                |
|   192937984 | frwikinews                |
|   155189248 | ptwikinews                |
|   134217728 | enwikinews                |
|    71303168 | iswiktionary              |
|    67108864 | ptwikisource              |
|    46137344 | hewikisource              |
|    31457280 | ruwikiquote               |
|    27262976 | cawikinews                |
|    13631488 | trwikiquote               |
|    11534336 | iawiki                    |
|    10485760 | fawikinews                |
|     9437184 | ptwikibooks               |
|     7340032 | tawikinews                |
|     7340032 | lawikisource              |
|     7340032 | zh_classicalwiki          |
|     7340032 | elwikinews                |
|     6291456 | test2wiki                 |
|     5242880 | hiwiki                    |
|     5242880 | flaggedrevs_labswikimedia |
|     5242880 | dewikiquote               |
|      163840 | bnwiki                    |
|      147456 | en_labswikimedia          |
|       90112 | de_labswikimedia          |
|       65536 | enwiki                    |
|       65536 | siwiki                    |
|       65536 | ckbwiki                   |
|       65536 | frwiki                    |
|       65536 | bawiki                    |
|       65536 | fawiki                    |
+-------------+---------------------------+
57 rows in set (28.824 sec)

Mentioned in SAL (#wikimedia-operations) [2021-08-24T08:51:53Z] <Amir1> start of mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=arwiki --prune --batch-size=5 --sleep=5 (T289249)

Mentioned in SAL (#wikimedia-operations) [2021-08-24T08:54:20Z] <Amir1> start of mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=dewiki --prune --batch-size=5 --sleep=5 (T289249)

Mentioned in SAL (#wikimedia-operations) [2021-08-26T14:24:14Z] <Amir1> start of mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=plwiki --prune --batch-size=10 --sleep=2 (T289249)

From that list:

Done:

  • huwiki
  • ruwiki
  • cewiki
  • plwiki
  • arwiki (s7)
  • plwiktionary (s3)
  • idwiki (s3)
  • trwiki (s2)
  • dewiktionary (s3)
  • ruwikinews (s3)
  • ruwiktionary (s3)
  • ruwikisource (s3)
  • fiwiki (s2)
  • ukwiki (s7)
  • dewiki (s5)
  • bewiki (s3)
  • mkwiki (s3)
  • eowiki (s3)
  • kawiki (s3)
  • vecwiki (s3)

Two stats:

  • arwiki's flaggedtemplates is bigger than the biggest table of enwiki 🤦‍♂️🤦‍♂️🤦‍♂️🤦‍♂️🤦‍♂️
  • Cleaning the ruwiki, caused the database backup size of ALL of s6 (ruwiki, frwiki, jawiki) to decrease by 7%

So this is cleaned everywhere, I would like to know the size of backups maybe next week.

This is done, I create a ticket to add this to a regular job.