Page MenuHomePhabricator

flaggedtemplates table is still too big
Closed, ResolvedPublic

Description

T289249: flaggedtemplates table should not keep the whole history of all revisions is done now but still flaggedtemplates table in arwiki (for example) is bigger than revision table of enwiki or wikidatawiki.

Two (mutually inclusive) actions that can be taken:

  • The current pruning script, deletes data on revisions that have 50 newer revision in a page (and are older than an hour). That is not the case for most revisions because they are spread among a lot of pages.
    • Make the script reduce that number to five revisions (instead of 50) for revisions older than a year.
  • Drop ft_namespace and ft_title columns.
    • the table mirrors templatelinks table (by picking up its worst parts), for templatelinks it makes sense to keep title instead of page_id or rev_id but for flaggedtemplates, we are keeping track of stable revisions of transcluded templates. That concept is useless here as a non-existent template doesn't have a stable version.
  • Fully truncate this table in ruwikisource (not needed after T226054)

Event Timeline

Marostegui moved this task from Triage to Refine on the DBA board.
Marostegui subscribed.

I would go for dropping those columns if they are not really needed.

Change 742170 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/FlaggedRevs@master] Drop using ft_title and ft_namespace

https://gerrit.wikimedia.org/r/742170

I would go for dropping those columns if they are not really needed.

I managed to make it work like this locally.

Change 742170 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@master] Drop using ft_title and ft_namespace

https://gerrit.wikimedia.org/r/742170

Change 742853 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/FlaggedRevs@wmf/1.38.0-wmf.9] Drop using ft_title and ft_namespace

https://gerrit.wikimedia.org/r/742853

Change 742853 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@wmf/1.38.0-wmf.9] Drop using ft_title and ft_namespace

https://gerrit.wikimedia.org/r/742853

Mentioned in SAL (#wikimedia-operations) [2021-12-01T09:39:16Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.9/extensions/FlaggedRevs/backend/FlaggedRevision.php: Backport: [[gerrit:742853|Drop using ft_title and ft_namespace (T296380)]] (duration: 00m 56s)

So far it looks good and there is nothing slow queries or massive increase in rows read, etc.

If there is no complaints by the next week, I move on to remove the columns from the database.

Change 743403 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/FlaggedRevs@master] Drop ft_title and ft_namespace from flaggedtemplates table

https://gerrit.wikimedia.org/r/743403

Change 743403 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@master] Drop ft_title and ft_namespace from flaggedtemplates table

https://gerrit.wikimedia.org/r/743403

Change 745325 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/FlaggedRevs@master] Change logic of pruneChange to delete old revisions and not buried ones

https://gerrit.wikimedia.org/r/745325

Change 745325 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@master] Change logic of pruneChange to allow deleting rows more flexibly

https://gerrit.wikimedia.org/r/745325

Change 745378 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/FlaggedRevs@wmf/1.38.0-wmf.12] Change logic of pruneChange to allow deleting rows more flexibly

https://gerrit.wikimedia.org/r/745378

Change 745529 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/FlaggedRevs@master] Fix the mistake in passing parameter

https://gerrit.wikimedia.org/r/745529

Change 745378 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@wmf/1.38.0-wmf.12] Change logic of pruneChange to allow deleting rows more flexibly

https://gerrit.wikimedia.org/r/745378

Mentioned in SAL (#wikimedia-operations) [2021-12-09T12:57:10Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.12/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: [[gerrit:745378|Change logic of pruneChange to allow deleting rows more flexibly (T296380)]] (duration: 01m 05s)

Change 745529 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@master] Fix the mistake in passing parameter

https://gerrit.wikimedia.org/r/745529

Change 745383 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/FlaggedRevs@wmf/1.38.0-wmf.12] Fix the mistake in passing parameter

https://gerrit.wikimedia.org/r/745383

Change 745383 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@wmf/1.38.0-wmf.12] Fix the mistake in passing parameter

https://gerrit.wikimedia.org/r/745383

Mentioned in SAL (#wikimedia-operations) [2021-12-09T22:05:55Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.12/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: [[gerrit:745383|Fix the mistake in passing parameter (T296380)]] (duration: 02m 11s)

I'm started to delete information on rows older than a year and having 5 reviewed revisions on top. It's gonna take a while.

Change 745883 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/FlaggedRevs@wmf/1.38.0-wmf.9] Change logic of pruneChange to allow deleting rows more flexibly

https://gerrit.wikimedia.org/r/745883

Change 745883 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@wmf/1.38.0-wmf.9] Change logic of pruneChange to allow deleting rows more flexibly

https://gerrit.wikimedia.org/r/745883

Change 745884 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/FlaggedRevs@wmf/1.38.0-wmf.9] Fix the mistake in passing parameter

https://gerrit.wikimedia.org/r/745884

Change 745884 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@wmf/1.38.0-wmf.9] Fix the mistake in passing parameter

https://gerrit.wikimedia.org/r/745884

Mentioned in SAL (#wikimedia-operations) [2021-12-13T07:35:44Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.9/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: [[gerrit:745883|Change logic of pruneChange to allow deleting rows more flexibly (T296380)]] (duration: 00m 57s)

Mentioned in SAL (#wikimedia-operations) [2021-12-13T07:39:59Z] <Amir1> start of clean up of flaggedtemplates on all flaggedrevs wikis: T296380

The clean up will take a month or so but since it was done in arwiki:

root@db2118:/srv/sqldata/arwiki# ls -Ssh
total 197G
 63G flaggedtemplates.ibd

After optimization:

root@db2118:/srv/sqldata/arwiki# ls -Ssh
total 157G
 27G pagelinks.ibd
 25G templatelinks.ibd
 22G flaggedtemplates.ibd

Before optimization it was the biggest table of s7.

We should truncate this table in ruwikisource: T226054

The clean up is still onging, gonna take a while.

Change 762097 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/FlaggedRevs@master] Clean up flaggedtemplate rows for deleted pages too

https://gerrit.wikimedia.org/r/762097

Change 762097 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@master] Clean up flaggedtemplate rows for deleted pages too

https://gerrit.wikimedia.org/r/762097

Change 762911 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/FlaggedRevs@wmf/1.38.0-wmf.20] Clean up flaggedtemplate rows for deleted pages too

https://gerrit.wikimedia.org/r/762911

Change 762911 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@wmf/1.38.0-wmf.20] Clean up flaggedtemplate rows for deleted pages too

https://gerrit.wikimedia.org/r/762911

Change 762912 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/FlaggedRevs@wmf/1.38.0-wmf.21] Clean up flaggedtemplate rows for deleted pages too

https://gerrit.wikimedia.org/r/762912

Change 762912 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@wmf/1.38.0-wmf.21] Clean up flaggedtemplate rows for deleted pages too

https://gerrit.wikimedia.org/r/762912

Mentioned in SAL (#wikimedia-operations) [2022-02-16T06:26:53Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.21/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: [[gerrit:762912|Clean up flaggedtemplate rows for deleted pages too (T296380)]] (duration: 00m 52s)

Thanks. I continue doing the clean up.

Change 819111 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/FlaggedRevs@master] pruneRevData: Make cleaning in larger batches

https://gerrit.wikimedia.org/r/819111

Change 819111 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@master] pruneRevData: Make cleaning in larger batches

https://gerrit.wikimedia.org/r/819111

Change 819077 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/FlaggedRevs@wmf/1.39.0-wmf.22] pruneRevData: Make cleaning in larger batches

https://gerrit.wikimedia.org/r/819077

Change 819077 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@wmf/1.39.0-wmf.22] pruneRevData: Make cleaning in larger batches

https://gerrit.wikimedia.org/r/819077

Mentioned in SAL (#wikimedia-operations) [2022-08-02T06:56:24Z] <ladsgroup@deploy1002> Synchronized php-1.39.0-wmf.22/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: [[gerrit:819077|pruneRevData: Make cleaning in larger batches (T296380)]] (duration: 03m 26s)

After many many months of running this and many optimizations on the way, I actually finally managed to finish running this everywhere. In just one run it dropped 1B rows from ruwiki. This number used to mean something. Anyway, we need a long-term solution but this immediate problem is addressed now.