Page MenuHomePhabricator

Drop extensions from closed wikis where the database tables are unused
Open, Needs TriagePublic

Description

Summary

Closed wikis are unlikely to be re-opened and so keeping extensions which have no use on the wikis provides no extra value. Additionally, the extensions with database tables add to the issue of too many database tables in a cluster.

We can undeploy extensions where their database tables have no rows (or possibly where these rows provide no historical value)

Background

  • Extensions like AbuseFilter, CheckUser, FlaggedRevs, etc are only useful when the wiki is not closed
    • While in some cases they may have some data, they use database tables which are then likely empty or have no data which is useful to keep long term
  • There are issues that s3 is approaching too many database tables
    • Cutting the number of database tables would help push back this issue and would avoid the need to create a new section

Acceptance criteria

  • Explore which extensions add tables and are on closed wikis
  • Decide which of these extensions can be safely undeployed from which closed wikis
  • Undeploy the extensions from the closed wikis and drop the database tables once this is stably achieved

Event Timeline

Extensions which may be able to be dropped (these have database tables):

  • CheckUser - This will only log actions if stewards take them on the wiki, as editing is disabled (with stewards able to bypass if necessary). It seems unnecessary however for a steward to check one of these wikis
  • AbuseFilter - As editing is disabled, there is no need to keep AbuseFilter around. While the Special:AbuseLog page may have historical entries, it doesn't appear necessary to keep these entries around on wikis that have been closed for years wikis with no Special:AbuseLog entries don't need the extension installed, as there is no historical data to compare to
  • DiscussionTools - Only used for wikis that can be edited, so wikis that have been closed long enough are unlikely to need the extension
  • GlobalBlocking - Because editing and account creation are disabled, there should be no need to have GlobalBlocking global blocks applied on the wiki
  • MediaWiki-extensions-SecurePoll - Assuming no local elections ever existed, there should be no need for those wikis to have the tables or the interface to view elections
  • AntiSpoof - As account creation is disabled, there should be no need for it. However, MediaWiki-extensions-CentralAuth requires the extension and we would not be able to disable MediaWiki-extensions-CentralAuth on those wikis (so this probably has to be left)
  • BetaFeatures - Only useful for users who are actively using the wiki to edit, as experiments are often based around editing or interfaces which are only really used for a wiki where edits are allowed.
  • Wikifunctions - Client wikis where the database table is empty should have that extension uninstalled it seems no closed wikis have WikiLambda installed
  • ORES - Once a wiki has been closed for long enough, there shouldn't be a need to view ORES data for edits (as they should be stable enough to not need patrolling through a steward editing the wiki)
  • FlaggedRevs - Closed wikis cannot be edited, so there is no need for protections on pages and neither a system of reviewing edits it seems no closed wikis have flagged revs
from the task description

Closed wikis are unlikely to be re-opened and so keeping extensions which have no use on the wikis provides no extra value.

Maybe I'm misunderstanding something, but this sounds like quite a categorical statement that I'm not currently (personally) confident in. For example, AIUI, extensions can define their own log-types; the logs for which may not be as easily viewable onwiki if such an extension is just uninstalled without any mitigations (e.g. T89426: Define LiquidThreads logs config and i18n messages in WikimediaMessages ahead of uninstalling LiquidThreads for wikis that're having LQT undeployed). For this sort of example, undeploying such an extension seems like it'd in turn impact the ability to inspect the history of a now-closed wiki.

I'm therefore not currently confident that many extensions can be automatically determined to have no extra value simply because the wiki they're installed on is closed.

from the task description

Closed wikis are unlikely to be re-opened and so keeping extensions which have no use on the wikis provides no extra value.

Maybe I'm misunderstanding something, but this sounds like quite a categorical statement that I'm not currently (personally) confident in. For example, AIUI, extensions can define their own log-types; the logs for which may not be as easily viewable onwiki if such an extension is just uninstalled without any mitigations (e.g. T89426: Define LiquidThreads logs config and i18n messages in WikimediaMessages ahead of uninstalling LiquidThreads for wikis that're having LQT undeployed). For this sort of example, undeploying such an extension seems like it'd in turn impact the ability to inspect the history of a now-closed wiki.

I'm therefore not currently confident that many extensions can be automatically determined to have no extra value simply because the wiki they're installed on is closed.

For wikis closed for years, are we sure that they would ever be reopened / this kind of data be inspected? Perhaps it's better to limit this to only apply a year after their closure?

I've specifically raised extensions, like CheckUser, which are unlikely to have public logs which would be useful to inspect beyond the closure of the wiki

I've been discussing this with @Ladsgroup (DBA) who has been saying the number wikis on s3 is causing issues because the number of tables is reaching the maximum for the section (due to file handle limits). Dropping tables which on wikis where the wiki is not being used again and the tables are empty seems useful. The alternative would be to split the section (which brings extra database infra cost to handle)

For extensions which provide public logs, then it would make sense to add something to WikimediaMessages to handle these logs but we would still need to undeploy the extensions to be able to drop the database tables

The list above is just my first pass on this, I'll make sure that the extensions are fine to undeploy before actually doing anything and would intend to only delete the database tables once it's been confirmed it's fine to do. If you have specific extensions that shouldn't be undeployed, then please do raise them and I can drop them from the proposed list

I'd keep AbuseFilter, just because I think dropping historical data is scary and contrary to the wiki way. And, unlike the others, AbuseFilter stores canonical data rather than derived data, whereas for all of the others no data would actually be lost.

(FlaggedRevs technically stores canonical data but the logging table also stores a copy of it so I'm not worried there)

I'd keep AbuseFilter, just because I think dropping historical data is scary and contrary to the wiki way. And, unlike the others, AbuseFilter stores canonical data rather than derived data, whereas for all of the others no data would actually be lost.

Sure, though wikis with empty DB tables for AbuseFilter feel like not necessary to skip :D. Thanks for the comment

If the AbuseFilter db tables are all empty then feel free to drop them.

In T420052#11708969, @Dreamy_Jazz wrote [quoted in a mixed order]:

I've been discussing this with @Ladsgroup (DBA) who has been saying the number wikis on s3 is causing issues because the number of tables is reaching the maximum for the section (due to file handle limits).

Okay -- I will be honest that I don't think my brain actually took in the bit about database load written in the task description before I left my previous comment, even though it is quite clearly included within the description. I apologise for that.
However, I would still say that FWICS, the task description saying that these extensions provide 'no extra value' may be an oversimplification IMHO. IMO it would be better if - where e.g. removing an extension would include trade-offs - those proposed trade-offs are described, rather than stating in broad terms that the extensions just provide no extra value.

Dropping tables which on wikis where the wiki is not being used again and the tables are empty seems useful?

I'm not familiar with all the referenced extensions so I can't say for certain for all of them, but if the tables are empty then that seems potentially better on its face (I assume that, in that case, there is no historical data that would be permanently deleted). For extensions that e.g. modify the UX in some way (e.g. by defining an extra special page), it might mess a bit with the historical accuracy of visiting a closed wiki, as e.g. a special-page that would've been viewable when the wiki was open would no longer exist on the (currently-closed) wiki. But maybe that's e.g. an acceptable trade-off, I'm not sure.

For wikis closed for years, are we sure that they would ever be reopened? Perhaps it's better to limit this to only apply a year after their closure?

For extensions which provide public logs, then it would make sense to add something to WikimediaMessages to handle these logs but we would still need to undeploy the extensions to be able to drop the database tables

I guess I'm concerned about deleting data forever, even from wikis that've been closed for a while (and which may never be reopened). I suppose, for one reason, I worry in case it would be effectively tampering with the historical / preserved record of these wikis (I suppose, potentially similarly to @Pppery above). But I will let others opine :)

So, for AbuseFilter we have 113 closed wikis where they have no rows in abuse_filter_log. 30 of them have rows. Of the wikis that do have rows, there are some wikis where the abuse is from 2014.

Perhaps we can focus on dropping when the database tables have no rows for the time being? I feel like that will still have an impact

Dreamy_Jazz renamed this task from Drop extensions adding database tables which are unused on closed wikis to Drop extensions where database tables have no rows on closed wikis.Fri, Mar 13, 10:47 PM
Dreamy_Jazz updated the task description. (Show Details)

Change #1251582 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[operations/mediawiki-config@master] Uninstall AbuseFilter from closed wikis with no AbuseFilter logs

https://gerrit.wikimedia.org/r/1251582

Dreamy_Jazz renamed this task from Drop extensions where database tables have no rows on closed wikis to Drop extensions from closed wikis where the database tables are unused.Sat, Mar 14, 10:40 AM

Change #1251888 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[operations/mediawiki-config@master] DiscussionTools: Uninstall from closed wikis

https://gerrit.wikimedia.org/r/1251888

AntiSpoof is required by MediaWiki-extensions-CentralAuth, and my understanding is that it was done to avoid complications in MediaWiki-extensions-CentralAuth. Therefore, I'll skip that extension for now.

In T420052#11709099, @Dreamy_Jazz updated the task description (my added emphasis):

We can undeploy extensions where their database tables have no rows (or possibly where these rows provide no historical value)

I would personally just advise caution in general around making this sort of determination for a given set of data. I fear that it may be difficult to accurately determine in the present whether any historical records may/may not be likely to provide historical value to anyone in the future.


from the task description
  • There are issues that s3 is approaching too many database tables
    • Cutting the number of database tables would help push back this issue and would avoid the need to create a new section

Out of interest, is there a certain/rough number of tables that DBAs would like to reduce s3 by at least? Just in case there is an ideal sort of number that we can be aware of around this.

In T420052#11709099, @Dreamy_Jazz updated the task description (my added emphasis):

We can undeploy extensions where their database tables have no rows (or possibly where these rows provide no historical value)

I would personally just advise caution in general around making this sort of determination for a given set of data. I fear that it may be difficult to accurately determine in the present whether any historical records may/may not be likely to provide historical value to anyone in the future.

Sure. The only places I'm actively considering dropping any tables which are not empty are:

  1. CheckUser result tables (i.e. any checkuser table that is not cu_log)
  2. DiscussionTools tables on wikis that were closed before the permalink feature existed (per Matamarex's comments on gerrit)
  3. MediaWiki-extensions-SecurePoll tables (these only have global elections for all closed wikis, so this data is essentially a duplicate)

The rest are empty tables that would be dropped.

Change #1251888 merged by jenkins-bot:

[operations/mediawiki-config@master] DiscussionTools: Uninstall wikis closed before permalinks were deployed

https://gerrit.wikimedia.org/r/1251888

Mentioned in SAL (#wikimedia-operations) [2026-03-16T21:12:35Z] <dreamyjazz@deploy2002> Started scap sync-world: Backport for [[gerrit:1251848|Disable CheckUser on closed wikis where no checks were ever made (T420062)]], [[gerrit:1251865|Uninstall SecurePoll from closed wikis (T420062)]], [[gerrit:1251888|DiscussionTools: Uninstall wikis closed before permalinks were deployed (T420052)]]

Mentioned in SAL (#wikimedia-operations) [2026-03-16T21:14:20Z] <dreamyjazz@deploy2002> dreamyjazz: Backport for [[gerrit:1251848|Disable CheckUser on closed wikis where no checks were ever made (T420062)]], [[gerrit:1251865|Uninstall SecurePoll from closed wikis (T420062)]], [[gerrit:1251888|DiscussionTools: Uninstall wikis closed before permalinks were deployed (T420052)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified the

Mentioned in SAL (#wikimedia-operations) [2026-03-16T21:18:40Z] <dreamyjazz@deploy2002> Finished scap sync-world: Backport for [[gerrit:1251848|Disable CheckUser on closed wikis where no checks were ever made (T420062)]], [[gerrit:1251865|Uninstall SecurePoll from closed wikis (T420062)]], [[gerrit:1251888|DiscussionTools: Uninstall wikis closed before permalinks were deployed (T420052)]] (duration: 06m 10s)

Change #1254225 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[operations/puppet@production] maintenance: Disable scripts for closed wikis on various extensions

https://gerrit.wikimedia.org/r/1254225

Change #1254225 merged by JHathaway:

[operations/puppet@production] mw::maintenance: Disable scripts for closed wikis on various extensions

https://gerrit.wikimedia.org/r/1254225

DiscussionTools also modified the appearance of pages, potentially making archives less readable. But this is a pretty minor issue.