Currently, only on each replica on s3 we have 9477 tables that are created because of secure poll elections. For example:
root@db1157:/srv/sqldata/fawikiquote# ls -Ssh *_edits.ibd 368K bv2015_edits.ibd 176K bv2022_edits.ibd 176K ucoc2023_edits.ibd 160K bv2021_edits.ibd 144K bv2017_edits.ibd 96K pif_edits.ibd 192K bv2011_edits.ibd 176K u4c2024_edits.ibd 176K ucoc_edits.ibd 160K mcdc2021_edits.ibd 128K bv2009_edits.ibd
This is still holding data for elections from 15 years ago and is adding a lot of maintenance burden (e.g. contributing to issues in T349360: Clean up dbbackups.backup_files table (backups tracking table growing really large), files opened by mariadb, inode table, etc.). It's hard to track down what these tables supposed to be (it took me a while to figure out what pif_edits is) and it's inefficient: The table from the 2009 election has been backed up thousands of times now.
And this will get worse every year with more elections and is not sustainable.
We either need to stop creating a new table for each election or phase out previous ones. If it's needed for legal reasons, we can create a swift container and dump them there.
Acceptance criteria
- Current workflow is to run multiple maintenance scripts in a certain order, and these maintenance scripts first write their data to the intermediate tables such as ucoc2023_edits, then a final maintenance scripts writes its results to centralauth db -> securepoll_lists. Rewrite the maintenance scripts to skip writing to intermediate tables such as ucoc2023_edits and to write directly to centralauth db -> securepoll_lists. List of maintenance scripts to re-write:
- TBD
Ideas for additional tickets
- Investigate if / how many years the data in the intermediate tables such as ucoc2023_edits needs to be kept (by consulting T&S, legal, DBAs, etc). Make a decision on which / how many of these tables to delete. Then delete them. Expected workflow impact from deleting these tables: zero impact on completed elections. The valuable data (final voter eligibility whitelist) seems to be stored in centralauth db -> securepoll_lists. These intermediate tables are just a middle step used to generate the final voter eligibility whitelist.
- T393247: SecurePoll: Drop old rows from the securepoll_lists table on WMF production - Investigate / decide if it's worth deleting a couple million rows of old voter eligibility data from securepoll_lists tables on non-centalauth wikis.