Page MenuHomePhabricator

For global elections, SecurePoll creates an eligible voters table for each election on every wiki and keeps it forever
Open, Needs TriagePublic

Description

Currently, only on each replica on s3 we have 9477 tables that are created because of secure poll elections. For example:

root@db1157:/srv/sqldata/fawikiquote# ls -Ssh *_edits.ibd 
368K bv2015_edits.ibd  176K bv2022_edits.ibd   176K ucoc2023_edits.ibd	160K bv2021_edits.ibd	 144K bv2017_edits.ibd	 96K pif_edits.ibd
192K bv2011_edits.ibd  176K u4c2024_edits.ibd  176K ucoc_edits.ibd	160K mcdc2021_edits.ibd  128K bv2009_edits.ibd

This is still holding data for elections from 15 years ago and is adding a lot of maintenance burden (e.g. contributing to issues in T349360: Clean up dbbackups.backup_files table (backups tracking table growing really large), files opened by mariadb, inode table, etc.). It's hard to track down what these tables supposed to be (it took me a while to figure out what pif_edits is) and it's inefficient: The table from the 2009 election has been backed up thousands of times now.

And this will get worse every year with more elections and is not sustainable.

We either need to stop creating a new table for each election or phase out previous ones. If it's needed for legal reasons, we can create a swift container and dump them there.

Event Timeline

I spoke with Legal and it doesn't seem that they would have a problem with deleting these old tables.

It might be worth keeping the last X years of tables just as a failsafe (since compiling them is a pain) but the ones from 2017 and earlier don't need to stick around and create additional debt.

Where in the code are these tables created? I don't recall the core code needing to create separate tables for elections. I know WMF has occasionally created tables (see ./cli/wm-scripts) but those are not for every single election either. And at least the present code (same link) only relates to 2022 and later.

I do see them being removed: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/SecurePoll/+/750803

Obviously, they are still accessible in VCS and can be restored and run again if needed.

Mentioned in SAL (#wikimedia-operations) [2024-01-23T11:11:53Z] <Amir1> dropping pif_edits table from all wikis (T355594)

Mentioned in SAL (#wikimedia-operations) [2024-01-23T11:22:49Z] <Amir1> dropping bv2011_edits table from all wikis (T355594)

Mentioned in SAL (#wikimedia-operations) [2024-01-23T12:02:59Z] <Amir1> dropping bv2009_edits table from all wikis (T355594)

Mentioned in SAL (#wikimedia-operations) [2024-01-23T12:13:35Z] <Amir1> dropping bv2015_edits table from all wikis (T355594)

Now number of securepolls tables have been dropped to 5479. Roughly a 4,000 tables redaction.

Where in the code are these tables created? I don't recall the core code needing to create separate tables for elections. I know WMF has occasionally created tables (see ./cli/wm-scripts) but those are not for every single election either. And at least the present code (same link) only relates to 2022 and later.

Wikitech documentation details the process: https://wikitech.wikimedia.org/wiki/SecurePoll

For a global election we need to ensure voters are eligible no matter which project they made their edits on. Since that could be hundreds of separate projects (in theory), we need to make these tables on every wiki. It is far from an ideal solution of course but that's the reasoning for it right now.

We reduced a bit of maintenance debt, for future devs need to find a better solution to this.

Has a "one set of tables per wiki instead of per election" approach been considered? We could add an election_id column to each table. I think this would be much more scalable in the long run than a separate set of tables for each election.

It's because the columns depends on the election. You might have a different criteria that could not be fit inside the table.

OTOH, it can turn into a json blob instead (depends on the query patterns but I highly doubt you'd need to query anything beside looking up values of a given user)

Or it could be pivoted into a long format (one row for each property, as opposed to one column for each property).

Can we at least prefix each table with securepoll_ or whatever (in future) so they're more visibly grouped together in reports?

Can we at least prefix each table with securepoll_ or whatever (in future) so they're more visibly grouped together in reports?

We can update the technical docs to advise that; I've boldly done so on Wikitech.

As for the task itself I briefly spoke with @Ladsgroup and I think generally we can just drop these tables after some arbitrary time after the election is over (and the people elected are seated, etc., just in case we need to re-tally). He mentioned a year, and I think that's fine, since we can just re-run the scripts if necessary.

Can we at least prefix each table with securepoll_ or whatever (in future) so they're more visibly grouped together in reports?

We can update the technical docs to advise that; I've boldly done so on Wikitech.

Thanks, that'd be great!

As for the task itself I briefly spoke with @Ladsgroup and I think generally we can just drop these tables after some arbitrary time after the election is over (and the people elected are seated, etc., just in case we need to re-tally). He mentioned a year, and I think that's fine, since we can just re-run the scripts if necessary.

Ack. +1

Thank you, I turn on the chainsaw now.

Novem_Linguae renamed this task from SecurePoll creates a table for each election and keeps it forever to For global elections, SecurePoll creates an eligible voters table for each election on every wiki and keeps it forever.Oct 29 2024, 10:09 AM