Page MenuHomePhabricator

Investigate if some SecurePoll SQL tables can be merged
Open, Needs TriagePublic

Description

Why

DBAs are indicating that SecurePoll having 12 SQL tables per wiki as a default installation is not ideal, and that less tables would be better.

What

The current SecurePoll SQL table schema, without including the global election voter list tables, is 12 tables. This ticket is to investigate if some of these 12 tables can be merged into some of these other 12 tables, which should reduce the overall table count.

In particular, securepoll_msgs, securepoll_questions and securepoll_cookie_match may be worth further investigation.

image.png (394×257 px, 16 KB)

Notes

  • SecurePoll installs 12 tables on all 1000 wikis, so creates more of a scalability problem than, say, PageTriage, which only installs 4 tables on a couple wikis.

Original comment in other ticket

In s3 currently there ~190,000 files that need to be opened by mariadb and around ~30,000 files are just securepoll:

For example this is hywiki:

root@db1212:/srv/sqldata/hywiki# ls | grep -i securepoll
securepoll_cookie_match.frm
securepoll_cookie_match.ibd
securepoll_elections.frm
securepoll_elections.ibd
securepoll_entity.frm
securepoll_entity.ibd
securepoll_lists.frm
securepoll_lists.ibd
securepoll_msgs.frm
securepoll_msgs.ibd
securepoll_options.frm
securepoll_options.ibd
securepoll_properties.frm
securepoll_properties.ibd
securepoll_questions.frm
securepoll_questions.ibd
securepoll_strike.frm
securepoll_strike.ibd
securepoll_u4c2025_edits.frm
securepoll_u4c2025_edits.ibd
securepoll_ucocar2025_edits.frm
securepoll_ucocar2025_edits.ibd
securepoll_voters.frm
securepoll_voters.ibd
securepoll_votes.frm
securepoll_votes.ibd

(plus some other tables like bv2024_edits.frm and wmcr2024_edits.frm)

Can you at least drop some of these tables? Do we really need securepoll_msgs, securepoll_questions or securepoll_cookie_match?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
  • securepoll_cookie_match is used on the vote details page to display the securepoll-cookie-dup-list message. Do the scrutineers actually rely on this?
  • securepoll_msgs is used for translation of vote options. That entire system could probably be rewritten to read from a wiki page instead.
  • securepoll_questions is where the actual questions voters see are stored, which is probably needed and I can't think of anywhere to merge it.

A better idea is probably to migrate SecurePoll logging to use Special:Log, which would obsolete the securepoll_log and securepoll_strike tables and obsoleting even more of them.

And then there's the even more radical idea of declaring the SecurePoll namespace rather than the DB tables the canonical form of an election.

  • securepoll_msgs is used for translation of vote options. That entire system could probably be rewritten to read from a wiki page instead.

And then there's the even more radical idea of declaring the SecurePoll namespace rather than the DB tables the canonical form of an election.

$wgSecurePollUseNamespace does exist, but is turned off by default, and has been buggy on localhost in the past (T378322). We may already have this feature, but I have not wrapped my head around it yet. I suppose it is worth investigating turning this on by default and converting everything to this if it allows us to drop a table.

@Ladsgroup Can we instead resolve the underlying issue by moving tables to the extension cluster? SecurePoll doesn't seem that important in the scheme of things to be in the primary cluster. Except for securepoll_lists (which needs to be joined with the user table to render the voter eligibility page), none of the other tables require any joins with core tables.

Merging tables will make the code hackier. The current schema design is very clean and heavily normalized. Even for T387701, we should ideally use yet another table as as the alternative seemingly involves fitting multiple 300 kb blobs into a single db field.

@Ladsgroup Can we instead resolve the underlying issue by moving tables to the extension cluster? SecurePoll doesn't seem that important in the scheme of things to be in the primary cluster. Except for securepoll_lists (which needs to be joined with the user table to render the voter eligibility page), none of the other tables require any joins with core tables.

As far as DBA side of things is concerned, as long as number of tables on s3 is reduced, It doesn't matter to me how (I actually suggested migration to x1 too). If you manage to bring down the number by sacrificing a newborn every full moon, I'm not here to judge.

That being said, I personally (which please feel free to ignore, this is my personal opinion) like T395928: Delete non-essential SecurePoll tables +90% of wikis are not going to ever hold a secure poll election so the actual useless stuff is gone, if a wiki wants to hold a local election for the first time, we just create the tables as needed.

Merging tables will make the code hackier. The current schema design is very clean and heavily normalized. Even for T387701, we should ideally use yet another table as as the alternative seemingly involves fitting multiple 300 kb blobs into a single db field.

As long the 300kb is not retrieved a lot, it's fine. Our ES blobs are much larger.

I took a look at the SQL tables yesterday by creating a localhost election and peeking at the data in HeidiSQL. I documented each table at https://www.mediawiki.org/wiki/Extension:SecurePoll#SQL_tables.

Two ideas jumped out at me:

  • Merge securepoll_entity into securepoll_msgs. They appear to map 1:1. It would just involve moving one column (en_type).
  • The name of the election is stored in two places: securepoll_elections and securepoll_msgs. This is not ideal because it'd be easy for these to get out of sync if a patch writer forgot to update both places. I think we should get rid of securepoll_entity -> en_type = election (which maps to a string in securepoll_msgs), and just use securepoll_elections -> el_title as the source of truth for the election name.

That being said, I personally (which please feel free to ignore, this is my personal opinion) like T395928: Delete non-essential SecurePoll tables +90% of wikis are not going to ever hold a secure poll election so the actual useless stuff is gone, if a wiki wants to hold a local election for the first time, we just create the tables as needed.

That sounds the most promising.

  • Merge securepoll_entity into securepoll_msgs. They appear to map 1:1. It would just involve moving one column (en_type).

It's a 1:many mapping. securepoll_entity represents an entity, securepoll_msgs stores the name of the entity in different languages.

  • The name of the election is stored in two places: securepoll_elections and securepoll_msgs. This is not ideal because it'd be easy for these to get out of sync if a patch writer forgot to update both places. I think we should get rid of securepoll_entity -> en_type = election (which maps to a string in securepoll_msgs), and just use securepoll_elections -> el_title as the source of truth for the election name.

el_title stores the election name in the original language. The msgs table can store it in other languages. I think the voters are shown the localised name from msgs. There's a likely a duplication of the name in the wiki content language, but at max you could optimize away el_title and do a join with the msgs table at every place where it's used (quite messy). Either way it doesn't help reduce the number of tables.

And then there's the even more radical idea of declaring the SecurePoll namespace rather than the DB tables the canonical form of an election.

The namespace uses multiple pages (one for ids and one per language for strings). I'm not sure if there's a way to update multiple pages transactionally.

One way to remove a table could be to do away with securepoll_msgs and store the values like regular i18n messages in MediaWiki: namespace, establishing some naming convention for messages, say securepoll-election-<electionid>-<questionid>-<optionid>. Seems like a pretty big refactor.