Page MenuHomePhabricator

Seek DBA approval for new logging table in SecurePoll
Closed, ResolvedPublic3 Estimated Story PointsJan 12 2021

Description

What will be logged

When election admins access voter data (T270342):

  • Who accessed the voter data
  • Which election
  • Timestamp

When admins are added to or removed from an election (T270313):

  • Which election was the new admin added/removed for
  • Who added the new admin
  • Username for the new admin
  • Which action was performed (addition/removal/something else?)
  • Timestamp for this action

Proposed table structure
Table:
securepoll_log

Columns:
spl_id (primary key)
spl_timestamp
spl_election_id (securepoll_elections.el_entity)
spl_user (user ID of admin)
spl_type (type of action logged, e.g. adding an admin)
spl_target (user ID of admin target, for adding and removing admins only)

Ideally we'd be ale to filter by election, performer, type, target and timestamp (T271268#6730209). I expect we'll want to sort by timestamp, so we'll need an index for that. Do we also need an index for each field that we filter by? (If so, do we need choose fewer fields that can be filtered?)

Usage expectations

We expect this table to be small, given the current use of SecurePoll. If SecurePoll is used for more types of elections in the future, the number of elections per year would increase. Everything below is based on the SecurePoll's current useage.

Assumptions (erring on overestimating):

  • 2-3 elections a year (based on https://vote.wikimedia.org/w/index.php?title=Special%3ASecurePoll)
  • 20 admins per election
  • each admin gets added and removed once
  • each admin looks up data 10 times
  • the logging feature will only be enabled on votewiki, so the table will only be on votewiki (and beta wikis, testwiki)
  • the table contains no private data, so can be public

Expectations for the table based on these assumptions (questions taken from T260372#6482475):

  • Size of the table (number of rows expected): <250 per election
  • Expected growth per year (number of rows): hundreds
  • Expected writes to the table (per minute, per hour...per day, any of those are ok): <10 per day
  • Expected amount of reads: <10 per day
  • Can this table be public or private (so we know if it can be replicated to our public cloud infra or it needs to be filtered): public
  • The release plan for the feature (are there specific wikis you'd like to test first etc): Test on the beta cluster first, followed by testwiki and then the rest of the wikis.

Related investigations
T270313: Investigate work involved in adding log for election admins for SecurePoll [8 hours]
T270342: Investigate work involved in adding log for scrutiny of voter data [8 hours]

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptJan 5 2021, 9:54 PM
ARamirez_WMF set the point value for this task to 3.Jan 6 2021, 5:19 PM
Niharika triaged this task as High priority.Jan 6 2021, 5:25 PM
ARamirez_WMF changed the subtype of this task from "Task" to "Deadline".

@Tchanders -

Do we need to record which users' data was seen? - Ok without the full list of users, more than happy to proceed with the Election ID which we can then use to lookup voters should we need to.

What filters and sort orders do we expect to need? - Filter on all except the primary key

Usage Expectations - Do these assumptions look correct? - Yes, we don't anticipate any huge surge in usage post launch for at least 2 quarters

Thanks @drochford

The release plan for the feature (are there specific wikis you'd like to test first etc)

@Niharika I think this is the only outstanding question - which wikis do we expect the table to be enabled for?

Thanks @drochford

The release plan for the feature (are there specific wikis you'd like to test first etc)

@Niharika I think this is the only outstanding question - which wikis do we expect the table to be enabled for?

I updated the task description to answer this. I propose we test on beta cluster first (behind a feature flag) where @jrbs and @drochford can help test it for us and then testwiki (first production cluster wiki) and once we think it works well we can deploy to all other wikis. @Tchanders what do you think of this?

Change 655133 had a related patch set uploaded (by Tchanders; owner: Tchanders):
[mediawiki/extensions/SecurePoll@master] WIP Add securepoll_log table for logging actions performed by admins

https://gerrit.wikimedia.org/r/655133

I updated the task description to answer this. I propose we test on beta cluster first (behind a feature flag) where @jrbs and @drochford can help test it for us and then testwiki (first production cluster wiki) and once we think it works well we can deploy to all other wikis. @Tchanders what do you think of this?

That sounds fine to me - thanks @Niharika

Tchanders renamed this task from [WIP] Seek DBA approval for new logging table(s) in SecurePoll to Seek DBA approval for new logging table(s) in SecurePoll.Jan 11 2021, 3:54 PM
Tchanders renamed this task from Seek DBA approval for new logging table(s) in SecurePoll to Seek DBA approval for new logging table in SecurePoll.

Tagging DBA team. We'd be grateful to hear your comments on our proposed new table for SecurePoll, outlined in the task description (and in more detail in the above patch). Thank you!

Adding a few folks, as this was mentioned on weekly Scrum of Scrums, for increased visibillity. @LSobanski Is Data-Persistence (work done) adequate here?

@Tchanders Note the DBA tag wasn't really added to the task, only to your comment, I happened to run into the task by pure accident. Adding at least the tag or relevant people is suggested.

I will comment on the patchset. As Jaime mentioned, if you want us to be notified please add the tag as writing the tag on the comment won't unfortunately ping us.

Thanks @jcrespo and @Marostegui - will make a note to add the tag next time!

I should add that we've contacted you early in our process, so we're not right up against a deadline or anything.

Ah cool, we appreciate that. The priority set as High was a bit misleading :-)

Ah cool, we appreciate that. The priority set as High was a bit misleading :-)

@Niharika Should we lower it?

@Marostegui @Ladsgroup Thanks for the reviews.

I'm trying to understand the next steps for deployment after agreeing the table structure. I've found https://wikitech.wikimedia.org/wiki/Creating_new_tables and a recent task that added a table in another extension (T261410) which seem to suggest we need to update and run createExtensionTables.php, but I'm struggling to locate any detailed instructions. Would you be able to help us or point us in the right direction?

Depends on where do you want to add the table. if it's going to be everywhere. that'd be in createExtensionTables in github.com/wikimedia/mediawiki-extensions-WikimediaMaintenance/ and then run it using forwikiineachdblist in production but if it's going to be only in votewiki, you can run it directly using sql votewiki --write and pasting the sql.

Thanks @Ladsgroup.

@Marostegui Is our new table considered approved at this stage? If so, do we need to do any more steps before we add the table, like informing anyone of our intention to add the table or scheduling when we add it?

After the changes discussed on the patchset, I am fine with that.
I am not sure how the table creation process goes, but I would guess you'd probably need to get it done via a normal deployment window, you might want to talk to Release-Engineering-Team about it.

@Marostegui @thcipriani Hey y'all, we're receiving conflicting messages about whom to ask about the process for creating a new table for an extension. We're being told DBAs and RelEng. Who can help us here? Thanks in advance!

It's not conflicting. Every new table in production needs to be signed off by DBAs and once it's signed off, anyone with production access can create it (and notify DBAs for replication to labs).

@Marostegui @thcipriani Hey y'all, we're receiving conflicting messages about whom to ask about the process for creating a new table for an extension. We're being told DBAs and RelEng. Who can help us here? Thanks in advance!

DBAs would be the right folks to talk to to approve tables/table changes. After that, anyone with production access can create tables using: https://wikitech.wikimedia.org/wiki/Creating_new_tables

Release-Engineering-Team doesn't usually create tables and isn't needed. If you think table change needs to happen outside of other changes being deployed you can add a work window to the deployment calendar: https://wikitech.wikimedia.org/wiki/Deployments although that's not typical either.

Same thing in other words, DBAs ask to be notified in advance for performance, security, filtering and operational reasons (*what* new tables are deployed), but don't gatekeep when/how it is deployed.

This is in contrast with schema changes, where we gatekeep everything as those are dangerous actions that affect live queries.

This is documented at:

Probably the confusion was on the different workflow of new table creation vs schema changes.

Thanks everyone.

After that, anyone with production access can create tables

This is what we were missing, as a point of process. I've attempted to document this: https://wikitech.wikimedia.org/w/index.php?title=Creating_new_tables&type=revision&diff=1895054&oldid=1846803 - please update if I got anything wrong.

Probably the confusion was on the different workflow of new table creation vs schema changes.

I don't think this was a problem - that documentation seemed clear to me.

Some section like https://wikitech.wikimedia.org/wiki/Schema_changes#Workflow_of_a_schema_change on the Creating new tables page would be useful. I can try to update it retrospectively with what AHT ended up doing this time, but it might be more useful coming from someone who is more in the know.

Change 655133 merged by jenkins-bot:
[mediawiki/extensions/SecurePoll@master] Add securepoll_log table for logging actions performed by admins

https://gerrit.wikimedia.org/r/655133

Thanks everybody for helping with this, and @Ladsgroup for helping us deploy.