Page MenuHomePhabricator

Create oathauth_types and oathauth_devices tables
Closed, ResolvedPublic

Description

Patch (schema already reviewed by Amir): https://gerrit.wikimedia.org/r/c/mediawiki/extensions/OATHAuth/+/873892

Following https://wikitech.wikimedia.org/wiki/Creating_new_tables:

  • Should this table be replicated to wiki replicas (does it not contain private data)? oathauth_devices is private. oathauth_types technically is not, although I don't see any value from replicating it without _devices.
  • Will you be doing cross-joins with the wiki metadata? Not sure what this means. No cross-wiki joins will be performed.
  • Size of the table (number of rows expected). Initially in the same region as the existing oathauth_users table (which is being replaced).
  • Expected growth per year (number of rows). See answer to last question.
  • Expected amount of queries, both writes and reads (per minute, per hour...per day, any of those are ok). This table is queried with each login attempt with a login attempt where a correct password was provided. This dashboard suggests that happens roughly 100-200 times per second. Writes happens when someone sets up or disables two-factor authentication. Again, these are replacing the existing oathauth_users table with similar query patterns.
  • Examples of queries that will be using the table.
    • SELECT oad_data,oat_name FROM oathauth_devices JOIN oathauth_types ON ((oat_id = oad_type)) WHERE oad_user = 123456
  • The release plan for the feature (are there specific wikis you'd like to test first etc). These tables are shared for all public wikis, so testing on a single wiki is not unfortunately possible.

Event Timeline

Ladsgroup moved this task from Triage to Blocked on the DBA board.EditedFeb 26 2023, 3:14 AM
Ladsgroup subscribed.

We usually proceed to deployment once the patch for schema is merged and deployed (mostly to give time in case it gets reverted and such)

It looks like that it's better to simply add both to the private tables, this extension is about authentication, it doesn't make sense to make any of its data public.

I had a bit of confusion that 100 reads per sec is too much for people with 2FA but it needs to look up if there is an 2FA for that user exists meaning it does a read on every login while it'll use the data maybe at most 1% of the time. The total number of login probably can be reduced though. The usual contributors are some stray bots logging in before every action, last time I asked a bot operator to fix their bot that was responsible for 40% of all logins. You can track them down in logstash (by building a dashboard of top "name" in goodpass channel) if you feel like it. Not a blocker, just an anecdote.

Since it has index on user, we should be fine for basically all reads and the table is also quite small given its nature.

Quick q: Where is oathauth_users atm? x1 or centralauth?

To recap: Wait for the patch to be merged and deployed, then add the tables to list of private tables in puppet (and make sure it's good), then it has the sign off from DBA to create the tables (you can do it yourself).

Change 892369 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] realm.pp: Add private tables

https://gerrit.wikimedia.org/r/892369

I have created https://gerrit.wikimedia.org/r/892369 for both tables. This needs to be merged and sanitarium hosts restarted before creating the tables (if they go to sX)

Change 892369 merged by Marostegui:

[operations/puppet@production] realm.pp: Add private tables

https://gerrit.wikimedia.org/r/892369

Mentioned in SAL (#wikimedia-operations) [2023-02-27T10:26:12Z] <marostegui> Restart codfw sanitarium hosts T330502

Mentioned in SAL (#wikimedia-operations) [2023-02-27T10:32:24Z] <marostegui> Restart eqiad sanitarium hosts T330502

I have restarted sanitarium hosts on both DCs. So these tables can be created whenever it is convenient as they won't be replicated to clouddb* (if they go to x1 that's of course not even a question)

I had a bit of confusion that 100 reads per sec is too much for people with 2FA but it needs to look up if there is an 2FA for that user exists meaning it does a read on every login while it'll use the data maybe at most 1% of the time. The total number of login probably can be reduced though. The usual contributors are some stray bots logging in before every action, last time I asked a bot operator to fix their bot that was responsible for 40% of all logins. You can track them down in logstash (by building a dashboard of top "name" in goodpass channel) if you feel like it. Not a blocker, just an anecdote.

Yeah, and there's some caching going on as well. So the 100/s is more likely the worst case estimate and the actual number is smaller.

Quick q: Where is oathauth_users atm? x1 or centralauth?

Centralauth.

To recap: Wait for the patch to be merged and deployed, then add the tables to list of private tables in puppet (and make sure it's good), then it has the sign off from DBA to create the tables (you can do it yourself).

Ok, thank you!

I'm going to close this task as everything from the DBA side was done already and the patch was merged. I'm holding the table creation until the train with the new code is rolling to be 100% sure any errors would fail loudly instead of silently. Probably overkill, but you can never be too careful when dealing with authentication stuff :-P. Anyways, I'll be tracking the rest of the migration steps in T242031: Allow multiple different 2FA devices so it makes sense to also track the table creation there instead of splitting that to this one. Thanks everyone for the help so far!