Page MenuHomePhabricator

Review request for a new database table for OAuthRateLimiter
Closed, ResolvedPublic

Description

According to https://wikitech.wikimedia.org/wiki/Schema_changes#What_is_not_a_schema_change we should block on DBA review when creating new tables.

Dear DBAs, please review the new table we're adding in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/OAuthRateLimiter/+/613282/9/schema/oauth_ratelimit_client_tier.sql

The table will only be present on OAuth central wiki, and will hold a mapping 'client_id' -> 'client_tier_name'. It's a one-to-one relationship. The tier name is an arbitrary string, currently listed in the config, but in future we might add an extra table to hold tier name -> tier config mappings.

We do not put the ratelimit tier into the oauth_client table because this is not a native oauth concept, thus the code for supporting it was extracted to a separate extension, and storage is extracted into a separate table.

Event Timeline

Change 613282 had a related patch set uploaded (by Clarakosi; owner: Clarakosi):
[mediawiki/extensions/OAuthRateLimiter@master] Implement OAuth hook & add functionality to set/get ratelimit tiers

https://gerrit.wikimedia.org/r/613282

Hey. There's a couple of comments/questions to start with:

  • Will this table contain private/sensitive information? If so we'll need to filter it so it doesn't end up publicly queryable.
  • I see you're using a foreign key. This is an unusual feature, we currently have none in production. How important is it to your feature?

We can take a deeper look at things once you've had a chance to respond.

Will this table contain private/sensitive information? If so we'll need to filter it so it doesn't end up publicly queryable.

The table should follow the same pattern as the OAuth extension tables, cause without them it's entirely. I can see oauth tables are set to $privateTables[[ https://gerrit.wikimedia.org/g/operations/puppet/+/c1bcf721ad32aec61714e02b61f2d0f3ab3e3d81/manifests/realm.pp#177 | in puppet ]], so this one should be filtered as well.

I see you're using a foreign key. This is an unusual feature, we currently have none in production. How important is it to your feature?

We were not aware foreign keys were to be avoided. The constraint was added as a layer of protection from adding bogus client-ids into the table. We can achieve the same semantics by checking that the client-id exists before writing into the new table. Cascade delete was added in case we ever delete clients, but AFAIK the client records are never actually deleted from oauth table, it's just marked as deleted, so we would never need cascade delete. TLDR: we can remove the foreign key constraint.

Following up as I think Kormat may be done for the day, so you don't have to wait an extra day:

so this one should be filtered as well

Cool, this is the biggest blocker, which is now understood. @Kormat you know now what to do (this is similar to the "creating a private wiki" use case, except for single tables). @Pchelolo DO NOT deploy anything until he confirms filters are in place. Thank you for pinging us ahead- FYI leaks will not happen by default but this makes it easier to be vigilant in case there is any kind of bug on filtering process and its monitoring.

Regarding FK, please note that FK are a basic component of any properly designed database, but traditionally, no code on production uses them (there are 3rd party usages). The largest issue is that they limit concurrency when updating values referred by them, so the "consistency logic" is optimized at large scales like Wikipedia by moving the burden locally to the application logic to prevent that extra contention, specially given sometimes the poor quality of mw queries. Please note this is not really a DBA-driven development guidelines, as much as something that mw mantainers did, I think- although I can see as a sensible design decision for our large scale.

One last thing, I am not sure if new table creations should follow the new system, which uses the abstract schema system. Maybe it only applies to core, or it is optional for existing extensions, not sure.

For topics 2 (FK) and 3(abstract) I think other people in your team would be the best ones to guide you, we (SRE) are "mere deployers/implementors" for these kind of things, and only "canonical blockers" for topic 1 (filtering). Hopefully that will help you a bit?

Thank you a lot @jcrespo. We will remove the FK to follow the current pattern, as for abstract schema - I'll consult with the team.

One last thing, I am not sure if new table creations should follow the new system, which uses the abstract schema system. Maybe it only applies to core, or it is optional for existing extensions, not sure.

I don't think it's required (yet - obviously some tables in core are now using it), but there's certainly no reason it couldn't (to save having to migrate it in the future). The table is pretty simple, so it should be relatively easy to do.

Basically, you write a json file instead (see maintenance/tables.json in MW core for a good example), run maintenance/generateSchemaSql.php (with the required parameters) to generate the SQL files, then plumb those into MW's LoadExtensionSchemaUpdates hook subscriber as usual (and obviously git add the lot)

And you get MySQL/MariaDB, SQLite and Postgres support for the extension from day 1

I'm sure @Ladsgroup (or myself) would be happy for people to help test and lend a hand if you encounter any issues

I'm doing the migration to abstract schema (so far 6 tables have been moved) and wrote this help page that would help you.

I highly recommend using the abstract schema. For several reasons:

  • Currently when I'm migrating these to abstract schema, I find at least one drift per table. This has been causing all sorts of issues. If you don't want this to happen and schemas stop drifting, use abstract schema.
  • Postgres and Sqlite are significantly different from MySQL and since it's hard to test them, we start giving the impression that we support these while actually being completely broken (e.g. not having the proper data type)
  • It will make schema changes much easier in future (once I start migrating schema changes to abstract system, you'll notice the difference).

I can help in any case you want :)

The extension now uses abstract schema - thus the foreign keys and abstract schemas concert were addressed.

As for filtering - is that something we can help with, or would you ping us when it's done?

or would you ping us when it's done?

^This, as I mentioned before. Likely to be done by tomorrow.

Change 616018 had a related patch set uploaded (by Kormat; owner: Kormat):
[operations/puppet@production] realm: Add oauth_ratelimit_client_tier to private_tables.

https://gerrit.wikimedia.org/r/616018

Change 616018 merged by Kormat:
[operations/puppet@production] realm: Add oauth_ratelimit_client_tier to private_tables.

https://gerrit.wikimedia.org/r/616018

Marostegui triaged this task as Medium priority.Jul 24 2020, 8:25 AM
Marostegui moved this task from Triage to In progress on the DBA board.

Mentioned in SAL (#wikimedia-operations) [2020-07-24T08:40:02Z] <kormat> restarting mariadb on all sanitarium hosts T258711

@Pchelolo: the filtering is now in place. Let's keep the task open, and please ping it again when the table has been created in production. I'll then double-check to ensure it's getting filtered correctly.

Kormat moved this task from Unsorted 💣 to Watching 👀 on the User-Kormat board.
Kormat moved this task from Watching 👀 to Blocked 🚧 on the User-Kormat board.

@Pchelolo - Do you know what this task is blocked on your workboard? How can it be unblocked?

@Pchelolo - Do you know what this task is blocked on your workboard? How can it be unblocked?

It is blocked because we need to ping DBAs right before we will be stating to deploy the new extension and it is not ready to be deployed yet. The 'blocked' status usually applies to the whole tree of connected tasks.

@Pchelolo - Do you know what this task is blocked on your workboard? How can it be unblocked?

It is blocked because we need to ping DBAs right before we will be stating to deploy the new extension and it is not ready to be deployed yet. The 'blocked' status usually applies to the whole tree of connected tasks.

It doesn't have to be before, it is ok to ping us once it is created, so we can double check it has been filtered correctly.

And it's also fine to create the table well in advance of the extension actually being deployed (this doesn't necessarily need to be done by a DBA either), if the schema is (probably) not going to change

Not that the security reviews are done, patches are merged and we're ready for deployment, how do we actually create the new table in the database?

Not that the security reviews are done, patches are merged and we're ready for deployment, how do we actually create the new table in the database?

Where does it want to go? With the other OAuth tables on meta which is the "central" wiki?

Where does it want to go? With the other OAuth tables on meta which is the "central" wiki?

Yes.

sql.php --wiki=metawiki /path/to/schema.sql

As the extension patch isn't in WMF branches yet, you can just create a sql file in your home dir, and then use something like (I think offhand it needs to be the full path)

sql.php --wiki=metawiki ~/OAuthRateLimiter.sql

Change 613282 merged by jenkins-bot:
[mediawiki/extensions/OAuthRateLimiter@master] Implement OAuth hook & add functionality to set/get ratelimit tiers

https://gerrit.wikimedia.org/r/613282

Mentioned in SAL (#wikimedia-operations) [2020-09-02T16:39:17Z] <Pchelolo> creating oauth_ratelimit_client_tier table T258711

@Pchelolo: the filtering is now in place. Let's keep the task open, and please ping it again when the table has been created in production. I'll then double-check to ensure it's getting filtered correctly.

The table has been created in beta and in production. @Kormat would you be able to confirm filtering works?

Filtering is confirmed functioning, there's no sign of the table on labsdb hosts.