Page MenuHomePhabricator

Configure Toolforge replica views and dumps for the new MCR tables
Closed, ResolvedPublic

Description

There's no formal Schema-change here, since simply adding tables isn't classified as a schema change at https://wikitech.wikimedia.org/wiki/Schema_changes#What_is_not_a_schema_change. But we still need to make them visible in Toolforge and possibly dumps.

There are four new tables added in rMW943c724198f1: MCR database schema.

  • slot_roles can be exposed without restriction.
  • content_models can be exposed without restriction.
  • slots will need some filtering to match the existing behavior for revision.
    • The slot_content_id column should be nulled if the corresponding revision table row (join on rev_id = slot_revision_id) has rev_deleted & 1, or if the corresponding archive table row (join on ar_rev_id = slot_revision_id) has ar_deleted & 1
  • content will also need some filtering to match the existing behavior for revision.
    • A row should be hidden entirely if there is no row in slots that publicly refers to it (join on content_id = slot_content_id). You'll probably want to add an index on slot_content_id to be able to efficiently check this.
    • We don't currently expose anything equivalent to content_address at all. It roughly corresponds to the text table's old_text when old_flags contains external (unlike the text table, there's no provision for storing actual revision content here).

See also T174047: Hide deprecated/unused fields on toolforge replica [MCR], which is about providing views that pretend fields that are being removed or replaced in MCR still exist.

Event Timeline

Anomie triaged this task as Medium priority.Jan 8 2018, 4:03 PM
Anomie created this task.

slot_roles and content_models can be dumped as is by adding them to the list of tables to be dumped: https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/snapshot/files/dumps/table_jobs.yaml
For the other two, we can either have views or fall back to using MediaWiki to determine which rows are viewable by an anonymous user, and write only those, probably in xml format. Appropriate stanzas would need to be added to the dumpBackup.php maintenance script, or perhaps a new script should be written which we could extend to dump public pieces of other restricted tables (see T140977).

So should content.content_address be NULL in replica views? Just trying to clarify how the comments here affect how it should be handled.

content.content_address doesn't have to be nulled. There will just be no mechanism on labs for resolving these addresses.

Some rows in the slots and content tables belong to archived revisions, and some may refer to suppressed (oversighted) content. I don't think it's necessary to filter them out, though. They don't contain any sensitive data. The "worst" thing there is an SHA1 hash of the content blob, which would allow a nosy user to check if their guess at the content of some hidden revision is correct.

Change 426325 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: Add new MCR tables to views

https://gerrit.wikimedia.org/r/426325

I need to add an index to that patch. It's a first scribble.

Change 426325 merged by Bstorm:
[operations/puppet@production] wiki replicas: Add new MCR tables to views

https://gerrit.wikimedia.org/r/426325

When are these tables available in the prod dbs? Are they up now? I can run the adds of the new tables as soon as they are generally available.

Someone needs to run https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/master/maintenance/archives/patch-slots.sql on all wikis (see T190153), the rest should already be available.

We (DBAs) normally do not create tables, as that can be done during the deployments: https://wikitech.wikimedia.org/wiki/Schema_changes#What_is_not_a_schema_change so maybe ping Releng for that?

I can do it myself too, if @Bstorm doesn't want to.

I think I'm supposed to hang back in cloud-land rather than pushing out production changes. That's as far as I know, anyway. :)

The slots table should now exist on all wikis.

Change 428037 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: Depool labsdb1010 for MCR table additions

https://gerrit.wikimedia.org/r/428037

Change 428037 merged by Marostegui:
[operations/puppet@production] wiki replicas: Depool labsdb1010 for MCR table additions

https://gerrit.wikimedia.org/r/428037

Change 428361 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: Depool labsdb1011 for MCR table additions

https://gerrit.wikimedia.org/r/428361

Change 428361 merged by Marostegui:
[operations/puppet@production] wiki replicas: Depool labsdb1011 for MCR table additions

https://gerrit.wikimedia.org/r/428361

Mentioned in SAL (#wikimedia-operations) [2018-04-24T07:23:43Z] <marostegui> Reload haproxy on dbproxy1010 to depool labsdb1011 - T184446

Mentioned in SAL (#wikimedia-operations) [2018-04-24T16:08:03Z] <marostegui> Reload haproxy on dbproxy1010 to repool labsdb1011 - https://phabricator.wikimedia.org/T184446

Bstorm claimed this task.
238482n375 added a project: acl*security.
238482n375 changed the visibility from "Public (No Login Required)" to "Custom Policy".
238482n375 subscribed.

SG9tZVBoYWJyaWNhdG9yCk5vIG1lc3NhZ2VzLiBObyBub3RpZmljYXRpb25zLgoKICAgIFNlYXJjaAoKQ3JlYXRlIFRhc2sKTWFuaXBoZXN0ClQxOTcyODEKRml4IGZhaWxpbmcgd2VicmVxdWVzdCBob3VycyAodXBsb2FkIGFuZCB0ZXh0IDIwMTgtMDYtMTQtMTEpCk9wZW4sIE5lZWRzIFRyaWFnZVB1YmxpYwoKICAgIEVkaXQgVGFzawogICAgRWRpdCBSZWxhdGVkIFRhc2tzLi4uCiAgICBFZGl0IFJlbGF0ZWQgT2JqZWN0cy4uLgogICAgUHJvdGVjdCBhcyBzZWN1cml0eSBpc3N1ZQoKICAgIE11dGUgTm90aWZpY2F0aW9ucwogICAgQXdhcmQgVG9rZW4KICAgIEZsYWcgRm9yIExhdGVyCgpFVzZSC3IERpc2NsYWltZXIgtyBDQy1CWS1TQSC3IEdQTApZb3VyIGJyb3dzZXIgdGltZXpvbmUgc2V0dGluZyBkaWZmZXJzIGZyb20gdGhlIHRpbWV6b25lIHNldHRpbmcgaW4geW91ciBwcm9maWxlLCBjbGljayB0byByZWNvbmNpbGUu

Aklapper changed the visibility from "Custom Policy" to "Public (No Login Required)".
Aklapper removed a subscriber: 238482n375.