Page MenuHomePhabricator

Create views for the schema change for refactored actor storage
Closed, ResolvedPublic

Description

The schema change for T188299 is now done on all the slaves and replicas, only pending the master which will be done probably on Wednesday.

Please go ahead and get the views ready for these changes.

  • All columns in the actor table can be made available without restriction. Rows must only be available if (1) actor_user refers to a visible row in the user table or (2) the row is publicly referenced from one of the _actor columns being added in this patch. Extensions converted to have _actor columns will eventually need to be added to that list. Note: From the MediaWiki side, this table should receive manly INSERTs, no UPDATEs, and DELETEs only due to potential maintenance. Also note {T91706}.
  • All columns in revision_actor_temp can be made available without restriction. Rows must only be available if the corresponding revision table row exists (join on revactor_rev = rev_id) and has (rev_deleted & 4) = 0. Note: From the MediaWiki side, this table should receive INSERTs and DELETEs but no UPDATEs.
  • All the _actor columns added to existing tables should be available under the same conditions as the corresponding _user or _user_text columns. These seem to be:
  • archive.ar_actor: Available when (ar_deleted & 4) = 0.
  • ipblocks.ipb_by_actor: Always available (whenever the row itself is).
  • image.img_actor: Always available.
  • oldimage.oi_actor: Available when (oi_deleted & 4) = 0.
  • filearchive.fa_actor: Available when (fa_deleted & 4) = 0.
  • recentchanges.rc_actor: Available when (rc_deleted & 4) = 0.
  • logging.log_actor: Available when (log_deleted & 4) = 0

Event Timeline

Marostegui created this task.

Change 431823 had a related patch set uploaded (by Anomie; owner: Bstorm):
[operations/puppet@production] WIP: wiki replicas - prepare for refactored actor storage

https://gerrit.wikimedia.org/r/431823

Vvjjkkii renamed this task from Create views for the schema change for refactored actor storage to w5baaaaaaa.Jul 1 2018, 1:07 AM
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: Aklapper, gerritbot.
Marostegui renamed this task from w5baaaaaaa to Create views for the schema change for refactored actor storage.Jul 1 2018, 7:58 PM
Marostegui lowered the priority of this task from High to Medium.
Marostegui updated the task description. (Show Details)
Marostegui added a subscriber: Aklapper.

I've rebased and set up the patch for the replicas. Is the actor table ready to go with img_actor and all that now?
Also, any comments on the patch?

The img_actor column isn't present on the masters yet (T187089) as that is blocked on the DC failover.
All the slaves and replicas do have the img_actor column on the image table

With how we are handling T174047, where is this now? Should I change my review to something more like that or leave it as is.
If all replicas have the column, I should be able to deploy it, but I realized I should check if we are continuing the backward-compat approach or changing it.

Change 431823 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas - prepare for refactored actor storage

https://gerrit.wikimedia.org/r/431823

There are two aspects to compatibility joins here:

  • The forward-compat join from revision → revision_actor_temp to emulate rev_actor.
  • The back-compat joins from all the tables to emulate xx_user and xx_user_text once those start being not populated.

Per T174047 the back-compat joins probably shouldn't happen. The forward-compat join might still make sense though. In the end, though, I leave the decision up to you and the DBAs since you know the performance issues better than I do.

The performance issues are likely to be the most severe around the biggest tables. We could probably get away with almost anything on smaller ones. The needs of the actor table re: what should be shown and what not means we'll be doing some joining at least. I'll take another look at it with more of an eye to this issue.

are likely to be the most severe around the biggest tables

I agree with this statement, the most issues we had so far is with page (small to normal) joining revision (huge). I don't have a problem with the idea, just with the effects on certain cases, like the one I just mentioned.

My vote would be, however, with exposing the underlying tables if public for those who want raw access (at least, in addition to the compat views).

Funny thing about this one, I think a call to a joined-up table that references actor, using backward compatibility logic, might be faster than correctly calling the new actor table because getting a row from actor requires checking 9 other tables to be filtered right.

Actor and revision are probably some of the largest tables we'll have, so, I'm not sure there is an easy way around this badly impacting performance like for MCR (where one simply can not do the joins).

I think it might be worth it to do like we did for the "user_index" views, and change ones with an unnecessary, backward compat join to "_compat" version, with the straight $table view being joined only as needed for foward compatibility or privacy, just in general. However, we are likely to still find the actor table will be a slow one or a burden on the servers due to privacy filtering that references nine other tables.

I'll start this by first putting up a patch that provides this on MCR for when we are ready with communications and all that.

Ok, I believe, from everything I'm seeing here and on the replicas that I can push out the patch and rebuild views (with depools) as in the current patch set https://gerrit.wikimedia.org/r/c/operations/puppet/+/431823. I've tested it on my local setup.

This should add the backward compatibility steps to the compat tables and expose the new fields in the main tables. This should be good until we start dropping columns. Correct, @Marostegui and @jcrespo ?

Change 431823 merged by Bstorm:
[operations/puppet@production] wiki replicas - prepare for refactored actor storage

https://gerrit.wikimedia.org/r/431823

Going to test this on a wiki tomorrow to ensure that it looks good on the replicas as well as on my local setup.

Finally got this deployed to a set of views (scwiki) as a litmus test.

Change 464569 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: depool labsdb1010 to add initial actor table changes to views

https://gerrit.wikimedia.org/r/464569

Change 464569 merged by Banyek:
[operations/puppet@production] wiki replicas: depool labsdb1010 to add initial actor table changes to views

https://gerrit.wikimedia.org/r/464569

Change 464619 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: depool labsdb1011 to add initial actor table changes to views

https://gerrit.wikimedia.org/r/464619

Change 464619 merged by Banyek:
[operations/puppet@production] wiki replicas: depool labsdb1011 to add initial actor table changes to views

https://gerrit.wikimedia.org/r/464619

Change 464859 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: depool labsdb1009 for actor table changes to views

https://gerrit.wikimedia.org/r/464859

Change 464859 merged by Banyek:
[operations/puppet@production] wiki replicas: depool labsdb1009 for actor table changes to views

https://gerrit.wikimedia.org/r/464859

I think this is completed, and I just forgot to close it.