Page MenuHomePhabricator

Change views for the new columns of the refactored comment storage
Closed, ResolvedPublic

Description

Hi,

As part of T174569 there are some views that need to be change to reflect the new columns with some conditionals.
Cross posting from the original task:

  • All the _id columns added to existing tables should be available under the same conditions as the corresponding column without the _id suffix. These seem to be:
    • archive.ar_comment_id: Never available.
    • ipblocks.ipb_reason_id: Always available (whenever the row itself is).
    • oldimage.oi_description_id: Available when (oi_deleted & 2) = 0.
    • filearchive.fa_description_id: Available when (fa_deleted & 2) = 0.
    • filearchive.fa_deleted_reason_id: Always available.
    • recentchanges.rc_comment_id: Available when (rc_deleted & 2) = 0.
    • logging.log_comment_id: Available when (log_deleted & 2) = 0.
    • protectedtitles.pt_reason_id: Always available.

The original task is not yet finished (we are aiming for mid Feb) but creating this so you guys can have a heads up.

Note: I will be merging the new columns filters soon: https://gerrit.wikimedia.org/r/#/c/393725/2 so pending would be the views with the conditions specified above.

Thanks!

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+4 -4
operations/puppetproduction+4 -4
operations/puppetproduction+2 -2
operations/puppetproduction+23 -80
operations/puppetproduction+11 -9
operations/puppetproduction+4 -4
operations/puppetproduction+3 -3
operations/puppetproduction+3 -3
operations/puppetproduction+4 -0
operations/puppetproduction+11 -0
operations/puppetproduction+5 -2
operations/puppetproduction+2 -2
operations/puppetproduction+4 -3
operations/puppetproduction+10 -3
operations/puppetproduction+230 -0
operations/puppetproduction+1 -1
operations/puppetproduction+347 -86
operations/puppetproduction+12 -0
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Rather than "create views", this task seems to be focusing on the columns that need to be added to the existing views.

There are also three views that will need creating:

  • A view for the new comment table. All columns can be made available without restriction, but only expose rows referenced from a visible _id column (as detailed in the task description or from one of the two views below).
  • A view for the new revision_comment_temp table. All columns can be made available without restriction, but only rows where the corresponding revision table row exists (join on revcomment_rev = rev_id) and has (rev_deleted & 2) = 0.
  • A view for the new image_comment_temp table. All columns can be made available without restriction, but only rows where the corresponding image table row exists (join on imgcomment_name = img_name).

Note that the latter two tables will eventually go away. If you want to avoid a user-visible change at that time, you could just make your revision and image views LEFT JOIN in the temporary tables and expose the revcomment_comment_id as rev_comment_id (or null when hidden by rev_deleted) and imgcomment_description_id as img_description_id.

Marostegui renamed this task from Create views for the new columns of the refactored comment storage to Change views for the new columns of the refactored comment storage.Nov 29 2017, 5:52 PM
Marostegui updated the task description. (Show Details)

Rather than "create views" :-) I think it is a way of speaking, as views are not tangible, I think current scripts drops and recreates.

Change 394254 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] filtered_tables: Add new columns

https://gerrit.wikimedia.org/r/394254

Change 394254 merged by Marostegui:
[operations/puppet@production] filtered_tables: Add new columns

https://gerrit.wikimedia.org/r/394254

this can now proceed as the schema change has been finished and the triggers are now in place.

Change 415384 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] views: Accommodate new comments table with rules and compatibility

https://gerrit.wikimedia.org/r/415384

Change 416496 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki-replicas: Accommodate new comments table with rules and compatibility

https://gerrit.wikimedia.org/r/416496

Change 416496 merged by Bstorm:
[operations/puppet@production] wiki-replicas: Accommodate new comments table with rules and compatibility

https://gerrit.wikimedia.org/r/416496

Change 417024 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: Small fix in maintain-views changes for comment table

https://gerrit.wikimedia.org/r/417024

Change 417024 merged by Bstorm:
[operations/puppet@production] wiki replicas: Small fix in maintain-views changes for comment table

https://gerrit.wikimedia.org/r/417024

Change 417357 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: script index creation for easier maintenance

https://gerrit.wikimedia.org/r/417357

Change 417357 merged by Bstorm:
[operations/puppet@production] wiki replicas: script index creation for easier maintenance

https://gerrit.wikimedia.org/r/417357

Change 422199 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: record grants to be added to index-management user

https://gerrit.wikimedia.org/r/422199

Change 422199 merged by Andrew Bogott:
[operations/puppet@production] wiki replicas: record grants and set user for index maintenance script

https://gerrit.wikimedia.org/r/422199

Change 423730 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: set the index script to run without args in expected ways

https://gerrit.wikimedia.org/r/423730

Change 423730 merged by Bstorm:
[operations/puppet@production] wiki replicas: set the index script to run without args in expected ways

https://gerrit.wikimedia.org/r/423730

Change 423754 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: make dry-run command the same as others and fix a typo

https://gerrit.wikimedia.org/r/423754

Change 423754 merged by Bstorm:
[operations/puppet@production] wiki replicas: make dry-run command the same as others and fix a typo

https://gerrit.wikimedia.org/r/423754

Change 423833 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: correct small but critical bugs in maintain-views

https://gerrit.wikimedia.org/r/423833

Change 423833 merged by Bstorm:
[operations/puppet@production] wiki replicas: correct small but critical bugs in maintain-views

https://gerrit.wikimedia.org/r/423833

Change 423965 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: fix complex where with regex substitution

https://gerrit.wikimedia.org/r/423965

Change 423965 merged by Bstorm:
[operations/puppet@production] wiki replicas: fix complex where with regex substitution

https://gerrit.wikimedia.org/r/423965

Change 424023 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: regex-add database name in joins for WHERE statements

https://gerrit.wikimedia.org/r/424023

Change 424023 merged by Bstorm:
[operations/puppet@production] wiki replicas: regex-add database name in joins for WHERE statements

https://gerrit.wikimedia.org/r/424023

Change 425095 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: depool labsdb1010

https://gerrit.wikimedia.org/r/425095

Change 425095 merged by Marostegui:
[operations/puppet@production] wiki replicas: depool labsdb1010

https://gerrit.wikimedia.org/r/425095

Change 425301 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: depool labsdb1011

https://gerrit.wikimedia.org/r/425301

Change 425301 merged by Marostegui:
[operations/puppet@production] wiki replicas: depool labsdb1011

https://gerrit.wikimedia.org/r/425301

Change 425553 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: depool labsdb1009 for view updates

https://gerrit.wikimedia.org/r/425553

Change 425553 merged by Marostegui:
[operations/puppet@production] wiki replicas: depool labsdb1009 for view updates

https://gerrit.wikimedia.org/r/425553

All views on the wiki replicas are now updated with the new settings. It remains to be seen for sure if they are altogether usable. I think I'll try some tests to be sure.

In enwiki_p I can actually select from the comment table. Some selecting the last few entries in that table hangs, but I kind of understand that. The last few entries of ipblocks come up quite nicely with both the ipb_reason_id and the ipb_reason in the view.

However, there are apparently only 36 rows in the resultant comment view, I've just discovered. That means that somewhere in that crazy bunch of selects to create the view, something or the whole approach isn't right. I think I've spotted the error in the resultant SQL. Fixing that.

I reran the script on just enwiki and got 4053401 rows for comment, which is far more reasonable. It had assigned the wrong db for the where portion of the view. That sort of a scope leak should not happen. With the number of DBs involved here, I cannot go back over all of them one-at-a-time, but I need to check the other servers for signs of that condition. I'm reviewing the program to see where the leak could have happened, and am not yet seeing it.

Just a guess since I don't really know python, but perhaps https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/profile/files/labs/db/views/maintain-views.py#294 where it assigns back to view_details["where"] it should instead use a local variable?

No, that shouldn't be it. The issue here is one of regex caching or compiling, I think, which would be happening--something in the global scope, basically. It's really weird. It uses only the very first database name it encounters for every single regex sub. That makes sense only if it caches it globally (which would be typical).

Except, I did an experiment and that is it (go figure). Fixing things. I'd figured that it was being handled in local scope, but I don't think it is somehow.

Or rather it is something like that, because removing the regex, it has another issue that does include a scope leak. It's not at all what I'd expect, for that matter, but there is one there.

Change 425741 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: comment view scope fix

https://gerrit.wikimedia.org/r/425741

Change 425741 merged by Bstorm:
[operations/puppet@production] wiki replicas: comment view scope fix

https://gerrit.wikimedia.org/r/425741

Ok, now it all works, and the comment view is surprisingly responsive on all servers. Thanks so much for your help @Anomie

At this point, I think this is done. The views seem to work in a backward compatible way, and things all check out in my sample tests.

You're welcome.

My guess was that python might treat the view_details object (dictionary?) like PHP does its objects, rather than making a deep copy of the object, so the assignment would affect the input to subsequent runs. The regexes only did any replacements on the first run: they look for fragments like from table and join table, so the first run changed from table1 join table2 to from `bewiki`.`table1` join `bewiki`.`table2`. Subsequent runs didn't find anything to replace because of the backticks the first run inserted.

Yeah, that's basically how I figured it in the end.

My suggestion in the future, for new code/new views (not for regular things like dropping views or adding new wikis) would be to test extensively on a depooled host, to avoid bugs and security issues (maybe you did that already, I didn't follow all details).

Another thing that could be done is to ask trusted users (ones that signed the NDA) to test the validity of changes.

Having said that, remember we already do a first pass of nuking the criticaly private data at sanitarium (password hashes, etc.) so to avoid larger issues, it is just the row-level one we cannot do with triggers.

Thanks for the great work. I also warned you it was not easy to handle production (metadata locking and other issues when in use)! :-)

Bstorm removed a project: Patch-For-Review.

Great ideas, too. I'll keep that in mind and document it.

@Anomie Since it is clear that the joins are impacting performance badly for some tables for this as well (especially some of the userindex views and friends) where they are only there for better performance. I am thinking it might be good to move the comment stuff over to compat views. It seems we are already moving to the "write new" phase of this, so I'm not sure exactly how much impact it'll be if I do that. Are we presently at "write both" or "write new" @Marostegui ?

We're currently at "write both". Testing wikis and mediawiki.org may be changed to "write new" soon, but other wikis will wait until after the datacenter switch-back.

Change 463541 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: Remove most comment joins from non-compat tables

https://gerrit.wikimedia.org/r/463541

Change 463541 merged by Bstorm:
[operations/puppet@production] wiki replicas: Remove most comment joins from non-compat tables

https://gerrit.wikimedia.org/r/463541

Deployed changes to scwiki as a litmus test.

Change 467820 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: depool labsdb1010 for view updates

https://gerrit.wikimedia.org/r/467820

Change 467820 merged by Banyek:
[operations/puppet@production] wiki replicas: depool labsdb1010 for view updates

https://gerrit.wikimedia.org/r/467820

Change 467970 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: repool labsdb1010 and depool labsdb1011 for view updates

https://gerrit.wikimedia.org/r/467970

Change 467970 merged by Marostegui:
[operations/puppet@production] wiki replicas: repool labsdb1010 and depool labsdb1011 for view updates

https://gerrit.wikimedia.org/r/467970

Mentioned in SAL (#wikimedia-operations) [2018-10-17T14:01:36Z] <marostegui> Repool labsdb1010, depool labsdb1011 - T181650

Change 468035 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wiki replicas: depool labsdb1009 for view updates

https://gerrit.wikimedia.org/r/468035

Change 468035 merged by Banyek:
[operations/puppet@production] wiki replicas: depool labsdb1009 for view updates

https://gerrit.wikimedia.org/r/468035

The new version is now out everywhere. The only funny joins are where needed for temp tables now.

As Jaime mentioned in T209031, we're having trouble joining the new comment views with the logging and revision tables. Here's my original question with guesses as to what might work, but we're open to anything that helps us with our use case: T209031#4736006

Looks like this was done in October 2018. Feel free to reopen if I'm mistaken.

Continued slowness of comment itself is tracked in T215445: comment and actor view challenges for Cloud Services.