Page MenuHomePhabricator

Deploy labsdbuser and views to new clouddb hosts
Closed, ResolvedPublic

Description

The new clouddb hosts need to have:

  • labsdbrole
  • Grants for labsdb role
  • The _p databases
  • Views for _p databases.
  • The following users:
    • labsdbadmin
    • maintainviews
    • maintainindexes
    • viewmaster

Hosts:

  • clouddb1013:3311
  • clouddb1013:3313
  • clouddb1014:3312
  • clouddb1014:3317
  • clouddb1015:3314
  • clouddb1015:3316
  • clouddb1016:3315
  • clouddb1016:3318
  • clouddb1017:3311
  • clouddb1017:3313
  • clouddb1018:3312
  • clouddb1018:3317
  • clouddb1019:3314
  • clouddb1019:3316
  • clouddb1020:3315
  • clouddb1020:3318

Details

ProjectBranchLines +/-Subject
operations/puppetproduction+1 -8
operations/puppetproduction+5 -2
operations/puppetproduction+34 -14
operations/puppetproduction+43 -14
operations/cookbooksmaster+9 -1
operations/puppetproduction+2 -1
operations/puppetproduction+1 -1
operations/puppetproduction+51 -18
operations/puppetproduction+46 -18
operations/puppetproduction+2 -2
operations/puppetproduction+1 -1
operations/puppetproduction+5 -12
operations/puppetproduction+125 -96
operations/puppetproduction+2 -2
operations/puppetproduction+2 -2
operations/puppetproduction+2 -2
operations/puppetproduction+254 -119
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Marostegui triaged this task as Medium priority.Nov 20 2020, 8:21 AM
Marostegui moved this task from Triage to Ready on the DBA board.

@Bstorm we should start with this "early" (meaning: before all the hosts are ready, in case we find issues).
Next week I will deploy the user, role and _p databases to clouddb1015:3316 and clouddb1019:3316. I will ping you once that is done so you can try to create the views and do all the magic behind them.

Cheeky question, should we go with a "clouddb" user and role to be consistent or would that be too much work to untangle from the current set up?

That's a good point - up to cloud-services-team, as they'd need to change their scripts. From our side it doesn't make much difference.
But probably they'd need to adapt their scripts to have both users available, as both setups will co-exist for a period of time.

@Bstorm we should start with this "early" (meaning: before all the hosts are ready, in case we find issues).
Next week I will deploy the user, role and _p databases to clouddb1015:3316 and clouddb1019:3316. I will ping you once that is done so you can try to create the views and do all the magic behind them.

Thanks! I'll get the scripts ready. I am inclined to not change the role name right now because it will complicate this already-complicated transition.

@Bstorm we should start with this "early" (meaning: before all the hosts are ready, in case we find issues).
Next week I will deploy the user, role and _p databases to clouddb1015:3316 and clouddb1019:3316. I will ping you once that is done so you can try to create the views and do all the magic behind them.

Thanks! I'll get the scripts ready. I am inclined to not change the role name right now because it will complicate this already-complicated transition.

+1 I will go for labsdbuser next week then!

@Marostegui Random question: where does centralauth live in this setup? We are so far planning on keeping meta_p on s7 for historical reasons (or possibly on all sections if meta_p becomes a much better thing with tooling assuming it ends up on s7).

Change 642503 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: modify views scripts to work on any replica style

https://gerrit.wikimedia.org/r/642503

bd808 added a subscriber: bd808.Nov 20 2020, 9:58 PM

@Marostegui Random question: where does centralauth live in this setup? We are so far planning on keeping meta_p on s7 for historical reasons (or possibly on all sections if meta_p becomes a much better thing with tooling assuming it ends up on s7).

Per operations/mediawiki-config.git, centralauth is located in s7. Semi-confusingly it is not listed in the s7 dblist, but that is because those lists are for automating various wiki config and maintenance tasks and centralauth is not really a wiki.

Change 642570 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: extend maintain_dbusers to multiinstance replicas

https://gerrit.wikimedia.org/r/642570

@Marostegui Random question: where does centralauth live in this setup? We are so far planning on keeping meta_p on s7 for historical reasons (or possibly on all sections if meta_p becomes a much better thing with tooling assuming it ends up on s7).

Per operations/mediawiki-config.git, centralauth is located in s7. Semi-confusingly it is not listed in the s7 dblist, but that is because those lists are for automating various wiki config and maintenance tasks and centralauth is not really a wiki.

I saw you were answered on IRC already, but answering here for the record.
centralauth lives in s7 and will remain on s7 on the new hosts.
It is not listed on s7 cause it only lists wikis, and centralauth is not really a wiki

@Bstorm can you test the script against clouddb1013:3311?
On that host I have:

  • Created the labsdbuser role
  • Granted all the _p (even though only enwiki lives there) like:
+----------------------------------------------------------------------------+
| Grants for labsdbuser                                                      |
+----------------------------------------------------------------------------+
| GRANT USAGE ON *.* TO `labsdbuser`                                         |
| GRANT SELECT, SHOW VIEW ON `amwikiquote\_p`.* TO `labsdbuser`              |
| GRANT SELECT, SHOW VIEW ON `kkwiki\_p`.* TO `labsdbuser`                   |
| GRANT SELECT, SHOW VIEW ON `ndswiki\_p`.* TO `labsdbuser`                  |
| GRANT SELECT, SHOW VIEW ON `elwikivoyage\_p`.* TO `labsdbuser`             |
  • Created: enwiki_p

Do your testing there, and if you are happy with it, I will deploy the same to the other hosts that already have mysql up.

Marostegui moved this task from Ready to In progress on the DBA board.Nov 23 2020, 7:41 AM
Bstorm added a comment.EditedNov 23 2020, 11:42 PM

I'm aiming to set up https://gerrit.wikimedia.org/r/c/operations/puppet/+/642503 to be a sorta noop on the existing labsdb* things and to set up stuff correctly on the new ones. Since it runs things in order, I can manually edit the config when it's deployed to only run against s1 and just try it there.

On a quick check (since that's not merged yet) I know it will choke on not having the viewmaster user (which acts as the definer for the views) or likely the maintainviews user who actually writes the views. The other thing that will fail is the maintain_dbusers service because it depends on having the user from $::passwords::mysql::labsdb::user set up with the grants at modules/role/templates/mariadb/grants/wiki-replicas.sql

I should be able to make the indexes without either of those when the code is merged, I think.

I should be able to make the indexes without either of those when the code is merged, I think.

Ah, nope. That'll need the maintainindexes user. I can make these users, if you prefer as well. We've recorded much of the grants and things as far as I can tell.

@Bstorm I am going to apply all the grants that show up at: templates/mariadb/grants/wiki-replicas.sql (not the uX ones).
So we also need to create:
labsdbadmin
maintainviews
maintainindexes
viewmaster

Marostegui updated the task description. (Show Details)Nov 24 2020, 6:58 AM

Created users (self note: grants_labsdb file) on clouddb1013:3313. Can you try again?

GRANT SUPER, CREATE USER ON *.* TO 'labsdbadmin'@'10.64.37.20' IDENTIFIED BY PASSWORD '*x' WITH GRANT OPTION;
GRANT labsdbuser TO 'labsdbadmin'@'10.64.37.20' WITH ADMIN OPTION;
GRANT SELECT, INSERT, UPDATE ON `mysql`.* TO 'labsdbadmin'@'10.64.37.20';
GRANT SELECT, SHOW VIEW ON `%wik%`.* TO 'labsdbadmin'@'10.64.37.20';
GRANT SELECT, SHOW VIEW ON `%_p`.* TO 'labsdbadmin'@'10.64.37.20' WITH GRANT OPTION;
GRANT SUPER, CREATE USER ON *.* TO 'labsdbadmin'@'10.64.37.19' IDENTIFIED BY PASSWORD '*x' WITH GRANT OPTION;
GRANT labsdbuser TO 'labsdbadmin'@'10.64.37.19' WITH ADMIN OPTION;
GRANT SELECT, INSERT, UPDATE ON `mysql`.* TO 'labsdbadmin'@'10.64.37.19';
GRANT SELECT, SHOW VIEW ON `%wik%`.* TO 'labsdbadmin'@'10.64.37.19';
GRANT SELECT, SHOW VIEW ON `%_p`.* TO 'labsdbadmin'@'10.64.37.19' WITH GRANT OPTION;
GRANT SUPER ON *.* TO 'maintainviews'@'localhost' IDENTIFIED BY PASSWORD '*x';
GRANT ALL PRIVILEGES ON `meta\_p`.* TO 'maintainviews'@'localhost';
GRANT ALL PRIVILEGES ON `heartbeat\_p`.* TO 'maintainviews'@'localhost';
GRANT SELECT ON `heartbeat`.* TO 'maintainviews'@'localhost';
GRANT SELECT ON `centralauth`.* TO 'maintainviews'@'localhost';
GRANT ALL PRIVILEGES ON `centralauth\_p`.* TO 'maintainviews'@'localhost';
GRANT ALL PRIVILEGES ON `%\_p`.* TO 'maintainviews'@'localhost';
GRANT SELECT, DROP, CREATE VIEW ON `%wik%`.* TO 'maintainviews'@'localhost';
GRANT ALL PRIVILEGES ON `%wik%\_p`.* TO 'maintainviews'@'localhost';
GRANT SELECT (user, host) ON `mysql`.`user` TO 'maintainviews'@'localhost';
GRANT SUPER ON *.* TO 'maintainindexes'@'localhost' IDENTIFIED BY PASSWORD '*x';
GRANT SELECT, INDEX, ALTER ON `%wik%`.* TO 'maintainindexes'@'localhost';
GRANT SELECT ON *.* TO 'viewmaster'@'%' IDENTIFIED BY PASSWORD '*x';

Change 643356 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: format maintain-meta_p.py with black

https://gerrit.wikimedia.org/r/643356

Change 643363 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: Upgrade maintain-meta_p.py to python 3

https://gerrit.wikimedia.org/r/643363

Change 643578 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: add maintain-meta_p only to s7 and legacy replicas

https://gerrit.wikimedia.org/r/643578

@Bstorm have you found any other grant issues or should I go ahead and deploy all those roles/users to the rest of the clouddb hosts that are up for now?

Sorry, I haven't had a chance to test. I plan to today.

Change 642503 merged by Bstorm:
[operations/puppet@production] wikireplicas: modify views scripts to work on any replica style

https://gerrit.wikimedia.org/r/642503

Change 644328 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: fiddle with the template spacing a bit

https://gerrit.wikimedia.org/r/644328

Change 644328 merged by Bstorm:
[operations/puppet@production] wikireplicas: fiddle with the template spacing a bit

https://gerrit.wikimedia.org/r/644328

Change 644348 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: extend, don't append when adding to your lists

https://gerrit.wikimedia.org/r/644348

Change 644348 merged by Bstorm:
[operations/puppet@production] wikireplicas: extend, don't append when adding to your lists

https://gerrit.wikimedia.org/r/644348

@Marostegui The views are created on s1@clouddb1013. That was nice and smooth.
The indexes are in process. It's taking a little time for that part...and I already mixed up one because I made the mistake of not starting a screen session and will have to redo that index, but that's ok.

I have puppet disabled so that the configs are set to only use s1 vs. s1 and s3.

Change 644351 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: fix the spacing for the index script template

https://gerrit.wikimedia.org/r/644351

Change 644351 merged by Bstorm:
[operations/puppet@production] wikireplicas: fix the spacing for the index script template

https://gerrit.wikimedia.org/r/644351

I haven't created all the users yet either. I'm going to need to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/642570, and see how that goes before it will even work. That said, the indexes are done as well. So we've got views and indexes on that instance :) The settings there are sufficient for that.

PS - I am aware of the wmf-pt-killer script setup causing puppet to fail. I'll get that tomorrow.

I haven't created all the users yet either. I'm going to need to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/642570, and see how that goes before it will even work. That said, the indexes are done as well. So we've got views and indexes on that instance :) The settings there are sufficient for that.

Excellent, I am going to add the users part to the task so we know it needs to be done before considering a host complete. I will go ahead and deploy the same I deployed to clouddb1013:3311 to the rest of hosts.

PS - I am aware of the wmf-pt-killer script setup causing puppet to fail. I'll get that tomorrow.

This is being tracked at T260511

Mentioned in SAL (#wikimedia-operations) [2020-12-01T07:15:43Z] <marostegui> Deploy labsdb role on all clouddb instances (except clouddb1020*) T268312

Mentioned in SAL (#wikimedia-operations) [2020-12-01T07:31:11Z] <marostegui> Deploy "_p" databases to all clouddb hosts (except clouddb1020*) T268312

labsdb user and grants deployed:

clouddb1013:3311
929
clouddb1013:3313
929
clouddb1014:3312
929
clouddb1014:3317
929
clouddb1015:3314
929
clouddb1015:3316
929
clouddb1016:3315
929
clouddb1016:3318
929
clouddb1017:3311
929
clouddb1017:3313
929
clouddb1018:3312
929
clouddb1018:3317
929
clouddb1019:3314
929
clouddb1019:3316
929

The following users were also deployed:

  • labsdbadmin
  • maintainviews
  • maintainindexes
  • viewmaster
clouddb1013:3311
+-----------------+-------------+
| User            | Host        |
+-----------------+-------------+
| viewmaster      | %           |
| labsdbadmin     | 10.64.37.19 |
| labsdbadmin     | 10.64.37.20 |
| maintainindexes | localhost   |
| maintainviews   | localhost   |
+-----------------+-------------+
clouddb1013:3313
+-----------------+-------------+
| User            | Host        |
+-----------------+-------------+
| viewmaster      | %           |
| labsdbadmin     | 10.64.37.19 |
| labsdbadmin     | 10.64.37.20 |
| maintainindexes | localhost   |
| maintainviews   | localhost   |
+-----------------+-------------+
clouddb1014:3312
+-----------------+-------------+
| User            | Host        |
+-----------------+-------------+
| viewmaster      | %           |
| labsdbadmin     | 10.64.37.19 |
| labsdbadmin     | 10.64.37.20 |
| maintainindexes | localhost   |
| maintainviews   | localhost   |
+-----------------+-------------+
clouddb1014:3317
+-----------------+-------------+
| User            | Host        |
+-----------------+-------------+
| viewmaster      | %           |
| labsdbadmin     | 10.64.37.19 |
| labsdbadmin     | 10.64.37.20 |
| maintainindexes | localhost   |
| maintainviews   | localhost   |
+-----------------+-------------+
clouddb1015:3314
+-----------------+-------------+
| User            | Host        |
+-----------------+-------------+
| viewmaster      | %           |
| labsdbadmin     | 10.64.37.19 |
| labsdbadmin     | 10.64.37.20 |
| maintainindexes | localhost   |
| maintainviews   | localhost   |
+-----------------+-------------+
clouddb1015:3316
+-----------------+-------------+
| User            | Host        |
+-----------------+-------------+
| viewmaster      | %           |
| labsdbadmin     | 10.64.37.19 |
| labsdbadmin     | 10.64.37.20 |
| maintainindexes | localhost   |
| maintainviews   | localhost   |
+-----------------+-------------+
clouddb1016:3315
+-----------------+-------------+
| User            | Host        |
+-----------------+-------------+
| viewmaster      | %           |
| labsdbadmin     | 10.64.37.19 |
| labsdbadmin     | 10.64.37.20 |
| maintainindexes | localhost   |
| maintainviews   | localhost   |
+-----------------+-------------+
clouddb1016:3318
+-----------------+-------------+
| User            | Host        |
+-----------------+-------------+
| viewmaster      | %           |
| labsdbadmin     | 10.64.37.19 |
| labsdbadmin     | 10.64.37.20 |
| maintainindexes | localhost   |
| maintainviews   | localhost   |
+-----------------+-------------+
clouddb1017:3311
+-----------------+-------------+
| User            | Host        |
+-----------------+-------------+
| viewmaster      | %           |
| labsdbadmin     | 10.64.37.19 |
| labsdbadmin     | 10.64.37.20 |
| maintainindexes | localhost   |
| maintainviews   | localhost   |
+-----------------+-------------+
clouddb1017:3313
+-----------------+-------------+
| User            | Host        |
+-----------------+-------------+
| viewmaster      | %           |
| labsdbadmin     | 10.64.37.19 |
| labsdbadmin     | 10.64.37.20 |
| maintainindexes | localhost   |
| maintainviews   | localhost   |
+-----------------+-------------+
clouddb1018:3312
+-----------------+-------------+
| User            | Host        |
+-----------------+-------------+
| viewmaster      | %           |
| labsdbadmin     | 10.64.37.19 |
| labsdbadmin     | 10.64.37.20 |
| maintainindexes | localhost   |
| maintainviews   | localhost   |
+-----------------+-------------+
clouddb1018:3317
+-----------------+-------------+
| User            | Host        |
+-----------------+-------------+
| viewmaster      | %           |
| labsdbadmin     | 10.64.37.19 |
| labsdbadmin     | 10.64.37.20 |
| maintainindexes | localhost   |
| maintainviews   | localhost   |
+-----------------+-------------+
clouddb1019:3314
+-----------------+-------------+
| User            | Host        |
+-----------------+-------------+
| viewmaster      | %           |
| labsdbadmin     | 10.64.37.19 |
| labsdbadmin     | 10.64.37.20 |
| maintainindexes | localhost   |
| maintainviews   | localhost   |
+-----------------+-------------+
clouddb1019:3316
+-----------------+-------------+
| User            | Host        |
+-----------------+-------------+
| viewmaster      | %           |
| labsdbadmin     | 10.64.37.19 |
| labsdbadmin     | 10.64.37.20 |
| maintainindexes | localhost   |
| maintainviews   | localhost   |
+-----------------+-------------+

The _p (included information_schema_p) databases are also deployed:

clouddb1013:3311
2
clouddb1013:3313
905
clouddb1014:3312
18
clouddb1014:3317
13
clouddb1015:3314
3
clouddb1015:3316
4
clouddb1016:3315
16
clouddb1016:3318
2
clouddb1017:3311
2
clouddb1017:3313
905
clouddb1018:3312
18
clouddb1018:3317
13
clouddb1019:3314
3
clouddb1019:3316
4

clouddb1020 isn't set up yet as it is being used by @Bstorm.
@Bstorm you can go ahead from your side, and let's mark the hosts as done once everything is done from your end

Marostegui updated the task description. (Show Details)Dec 1 2020, 7:59 AM

@Bstorm clouddb1017:3311 clouddb1013:3311 are currently down as I need to rebuild them due to some data inconsistency which can be a leftover from T269072. Going to rebuild them now that they are not in service to avoid future issues.

Change 643356 merged by Bstorm:
[operations/puppet@production] wikireplicas: format maintain-meta_p.py with black

https://gerrit.wikimedia.org/r/643356

Change 643363 merged by Bstorm:
[operations/puppet@production] wikireplicas: Upgrade maintain-meta_p.py to python 3

https://gerrit.wikimedia.org/r/643363

Change 644552 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/cookbooks@master] wmcs: set the add_wiki cookbook to only run meta_p on some hosts

https://gerrit.wikimedia.org/r/644552

Change 644606 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: add cumin aliases that include multiinstance servers

https://gerrit.wikimedia.org/r/644606

Bstorm added a comment.Dec 1 2020, 8:00 PM

@Bstorm you can go ahead from your side, and let's mark the hosts as done once everything is done from your end

So far, I haven't got the actual end-user accounts created on any of them yet (haven't merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/642570). I will check the script quick to make sure it will operate sanely with those settings and try it out. I didn't actually merge that yet because, if it misbehaves, that takes down that service for the existing replicas as well so it needs babysitting. Should I consider that part of finishing an instance or set that part up in another task?

Bstorm added a comment.Dec 1 2020, 9:37 PM

Running create views across all the hosts except clouddb1013 and clouddb1017, I got this anomaly:

2020-12-01 21:30:17,981 WARNING DB frwiki does not exist to create views
2020-12-01 21:30:17,982 WARNING DB jawiki does not exist to create views
2020-12-01 21:30:17,983 WARNING DB ruwiki does not exist to create views
2020-12-01 21:30:23,224 WARNING DB arwiki does not exist to create views
2020-12-01 21:30:23,226 WARNING DB cawiki does not exist to create views
2020-12-01 21:30:23,227 WARNING DB eswiki does not exist to create views
2020-12-01 21:30:23,229 WARNING DB fawiki does not exist to create views
2020-12-01 21:30:23,230 WARNING DB frwiktionary does not exist to create views
2020-12-01 21:30:23,232 WARNING DB hewiki does not exist to create views
2020-12-01 21:30:23,233 WARNING DB huwiki does not exist to create views
2020-12-01 21:30:23,234 WARNING DB kowiki does not exist to create views
2020-12-01 21:30:23,236 WARNING DB metawiki does not exist to create views
2020-12-01 21:30:23,237 WARNING DB rowiki does not exist to create views
2020-12-01 21:30:23,238 WARNING DB ukwiki does not exist to create views
2020-12-01 21:30:23,240 WARNING DB viwiki does not exist to create views
2020-12-01 21:30:23,241 WARNING DB centralauth does not exist to create views

Is that expected at this time or is there a possibility that the config checkout for mediawiki has something incorrect about instance locations? @Marostegui

Mentioned in SAL (#wikimedia-operations) [2020-12-02T00:14:56Z] <bstorm> created views and wikireplicas indexes on clouddb10[13-19] sans s1 T268312

@Bstorm clouddb1013;3311 and clouddb1017:3311 are back and with all the grants and the _p database. Can you try those?

Running create views across all the hosts except clouddb1013 and clouddb1017, I got this anomaly:

2020-12-01 21:30:17,981 WARNING DB frwiki does not exist to create views
2020-12-01 21:30:17,982 WARNING DB jawiki does not exist to create views
2020-12-01 21:30:17,983 WARNING DB ruwiki does not exist to create views
2020-12-01 21:30:23,224 WARNING DB arwiki does not exist to create views
2020-12-01 21:30:23,226 WARNING DB cawiki does not exist to create views
2020-12-01 21:30:23,227 WARNING DB eswiki does not exist to create views
2020-12-01 21:30:23,229 WARNING DB fawiki does not exist to create views
2020-12-01 21:30:23,230 WARNING DB frwiktionary does not exist to create views
2020-12-01 21:30:23,232 WARNING DB hewiki does not exist to create views
2020-12-01 21:30:23,233 WARNING DB huwiki does not exist to create views
2020-12-01 21:30:23,234 WARNING DB kowiki does not exist to create views
2020-12-01 21:30:23,236 WARNING DB metawiki does not exist to create views
2020-12-01 21:30:23,237 WARNING DB rowiki does not exist to create views
2020-12-01 21:30:23,238 WARNING DB ukwiki does not exist to create views
2020-12-01 21:30:23,240 WARNING DB viwiki does not exist to create views
2020-12-01 21:30:23,241 WARNING DB centralauth does not exist to create views

Is that expected at this time or is there a possibility that the config checkout for mediawiki has something incorrect about instance locations? @Marostegui

I am not fully sure how the script works, but given that each instance has its own wikis, I would expect that is normal?
For instance if you connect to the 3311 port, then only enwiki would be there. So it is expected that, for instance jawiki isn't there, as jawiki is part of s6 (3316).
So I would expect that if we read ./usr/local/lib/mediawiki-config/dblists/s1.dblist to create the views on hosts with 3311, only the wikis showing up on those will succeed.
If we attempt to create views on hosts that have mysql on 3311 using any other dblist apart from s1.dblist is expected to fail.
Does that make sense and would explain those errors?.

@Bstorm you can go ahead from your side, and let's mark the hosts as done once everything is done from your end

So far, I haven't got the actual end-user accounts created on any of them yet (haven't merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/642570). I will check the script quick to make sure it will operate sanely with those settings and try it out. I didn't actually merge that yet because, if it misbehaves, that takes down that service for the existing replicas as well so it needs babysitting. Should I consider that part of finishing an instance or set that part up in another task?

Let's create a separate task for the end-user accounts then and use this one just for "internal" grants and _p databases.

Marostegui updated the task description. (Show Details)Dec 2 2020, 6:06 AM
Bstorm added a comment.Dec 2 2020, 3:19 PM

I am not fully sure how the script works, but given that each instance has its own wikis, I would expect that is normal?
For instance if you connect to the 3311 port, then only enwiki would be there. So it is expected that, for instance jawiki isn't there, as jawiki is part of s6 (3316).
So I would expect that if we read ./usr/local/lib/mediawiki-config/dblists/s1.dblist to create the views on hosts with 3311, only the wikis showing up on those will succeed.

That's what the script does (reads the s#.dblist). This is why I am concerned that one or more sections reported missing DBs. s1 and s3, for instance report no missing DBs. The same for s4. I can run the script across each of the hosts to figure out which section is throwing the errors and compare the dblist files. I do see metawiki, centralauth (which should only be tried on s7). The others I don't know which section is was trying at the time. I can also add a log of which section it is on when it throws the error. This definitely should not happen and is a sign of *something* being strange. I'm just not sure what :)

That'd be strange if they were missing databases, as the copies are being done from the sanitarium hosts or even their masters.
Let's try to narrow which section and for which specific database they're complaining so we can double check.
If any, I would have guessed s3 could be the one that could have different database on the dblits file than live as there are more than 900 there.

Bstorm added a comment.Dec 2 2020, 3:29 PM

I'll get more info today by trying again on servers that have possible issues with debug logging (s7, s6, s5...I think all the others ran fine). Issues could be in the script, the dblists, the dbs. I'll also do s1 again now that they are back.

As for clouddb1020, that's all waiting on comments on T267376#6660382 and whether we can get some IPs and such. I may not be able to actually test on clouddb1020 in the end (due to network constraints) and might need to just deploy to "real" proxies 😱

I'll get more info today by trying again on servers that have possible issues with debug logging (s7, s6, s5...I think all the others ran fine). Issues could be in the script, the dblists, the dbs. I'll also do s1 again now that they are back.

I am surprised if s7 had issues as we've touched wikis there in years.
s5 was changed some months ago when we moved wikis (can't find the task number).

For the record:

s7 live

# mysql.py -hclouddb1014:3317 -e "show databases" -BN
arwiki
arwiki_p
cawiki
cawiki_p
centralauth
eswiki
eswiki_p
fawiki
fawiki_p
frwiktionary
frwiktionary_p
heartbeat
hewiki
hewiki_p
huwiki
huwiki_p
information_schema
information_schema_p
kowiki
kowiki_p
metawiki
metawiki_p
mysql
ops
performance_schema
rowiki
rowiki_p
sys
ukwiki
ukwiki_p
viwiki
viwiki_p

s5 live

# mysql.py -hclouddb1016:3315 -e "show databases" -BN
apiportalwiki
apiportalwiki_p
arbcom_ruwiki_p
avkwiki
avkwiki_p
cebwiki
cebwiki_p
dewiki
dewiki_p
enwikivoyage
enwikivoyage_p
heartbeat
information_schema
information_schema_p
jawikivoyage
jawikivoyage_p
lldwiki
lldwiki_p
mgwiktionary
mgwiktionary_p
mhwiktionary
mhwiktionary_p
muswiki
muswiki_p
mysql
ops
performance_schema
shwiki
shwiki_p
smnwiki
smnwiki_p
srwiki
srwiki_p
sys
thankyouwiki
thankyouwiki_p

As for clouddb1020, that's all waiting on comments on T267376#6660382 and whether we can get some IPs and such. I may not be able to actually test on clouddb1020 in the end (due to network constraints) and might need to just deploy to "real" proxies 😱

I will give it a look - but not sure I will have much to say there :-)
Remember we can try to move all the services to one of the proxies and use the other one to deploy your tests if that helps to make it less scary!

Bstorm added a comment.Dec 2 2020, 6:02 PM

Yeah, I think I'll update the task over there today to take clouddb1020 off it. It just makes it more confusing anyway.

Change 644906 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: fix error in the logic for multiinstance selections

https://gerrit.wikimedia.org/r/644906

Change 644906 merged by Bstorm:
[operations/puppet@production] wikireplicas: fix error in the logic for multiinstance selections

https://gerrit.wikimedia.org/r/644906

Change 644934 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: Fix another mistake in the script

https://gerrit.wikimedia.org/r/644934

Change 644934 merged by Bstorm:
[operations/puppet@production] wikireplicas: Fix another mistake in the script

https://gerrit.wikimedia.org/r/644934

Bstorm added a comment.EditedDec 2 2020, 10:39 PM

So I'm glad I noticed that warning. A few things have come out of it:

  1. I've fix the script to be much better around the multi-instance settings.
  2. It was creating _p databases for wikis that aren't on the replicas and should not be (so I am cleaning them up).
  3. It was only running half the loop in many cases (fixed).
  4. The script can do the CREATE DB and role stuff just fine on its own now (based on 2 above).

This is both tedious because of the mistake, and good news :) EDIT: it was two when I started typing, but it became four things.

Change 642570 merged by Bstorm:
[operations/puppet@production] wikireplicas: extend maintain_dbusers to multiinstance replicas

https://gerrit.wikimedia.org/r/642570

Change 644949 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: extend maintain_dbusers to multiinstance replicas

https://gerrit.wikimedia.org/r/644949

Change 644949 merged by Bstorm:
[operations/puppet@production] wikireplicas: extend maintain_dbusers to multiinstance replicas

https://gerrit.wikimedia.org/r/644949

Change 644950 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: extend maintain_dbusers to multiinstance--test 2

https://gerrit.wikimedia.org/r/644950

Change 644950 merged by Bstorm:
[operations/puppet@production] wikireplicas: extend maintain_dbusers to multiinstance--test 2

https://gerrit.wikimedia.org/r/644950

Change 644952 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: extend maintain_dbusers to multiinstance

https://gerrit.wikimedia.org/r/644952

So I'm glad I noticed that warning. A few things have come out of it:

  1. I've fix the script to be much better around the multi-instance settings.
  2. It was creating _p databases for wikis that aren't on the replicas and should not be (so I am cleaning them up).
  3. It was only running half the loop in many cases (fixed).
  4. The script can do the CREATE DB and role stuff just fine on its own now (based on 2 above).

This is both tedious because of the mistake, and good news :) EDIT: it was two when I started typing, but it became four things.

Excellent news!
Should we consider clouddb1013:3311 and clouddb1017:3311 as done?

Bstorm updated the task description. (Show Details)Dec 3 2020, 3:13 PM
Bstorm added a comment.Dec 3 2020, 3:15 PM

Yep, marking them done. Also I think I worked out the kinks in the maintain-dbusers process yesterday, so I should be able to get the user accounts syncing soon.

I think that also gave me more confidence in my puppet stuff regarding instance resources to dynamically generate the proxy configs, so let's go ahead and finish up with clouddb1020. I don't think network restrictions are going to allow a meaningful test there. I'll put up a patch today to apply the role to it.

Change 644606 merged by Bstorm:
[operations/puppet@production] wikireplicas: add cumin aliases that include multiinstance servers

https://gerrit.wikimedia.org/r/644606

Change 644552 merged by jenkins-bot:
[operations/cookbooks@master] wmcs: set the add_wiki cookbook to only run meta_p on some hosts

https://gerrit.wikimedia.org/r/644552

Change 643578 merged by Bstorm:
[operations/puppet@production] wikireplicas: add maintain-meta_p only to s7 and legacy replicas

https://gerrit.wikimedia.org/r/643578

Change 645114 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: let clouddb1020 join the party

https://gerrit.wikimedia.org/r/645114

Change 644952 merged by Bstorm:
[operations/puppet@production] wikireplicas: extend maintain_dbusers to multiinstance

https://gerrit.wikimedia.org/r/644952

Change 645173 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: fix the harvest-replicas functionality

https://gerrit.wikimedia.org/r/645173

Change 645173 merged by Bstorm:
[operations/puppet@production] wikireplicas: fix the harvest-replicas functionality

https://gerrit.wikimedia.org/r/645173

Bstorm added a comment.Dec 3 2020, 9:46 PM

So all the existing replicas will also now have the Toolforge user accounts. When we set up clouddb1020, we just need to run the harvest-replica bit again.

Great news Brooke. Maybe for clouddb1020 won't be needed, as I plan to clone it from clouddb1016. I will ask you though to double check once done.

Change 645114 merged by Marostegui:
[operations/puppet@production] wikireplicas: let clouddb1020 join the party

https://gerrit.wikimedia.org/r/645114

@Bstorm clouddb1020:3315 and clouddb1020:3318 is now up.
Can you double check that everything is there?

Thanks

@Bstorm I have seen some warnings on clouddb1016:3315 (and :3318) but they seem to be on all the hosts or at least on the ones I have checked regarding the labsdbadmin user

Dec 04 11:09:49 clouddb1016 mysqld[3591]: 2020-12-04 11:09:49 4486 [Warning] Aborted connection 4486 to db: 'unconnected' user: 'labsdbadmin' host: '10.64.37.19' (Got an error reading communication packets)
Dec 04 11:12:02 clouddb1016 mysqld[3591]: 2020-12-04 11:12:02 4557 [Warning] Aborted connection 4557 to db: 'unconnected' user: 'labsdbadmin' host: '10.64.37.19' (Got an error reading communication packets)
Dec 04 13:36:22 clouddb1013 mysqld[7286]: 2020-12-04 13:36:22 115747 [Warning] Aborted connection 115747 to db: 'unconnected' user: 'labsdbadmin' host: '10.64.37.19' (Got an error reading communication packets)

The grants are only for labstore1004 and 1005 - I have tested them and they work fine

root@labstore1004:~# mysql -ulabsdbadmin -p -h clouddb1016 -P3315
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 8695
Server version: 10.4.15-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql:labsdbadmin@clouddb1016 [(none)]>



root@labstore1005:~# mysql   -ulabsdbadmin -p -h clouddb1016 -P3315
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 8733
Server version: 10.4.15-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql:labsdbadmin@clouddb1016 [(none)]>

This is probably not a big deal, but maybe those scripts are not properly closing their connections?

bd808 moved this task from Backlog to Wiki replicas on the Data-Services board.Dec 4 2020, 11:19 PM
Bstorm added a comment.Dec 5 2020, 8:59 PM

This is probably not a big deal, but maybe those scripts are not properly closing their connections?

I'd consider it highly likely that the backfill script for user accounts does not properly close connections. I can look into that. We don't use it often, so the code has suffered from a lot of rot :)

Bstorm added a comment.Dec 7 2020, 7:21 PM

@Bstorm clouddb1020:3315 and clouddb1020:3318 is now up.
Can you double check that everything is there?

Thanks

Looks good.

Bstorm closed this task as Resolved.Dec 7 2020, 7:21 PM
Bstorm updated the task description. (Show Details)