Page MenuHomePhabricator

Check and fix GRANT issues of wikiuser
Closed, ResolvedPublic

Description

Similar to T296274: Clean up wikiadmin GRANTs mess but running for wikiuser, that user handles webrequests.

Result of check:

General issues

DB unavaiable
No further check was done on this db
None!


10.64.% user missing
No further check was done on this grants of 10.64.% user in these dbs

  • db1140.eqiad.wmnet:3311 (s1)
  • db1133.eqiad.wmnet:3306 (s1)
  • db1128.eqiad.wmnet:3306 (s1)
  • db1125.eqiad.wmnet:3306 (s4)
  • db1124.eqiad.wmnet:3306 (s4)

10.192.% user missing
No further check was done on this grants of 10.192.% user in these dbs

  • db1140.eqiad.wmnet:3311 (s1)
  • db1133.eqiad.wmnet:3306 (s1)
  • db1128.eqiad.wmnet:3306 (s1)
  • db1125.eqiad.wmnet:3306 (s4)
  • db1124.eqiad.wmnet:3306 (s4)

localhost user found
None!


10.64.%

appserver grant missing
None!


replication grant missing
None!


heartbeat grant missing
None!


centralauth grant missing
None!


10.192.%

appserver grant missing
None!


replication grant missing
None!


heartbeat grant missing
None!


centralauth grant missing
None!


Event Timeline

Ladsgroup triaged this task as Medium priority.Nov 26 2021, 2:09 PM
Ladsgroup moved this task from Triage to Ready on the DBA board.
Ladsgroup moved this task from Ready to In progress on the DBA board.

Mentioned in SAL (#wikimedia-operations) [2021-12-07T13:52:11Z] <Amir1> removing wikiuser@localhost on s6 (T296537)

I removed localhost wikiuser from s6, I let it stay like this for a bit before moving forward to the rest of infra.

Mentioned in SAL (#wikimedia-operations) [2021-12-07T14:15:17Z] <Amir1> fixing heartbeat grants for wikiuser across the cluster (T296537)

Fixed the heartbeat mess and the ran the analyze again. Much cleaner, will clean up more tomorrow.

Mentioned in SAL (#wikimedia-operations) [2021-12-08T15:04:06Z] <Amir1> removing rest of wikiuser@localhost (T296537)

Removed all of reported localhost users and ran the script again. It seems now several codfw that were shut down for de-racking now are showing up in the error list. Fixing those now.

Regarding the "a" grant:

ladsgroup@cumin1001:~$ grep a s3_dbs | grep -v wik
Database
boards
boardvote
boardvote2005
boardvote2006
boardvote2007_test
boardvotetest
defoundation
heartbeat
information_schema
jamestemp
katesdb
oai
performance_schema
steward
ladsgroup@cumin1001:~$ grep a s7_dbs | grep -v wik
Database
centralauth
heartbeat
information_schema
performance_schema

the s3 was made from -hdb2149 and s7 was made from -hdb2118

`GRANT SELECT, INSERT, UPDATE, DELETE ON `centralauth`.* TO `wikiuser`@`10.64.%`

^ This should only exist on s7. If it exists somewhere else, it should be dropped.

`GRANT SELECT, INSERT, UPDATE, DELETE ON `%a%`.* TO `wikiuser`@`10.64.%``

^ This should be deleted, trying first on an eqiad host and leaving it for 1 week and then everywhere else.

`GRANT SELECT, INSERT, UPDATE, DELETE ON `%a%`.* TO `wikiuser`@`10.192.%`

^ Same as above.

`GRANT SELECT, INSERT, UPDATE, DELETE ON `centralauth`.* TO `wikiuser`@`10.64.%`

^ This should only exist on s7. If it exists somewhere else, it should be dropped.

`GRANT SELECT, INSERT, UPDATE, DELETE ON `%a%`.* TO `wikiuser`@`10.64.%``

^ This should be deleted, trying first on an eqiad host and leaving it for 1 week and then everywhere else.

It's a bit weirder than that. It is actually needed on s7 in eqiad (not codfw) because they lack the centralauth grant. I obviously add the centralauth grant before removing this one and with my check, ca is the only db that doesn't follow "wik" pattern and does have "a" so I think it's fine to go ahead for all of it in one go.

s3 is a different story altogether. Definitely we need to remove one host and wait a while.

Thanks. I will run the analyze code again soon after fixing s7 grants and fix whatever is in db1102.

To recap:

  • wikiuser and wikiadmin are inconsistent in the source of truth (puppet) on centralauth
  • We have an interesting extra grant on any database with the letter “a”. This (I think) was added to let wikiuser access s7 for centralauth as it’s only in s7 and s3. And I checked and there is no other database in s7 that would get access by this grant (except performance_schema and information_schema but it should not have that access anyway)
    • But this grant also exists in most hosts of s3 but only for half of ranges (10.64% but it’s missing on 10.192%) and this is where things get interesting:
      • There are a lot of random databases that wikiuser had access to just because of this grant. Including “jamestmp” and “katesdb”: T296537#7556536
      • There are a lot of dbs that don’t have the letter “a” in it and it means they are only accessible to root: T297297
      • There are also lots of databases that have “%wik%” in them but they are not in s3 dblist so mediawiki wouldn’t route any request to them: T297297#7556651 but wikiuser has access to it.
      • Except the %wik% ones, wikiadmin doesn’t have any grants like this meaning any maintenance or script runs on these databases would just fail.

So let's remove that %a% grant from everywhere. And get only s7 hosts with an centralauth grant.
I wouldn't touch the %wiki% grants as those are the normal ones we have across the board.

Mentioned in SAL (#wikimedia-operations) [2021-12-13T08:45:42Z] <Amir1> fixing centralauth grants of wikiuser on all of s7 T296537

Mentioned in SAL (#wikimedia-operations) [2021-12-13T08:48:30Z] <Amir1> removing grant of '%a%' on db1123 (s3) T296537

I removed it on db1123 from s3 and will let it stay there for a bit before moving on the removing this from the rest of s3

Change 746826 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] [WIP] mariadb: Make centralauth GRANTs conditional to s7

https://gerrit.wikimedia.org/r/746826

Change 747548 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[labs/private@master] Set dummy wikiuser and wikiadmin passwords

https://gerrit.wikimedia.org/r/747548

Change 747548 merged by Ladsgroup:

[labs/private@master] Set dummy wikiuser and wikiadmin passwords

https://gerrit.wikimedia.org/r/747548

Mentioned in SAL (#wikimedia-operations) [2021-12-15T17:33:30Z] <Amir1> removing grant on letter a on all of s3 hosts (T296537)

Clean now \o/ I just need to get the patch merged to call this done.

Change 746826 merged by Ladsgroup:

[operations/puppet@production] mariadb: Make centralauth GRANTs conditional to s7

https://gerrit.wikimedia.org/r/746826

Ladsgroup moved this task from In progress to Done on the DBA board.

Well, wikiuser decided to go out with a bang and cause puppet failures. Fixed now and puppetboard is clean ^^