Maniphest T201662

Change the MySQL passwords
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	tstarling
	Aug 10 2018, 1:37 AM

Description

The MySQL root password has not been changed since before I got shell access in 2003. The replication password has been the same since replication was first set up in 2004.

According to the SAL I changed the wikiuser and wikiadmin passwords in 2007. They probably haven't been changed again since then. The procedure I used to change the wikiuser password was:

Create a new user "wikiuser2" with the new password
Deploy the MW configuration change to use wikiuser2
Verify that no wikiuser connections remain
Change the password for wikiuser
Deploy the MW configuration change to use wikiuser with the new password

wikiadmin T232106
wikiuser

Related Objects
Search...

		Status	Subtype	Assigned	Task
					Restricted Task
		Resolved		Ladsgroup	T201662 Change the MySQL passwords

Event Timeline

tstarling triaged this task as Medium priority.Aug 10 2018, 1:37 AM

tstarling created this task.

tstarling created this object with visibility "Custom Policy".

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 10 2018, 1:37 AM

jcrespo moved this task from Triage to Backlog on the DBA board.Aug 10 2018, 10:07 AM

• chasemp subscribed.Aug 13 2018, 12:59 PM

Ladsgroup awarded a token.Aug 19 2018, 7:14 PM

wikiadmin password being changed due to {T232106}

Jdforrester-WMF subscribed.Sep 5 2019, 4:18 PM

@Marostegui Given update on T234014, I would like to 1) test no leak is present and 2) schedule ASAP, with your help as well as someone from service ops and, if possible, someone with Mediawiki config experience (security or CPT) to do a password rollover change. Also document the procedure properly/automate it (based on above Tim's guidance) for future easier/regular changes. Dumps support (@ArielGlenn) would be helpful, too.

Given long running queries on dumps, cron and jobqueue, among others, this could take more than a few hours.

[note that to fully close this ticket, we should also change root-production and replication-production ones]

For minimal disruption on my side, we'd want the xml/sql dumps to be either in the 7z compression phase or idle. That will happen around the 17th I think. If we have to go earlier then it can be any time from now until then, as all times are equally likely to be disruptive. Just give me a heads up so that I can keep an eye out.

We'll probably take a hit on wikidata entity dumps no matter what,since they are on a weekly schedule that won't coincide with the xml/sql ones, but the failed bits will rerun.

+1 to change the password, but I would suggest we resume this conversation in January, after the break.

• chasemp added a project: Security.Feb 10 2020, 10:58 PM

• chasemp removed a project: acl*security.Feb 20 2020, 8:07 PM

jcrespo updated the task description. (Show Details)Aug 18 2020, 8:58 AM

LSobanski moved this task from Pending comment to Ready on the DBA board.Oct 8 2020, 10:11 AM

In T201662#5730769, @Marostegui wrote:

+1 to change the password, but I would suggest we resume this conversation in January, after the break.

We passed two Januaries since then :D

LSobanski subscribed.Feb 9 2021, 10:22 AM

• Kormat subscribed.Jun 10 2021, 5:15 PM

We talked about this a bit yesterday and I think with automation of reporting of grants, it's now easier to roll out new users (I estimate it to be a couple of days of work at most now). That being said, currently wikiuser has 26 distinct grants across cluster of which 11 of them are about wikitech (and 10 more on wikiadmin). Maybe we should roll this out once T292707 is done so we could simplify our grants? Maybe we could push for that?

Ladsgroup claimed this task.Mar 14 2022, 1:33 PM

Ladsgroup moved this task from Ready to In progress on the DBA board.

Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptMar 14 2022, 1:33 PM

So I duplicated wikiuser and its grants anywhere the grant reporting system could get its hands on. The new user is wikiuser2022 (so hopefully we will change it next year). I will run another reporting tomorrow for dbs that have been missed (in case they were shut down today but I can't find any) and then deploy mw with the new user.

Next week we should start dropping wikiuser user and its grants.

I'm seeing a user without grants on some hosts, called wikiuser@localhost.localdomain. I simply won't move this forward to wikiuser2022.

In T201662#7775228, @Ladsgroup wrote:

I'm seeing a user without grants on some hosts, called wikiuser@localhost.localdomain. I simply won't move this forward to wikiuser2022.

This can be deleted

I just deployed wikiuser2022 in appservers everywhere. No errors so far.

Created {T303800}

Created this patch for making it easier to change username of wikiuser in puppet https://gerrit.wikimedia.org/r/c/operations/puppet/+/770890

Going to remove all wikiuser users and grants everywhere in an hour or so. Keep a way to roll it back if needed.

Changing replication user implies running a change master to master host on all the hosts.
Given how limited the replication user is, I wouldn't include it here. It is a pretty risky operation.

Ladsgroup updated the task description. (Show Details)Mar 16 2022, 9:32 AM

I just removed wikiuser from every host we have. It was automated based on the set of code I've written to automate grant handling work. The previous grants are stored in my home directory in cumin under omg_with_old_wikiuser.json which you can easily write a script to bring back old wikiuser and its grants if needed

But the errors are clean so I think it's safe to call it done. I agree with Manuel that replication user is a different beast with different needs and MO. I create a follow up and call this done.

I leave this to the security team if they want to make it public or not.

Created {T303920} for repl user.

I need to document the process of changing password so we can do it for wikiadmin and next year on wikiuser. For that, I need to get omg code to somewhere. In my todo list.

I double checked with git blame and the root db password was NOT changed at T104900#3569900, only the clouddb password was separated (not touching the production one!).

Filed T303930: Change mysql root password. Will get to it later.

Doing any security ticket seems like fighting hydra, closing one causes two new to show up.

Note for next time: various references to the username also needed to be updated. E.g.:

replication user has also been rotated. Now there is one user with rather old password is left: ops. Let me take a look at that first. It should probably be dropped.

ops is the user under which the query killer events & logs run. If you drop it, events will fail and dbs will be overloaded, as it happens usually when the events for a db haven't been loaded properly.

Well, It's not in puppet grants file and it's not in roughly 1/3rd of the hosts. It's also sometimes on localhost, and sometimes on '%' which needs to be at least cleaned up.

If I remember correctly the ops user was migrated to root (I haven't checked but I recall something like that). The database is still ops but I believe the definer was migrated to root. Otherwise, if the user is missing in so many hosts, that's definitely something we need to look at.

@Ladsgroup can you give me a lost of hosts without that user?

This is one set I found

P63997 (An Untitled Masterwork)

1	db1163 (s1)
2	db1179 (x1)
3	db1216:3320 (x1)
4	db1220 (x1)
5	db1225:3312 (s2)
6	db1225:3320 (x1)
7	db1237 (x1)
8	db1240:3311 (s1)
9	dbstore1009:3320 (x1)
10	es1026 (es2)
11	es1027 (es1)
12	es1028 (es3)
13	es1029 (es1)
14	es1030 (es2)
15	es1031 (es3)
16	es1032 (es1)
17	es1033 (es2)
18	es1034 (es3)
19	es1035 (es7)
20	es1036 (es6)
21	es1037 (es6)
22	es1038 (es6)
23	es1039 (es7)
24	es1040 (es7)
25	es2026 (es2)
26	es2027 (es3)
27	es2028 (es1)
28	es2029 (es3)
29	es2030 (es1)
30	es2031 (es2)
31	es2032 (es1)
32	es2033 (es2)
33	es2034 (es3)
34	es2035 (es6)
35	es2036 (es6)
36	es2037 (es6)
37	es2038 (es7)
38	es2039 (es7)
39	es2040 (es7)
40	pc1011 (pc1)
41	pc1012 (pc2)
42	pc1013 (pc3)
43	pc1014 (pc1)
44	pc1015 (pc4)
45	pc1016 (pc4)
46	pc2011 (pc1)
47	pc2012 (pc2)
48	pc2013 (pc3)
49	pc2014 (pc1)
50	pc2015 (pc4)
51	pc2016 (pc4)

Looking at the events file we run every time we do a switchover, it is indeed from root:

drop event if exists wmf_slave_wikiuser_sleep;;

create definer='root'@'localhost' event wmf_slave_wikiuser_sleep
on schedule every 30 second starts date(now()) + interval 5 second
do begin

Yes, I can see events being killed on hosts which don't have ops user.

cumin2024@db1163.eqiad.wmnet[ops]> select stamp from event_log order by stamp desc limit 1;
+---------------------+
| stamp               |
+---------------------+
| 2024-06-04 04:54:33 |
+---------------------+
1 row in set (0.001 sec)

cumin2024@db1163.eqiad.wmnet[ops]> select user from mysql.user;
+---------------+
| User          |
+---------------+
| wikiadmin2023 |
| wikiuser2023  |
| repl2024      |
| cumin2024     |
| repl2024      |
| cumin2024     |
| orchestrator  |
| mariadb.sys   |
| nagios        |
| prometheus    |
| root          |
+---------------+
11 rows in set (0.002 sec)

Then I mistook the ops user with the ops db, sorry.

Yes, the ops database is where the events are installed and that is not being touched.

Can we drop them now 🥺

Yes, I think we can. Let's do it in a controlled way just in case. Let's drop all codfw and we can check in a few hours the event_log and make sure events are still being killed, if so, we can drop eqiad.

Sure! sounds good to me.

es2022 (es4), es2021 (es4), es2020 (es4), es1022 (es4), es1021 (es4), es1020 (es4), db2215 (x1), db2201:3320 (x1), db2197:3320 (x1), db2196 (x1), db2191 (x1), db2131 (x1), db2115 (x1), db2097:3320 (x1), db2144 (x2), db2143 (x2), db2142 (x2), es2025 (es5), es2024 (es5), es2023 (es5), es1025 (es5), es1024 (es5), es1023 (es5), dbstore1008:3311 (s1), db2216 (s1), db2212 (s1), db2203 (s1), db2188 (s1), db2186:3311 (s1), db2176 (s1), db2174 (s1), db2173 (s1), db2170 (s1), db2153 (s1), db2146 (s1), db2145 (s1), db2141:3311 (s1), db2130 (s1), db2116 (s1), dbstore1007:3312 (s2), db2207 (s2), db2204 (s2), db2197:3312 (s2), db2189 (s2), db2187:3312 (s2), db2175 (s2), db2148 (s2), db2138 (s2), db2126 (s2), db2125 (s2), db2097:3312 (s2), dbstore1007:3313 (s3), db2209 (s3), db2205 (s3), db2194 (s3), db2190 (s3), db2186:3313 (s3), db2177 (s3), db2156 (s3), db2149 (s3), db2139:3313 (s3), db2127 (s3), dbstore1007:3314 (s4), db2219 (s4), db2210 (s4), db2206 (s4), db2199:3314 (s4), db2187:3314 (s4), db2179 (s4), db2172 (s4), db2155 (s4), db2147 (s4), db2140 (s4), db2139:3314 (s4), db2137 (s4), db2136 (s4), db2099:3314 (s4), dbstore1008:3315 (s5), db2213 (s5), db2211 (s5), db2192 (s5), db2186:3315 (s5), db2178 (s5), db2171 (s5), db2157 (s5), db2128 (s5), db2123 (s5), dbstore1009:3316 (s6), db2217 (s6), db2214 (s6), db2197:3316 (s6), db2193 (s6), db2187:3316 (s6), db2180 (s6), db2169 (s6), db2158 (s6), db2151 (s6), db2129 (s6), db2124 (s6), db2114 (s6), db2097:3316 (s6), dbstore1008:3317 (s7), db2220 (s7), db2218 (s7), db2208 (s7), db2187:3317 (s7), db2182 (s7), db2168 (s7), db2159 (s7), db2150 (s7), db2122 (s7), db2121 (s7), dbstore1009:3318 (s8), db2200:3318 (s8), db2198:3318 (s8), db2195 (s8), db2186:3318 (s8), db2181 (s8), db2167 (s8), db2166 (s8), db2165 (s8), db2164 (s8), db2163 (s8), db2162 (s8), db2161 (s8), db2154 (s8), db2152 (s8), db2098:3318 (s8)

Dropping it on these hosts ^

It seems to be working:

cumin2024@db2166.codfw.wmnet[ops]> select stamp from event_log order by stamp desc limit 1;
+---------------------+
| stamp               |
+---------------------+
| 2024-06-10 08:00:35 |
+---------------------+
1 row in set (0.032 sec)

cumin2024@db2166.codfw.wmnet[ops]> select user from mysql.user;
+---------------+
| User          |
+---------------+
| wikiadmin2023 |
| wikiuser2023  |
| repl2024      |
| cumin2024     |
| repl2024      |
| cumin2024     |
| orchestrator  |
| mariadb.sys   |
| nagios        |
| prometheus    |
| root          |
+---------------+
11 rows in set (0.033 sec)

Dropping it everywhere now.

Dropped everywhere. We now can make these tickets public if @Marostegui thinks it's fine: {T364985}, T303930: Change mysql root password and this ticket. cc @sbassett (T364985#9813011)