Page MenuHomePhabricator

Marostegui (Manuel Aróstegui)
Staff Database Administrator

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Sep 1 2016, 6:48 AM (280 w, 3 d)
Availability
Available
IRC Nick
marostegui
LDAP User
Marostegui
MediaWiki User
MArostegui (WMF) [ Global Accounts ]

TZ: UTC +1/+2

Recent Activity

Fri, Jan 14

Marostegui added a comment to T299046: Upgrade parsercache infra to Bullseye.

All codfw is now on Bullseye.

Fri, Jan 14, 1:32 PM · DBA
Marostegui updated the task description for T299046: Upgrade parsercache infra to Bullseye.
Fri, Jan 14, 1:32 PM · DBA
Marostegui updated the task description for T299046: Upgrade parsercache infra to Bullseye.
Fri, Jan 14, 9:19 AM · DBA
Marostegui updated the task description for T299046: Upgrade parsercache infra to Bullseye.
Fri, Jan 14, 8:22 AM · DBA
Marostegui changed the status of T299046: Upgrade parsercache infra to Bullseye, a subtask of T298585: Upgrade WMF database-and-backup-related hosts to bullseye, from Stalled to Open.
Fri, Jan 14, 6:59 AM · DBA
Marostegui changed the status of T299046: Upgrade parsercache infra to Bullseye from Stalled to Open.

I haven't seen anything relevant performance-wise on pc1011 so I think it is ok to go ahead and migrate our parsercache infra to Bullseye.

Fri, Jan 14, 6:59 AM · DBA
Marostegui added a comment to T263127: Remove groups from db configs.

logpager removed from s3 eqiad

Fri, Jan 14, 6:36 AM · Platform Engineering Code Jam, Performance-Team (Radar), Platform Engineering Roadmap Decision Making, DBA
Marostegui updated the task description for T298586: Upgrade all dbproxy hosts to Bullseye.
Fri, Jan 14, 6:25 AM · Patch-For-Review, DBA
Marostegui added a comment to T298586: Upgrade all dbproxy hosts to Bullseye.

m5 proxy failed over to dbproxy1021. Normally m5 services take quite a while to move all the connections thru the new proxy, but that's not a problem as I won't reimage the "old" proxy today.

Fri, Jan 14, 6:25 AM · Patch-For-Review, DBA
Marostegui added a comment to T298485: MW scripts should reload the database config.

I haven't checked the code fully, but it is my general understanding that if a connection fails, MW will try to reconnect and it also has the ability to consider one of various other replicas. If one of the replicas refuses new connections or otherwise too lagged or unavailable, I would expect it to pick another one without needing to load new configs (and remembers it for a short time after that via php-apcu).

Fri, Jan 14, 6:24 AM · MediaWiki-Maintenance-system, Performance-Team, User-Ladsgroup, DBA

Thu, Jan 13

Marostegui added a comment to T263127: Remove groups from db configs.

contributions removed from s3 eqiad

Thu, Jan 13, 12:44 PM · Platform Engineering Code Jam, Performance-Team (Radar), Platform Engineering Roadmap Decision Making, DBA
Marostegui updated the task description for T263127: Remove groups from db configs.
Thu, Jan 13, 12:42 PM · Platform Engineering Code Jam, Performance-Team (Radar), Platform Engineering Roadmap Decision Making, DBA
Marostegui added a comment to T263127: Remove groups from db configs.

I have removed all special groups from s3 codfw. Slowly starting in eqiad too now.

Thu, Jan 13, 12:42 PM · Platform Engineering Code Jam, Performance-Team (Radar), Platform Engineering Roadmap Decision Making, DBA
Marostegui updated the task description for T295965: Test MariaDB 10.4 with Bullseye.
Thu, Jan 13, 11:15 AM · Patch-For-Review, DBA
Marostegui updated the task description for T299123: es1022 troubles with PXE.
Thu, Jan 13, 11:10 AM · SRE, DBA, ops-eqiad
Marostegui added a comment to T295965: Test MariaDB 10.4 with Bullseye.

es2022 worked fine, so it could be restricted to either es1022 or es10XX hosts. We'll see...

Thu, Jan 13, 10:57 AM · Patch-For-Review, DBA
Marostegui added a comment to T295965: Test MariaDB 10.4 with Bullseye.

I am trying a reimage on es2022 to see if this was a one time thing or could be something related to all the new es hosts.

Thu, Jan 13, 10:42 AM · Patch-For-Review, DBA
Marostegui updated the task description for T295965: Test MariaDB 10.4 with Bullseye.
Thu, Jan 13, 10:27 AM · Patch-For-Review, DBA
Marostegui triaged T299123: es1022 troubles with PXE as Medium priority.
Thu, Jan 13, 10:22 AM · SRE, DBA, ops-eqiad
Marostegui created T299123: es1022 troubles with PXE.
Thu, Jan 13, 10:22 AM · SRE, DBA, ops-eqiad
Marostegui added a comment to T295965: Test MariaDB 10.4 with Bullseye.

The reimage was actually fine, what failed was to change BIOS parameters, but I have forced that manually.
Probably this host needs a good firmware and BIOS upgrade to start with cause the reimage failed with:

Running IPMI command: ipmitool -I lanplus -H es1022.mgmt.eqiad.wmnet -U root -E chassis bootparam get 5
Exception raised while executing cookbook sre.hosts.reimage:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/spicerack/_menu.py", line 234, in run
    raw_ret = runner.run()
  File "/srv/deployment/spicerack/cookbooks/sre/hosts/reimage.py", line 487, in run
    self.ipmi.check_bootparams()
  File "/usr/lib/python3/dist-packages/spicerack/ipmi.py", line 125, in check_bootparams
    raise IpmiCheckError(f"Expected BIOS boot params in {IPMI_SAFE_BOOT_PARAMS} got: {param}")
spicerack.ipmi.IpmiCheckError: Expected BIOS boot params in ('0000000000', '8000020000') got: 0000020000
Thu, Jan 13, 10:14 AM · Patch-For-Review, DBA
Marostegui added a comment to T295965: Test MariaDB 10.4 with Bullseye.

I was able to get into the debian installer by force PXE from the idrac manually, not sure if it is not trying to PXE on the right interface or what (cause the screen goes blank after PXE gets selected and then times out and boots from disk).
Using F12 to make sure it PXEs boot seem to have worked. I will create a task for DCOPs about this.

Thu, Jan 13, 9:55 AM · Patch-For-Review, DBA
Marostegui added a comment to T295965: Test MariaDB 10.4 with Bullseye.

I am trying to reinstall es1022, it does PXE boot but it is not getting into the Debian Installer, so I am investigating.

Thu, Jan 13, 9:29 AM · Patch-For-Review, DBA
Marostegui updated the task description for T295965: Test MariaDB 10.4 with Bullseye.
Thu, Jan 13, 8:57 AM · Patch-For-Review, DBA
Marostegui updated the task description for T263127: Remove groups from db configs.
Thu, Jan 13, 8:40 AM · Platform Engineering Code Jam, Performance-Team (Radar), Platform Engineering Roadmap Decision Making, DBA
Marostegui added a comment to T263127: Remove groups from db configs.

recentchanges removed from s7 eqiad. There are no more groups on s7. Going to see if I need to re-adjust weights.

Thu, Jan 13, 8:40 AM · Platform Engineering Code Jam, Performance-Team (Radar), Platform Engineering Roadmap Decision Making, DBA
Marostegui updated the task description for T295965: Test MariaDB 10.4 with Bullseye.
Thu, Jan 13, 8:26 AM · Patch-For-Review, DBA
Marostegui updated the task description for T287244: Considering switching innodb_checksum_algorithm=full_crc32.
Thu, Jan 13, 8:15 AM · Patch-For-Review, DBA
Marostegui added a comment to T287244: Considering switching innodb_checksum_algorithm=full_crc32.

Deployed this on db_inventory, which is db2093 and db1115. Unfortunately, this cannot be changed on 10.1, but it doesn't really matter as db1115 will be reimaged to 10.4 in less than a month for T297605: Shutdown Tendril and dbtree

Thu, Jan 13, 8:15 AM · Patch-For-Review, DBA
Marostegui updated the task description for T287244: Considering switching innodb_checksum_algorithm=full_crc32.
Thu, Jan 13, 8:09 AM · Patch-For-Review, DBA
Marostegui updated the task description for T287244: Considering switching innodb_checksum_algorithm=full_crc32.
Thu, Jan 13, 8:06 AM · Patch-For-Review, DBA
Marostegui added a comment to T287244: Considering switching innodb_checksum_algorithm=full_crc32.

Deployed on eqiad sanitarium hosts (db1154, db1155)

Thu, Jan 13, 8:06 AM · Patch-For-Review, DBA
Marostegui added a comment to T263127: Remove groups from db configs.

recentchangeslinked removed from s7 eqiad

Thu, Jan 13, 7:50 AM · Platform Engineering Code Jam, Performance-Team (Radar), Platform Engineering Roadmap Decision Making, DBA
Marostegui updated the task description for T298586: Upgrade all dbproxy hosts to Bullseye.
Thu, Jan 13, 7:48 AM · Patch-For-Review, DBA
Marostegui added a comment to T295965: Test MariaDB 10.4 with Bullseye.

db1169 is now fully pooled back in s1.

Thu, Jan 13, 7:47 AM · Patch-For-Review, DBA
Marostegui updated the task description for T298586: Upgrade all dbproxy hosts to Bullseye.
Thu, Jan 13, 6:39 AM · Patch-For-Review, DBA
Marostegui added a comment to T298586: Upgrade all dbproxy hosts to Bullseye.

m3 proxy has been failed over from dbproxy1016 to dbproxy1020

Thu, Jan 13, 6:38 AM · Patch-For-Review, DBA
Marostegui updated the task description for T285149: Schema change for dropping rev_page_id index.
Thu, Jan 13, 6:29 AM · MW-1.38-notes (1.38.0-wmf.9; 2021-11-16), Dumps-Generation, Blocked-on-schema-change, DBA
Marostegui added a comment to T285149: Schema change for dropping rev_page_id index.

s6 progress

  • dbstore1005
  • db2141
  • db2129
  • db2124
  • db2117
  • db2114
  • db2095
  • db2089
  • db2087
  • db2076
  • db1180
  • db1173
  • db1168
  • db1165
  • db1155
  • db1140
  • db1131
  • db1113
  • db1098
  • db1096
  • clouddb1021
  • clouddb1019
  • clouddb1015
Thu, Jan 13, 6:28 AM · MW-1.38-notes (1.38.0-wmf.9; 2021-11-16), Dumps-Generation, Blocked-on-schema-change, DBA
Marostegui added a comment to T285149: Schema change for dropping rev_page_id index.

I have manually removed the index from db1096:3316. Let's wait till Monday to see this host performs and if there are no regressions, let's deploy it everywhere.

Thu, Jan 13, 6:27 AM · MW-1.38-notes (1.38.0-wmf.9; 2021-11-16), Dumps-Generation, Blocked-on-schema-change, DBA
Marostegui updated the task description for T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.
Thu, Jan 13, 6:23 AM · Blocked-on-schema-change, DBA
Marostegui closed T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites as Resolved.

All done

Thu, Jan 13, 6:09 AM · Blocked-on-schema-change, DBA
Marostegui updated the task description for T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.
Thu, Jan 13, 6:09 AM · Blocked-on-schema-change, DBA
Marostegui added a comment to T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.

Views recreated on s8 clouddb* hosts.

Thu, Jan 13, 6:09 AM · Blocked-on-schema-change, DBA
Marostegui added a comment to T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.

s8 eqiad

  • dbstore1005
  • db1178
  • db1177
  • db1172
  • db1171
  • db1167
  • db1154
  • db1126
  • db1116
  • db1114
  • db1111
  • db1109
  • db1104
  • db1101
  • db1099
  • clouddb1021
  • clouddb1020
  • clouddb1016
Thu, Jan 13, 6:08 AM · Blocked-on-schema-change, DBA
Marostegui updated the task description for T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.
Thu, Jan 13, 6:02 AM · Blocked-on-schema-change, DBA
Marostegui added a comment to T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.

s8 deployment needs to be done manually, as right now there are dumps thread connected to all the s8 eqiad hosts so the script will not be able to advance:

db1178.eqiad.wmnet:3306
2367520963	wikiadmin	10.64.0.156:50602	wikidatawiki	Sleep	4		NULL	0.000
2367790498	wikiadmin	10.64.0.156:50808	wikidatawiki	Sleep	52		NULL	0.000
2367839053	wikiadmin	10.64.0.156:50868	wikidatawiki	Sleep	21		NULL	0.000
2367886157	wikiadmin	10.64.0.156:50944	wikidatawiki	Sleep	11		NULL	0.000
2367898478	wikiadmin	10.64.0.156:51004	wikidatawiki	Sleep	5		NULL	0.000
2367946148	wikiadmin	10.64.0.156:51112	wikidatawiki	Sleep	9		NULL	0.000
2367964115	wikiadmin	10.64.0.156:51136	wikidatawiki	Sleep	11		NULL	0.000
2367985433	wikiadmin	10.64.0.156:51172	wikidatawiki	Sleep	31		NULL	0.000
2368262205	wikiadmin	10.64.0.156:51256	wikidatawiki	Sleep	0		NULL	0.000
db1177.eqiad.wmnet:3306
3045009910	wikiadmin	10.64.0.156:32788	wikidatawiki	Sleep	59		NULL	0.000
3045023541	wikiadmin	10.64.0.156:32822	wikidatawiki	Sleep	57		NULL	0.000
3045053749	wikiadmin	10.64.0.156:32874	wikidatawiki	Sleep	4		NULL	0.000
3045064041	wikiadmin	10.64.0.156:32914	wikidatawiki	Sleep	8		NULL	0.000
3045089987	wikiadmin	10.64.0.156:32950	wikidatawiki	Sleep	1		NULL	0.000
3045095577	wikiadmin	10.64.0.156:32974	wikidatawiki	Sleep	457		NULL	0.000
3045109253	wikiadmin	10.64.0.156:33002	wikidatawiki	Sleep	0		NULL	0.000
3045119873	wikiadmin	10.64.0.156:33030	wikidatawiki	Sleep	94		NULL	0.000
3045121010	wikiadmin	10.64.0.156:33040	wikidatawiki	Sleep	22		NULL	0.000
3045152357	wikiadmin	10.64.0.156:33082	wikidatawiki	Sleep	0		NULL	0.000
3045577640	wikiadmin	10.64.0.156:33196	wikidatawiki	Sleep	35		NULL	0.000
db1172.eqiad.wmnet:3306
2088185998	wikiadmin	10.64.0.156:43628	wikidatawiki	Sleep	8		NULL	0.000
2088187935	wikiadmin	10.64.0.156:43638	wikidatawiki	Sleep	0		NULL	0.000
2088229649	wikiadmin	10.64.0.156:43726	wikidatawiki	Sleep	36		NULL	0.000
2088239729	wikiadmin	10.64.0.156:43774	wikidatawiki	Sleep	0		NULL	0.000
2088268619	wikiadmin	10.64.0.156:43864	wikidatawiki	Sleep	0		NULL	0.000
2088325425	wikiadmin	10.64.0.156:44008	wikidatawiki	Sleep	574		NULL	0.000
2088396353	wikiadmin	10.64.0.156:44018	wikidatawiki	Sleep	365		NULL	0.000
db1171.eqiad.wmnet:3318
db1167.eqiad.wmnet:3306
455720230	wikiadmin	10.64.0.156:43424	wikidatawiki	Sleep	3001		NULL	0.000
455720231	wikiadmin	10.64.0.156:43426	wikidatawiki	Sleep	3001		NULL	0.000
455720266	wikiadmin	10.64.0.156:43432	wikidatawiki	Sleep	3000		NULL	0.000
455720267	wikiadmin	10.64.0.156:43434	wikidatawiki	Sleep	3000		NULL	0.000
455720272	wikiadmin	10.64.0.156:43440	wikidatawiki	Sleep	3000		NULL	0.000
455720273	wikiadmin	10.64.0.156:43442	wikidatawiki	Sleep	3000		NULL	0.000
455720276	wikiadmin	10.64.0.156:43446	wikidatawiki	Sleep	3000		NULL	0.000
455720277	wikiadmin	10.64.0.156:43448	wikidatawiki	Sleep	3000		NULL	0.000
455720351	wikiadmin	10.64.0.156:43454	wikidatawiki	Sleep	2998		NULL	0.000
455720352	wikiadmin	10.64.0.156:43456	wikidatawiki	Sleep	2998		NULL	0.000
455720376	wikiadmin	10.64.0.156:43462	wikidatawiki	Sleep	2998		NULL	0.000
455720377	wikiadmin	10.64.0.156:43464	wikidatawiki	Sleep	2998		NULL	0.000
455720390	wikiadmin	10.64.0.156:43468	wikidatawiki	Sleep	2997		NULL	0.000
455720391	wikiadmin	10.64.0.156:43470	wikidatawiki	Sleep	2997		NULL	0.000
455720404	wikiadmin	10.64.0.156:43476	wikidatawiki	Sleep	2997		NULL	0.000
455720405	wikiadmin	10.64.0.156:43478	wikidatawiki	Sleep	2997		NULL	0.000
455720416	wikiadmin	10.64.0.156:43482	wikidatawiki	Sleep	2997		NULL	0.000
455720419	wikiadmin	10.64.0.156:43484	wikidatawiki	Sleep	2997		NULL	0.000
455720423	wikiadmin	10.64.0.156:43490	wikidatawiki	Sleep	2997		NULL	0.000
455720424	wikiadmin	10.64.0.156:43492	wikidatawiki	Sleep	2997		NULL	0.000
455720437	wikiadmin	10.64.0.156:43496	wikidatawiki	Sleep	2997		NULL	0.000
455720438	wikiadmin	10.64.0.156:43498	wikidatawiki	Sleep	2997		NULL	0.000
455720444	wikiadmin	10.64.0.156:43504	wikidatawiki	Sleep	2996		NULL	0.000
455720445	wikiadmin	10.64.0.156:43506	wikidatawiki	Sleep	2996		NULL	0.000
455720465	wikiadmin	10.64.0.156:43512	wikidatawiki	Sleep	2996		NULL	0.000
455720466	wikiadmin	10.64.0.156:43514	wikidatawiki	Sleep	2996		NULL	0.000
455720495	wikiadmin	10.64.0.156:43526	wikidatawiki	Sleep	2996		NULL	0.000
455720496	wikiadmin	10.64.0.156:43528	wikidatawiki	Sleep	2996		NULL	0.000
455720507	wikiadmin	10.64.0.156:43532	wikidatawiki	Sleep	2995		NULL	0.000
455720508	wikiadmin	10.64.0.156:43534	wikidatawiki	Sleep	2995		NULL	0.000
455720512	wikiadmin	10.64.0.156:43538	wikidatawiki	Sleep	2995		NULL	0.000
455720514	wikiadmin	10.64.0.156:43540	wikidatawiki	Sleep	2995		NULL	0.000
455720528	wikiadmin	10.64.0.156:43546	wikidatawiki	Sleep	2995		NULL	0.000
455720529	wikiadmin	10.64.0.156:43548	wikidatawiki	Sleep	2995		NULL	0.000
455720540	wikiadmin	10.64.0.156:43552	wikidatawiki	Sleep	2995		NULL	0.000
455720541	wikiadmin	10.64.0.156:43554	wikidatawiki	Sleep	2995		NULL	0.000
455720543	wikiadmin	10.64.0.156:43560	wikidatawiki	Sleep	2995		NULL	0.000
455720545	wikiadmin	10.64.0.156:43562	wikidatawiki	Sleep	2995		NULL	0.000
455720553	wikiadmin	10.64.0.156:43566	wikidatawiki	Sleep	2995		NULL	0.000
455720554	wikiadmin	10.64.0.156:43568	wikidatawiki	Sleep	2995		NULL	0.000
455720568	wikiadmin	10.64.0.156:43572	wikidatawiki	Sleep	2994		NULL	0.000
455720570	wikiadmin	10.64.0.156:43576	wikidatawiki	Sleep	2994		NULL	0.000
455720577	wikiadmin	10.64.0.156:43580	wikidatawiki	Sleep	2994		NULL	0.000
455720578	wikiadmin	10.64.0.156:43582	wikidatawiki	Sleep	2994		NULL	0.000
455720605	wikiadmin	10.64.0.156:43588	wikidatawiki	Sleep	2994		NULL	0.000
455720606	wikiadmin	10.64.0.156:43590	wikidatawiki	Sleep	2994		NULL	0.000
455720613	wikiadmin	10.64.0.156:43594	wikidatawiki	Sleep	2994		NULL	0.000
455720614	wikiadmin	10.64.0.156:43596	wikidatawiki	Sleep	2994		NULL	0.000
455720627	wikiadmin	10.64.0.156:43610	wikidatawiki	Sleep	2993		NULL	0.000
455720628	wikiadmin	10.64.0.156:43612	wikidatawiki	Sleep	2993		NULL	0.000
455720647	wikiadmin	10.64.0.156:43620	wikidatawiki	Sleep	2993		NULL	0.000
455720648	wikiadmin	10.64.0.156:43622	wikidatawiki	Sleep	2993		NULL	0.000
455796785	wikiadmin	10.64.16.16:55856	wikidatawiki	Sleep	0		NULL	0.000
455796819	wikiadmin	10.64.16.16:55894	wikidatawiki	Sleep	0		NULL	0.000
455796824	wikiadmin	10.64.16.16:55898	wikidatawiki	Sleep	0		NULL	0.000
455796840	wikiadmin	10.64.16.16:55930	wikidatawiki	Sleep	0		NULL	0.000
455796881	wikiadmin	10.64.16.16:55978	wikidatawiki	Sleep	0		NULL	0.000
455796901	wikiadmin	10.64.16.16:56012	wikidatawiki	Sleep	0		NULL	0.000
455797327	wikiadmin	10.64.16.16:56042	wikidatawiki	Sleep	0		NULL	0.000
455799507	wikiadmin	10.64.16.16:56088	wikidatawiki	Sleep	0		NULL	0.000
db1154.eqiad.wmnet:3318
db1126.eqiad.wmnet:3306
1327641935	wikiadmin	10.64.0.156:42162	wikidatawiki	Sleep	1		NULL	0.000
1327949355	wikiadmin	10.64.0.156:42452	wikidatawiki	Sleep	1		NULL	0.000
1327963606	wikiadmin	10.64.0.156:42490	wikidatawiki	Sleep	9		NULL	0.000
1328012695	wikiadmin	10.64.0.156:42602	wikidatawiki	Sleep	23		NULL	0.000
db1116.eqiad.wmnet:3318
db1114.eqiad.wmnet:3306
2765968284	wikiadmin	10.64.0.156:41922	wikidatawiki	Sleep	13		NULL	0.000
2766038368	wikiadmin	10.64.0.156:42002	wikidatawiki	Sleep	66		NULL	0.000
2766101402	wikiadmin	10.64.0.156:42048	wikidatawiki	Sleep	323		NULL	0.000
2766242835	wikiadmin	10.64.0.156:42116	wikidatawiki	Sleep	0		NULL	0.000
2766459814	wikiadmin	10.64.0.156:42550	wikidatawiki	Sleep	0		NULL	0.000
db1111.eqiad.wmnet:3306
3147662143	wikiadmin	10.64.0.156:38382	wikidatawiki	Sleep	435		NULL	0.000
3147735043	wikiadmin	10.64.0.156:38444	wikidatawiki	Sleep	0		NULL	0.000
3147842727	wikiadmin	10.64.0.156:38610	wikidatawiki	Sleep	2		NULL	0.000
3147851360	wikiadmin	10.64.0.156:38642	wikidatawiki	Sleep	3		NULL	0.000
3147899283	wikiadmin	10.64.0.156:38744	wikidatawiki	Sleep	4		NULL	0.000
db1109.eqiad.wmnet:3306
db1104.eqiad.wmnet:3306
3570393581	wikiadmin	10.64.0.156:57358	wikidatawiki	Sleep	15		NULL	0.000
3570599417	wikiadmin	10.64.0.156:57502	wikidatawiki	Sleep	437		NULL	0.000
3570738019	wikiadmin	10.64.0.156:57668	wikidatawiki	Sleep	2		NULL	0.000
3570803173	wikiadmin	10.64.0.156:57824	wikidatawiki	Sleep	455		NULL	0.000
3570829177	wikiadmin	10.64.0.156:57888	wikidatawiki	Sleep	41		NULL	0.000
db1101.eqiad.wmnet:3318
3158792082	wikiadmin	10.64.0.156:46916	wikidatawiki	Sleep	324		NULL	0.000
3158899405	wikiadmin	10.64.0.156:47050	wikidatawiki	Sleep	6		NULL	0.000
3159236580	wikiadmin	10.64.0.156:47498	wikidatawiki	Sleep	598		NULL	0.000
db1099.eqiad.wmnet:3318
3836702749	wikiadmin	10.64.0.156:48400	wikidatawiki	Sleep	70		NULL	0.000
3836936731	wikiadmin	10.64.0.156:48882	wikidatawiki	Sleep	0		NULL	0.000
Thu, Jan 13, 5:59 AM · Blocked-on-schema-change, DBA
Marostegui added a comment to T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.

Views recreated on s1 clouddb* hosts.

Thu, Jan 13, 5:58 AM · Blocked-on-schema-change, DBA
Marostegui updated the task description for T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.
Thu, Jan 13, 5:56 AM · Blocked-on-schema-change, DBA
Marostegui added a comment to T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.

s1 master done

Thu, Jan 13, 5:56 AM · Blocked-on-schema-change, DBA
Marostegui added a comment to T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.

s1 replicas (including db1169) are done

Thu, Jan 13, 5:55 AM · Blocked-on-schema-change, DBA

Wed, Jan 12

Marostegui added a comment to T299095: Links tables corrupted due to incorrectly parenthesized delete queries.

This huge spike of lag correlates with a huge spike in writes (which generated lots of lag): https://grafana.wikimedia.org/d/000000278/mysql-aggregated?orgId=1&from=1642018237886&to=1642020538564
They all seem to match this deployment:

20:21 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.17 refs T293958 (duration: 01m 21s)
Wed, Jan 12, 9:17 PM · MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), Wikimedia-Incident, Patch-For-Review, Platform Engineering, Wikimedia-production-error
Marostegui added a comment to T299095: Links tables corrupted due to incorrectly parenthesized delete queries.

There's no more lag in the databases

Wed, Jan 12, 9:13 PM · MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), Wikimedia-Incident, Patch-For-Review, Platform Engineering, Wikimedia-production-error
Marostegui added a comment to T295965: Test MariaDB 10.4 with Bullseye.

db1169 is back up after Chris fixed the HW issue on-site. I have reimaged it to Bullseye and tomorrow I will start pooling it (after running a schema change that has been deployed on s1).

Wed, Jan 12, 5:25 PM · Patch-For-Review, DBA
Marostegui added a comment to T299025: db1169 reimage/idrac failure.

@Cmjohnson the host got reimaged fine. Thank you for fixing this so fast!

Wed, Jan 12, 5:10 PM · SRE, ops-eqiad
Marostegui added a comment to T299025: db1169 reimage/idrac failure.

Thanks Chris, going to try a reimage then! I will let you know how it goes

Wed, Jan 12, 4:59 PM · SRE, ops-eqiad
Marostegui added a comment to T299025: db1169 reimage/idrac failure.

Thank you @Cmjohnson - once it is able to boot up, I can take it from there and attempt a reimage.

Wed, Jan 12, 4:20 PM · SRE, ops-eqiad
Marostegui added a comment to T295965: Test MariaDB 10.4 with Bullseye.

db1128 is Bullseye and it is now serving on s1 with normal weight, if you notice something strange: dbctl instance db1128 depool ; dbctl config commit -m "Depooling db1128"
Let's wait till next week to make sure it is serving fine before giving greenlight for Bullseye!

Wed, Jan 12, 2:00 PM · Patch-For-Review, DBA
Marostegui updated the task description for T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.
Wed, Jan 12, 1:47 PM · Blocked-on-schema-change, DBA
Marostegui added a comment to T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.

Started deployment on s1 - reminder that db1169 is down (T299025) and will need the schema change applied manually

Wed, Jan 12, 1:46 PM · Blocked-on-schema-change, DBA
Marostegui updated the task description for T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.
Wed, Jan 12, 1:44 PM · Blocked-on-schema-change, DBA
Marostegui added a comment to T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.

s3 master done

Wed, Jan 12, 1:44 PM · Blocked-on-schema-change, DBA
Marostegui added a comment to T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.

s3 replicas done

Wed, Jan 12, 1:33 PM · Blocked-on-schema-change, DBA
Marostegui added a comment to T263127: Remove groups from db configs.

watchlist removed from s7 eqiad

Wed, Jan 12, 12:54 PM · Platform Engineering Code Jam, Performance-Team (Radar), Platform Engineering Roadmap Decision Making, DBA
Marostegui changed the status of T299046: Upgrade parsercache infra to Bullseye, a subtask of T298585: Upgrade WMF database-and-backup-related hosts to bullseye, from Open to Stalled.
Wed, Jan 12, 12:39 PM · DBA
Marostegui changed the status of T299046: Upgrade parsercache infra to Bullseye from Open to Stalled.

On hold until we are happy with the performance of pc1011 (T295965)

Wed, Jan 12, 12:39 PM · DBA
Marostegui created T299046: Upgrade parsercache infra to Bullseye.
Wed, Jan 12, 12:38 PM · DBA
Marostegui updated the task description for T295965: Test MariaDB 10.4 with Bullseye.
Wed, Jan 12, 10:57 AM · Patch-For-Review, DBA
Marostegui added a comment to T295965: Test MariaDB 10.4 with Bullseye.

Doing a data check on db1128 before letting it serve traffic. I am checking these tables:

  • user
  • recentchanges
  • watchlist
  • logging
  • actor
  • slots
  • revision
  • archive
  • page
Wed, Jan 12, 10:25 AM · Patch-For-Review, DBA
Marostegui updated the task description for T295965: Test MariaDB 10.4 with Bullseye.
Wed, Jan 12, 10:18 AM · Patch-For-Review, DBA
Marostegui added a comment to T295965: Test MariaDB 10.4 with Bullseye.

pc1011 is now serving pc1 as a master and it is running Bullseye, so far so good. Let's see how the performance is and if there's any regression.

Wed, Jan 12, 10:18 AM · Patch-For-Review, DBA
Marostegui added a comment to T295965: Test MariaDB 10.4 with Bullseye.

Replication position for pc1014:

root@pc1014.eqiad.wmnet[(none)]> show master status\G
*************************** 1. row ***************************
            File: pc1014-bin.050274
        Position: 406581448
    Binlog_Do_DB:
Binlog_Ignore_DB:
Wed, Jan 12, 9:05 AM · Patch-For-Review, DBA
Marostegui added a comment to T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.

Views recreated on s3 clouddb* hosts

Wed, Jan 12, 8:05 AM · Blocked-on-schema-change, DBA
Marostegui added a comment to T295965: Test MariaDB 10.4 with Bullseye.

Doing a data check on db1128 before letting it serve traffic. I am checking these tables:

  • user
  • recentchanges
  • watchlist
  • logging
  • actor
  • slots
  • revision
  • archive
  • page
Wed, Jan 12, 7:28 AM · Patch-For-Review, DBA
Marostegui added a comment to T295965: Test MariaDB 10.4 with Bullseye.

I think I am going to convert db1128 to s1 slave (it is already replicating there) to cover for db1169 so we can test live traffic there.

Wed, Jan 12, 7:05 AM · Patch-For-Review, DBA
Marostegui added a comment to T295965: Test MariaDB 10.4 with Bullseye.

Created T299025 as I am unable to do anything as the idrac doesn't seem to be working on db1169 - the host gets stuck somewhere during the reboot as it never reaches the debian installer (or normal boot up) cause ping isn't available either

Wed, Jan 12, 7:03 AM · Patch-For-Review, DBA
Marostegui triaged T299025: db1169 reimage/idrac failure as High priority.

I am setting this to high as this is a live s1 host and that we need to test Bullseye there to make sure we are ready for it so we can confirm that T297913: Confirm support of PERC 750 raid controller would be unblocked if we can go ahead and order the hosts and install them directly with Bullseye.

Wed, Jan 12, 7:02 AM · SRE, ops-eqiad
Marostegui created T299025: db1169 reimage/idrac failure.
Wed, Jan 12, 7:00 AM · SRE, ops-eqiad
Marostegui added a comment to T295965: Test MariaDB 10.4 with Bullseye.

I am facing issues with the reboot/pxe of db1169 - investigating

Wed, Jan 12, 6:46 AM · Patch-For-Review, DBA
Marostegui added a comment to T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.

Recreated views on s4 clouddb* hosts.

Wed, Jan 12, 6:19 AM · Blocked-on-schema-change, DBA
Marostegui added a comment to T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.

Deploying on s3 replicas

Wed, Jan 12, 6:00 AM · Blocked-on-schema-change, DBA
Marostegui added a comment to T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.

s4 replicas were done overnight and I just did the primary master.

Wed, Jan 12, 5:59 AM · Blocked-on-schema-change, DBA
Marostegui updated the task description for T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.
Wed, Jan 12, 5:59 AM · Blocked-on-schema-change, DBA

Tue, Jan 11

Marostegui added a comment to T295965: Test MariaDB 10.4 with Bullseye.

Ah it is probably that the main service isn't disabled (as this is a multi instance host). I will do that tomorrow morning. Thanks for the heads up!

Tue, Jan 11, 6:36 PM · Patch-For-Review, DBA
Marostegui added a comment to T295965: Test MariaDB 10.4 with Bullseye.

In preparation for pc1 failover to reimage pc1011, I have upgraded pc1014's mysql to 10.4.22 which will become pc1 master tomorrow most likely

Tue, Jan 11, 2:36 PM · Patch-For-Review, DBA
Marostegui updated the task description for T298586: Upgrade all dbproxy hosts to Bullseye.
Tue, Jan 11, 2:15 PM · Patch-For-Review, DBA
Marostegui added a comment to T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.

Deploying on s4 replicas.

Tue, Jan 11, 1:26 PM · Blocked-on-schema-change, DBA
Marostegui updated the task description for T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.
Tue, Jan 11, 1:25 PM · Blocked-on-schema-change, DBA
Marostegui added a comment to T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.

s7 master done

Tue, Jan 11, 1:25 PM · Blocked-on-schema-change, DBA
Marostegui triaged T298959: Upgrade dborch1001 to Bullseye as Medium priority.

@Kormat I have taken the liberty to assign this directly to you - congratulations!

Tue, Jan 11, 11:40 AM · DBA
Marostegui created T298959: Upgrade dborch1001 to Bullseye.
Tue, Jan 11, 11:40 AM · DBA
Marostegui added a comment to T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.

s7 replicas done, waiting for the last repool before going for the primary master.

Tue, Jan 11, 11:37 AM · Blocked-on-schema-change, DBA
Marostegui added a comment to T297913: Confirm support of PERC 750 raid controller.

Thanks @MoritzMuehlenhoff - if this is only available from Bullseye, I think that's fine from the DB point of view. We are almost finishing our Bullseye testing and I nothing changes dramatically in the next few tests I have to do, we were planning to install the new DB roles with Bullseye directly to avoid having to migrate those too.

Tue, Jan 11, 11:04 AM · DC-Ops, SRE
Marostegui added a comment to T297191: Schema change for dropping page_restrictions.pr_user field on wmf sites.

Recreated the views on s7:

  • clouddb1014
  • clouddb1018
  • clouddb1021
Tue, Jan 11, 10:49 AM · Blocked-on-schema-change, DBA
Marostegui added a comment to T295965: Test MariaDB 10.4 with Bullseye.

I am going to include on the testing the reimage of an eqiad parsercache host. Given that parsercache has a very peculiar workload (lots of REPLACEs and DELETEs), it would be good to see if there's any regression on the OS side when dealing with writes.

Tue, Jan 11, 9:22 AM · Patch-For-Review, DBA
Marostegui updated the task description for T295965: Test MariaDB 10.4 with Bullseye.
Tue, Jan 11, 9:20 AM · Patch-For-Review, DBA
Marostegui updated the task description for T295965: Test MariaDB 10.4 with Bullseye.
Tue, Jan 11, 9:14 AM · Patch-For-Review, DBA
Marostegui added a comment to T295965: Test MariaDB 10.4 with Bullseye.

db2078 (misc host in codfw used for backups) has finished running the dumps, so I am going to go ahead and reimage it to Bullseye.

Tue, Jan 11, 8:44 AM · Patch-For-Review, DBA
Marostegui updated the task description for T298940: Reimage WMCS db proxies to Bullseye.
Tue, Jan 11, 8:36 AM · Data-Services, cloud-services-team (Kanban)
Marostegui updated the task description for T298586: Upgrade all dbproxy hosts to Bullseye.
Tue, Jan 11, 8:25 AM · Patch-For-Review, DBA