Page MenuHomePhabricator

Marostegui (Manuel Aróstegui)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Sep 1 2016, 6:48 AM (194 w, 4 d)
Availability
Available
IRC Nick
marostegui
LDAP User
Marostegui
MediaWiki User
MArostegui (WMF) [ Global Accounts ]

TZ: UTC +1/+2

Recent Activity

Yesterday

Marostegui added a comment to T252195: In-place conversion from LVM to normal partition.

Thanks for working on this.
This is probably not an issue, but worth checking that this works as expected with both HP and Dell controllers. Again, shouldn't be an issue, but worth checking, just in case there're underlying differences on how the controller sees/treats partitions.

Mon, May 25, 1:22 PM · Operations, DBA
Marostegui updated the task description for T253342: Apply Babel schema change expanding babel_lang in Wikimedia production.
Mon, May 25, 12:30 PM · DBA, Blocked-on-schema-change
Marostegui added a comment to T252812: [draft] Check into watchlist sizes (limiting or handling large ones properly).

Thanks for poking me!
We definitely do care about the size of the watchlist, which is already pretty big already on enwiki (54GB on disk) and on commonswiki (28GB).
Having such big and monolithic tables is dangerous and we should not add more stuff to them, they are already big enough that it could cause issues.
Big tables are hard to maintain (purge, run schema changes, backup, copy...) and there is a point where they are simply unmaintainable ie: we cannot even run schema changes on then. Where at this point we are still not there with watchlist certainly throwing more stuff to it will definitely not help.

Mon, May 25, 12:10 PM · Expiring-Watchlist-Items, Community-Tech
Marostegui added a comment to T252331: tendril_purge_global_status_log_5m and global_status_log needs more frequent purging.

This is approximately what global_status_log has accumulated since 10th May (15 days):

mysql:root@localhost [(none)]> show explain  for 102846891;
+------+-------------+-------------------+-------+---------------+---------+---------+------+------------+-------------+
| id   | select_type | table             | type  | possible_keys | key     | key_len | ref  | rows       | Extra       |
+------+-------------+-------------------+-------+---------------+---------+---------+------+------------+-------------+
|    1 | SIMPLE      | global_status_log | index | NULL          | PRIMARY | 24      | NULL | 4962983672 | Using index |
+------+-------------+-------------------+-------+---------------+---------+---------+------+------------+-------------+
1 row in set, 1 warning (0.00 sec)
Mon, May 25, 11:44 AM · DBA
Marostegui moved T253342: Apply Babel schema change expanding babel_lang in Wikimedia production from Backlog to In progress on the Blocked-on-schema-change board.
Mon, May 25, 5:13 AM · DBA, Blocked-on-schema-change
Marostegui moved T253342: Apply Babel schema change expanding babel_lang in Wikimedia production from Backlog to In progress on the DBA board.
Mon, May 25, 5:13 AM · DBA, Blocked-on-schema-change
Marostegui claimed T253342: Apply Babel schema change expanding babel_lang in Wikimedia production.
Mon, May 25, 5:13 AM · DBA, Blocked-on-schema-change
Marostegui added a comment to T252331: tendril_purge_global_status_log_5m and global_status_log needs more frequent purging.

With the latest changes looks like we have global_status_log_5m under control. It doesn't grow over 35-40M rows.
I am going to spend time now with the other big one: global_status_log

Mon, May 25, 4:58 AM · DBA
Marostegui added a comment to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4.

Labsdb1011 has been working fine since Thursday, the lag though keeps growing now less fast, but still there.
I am going to go ahead and do the above test:

Mon, May 25, 4:51 AM · cloud-services-team (Kanban), DBA
Marostegui triaged T253276: Normalise MW Core database language fields length as Medium priority.

I hace checked the gerrit patch - should be an easy schema change I think.
We receive quite a bunch of schema changes so please let's use the template at https://wikitech.wikimedia.org/wiki/Schema_changes#Workflow_of_a_schema_change once this is merged. It is a lot easier for use if we can just always follow the same template.
Moving this to our blocked column until the change is merged, once done, please edit it and use the above template.

Mon, May 25, 4:49 AM · MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), DBA, Schema-change, Technical-Debt, MediaWiki-General

Fri, May 22

Marostegui committed rOSTDf68da8515c0c: dashboard.sql: Change storing time (authored by Marostegui).
dashboard.sql: Change storing time
Fri, May 22, 3:06 PM
Marostegui updated the task description for T252512: Productionize db114[1-9].
Fri, May 22, 2:36 PM · DBA
Marostegui added a comment to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4.

So labsdb1011 looks stable. CPU seems to be stable at around 30% usage (which is a big improvement compared to the previous values).

Fri, May 22, 11:33 AM · cloud-services-team (Kanban), DBA
Marostegui updated the task description for T252512: Productionize db114[1-9].
Fri, May 22, 8:34 AM · DBA
Marostegui added a comment to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4.

labsdb1011 seems to be working fine.
Good news is that 10.4+Buster seems to confirm that the CPU usage is a lot better and the host isn't having almost 100% usage as it used to.

Fri, May 22, 4:46 AM · cloud-services-team (Kanban), DBA
Marostegui created P11273 (An Untitled Masterwork).
Fri, May 22, 4:41 AM
Marostegui triaged T253342: Apply Babel schema change expanding babel_lang in Wikimedia production as Medium priority.
Fri, May 22, 4:32 AM · DBA, Blocked-on-schema-change

Thu, May 21

Marostegui added a comment to T226546: babel database doesn't support language codes longer than 10 characters (e.g. de-x-formal).

I would prefer a different task following the template at: https://wikitech.wikimedia.org/wiki/Schema_changes#Workflow_of_a_schema_change

Thu, May 21, 7:53 PM · MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), Schema-change, affects-translatewiki.net, MediaWiki-extensions-Babel
Marostegui added a comment to T250602: db1140 (backup source) crashed .

Per my IRC chat with John

Could you tell me more, as before a processor error was mentioned, but then a board change?

Thu, May 21, 1:45 PM · DC-Ops, ops-eqiad, Operations, DBA
Marostegui added a comment to T251719: Quarry or the Analytics wikireplicas role creates lots of InnoDB Purge Lag.

And the effect on purge and lag is huge too after it got rid of that role:

Thu, May 21, 1:04 PM · Quarry, Data-Services, cloud-services-team (Kanban)
Marostegui added a comment to T251719: Quarry or the Analytics wikireplicas role creates lots of InnoDB Purge Lag.

This is a proof on how much CPU intensive the Analytics role is.
This is a CPU graph from labsdb1010 as soon as I depooled it from the analytics role

Thu, May 21, 1:01 PM · Quarry, Data-Services, cloud-services-team (Kanban)
Elitre awarded T251982: Upgrade and restart s1 (enwiki) primary database master: Thu 21th May a Like token.
Thu, May 21, 12:40 PM · DBA
Marostegui closed T250647: Read only windows for database primary masters, a subtask of T239791: DB: perform rolling restart of mariadb daemons to pick up CA changes, as Resolved.
Thu, May 21, 12:33 PM · DBA, User-jbond, Puppet, Operations
Marostegui closed T250647: Read only windows for database primary masters as Resolved.
Thu, May 21, 12:33 PM · CommRel-Specialists-Support (Apr-Jun-2020)
Marostegui added a comment to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4.

labsdb1011 is now serving queries.
Quarry seems to be working fine too: https://quarry.wmflabs.org/query/45075

Thu, May 21, 12:14 PM · cloud-services-team (Kanban), DBA
Marostegui updated the task description for T252512: Productionize db114[1-9].
Thu, May 21, 12:10 PM · DBA
Marostegui added a comment to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4.

labsdb1011 is up-to-date:

#  mysql.py -hlabsdb1011 -e "show all slaves status\G" | grep Seconds
         Seconds_Behind_Master: 0
         Seconds_Behind_Master: 0
         Seconds_Behind_Master: 0
         Seconds_Behind_Master: 0
         Seconds_Behind_Master: 0
         Seconds_Behind_Master: 0
         Seconds_Behind_Master: 0
         Seconds_Behind_Master: 0
Thu, May 21, 9:47 AM · cloud-services-team (Kanban), DBA
Aklapper awarded T251982: Upgrade and restart s1 (enwiki) primary database master: Thu 21th May a Like token.
Thu, May 21, 9:45 AM · DBA
Marostegui created T253289: Remove USE INDEX user_timestamp from code.
Thu, May 21, 7:53 AM · MediaWiki-General, MW-1.35-release, Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (MCR Schema Migration)
Marostegui moved T253217: Relocate "old" s4 hosts from Backlog to Next on the DBA board.
Thu, May 21, 7:47 AM · DBA
Marostegui created P11262 (An Untitled Masterwork).
Thu, May 21, 7:01 AM
Marostegui added a comment to T252802: Improve output message readabiliy of transfer.py.

Oh nice work! :-)
I will definitely not miss all the verbosity there hehe

Thu, May 21, 5:14 AM · DBA
Marostegui closed T239791: DB: perform rolling restart of mariadb daemons to pick up CA changes, a subtask of T236277: Extend Puppet CA Expiry date , as Resolved.
Thu, May 21, 5:09 AM · Patch-For-Review, User-jbond, Puppet, Operations
Marostegui closed T239791: DB: perform rolling restart of mariadb daemons to pick up CA changes, a subtask of T237259: Document all uses of the puppetCA certificate, as Resolved.
Thu, May 21, 5:09 AM · Patch-For-Review, User-jbond, Puppet, Operations
Marostegui closed T239791: DB: perform rolling restart of mariadb daemons to pick up CA changes as Resolved.

This is all done!

Thu, May 21, 5:09 AM · DBA, User-jbond, Puppet, Operations
Marostegui updated the task description for T239791: DB: perform rolling restart of mariadb daemons to pick up CA changes.
Thu, May 21, 5:08 AM · DBA, User-jbond, Puppet, Operations
Marostegui added a comment to T250647: Read only windows for database primary masters.

Everything is completed, this task can be closed.
Thanks for your support!

Thu, May 21, 5:08 AM · CommRel-Specialists-Support (Apr-Jun-2020)
Marostegui added a comment to T251985: Read only time window needed for enwiki.

All done - not closing this task, leaving that for @Trizek-WMF

Thu, May 21, 5:08 AM · User-notice, CommRel-Specialists-Support
Marostegui closed T251982: Upgrade and restart s1 (enwiki) primary database master: Thu 21th May, a subtask of T239791: DB: perform rolling restart of mariadb daemons to pick up CA changes, as Resolved.
Thu, May 21, 5:07 AM · DBA, User-jbond, Puppet, Operations
Marostegui closed T251982: Upgrade and restart s1 (enwiki) primary database master: Thu 21th May as Resolved.

This was done.
RO starts: 05:00:30
RO stops: 05:03:28

Thu, May 21, 5:06 AM · DBA
Marostegui added a comment to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4.

labsdb1011 keeps catching up nicely:

root@labsdb1011:~# mysql -e "show all slaves status\G" | grep Seconds
         Seconds_Behind_Master: 115303
         Seconds_Behind_Master: 9768
         Seconds_Behind_Master: 0
         Seconds_Behind_Master: 139544
         Seconds_Behind_Master: 0
         Seconds_Behind_Master: 0
         Seconds_Behind_Master: 0
         Seconds_Behind_Master: 48016
Thu, May 21, 4:48 AM · cloud-services-team (Kanban), DBA
Marostegui reassigned T250602: db1140 (backup source) crashed from Jclark-ctr to jcrespo.

Per my IRC chat with John, assigning this back to Jaime as the on-site part is done
Than you John!

Thu, May 21, 4:43 AM · DC-Ops, ops-eqiad, Operations, DBA

Wed, May 20

Marostegui added a comment to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4.

The alter table on all the categorylinks tables finished. I have now started replication.

root@labsdb1011:~# mysql -e "show all slaves status\G" | grep Seconds
         Seconds_Behind_Master: 216483
         Seconds_Behind_Master: 215119
         Seconds_Behind_Master: 214472
         Seconds_Behind_Master: 216846
         Seconds_Behind_Master: 218187
         Seconds_Behind_Master: 213065
         Seconds_Behind_Master: 213982
         Seconds_Behind_Master: 215624
root@labsdb1011:~#
Wed, May 20, 5:39 PM · cloud-services-team (Kanban), DBA
Marostegui committed rOSTDb3cc602cdf7e: dashboard: Change tendril_purge_global_status_log_5m (authored by Marostegui).
dashboard: Change tendril_purge_global_status_log_5m
Wed, May 20, 3:52 PM
Marostegui added a comment to T238966: Apply updates for MCR, actor migration, and content migration, to production wikis..

@daniel I won't proceed with testwiki till the FORCE is removed in code, just in case this generate more noise as testwiki is more used.

Wed, May 20, 1:47 PM · Cloud-Services, DBA, Schema-change, Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (MCR Schema Migration)
Marostegui assigned T251410: Prepare and check storage layer for awawiki to Kormat.

Thanks - I will work with @Kormat on this!
Assigning to @Kormat so this gets blocked on us and not sent for views creation yet.

Wed, May 20, 1:35 PM · cloud-services-team (Kanban), Data-Services, DBA
Marostegui assigned T250706: Prepare and check storage layer for gomwiktionary to Kormat.

Thanks - I will work with @Kormat on this!
Assigning to @Kormat so this gets blocked on us and not sent for views creation yet.

Wed, May 20, 1:33 PM · cloud-services-team (Kanban), Data-Services, DBA
Marostegui added a comment to T238966: Apply updates for MCR, actor migration, and content migration, to production wikis..

We need to make sure that any of the above indexes that will be dropped are being FORCED somewhere.

Wed, May 20, 12:58 PM · Cloud-Services, DBA, Schema-change, Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (MCR Schema Migration)
Marostegui added a comment to T252331: tendril_purge_global_status_log_5m and global_status_log needs more frequent purging.
root@db1115.eqiad.wmnet[tendril]> select count(*) from global_status_log_5m;
+-----------+
| count(*)  |
+-----------+
| 331934990 |
+-----------+
1 row in set (1 min 53.61 sec)
Wed, May 20, 11:58 AM · DBA
Marostegui added a comment to T253217: Relocate "old" s4 hosts.

sure

Wed, May 20, 11:46 AM · DBA
Marostegui changed the status of T253217: Relocate "old" s4 hosts from Open to Stalled.

This is not yet ready until T252512: Productionize db114[1-9] is done

Wed, May 20, 11:30 AM · DBA
Marostegui changed the status of T253217: Relocate "old" s4 hosts, a subtask of T252512: Productionize db114[1-9], from Open to Stalled.
Wed, May 20, 11:30 AM · DBA
Marostegui created T253217: Relocate "old" s4 hosts.
Wed, May 20, 11:30 AM · DBA
Marostegui updated the task description for T252512: Productionize db114[1-9].
Wed, May 20, 11:10 AM · DBA
Marostegui added a comment to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4.

We have seen more weird things with the grants, for that I have created: https://jira.mariadb.org/browse/MDEV-22645

Wed, May 20, 10:22 AM · cloud-services-team (Kanban), DBA
Marostegui updated the task description for T251982: Upgrade and restart s1 (enwiki) primary database master: Thu 21th May.
Wed, May 20, 10:18 AM · DBA
Marostegui added a comment to T251985: Read only time window needed for enwiki.

Let's do 30 minutes. I will fix the other one.
Thank you!

Wed, May 20, 10:18 AM · User-notice, CommRel-Specialists-Support
Marostegui added a comment to P11248 (An Untitled Masterwork).
root@db1077:~/mysql_dump# zcat mysql.user.sql.gz | grep -i u15343
("%","u15343","*x","N","N","N","N","N","N","N","N","N","N","N","N","N","N","N","N","N","N","N","N","N","N","N","N","N","N","N","N","N","","","","",0,0,0,10,"","","N","N","labsdbuser",0.000000),
Wed, May 20, 9:49 AM
Marostegui created P11248 (An Untitled Masterwork).
Wed, May 20, 9:47 AM
Marostegui moved T252512: Productionize db114[1-9] from Next to In progress on the DBA board.
Wed, May 20, 9:36 AM · DBA
Marostegui added a comment to T251982: Upgrade and restart s1 (enwiki) primary database master: Thu 21th May.

Maintenance day:

  • Silence all hosts in s1
  • Set read only on s1:
dbctl --scope eqiad section s1 ro "Maintenance on enwiki T251982" && dbctl config commit -m "Set enwiki as read-only for maintenance T251982"
  • Confirm read only on enwiki
  • Set read-only on the master on mysql: db1083
  • Restart mysql on db1083
  • Run puppet
  • Confirm the slaves are connected
  • Remove read only:
dbctl --scope eqiad section s1 rw && dbctl config commit -m "Set enwiki as read-only=off after maintenance T251982"
  • Confirm writes can go through
  • Run mysql_upgrade db1083
  • Close task
Wed, May 20, 9:30 AM · DBA
Marostegui added a comment to T251982: Upgrade and restart s1 (enwiki) primary database master: Thu 21th May.

Package upgraded on db1083 to 10.1.43-2 in preparation for tomorrow's maintenance.

Wed, May 20, 9:27 AM · DBA
Marostegui moved T251982: Upgrade and restart s1 (enwiki) primary database master: Thu 21th May from Next to In progress on the DBA board.
Wed, May 20, 9:25 AM · DBA
Marostegui added a comment to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4.

labsdb1011 has finished dropping and creating all cl_sortkey index on categorylinks table across all the wikis.
Right now I am rebuilding categorylinks everywhere on labsdb1011 too before starting replication.

Wed, May 20, 8:45 AM · cloud-services-team (Kanban), DBA
Marostegui updated subscribers of T249188: Reimage labsdb1011 to Buster and MariaDB 10.4.

@Kormat and @jcrespo have finished the cloning from the backup1001 (the backup that contains the data right after the original import from the mydumper files) to db1141 (as a temporary test host).
db1141 is now catching up:

root@db1141:~# mysql -e "show all slaves status\G" | grep Seconds
         Seconds_Behind_Master: 1730158
         Seconds_Behind_Master: 1730752
         Seconds_Behind_Master: 1729120
         Seconds_Behind_Master: 1731311
         Seconds_Behind_Master: 1722920
         Seconds_Behind_Master: 1728384
         Seconds_Behind_Master: 1728070
         Seconds_Behind_Master: 1728692
Wed, May 20, 8:43 AM · cloud-services-team (Kanban), DBA
Marostegui created P11243 (An Untitled Masterwork).
Wed, May 20, 8:04 AM
Marostegui added a comment to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4.

The alter table to regenerate categorylinks indexes:

for i in `mysql information_schema -e "select table_schema from tables where table_name='categorylinks' and table_type='BASE TABLE'"  -BN`; do echo $i; mysql $i -e "set session sql_log_bin=0; alter table categorylinks drop key if exists cl_sortkey, add key cl_sortkey (cl_to,cl_type,cl_sortkey,cl_from)" ; done
Wed, May 20, 7:32 AM · cloud-services-team (Kanban), DBA
Marostegui created P11242 (An Untitled Masterwork).
Wed, May 20, 7:11 AM
Marostegui updated the task description for T252512: Productionize db114[1-9].
Wed, May 20, 7:05 AM · DBA
Marostegui triaged T250063: inverse_timestamp column exists in text table, it shouldn't as Medium priority.
Wed, May 20, 7:01 AM · DBA
Marostegui added a comment to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4.

These are the positions where labsdb1011 stopped before sending the "up-to-date" binary log to backups1002:

May 18 05:01:04 labsdb1011 mysqld[13194]: 2020-05-18  5:01:04 406 [Note] Master 's1': Slave SQL thread exiting, replication stopped in log 'db1124-bin.002644' at position 756734814
May 18 05:01:04 labsdb1011 mysqld[13194]: 2020-05-18  5:01:04 405 [Note] Master 's1': Slave I/O thread exiting, read up to log 'db1124-bin.002644', position 756743357
May 18 05:01:04 labsdb1011 mysqld[13194]: 2020-05-18  5:01:04 408 [Note] Master 's5': Slave SQL thread exiting, replication stopped in log 'db1124-bin.001209' at position 661970737
May 18 05:01:04 labsdb1011 mysqld[13194]: 2020-05-18  5:01:04 407 [Note] Master 's5': Slave I/O thread exiting, read up to log 'db1124-bin.001209', position 661970737
May 18 05:01:04 labsdb1011 mysqld[13194]: 2020-05-18  5:01:04 410 [Note] Master 's2': Slave SQL thread exiting, replication stopped in log 'db1125-bin.002350' at position 916087667
May 18 05:01:04 labsdb1011 mysqld[13194]: 2020-05-18  5:01:04 409 [Note] Master 's2': Slave I/O thread exiting, read up to log 'db1125-bin.002350', position 916156546
May 18 05:01:04 labsdb1011 mysqld[13194]: 2020-05-18  5:01:04 412 [Note] Master 's3': Slave SQL thread exiting, replication stopped in log 'db1124-bin.002036' at position 394796134
May 18 05:01:04 labsdb1011 mysqld[13194]: 2020-05-18  5:01:04 411 [Note] Master 's3': Slave I/O thread exiting, read up to log 'db1124-bin.002036', position 394796134
May 18 05:01:04 labsdb1011 mysqld[13194]: 2020-05-18  5:01:04 414 [Note] Master 's4': Slave SQL thread exiting, replication stopped in log 'db1125-bin.003546' at position 936049523
May 18 05:01:04 labsdb1011 mysqld[13194]: 2020-05-18  5:01:04 413 [Note] Master 's4': Slave I/O thread exiting, read up to log 'db1125-bin.003546', position 936049523
May 18 05:01:04 labsdb1011 mysqld[13194]: 2020-05-18  5:01:04 416 [Note] Master 's8': Slave SQL thread exiting, replication stopped in log 'db1124-bin.003661' at position 860109621
May 18 05:01:04 labsdb1011 mysqld[13194]: 2020-05-18  5:01:04 415 [Note] Master 's8': Slave I/O thread exiting, read up to log 'db1124-bin.003661', position 860109621
May 18 05:01:04 labsdb1011 mysqld[13194]: 2020-05-18  5:01:04 418 [Note] Master 's6': Slave SQL thread exiting, replication stopped in log 'db1125-bin.001487' at position 587410550
May 18 05:01:04 labsdb1011 mysqld[13194]: 2020-05-18  5:01:04 417 [Note] Master 's6': Slave I/O thread exiting, read up to log 'db1125-bin.001487', position 587410550
May 18 05:01:04 labsdb1011 mysqld[13194]: 2020-05-18  5:01:04 420 [Note] Master 's7': Slave SQL thread exiting, replication stopped in log 'db1125-bin.002147' at position 10141175
May 18 05:01:04 labsdb1011 mysqld[13194]: 2020-05-18  5:01:04 419 [Note] Master 's7': Slave I/O thread exiting, read up to log 'db1125-bin.002147', position 10141667
Wed, May 20, 5:13 AM · cloud-services-team (Kanban), DBA

Tue, May 19

Marostegui added a comment to T252476: Give access to the Analytics Cluster to Research Inter (Rodolfo).

Yeah,let's track that in a different one I would suggest.

Tue, May 19, 7:43 PM · SRE-Access-Requests, Operations
Marostegui added a comment to T250602: db1140 (backup source) crashed .

Downtime expired - I have acked the alerts in Icinga

Tue, May 19, 7:31 PM · DC-Ops, ops-eqiad, Operations, DBA
Marostegui added a comment to T238966: Apply updates for MCR, actor migration, and content migration, to production wikis..

s6 progress

Tue, May 19, 1:54 PM · Cloud-Services, DBA, Schema-change, Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (MCR Schema Migration)
Marostegui updated the task description for T238966: Apply updates for MCR, actor migration, and content migration, to production wikis..
Tue, May 19, 1:41 PM · Cloud-Services, DBA, Schema-change, Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (MCR Schema Migration)
Marostegui updated the task description for T238966: Apply updates for MCR, actor migration, and content migration, to production wikis..
Tue, May 19, 1:40 PM · Cloud-Services, DBA, Schema-change, Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (MCR Schema Migration)
Marostegui added a comment to T238966: Apply updates for MCR, actor migration, and content migration, to production wikis..

I have done the first alter on s6, on db2124 (frwiki, jawiki and ruwiki).
Will leave it running till tomorrow, to make sure replication doesn't get broken (which would mean we still have inserts there).

Tue, May 19, 1:38 PM · Cloud-Services, DBA, Schema-change, Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (MCR Schema Migration)
Marostegui moved T238966: Apply updates for MCR, actor migration, and content migration, to production wikis. from Next to In progress on the DBA board.
Tue, May 19, 1:36 PM · Cloud-Services, DBA, Schema-change, Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (MCR Schema Migration)
Marostegui closed T112473: Better mysql monitoring for number of connections and processlist strange patterns as Declined.

Closing this in favour of T253120 which has more concrete points of action

Tue, May 19, 1:08 PM · observability, Patch-For-Review, Operations, DBA
Marostegui closed T112473: Better mysql monitoring for number of connections and processlist strange patterns, a subtask of T172492: Improve database alerting (tracking), as Declined.
Tue, May 19, 1:08 PM · Epic, observability, DBA
Marostegui added a subtask for T172492: Improve database alerting (tracking): T253120: Create prometheus alert to detect lag spikes.
Tue, May 19, 1:07 PM · Epic, observability, DBA
Marostegui added a parent task for T253120: Create prometheus alert to detect lag spikes: T172492: Improve database alerting (tracking).
Tue, May 19, 1:07 PM · DBA
Marostegui triaged T253120: Create prometheus alert to detect lag spikes as Medium priority.
Tue, May 19, 1:07 PM · DBA
Marostegui created T253120: Create prometheus alert to detect lag spikes.
Tue, May 19, 1:07 PM · DBA
Marostegui added a comment to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4.

The action plan for now is:

Tue, May 19, 9:36 AM · cloud-services-team (Kanban), DBA
Marostegui added a comment to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4.

The issue with the roles is found:

Tue, May 19, 8:48 AM · cloud-services-team (Kanban), DBA
Marostegui added a comment to P11229 (An Untitled Masterwork).
mysql:root@localhost [(none)]> grant labsdbuser to u15343;
Query OK, 0 rows affected (0.000 sec)
Tue, May 19, 8:29 AM
Marostegui created P11229 (An Untitled Masterwork).
Tue, May 19, 8:25 AM
Marostegui added a comment to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4.

More tests:

Tue, May 19, 8:08 AM · cloud-services-team (Kanban), DBA
Marostegui added a comment to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4.

More food for thought, this also happens on the CLI:

root@labsdb1011:~# mysql --skip-ssl -uu15343 -p
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 8146
Server version: 10.4.12-MariaDB MariaDB Server
Tue, May 19, 7:12 AM · cloud-services-team (Kanban), DBA
Marostegui added a comment to T249188: Reimage labsdb1011 to Buster and MariaDB 10.4.
root@cumin1001:/home/marostegui# mysql.py -hlabsdb1010 mysql -e "select count(*) from user"
+----------+
| count(*) |
+----------+
|     4048 |
+----------+
root@cumin1001:/home/marostegui# mysql.py -hlabsdb1011 mysql -e "select count(*) from user"
+----------+
| count(*) |
+----------+
|     4049 |
+----------+
Tue, May 19, 5:21 AM · cloud-services-team (Kanban), DBA
Marostegui closed T248086: Drop wb_terms in production from s4 (commonswiki, testcommonswiki), s3 (testwikidatawiki), s8 (wikidatawiki), a subtask of T208425: [EPIC] Kill the wb_terms table, as Resolved.
Tue, May 19, 5:09 AM · MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), User-Addshore, wikidata-tech-focus, Wikidata-Ugly-Cat-Trailblaze (wb_terms trail blazing), Wikidata
Marostegui closed T248086: Drop wb_terms in production from s4 (commonswiki, testcommonswiki), s3 (testwikidatawiki), s8 (wikidatawiki), a subtask of T251981: Upgrade and restart s2 and s8 (wikidatawiki) primary database masters: Tue 19th May, as Resolved.
Tue, May 19, 5:09 AM · DBA
Marostegui closed T248086: Drop wb_terms in production from s4 (commonswiki, testcommonswiki), s3 (testwikidatawiki), s8 (wikidatawiki) as Resolved.

This has been dropped everywhere! \o/

Tue, May 19, 5:09 AM · DBA
Marostegui updated the task description for T248086: Drop wb_terms in production from s4 (commonswiki, testcommonswiki), s3 (testwikidatawiki), s8 (wikidatawiki).
Tue, May 19, 5:07 AM · DBA
Marostegui updated the task description for T239791: DB: perform rolling restart of mariadb daemons to pick up CA changes.
Tue, May 19, 5:07 AM · DBA, User-jbond, Puppet, Operations
Marostegui closed T251981: Upgrade and restart s2 and s8 (wikidatawiki) primary database masters: Tue 19th May, a subtask of T239791: DB: perform rolling restart of mariadb daemons to pick up CA changes, as Resolved.
Tue, May 19, 5:06 AM · DBA, User-jbond, Puppet, Operations
Marostegui closed T251981: Upgrade and restart s2 and s8 (wikidatawiki) primary database masters: Tue 19th May as Resolved.

This was done.
RO starts: 05:00:44
RO stops: 05:03:47

Tue, May 19, 5:06 AM · DBA
Marostegui closed T251984: Read only time window needed for s2 wikis and s8 (wikidatawiki), a subtask of T251981: Upgrade and restart s2 and s8 (wikidatawiki) primary database masters: Tue 19th May, as Resolved.
Tue, May 19, 5:06 AM · DBA
Marostegui closed T251984: Read only time window needed for s2 wikis and s8 (wikidatawiki), a subtask of T250647: Read only windows for database primary masters, as Resolved.
Tue, May 19, 5:06 AM · CommRel-Specialists-Support (Apr-Jun-2020)