Page MenuHomePhabricator

Marostegui (Manuel Aróstegui)
Staff Database Administrator

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Sep 1 2016, 6:48 AM (240 w, 5 d)
Availability
Available
IRC Nick
marostegui
LDAP User
Marostegui
MediaWiki User
MArostegui (WMF) [ Global Accounts ]

TZ: UTC +1/+2

Recent Activity

Yesterday

Marostegui added a comment to T279640: Ingest user similarity data for March 2021.

All good from my side too.
Let's make the next ingestion without giving us (DBAs) a heads up, to see if it is fully transparent like this one.

Tue, Apr 13, 1:59 PM · Data-Persistence (Consultation), Platform Team Workboards (Green)
Marostegui closed T279848: labsdb1009:s2, replication broken as Resolved.

labsdb1009:s2 caught up:

# mysql.py -hlabsdb1009 -e "show slave 's2' status\G" | grep Seconds
        Seconds_Behind_Master: 3
Tue, Apr 13, 1:56 PM · Data-Services, DBA
Marostegui claimed T263127: Remove groups from db configs.
Tue, Apr 13, 1:27 PM · Performance-Team (Radar), Platform Engineering Roadmap Decision Making, User-Kormat, DBA, SRE
Marostegui added a comment to T275633: Productionize db21[45-52] and db11[76-84].

Pooled db1184 into s1 with minimal weight, if all goes fine, I will start to slowly pool it automatically

Tue, Apr 13, 12:55 PM · Patch-For-Review, DBA
Marostegui updated the task description for T274752: decommission db1076.eqiad.wmnet.
Tue, Apr 13, 11:07 AM · DBA, decommission-hardware
Marostegui added a comment to T279053: Grant ALTER privileges to adminlinkrecommendation user on m2.

ALTER grant added:

root@db1107.eqiad.wmnet[mysql]> show grants for 'adminlinkrecommendation'@'10.192.16.9';
| GRANT SELECT, INSERT, DELETE, CREATE, DROP, ALTER ON `mwaddlink`.* TO `adminlinkrecommendation`@`10.192.16.9`                    |
2 rows in set (0.001 sec)
Tue, Apr 13, 10:41 AM · Growth-Team (Current Sprint), DBA, Add-Link
Marostegui added a comment to T275633: Productionize db21[45-52] and db11[76-84].

db1180 is automatically being pooled in s6

Tue, Apr 13, 10:26 AM · Patch-For-Review, DBA
Marostegui updated the task description for T275633: Productionize db21[45-52] and db11[76-84].
Tue, Apr 13, 10:26 AM · Patch-For-Review, DBA
Marostegui updated the task description for T276448: Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC.
Tue, Apr 13, 9:30 AM · Patch-For-Review, DBA
Marostegui renamed T276448: Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC from Failover m1 master: db1080 -> db1159 to Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC.
Tue, Apr 13, 9:01 AM · Patch-For-Review, DBA
Marostegui added a comment to T276448: Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC.

Thank you all!

Tue, Apr 13, 9:01 AM · Patch-For-Review, DBA
Marostegui updated subscribers of T276448: Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC.

Thank you Jaime.

Tue, Apr 13, 8:59 AM · Patch-For-Review, DBA
Marostegui added a comment to T276448: Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC.

What about 10UTC? Would that work for backups? I will ping other owners if this works for you

Tue, Apr 13, 8:41 AM · Patch-For-Review, DBA
Marostegui added a comment to T279053: Grant ALTER privileges to adminlinkrecommendation user on m2.

We are debugging some connection issues, I won't merge the above patch until that is figured out, to avoid adding more stuff to the mix :)

Tue, Apr 13, 8:40 AM · Growth-Team (Current Sprint), DBA, Add-Link
Marostegui created P15279 (An Untitled Masterwork).
Tue, Apr 13, 8:38 AM
Marostegui added a comment to T276448: Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC.

@jcrespo I would like to do this Wednesday 14th April - is this a good day or will it mess up with the backups? I have no problems in scheduling it any other day

What time is this happening?

On the last months, the latest the backups finish is 8 UTC (normally the finish by 5 UTC). As long as this happens after then, we will be ok.

Tue, Apr 13, 8:36 AM · Patch-For-Review, DBA
Marostegui claimed T279053: Grant ALTER privileges to adminlinkrecommendation user on m2.
Tue, Apr 13, 8:26 AM · Growth-Team (Current Sprint), DBA, Add-Link
Marostegui moved T279587: Create database table to cache data about mentees from Refine to Done on the DBA board.

All sanitarium hosts have been restarted

Tue, Apr 13, 8:25 AM · MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), DBA, User-Urbanecm_WMF (Engineering), Growth-Team (Current Sprint), GrowthExperiments-MentorDashboard
Marostegui added a comment to T279587: Create database table to cache data about mentees.

Sure, here you go:

CREATE TABLE /*_*/growthexperiments_mentee_data (
  mentee_id INT UNSIGNED NOT NULL,
  mentee_data BLOB NOT NULL,
  PRIMARY KEY(mentee_id)
) /*$wgDBTableOptions*/;

I don't need any other index besides the primary key. Example of read query: SELECT mentor_id, mentor_data FROM growthexperiments_mentee_data WHERE mentor_id IN (123, 342, 965, ...). The JSON blob will be processed by application-level logic.

Does that look good to you?\

Tue, Apr 13, 8:15 AM · MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), DBA, User-Urbanecm_WMF (Engineering), Growth-Team (Current Sprint), GrowthExperiments-MentorDashboard
Marostegui moved T276150: Schema change to make rc_id unsigned and rc_timestamp BINARY from In progress to Unclear/blocked to proceed on the Blocked-on-schema-change board.
Tue, Apr 13, 8:05 AM · DBA, Blocked-on-schema-change
Marostegui changed the status of T276150: Schema change to make rc_id unsigned and rc_timestamp BINARY from Open to Stalled.

This is all done and only waiting for the masters to be done, either master switchover or DC switchover.

Tue, Apr 13, 8:05 AM · DBA, Blocked-on-schema-change
Marostegui changed the status of T276150: Schema change to make rc_id unsigned and rc_timestamp BINARY, a subtask of T274336: Switchover s7 from db1086 to db1136, from Open to Stalled.
Tue, Apr 13, 8:05 AM · DBA
Marostegui changed the status of T276150: Schema change to make rc_id unsigned and rc_timestamp BINARY, a subtask of T278214: Switchover s1 from db1083 to db1163, from Open to Stalled.
Tue, Apr 13, 8:05 AM · DBA
Marostegui updated the task description for T276150: Schema change to make rc_id unsigned and rc_timestamp BINARY.
Tue, Apr 13, 8:04 AM · DBA, Blocked-on-schema-change
Marostegui closed T276156: Drop default of rc_timestamp as Resolved.

This is all done

Tue, Apr 13, 8:04 AM · DBA, Blocked-on-schema-change
Marostegui updated the task description for T276156: Drop default of rc_timestamp.
Tue, Apr 13, 8:04 AM · DBA, Blocked-on-schema-change
Marostegui added a comment to T279640: Ingest user similarity data for March 2021.

Good - thanks for the heads up

Tue, Apr 13, 7:08 AM · Data-Persistence (Consultation), Platform Team Workboards (Green)
Marostegui updated the task description for T277116: fa_deleted_timestamp and fa_timestamp are binary(14) in code but varbinary(14) in production.
Tue, Apr 13, 7:08 AM · DBA
Marostegui added a comment to T278614: Create production databases for mailman3.

Excellent!

Tue, Apr 13, 7:04 AM · DBA, SRE, Wikimedia-Mailing-lists
Marostegui updated the task description for T277118: iw_url in interwiki is varbinary(127) in production but blob in code.
Tue, Apr 13, 7:04 AM · DBA
Marostegui claimed T279848: labsdb1009:s2, replication broken.

It's been a bit of a pain to fix these drifts.
It was a huge transaction involving recentchangesand ores_classification.
5 rows were missing on recentchanges and 27 on ores_classification

Tue, Apr 13, 7:03 AM · Data-Services, DBA
Marostegui added a comment to T278655: Appservers latency spike / parser cache growth 2021-03-28.

Thanks @Krinkle!
Yes, we are well aware of the trends parsercache is having lately and the fact that is growing steadily and quicker than we thought. We have some options in case this becomes a (more) serious issue, which is essentially reduce the expiration time for the keys (it is set to 30 days) like we've done in the past and then manually purge+optimize the tables to get some more disk space back.

Tue, Apr 13, 5:24 AM · Sustainability (Incident Followup), Performance-Team (Radar), DBA, Platform Engineering, SRE
Marostegui added a comment to T278614: Create production databases for mailman3.

With the wikitech-l imported my last offer is now: 34GB.

Tue, Apr 13, 5:08 AM · DBA, SRE, Wikimedia-Mailing-lists
Marostegui added a comment to T277118: iw_url in interwiki is varbinary(127) in production but blob in code.

Come up with the alter table statement and identify which hosts really need it in all the sections apart from s1 (as far as I remember your script only checked certain hosts but not all?)

Tue, Apr 13, 5:06 AM · DBA
Marostegui added a comment to T277116: fa_deleted_timestamp and fa_timestamp are binary(14) in code but varbinary(14) in production.

Come up with the alter table statement and identify which hosts really need it in all the sections apart from s1 (as far as I remember your script only checked certain hosts but not all?)

Tue, Apr 13, 5:05 AM · DBA

Sun, Apr 11

Marostegui added a comment to T279848: labsdb1009:s2, replication broken.

@nskaggs @Bstorm If restoring replication turns out to be non-trivial, is it OK to wait until after the failover (in case it turns out it doesn't make sense to try and fix it after all)? In practical terms, the wikis above would be out of sync for over a week.

Sun, Apr 11, 11:44 AM · Data-Services, DBA
Marostegui triaged T279848: labsdb1009:s2, replication broken as Medium priority.
Sun, Apr 11, 8:53 AM · Data-Services, DBA
Marostegui created T279848: labsdb1009:s2, replication broken.
Sun, Apr 11, 8:53 AM · Data-Services, DBA

Thu, Apr 8

Marostegui updated subscribers of T279640: Ingest user similarity data for March 2021.

Hey @Marostegui. Tuesday works! I can kick off the job between 8 and 9CEST, and monitor as it chugs along. In the eventuality it goes overtime (past 1400), would it be a problem?

Thu, Apr 8, 4:10 PM · Data-Persistence (Consultation), Platform Team Workboards (Green)
Marostegui added a comment to T278573: Create growthexperiments_mentor_mentee database table on extension1 for wikis in growthexperiments.dblist.

heh, it took me a while to realize that the INDEX was different from the PK first column, so many similar column names :-)
That looks good to me.

Thu, Apr 8, 2:44 PM · MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), DBA, User-Urbanecm_WMF (Engineering), Growth-Team (Current Sprint), GrowthExperiments-MentorDashboard, GrowthExperiments-Mentorship
Marostegui added a comment to T279587: Create database table to cache data about mentees.

Thanks for the answers. They look good to me. x1 should be fine as this doesn't look like a table that will have much load (neither writes or reads).
As for next steps, if you can come up with a draft of the schema and some of the read queries it will be receiving?
If we use BLOB and you'd need an index on that column, we'd need to specify how many characters we'd be indexing.

Thu, Apr 8, 2:21 PM · MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), DBA, User-Urbanecm_WMF (Engineering), Growth-Team (Current Sprint), GrowthExperiments-MentorDashboard
Marostegui triaged T279657: Upgrade mysql on db1128 (m5 db master) as Medium priority.
Thu, Apr 8, 12:52 PM · DBA, cloud-services-team (Kanban)
Marostegui created T279657: Upgrade mysql on db1128 (m5 db master).
Thu, Apr 8, 12:52 PM · DBA, cloud-services-team (Kanban)
Marostegui added a comment to T261868: Create a database CPU saturation dashboard for codfw.

I am not sure if we should keep this open anymore. We are not having CPU usage problems in s8 anymore (we have most of hosts running 10.4 and we have lots of hosts already).
If we do want to keep the codfw CPU dashboard, we'd also need to make sure it is populated automatically, as the eqiad one is done manually, that is: we have to add/remove new/old hosts from that dashboard manually.
With the last server movements in s8, I am pretty sure that eqiad dashboard doesn't reflect the reality anymore.

Thu, Apr 8, 10:47 AM · DBA
Marostegui added a project to T279640: Ingest user similarity data for March 2021: Data-Persistence (Consultation).

Anytime between 7-14 CEST on Tuesday for instance should work for me.
Would that work for you too?

Thu, Apr 8, 10:18 AM · Data-Persistence (Consultation), Platform Team Workboards (Green)
Marostegui added a comment to T275633: Productionize db21[45-52] and db11[76-84].

Ran: sudo cumin 'db11[76-84].eqiad.wmnet' 'sudo lvextend -L+1100G /dev/mapper/tank-data && sudo xfs_growfs /srv'

Thu, Apr 8, 9:48 AM · Patch-For-Review, DBA
Marostegui added a comment to T274752: decommission db1076.eqiad.wmnet.

I will fully decommission this host on Tuesday. No point in decommissioning it today before a long weekend, just in case.

Thu, Apr 8, 9:28 AM · DBA, decommission-hardware
Marostegui added a comment to T275633: Productionize db21[45-52] and db11[76-84].

db1177 is now replicating in s8. Will check its tables and won't pool in before the long weekend.

Thu, Apr 8, 9:24 AM · Patch-For-Review, DBA
Marostegui updated the task description for T279625: Upgrade mysql on db1132 (phabricator db master).
Thu, Apr 8, 9:08 AM · Phabricator, DBA
Marostegui updated the task description for T276448: Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC.
Thu, Apr 8, 8:47 AM · Patch-For-Review, DBA
Marostegui updated the task description for T276448: Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC.
Thu, Apr 8, 8:45 AM · Patch-For-Review, DBA
Marostegui added a comment to T276448: Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC.

@jcrespo I would like to do this Wednesday 14th April - is this a good day or will it mess up with the backups? I have no problems in scheduling it any other day

Thu, Apr 8, 8:44 AM · Patch-For-Review, DBA
Marostegui updated the task description for T276448: Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC.
Thu, Apr 8, 8:43 AM · Patch-For-Review, DBA
Marostegui updated the task description for T276448: Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC.
Thu, Apr 8, 8:26 AM · Patch-For-Review, DBA
Marostegui updated subscribers of T276448: Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC.

@jcrespo I would like to do this Wednesday 14th April - is this a good day or will it mess up with the backups? I have no problems in scheduling it any other day

Thu, Apr 8, 8:24 AM · Patch-For-Review, DBA
Marostegui added a parent task for T276448: Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC: T279281: Upgrade 10.4.13 hosts to a higher version.
Thu, Apr 8, 8:22 AM · Patch-For-Review, DBA
Marostegui added a subtask for T279281: Upgrade 10.4.13 hosts to a higher version: T276448: Failover m1 master: db1080 -> db1159 Wed 14th April at 10 AM UTC.
Thu, Apr 8, 8:22 AM · DBA
Marostegui triaged T279625: Upgrade mysql on db1132 (phabricator db master) as Medium priority.
Thu, Apr 8, 8:14 AM · Phabricator, DBA
Marostegui created T279625: Upgrade mysql on db1132 (phabricator db master).
Thu, Apr 8, 8:14 AM · Phabricator, DBA
Marostegui updated the task description for T279281: Upgrade 10.4.13 hosts to a higher version.
Thu, Apr 8, 7:04 AM · DBA
Marostegui created P15230 (An Untitled Masterwork).
Thu, Apr 8, 6:35 AM
Marostegui triaged T279587: Create database table to cache data about mentees as Medium priority.
Thu, Apr 8, 5:52 AM · MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), DBA, User-Urbanecm_WMF (Engineering), Growth-Team (Current Sprint), GrowthExperiments-MentorDashboard
Marostegui added a comment to T279587: Create database table to cache data about mentees.

I was going to ask what @Tgr actually explained, so thanks for that :-)
Just some more questions:

  • What would happen if this table isn't writable for any reason? (ie: host down, host under maintenance...).
  • What would happen if the script doesn't run for a day or a week? (ie: broken for whatever reason, mwmaint host down?)
  • What would happen if the script gets killed in the middle of a run?
  • Is this a table per wiki or a table containing all the wikis and living in x1?
Thu, Apr 8, 5:52 AM · MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), DBA, User-Urbanecm_WMF (Engineering), Growth-Team (Current Sprint), GrowthExperiments-MentorDashboard
Marostegui added a comment to T278655: Appservers latency spike / parser cache growth 2021-03-28.

I have done some testing with pc000 in a testing host.
Deleted everything under 20 days so simulating that we only keep 20 days instead of 30.
Doing the delete + the optimize would give us 2.4GB back on that table (around 300k rows deleted). If we extrapolate, that's around 500GB back on disk.
That means the servers would go from 80% to around 70%.

Thu, Apr 8, 5:44 AM · Sustainability (Incident Followup), Performance-Team (Radar), DBA, Platform Engineering, SRE
Marostegui added a comment to T278655: Appservers latency spike / parser cache growth 2021-03-28.

I am not fully sure I am reading the disk space graph correctly as I don't see an increase there. There's surely an increase on the graph itself but looking at the Y axis, it looks like always the same value?
Looking at this date, I cannot find a correlation on the server disks growth: https://grafana.wikimedia.org/d/000000377/host-overview?viewPanel=12&orgId=1&from=1616889601000&to=1616975999000&var-server=pc1007&var-datasource=thanos&var-cluster=mysql

Thu, Apr 8, 5:12 AM · Sustainability (Incident Followup), Performance-Team (Radar), DBA, Platform Engineering, SRE
Marostegui added a comment to T276156: Drop default of rc_timestamp.

s3 eqiad

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1004
  • db1175
  • db1171
  • db1166
  • db1157
  • db1154
  • db1124
  • db1123
  • db1112
  • clouddb1021
  • clouddb1017
  • clouddb1013
Thu, Apr 8, 4:59 AM · DBA, Blocked-on-schema-change
Marostegui added a comment to T276150: Schema change to make rc_id unsigned and rc_timestamp BINARY.

s3 eqiad

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1004
  • db1175
  • db1171
  • db1166
  • db1157
  • db1154
  • db1124
  • db1123
  • db1112
  • clouddb1021
  • clouddb1017
  • clouddb1013
Thu, Apr 8, 4:59 AM · DBA, Blocked-on-schema-change

Wed, Apr 7

Marostegui updated the task description for T276156: Drop default of rc_timestamp.
Wed, Apr 7, 2:39 PM · DBA, Blocked-on-schema-change
Marostegui updated the task description for T276150: Schema change to make rc_id unsigned and rc_timestamp BINARY.
Wed, Apr 7, 2:39 PM · DBA, Blocked-on-schema-change
Marostegui added a comment to T275633: Productionize db21[45-52] and db11[76-84].

db1180 is now replicating. I am checking all its tables as this host won't be pooled in before the long weekend anyways.

Wed, Apr 7, 1:47 PM · Patch-For-Review, DBA
Marostegui added a comment to T275633: Productionize db21[45-52] and db11[76-84].

Started transfer from db1173 to db1180

Wed, Apr 7, 12:29 PM · Patch-For-Review, DBA
Marostegui created P15223 (An Untitled Masterwork).
Wed, Apr 7, 12:18 PM
Marostegui moved T278214: Switchover s1 from db1083 to db1163 from Refine to Ready on the DBA board.
Wed, Apr 7, 11:54 AM · DBA
Marostegui updated the task description for T276150: Schema change to make rc_id unsigned and rc_timestamp BINARY.
Wed, Apr 7, 11:39 AM · DBA, Blocked-on-schema-change
Marostegui updated the task description for T276156: Drop default of rc_timestamp.
Wed, Apr 7, 11:38 AM · DBA, Blocked-on-schema-change
Marostegui closed T249085: Convert Tendril TokuDB tables to InnoDB, a subtask of T224589: Migrate dbmonitor hosts to Buster, as Resolved.
Wed, Apr 7, 11:25 AM · Patch-For-Review, SRE
Marostegui closed T249085: Convert Tendril TokuDB tables to InnoDB as Resolved.

I am going to close this, we are not going to convert more tables than the ones we've already done. The above tables are still running tokudb, but unless we had issues we shouldn't spend time converting them to InnoDB (some of them are huge).
Let's kill tendril instead.

Wed, Apr 7, 11:25 AM · DBA
Marostegui added a comment to T224589: Migrate dbmonitor hosts to Buster.

I have stopped apache on dbmonitor1001 (and done chmod -x to apache2 binary so puppet doesn't bring it up), let's leave it till next week and if nothing breaks, let's decommission it

Wed, Apr 7, 10:54 AM · Patch-For-Review, SRE
Marostegui added a comment to T279406: db2106 and db2147 crashed.

Thank you <3

Wed, Apr 7, 10:02 AM · DBA
Marostegui added a comment to T263443: Evaluate the impact of changing innodb_change_buffering to inserts .

Changed this on a few roles:

  • Misc
  • Phabricator
  • dbstore_multiinstance
Wed, Apr 7, 8:51 AM · DBA
Marostegui added a comment to T275633: Productionize db21[45-52] and db11[76-84].

db1184 is ready but I am not going to pool it before next week, let's leave it replicate for a few days and for the long weekend just in case.

Wed, Apr 7, 8:49 AM · Patch-For-Review, DBA
Marostegui updated the task description for T279505: Read only time window needed for s1 (enwiki).
Wed, Apr 7, 7:24 AM · CommRel-Specialists-Support (Apr-Jun-2021)
Marostegui added a comment to T278214: Switchover s1 from db1083 to db1163.

kernel upgraded on db1163

Wed, Apr 7, 7:18 AM · DBA
Marostegui updated the task description for T278214: Switchover s1 from db1083 to db1163.
Wed, Apr 7, 7:08 AM · DBA
Marostegui updated the task description for T278214: Switchover s1 from db1083 to db1163.
Wed, Apr 7, 7:08 AM · DBA
Marostegui created T279505: Read only time window needed for s1 (enwiki).
Wed, Apr 7, 7:07 AM · CommRel-Specialists-Support (Apr-Jun-2021)
Marostegui updated the task description for T278214: Switchover s1 from db1083 to db1163.
Wed, Apr 7, 7:00 AM · DBA
Marostegui added a comment to T278214: Switchover s1 from db1083 to db1163.

I think T276150 will be done by next week, so I am going to schedule this switchover for 28th April at 05:00 AM UTC

Wed, Apr 7, 7:00 AM · DBA
Marostegui moved T278655: Appservers latency spike / parser cache growth 2021-03-28 from Triage to Refine on the DBA board.

I was on holidays when all this happened, is there anything else to follow up with?

Wed, Apr 7, 5:37 AM · Sustainability (Incident Followup), Performance-Team (Radar), DBA, Platform Engineering, SRE
Marostegui updated the task description for T279281: Upgrade 10.4.13 hosts to a higher version.
Wed, Apr 7, 5:11 AM · DBA
Marostegui added a comment to T278614: Create production databases for mailman3.

That's a very doable number, thanks @Ladsgroup!

Wed, Apr 7, 5:04 AM · DBA, SRE, Wikimedia-Mailing-lists
Marostegui added a comment to T269211: Convert labsdb1012 from multi-source to multi-instance.

@Marostegui could you expand on why the check isn't realistic? From what I can tell all it's monitoring is the total used memory, which shouldn't be affected by the number of mysqld processes.

Wed, Apr 7, 5:02 AM · Analytics-Kanban, cloud-services-team (Kanban), Data-Services, DBA, Patch-For-Review, Analytics-Clusters

Tue, Apr 6

Marostegui added a comment to T267404: Add reply links to the parser cache.

Do we even know for sure that this is caused by DiscussionTools?

Tue, Apr 6, 4:49 PM · Editing-team (Tracking), Data-Persistence (Consultation), Verified, MW-1.36-notes (1.36.0-wmf.28; 2021-01-26), Technical-Debt, Performance Issue, DiscussionTools
Marostegui added a comment to T267404: Add reply links to the parser cache.

Looking at he 90 day view, parsercache usage was at 75% in January, so this is a very gradual increase of about 5% due to discussiontools.

Tue, Apr 6, 4:18 PM · Editing-team (Tracking), Data-Persistence (Consultation), Verified, MW-1.36-notes (1.36.0-wmf.28; 2021-01-26), Technical-Debt, Performance Issue, DiscussionTools
Marostegui added a comment to T268715: Occasional "Cannot access the database: Unknown error" in Wikimedia production.

Without a more concrete error it is hard to troubleshoot from the DB side :-(
If it is randomly distributed and/or not specific to either app servers or DB servers, it might small network glitches? Again, without a more concrete error it is hard to debug further unfortunately.

Tue, Apr 6, 2:19 PM · Platform Team Workboards (Clinic Duty Team), Release-Engineering-Team (Logspam), Wikimedia-General-or-Unknown, Wikimedia-database-error, Wikimedia-production-error
Marostegui added a comment to T279095: Sqoop on multi-instance clouddb1021 is very slow for some tables.

We can try to give wikidata (s8) more memory and remove it from some other sections, ie (from s5 and s6)
That table itself is 230GB which will never fit in the buffer pool though

Tue, Apr 6, 1:56 PM · Cloud-Services, Data-Persistence (Consultation), Analytics
Marostegui added a comment to T269211: Convert labsdb1012 from multi-source to multi-instance.

Almost! There are a couple of things left:

  • clouddb1021 is still running with icinga notifications disabled, plus there is a WARNING related to Mariadb Memory usage that needs to be tweaked for our use case (since we basically use all the RAM available). @razzi can you check when you have a moment?
Tue, Apr 6, 1:54 PM · Analytics-Kanban, cloud-services-team (Kanban), Data-Services, DBA, Patch-For-Review, Analytics-Clusters
Marostegui added a comment to T279411: Determine why service responses are slow and what we can do about it.

In production the results are the same:

Tue, Apr 6, 9:23 AM · Growth-Team (Current Sprint), serviceops, Data-Persistence (Consultation), Add-Link
Marostegui added a comment to T275633: Productionize db21[45-52] and db11[76-84].

checking tables on db1184

Tue, Apr 6, 9:17 AM · Patch-For-Review, DBA
Marostegui added a comment to T275633: Productionize db21[45-52] and db11[76-84].

db1184 is now replicating.

Tue, Apr 6, 8:53 AM · Patch-For-Review, DBA