Page MenuHomePhabricator

Ladsgroup (Amir Sarabadani)
Shah of Bugs, Emir of database architecture, World-renowned rubber duckAdministrator

Projects (35)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Oct 6 2014, 9:53 PM (498 w, 4 d)
Roles
Administrator
Availability
Available
IRC Nick
Amir1
LDAP User
Ladsgroup
MediaWiki User
Ladsgroup [ Global Accounts ]

Staff Database Architect in SRE data persistence team in WMF. Used to be Wikidata software engineer at WMDE

I'm also open source enthusiast, mediawiki volunteer developer, and long-term Wikipedian.

All edits on tickets about databases are in my work capacity and anything else is in my volunteer capacity unless mentioned otherwise.

Babel: fa-N, en-4, de-2, tr-1, hu-1

Recent Activity

Yesterday

Ladsgroup closed T363573: Maintenance_bot not removing patch-for-review when all conditions are seemingly met as Resolved.
Fri, Apr 26, 5:51 PM · Phabricator maintenance bot
Ladsgroup closed T363232: Phabricator maintenance bot removed Patch-for-Review from task with open gitlab MR as Resolved.
Fri, Apr 26, 5:49 PM · Phabricator maintenance bot
Ladsgroup added a comment to T363232: Phabricator maintenance bot removed Patch-for-Review from task with open gitlab MR.

We actually had support for gitlab but it had a bug. That's fixed now.

Fri, Apr 26, 5:48 PM · Phabricator maintenance bot
Ladsgroup added a comment to T363573: Maintenance_bot not removing patch-for-review when all conditions are seemingly met.

That's fixed.

Fri, Apr 26, 5:48 PM · Phabricator maintenance bot
Ladsgroup added a comment to T363573: Maintenance_bot not removing patch-for-review when all conditions are seemingly met.

We probably should actually fix the gitlab part before running it again

Fri, Apr 26, 4:49 PM · Phabricator maintenance bot
Ladsgroup added a comment to T363573: Maintenance_bot not removing patch-for-review when all conditions are seemingly met.

Lol, my first conflict in gitlab.

Fri, Apr 26, 4:48 PM · Phabricator maintenance bot
Ladsgroup added a comment to T363573: Maintenance_bot not removing patch-for-review when all conditions are seemingly met.

Fixed in https://gitlab.wikimedia.org/ladsgroup/Phabricator-maintenance-bot/-/commit/982cfc2f33f8190728363c5ecdc55fbbab1ea40e

Fri, Apr 26, 4:47 PM · Phabricator maintenance bot
Ladsgroup updated the task description for T363589: Migrate foreign-resources files to CDX SBOM format.
Fri, Apr 26, 4:27 PM · MediaWiki-General, Security-Team, Security
Ladsgroup created T363589: Migrate foreign-resources files to CDX SBOM format.
Fri, Apr 26, 4:19 PM · MediaWiki-General, Security-Team, Security
Ladsgroup added a comment to T363581: Build a machine-readable catalogue of mariadb tables in production.

How will this be kept in sync with tables.json in MW core and extensions?

Fri, Apr 26, 3:24 PM · DBA
Ladsgroup added a comment to T180648: Expand the access to 2FA on fawiki.

Actually we made a lot of progress in this regard. Now people can disable 2FA via a special page rather than running a maint script in production. As results, the userright to disable 2FA can now be easily[1] expanded to potentially stewards and checkusers to reduce the work from T&S and slowly allow us to roll it out to larger groups (maybe extended confirmed) in some wikis. Persian Wikipedia is actually a good candidate wiki given the sensitivity of editing in Iran or related to Iran.

Fri, Apr 26, 3:10 PM · Trust-and-Safety, Wikimedia-Site-requests
Ladsgroup added a comment to T363581: Build a machine-readable catalogue of mariadb tables in production.

I like the idea! A few questions

  • Do you intend to have some monitoring checks or a dashboard to compare the catalog to actual live prod state?
Fri, Apr 26, 3:03 PM · DBA
Ladsgroup placed T363487: Remove the cuc_ip, cule_ip, and cupe_ip columns from the cu_changes, cu_log_event, and cu_private_event tables respectively as duplicated to the IP hex columns up for grabs.

I reviewed this, no objection from our side. I'd guess it was added for readability which is not a good reason. Will appreciate making these tables slightly smaller.

Fri, Apr 26, 2:54 PM · Data-Persistence (work done), Trust and Safety Product Team, CheckUser
Ladsgroup renamed T363581: Build a machine-readable catalogue of mariadb tables in production from Build a machine-readable catalogue of tables in production to Build a machine-readable catalogue of mariadb tables in production.
Fri, Apr 26, 2:18 PM · DBA
Ladsgroup triaged T363581: Build a machine-readable catalogue of mariadb tables in production as Medium priority.
Fri, Apr 26, 2:17 PM · DBA
Ladsgroup created T363581: Build a machine-readable catalogue of mariadb tables in production.
Fri, Apr 26, 2:16 PM · DBA
Ladsgroup awarded T363572: Reimage physical lists hosts to have public IPs a Love token.
Fri, Apr 26, 1:05 PM · collaboration-services, Wikimedia-Mailing-lists, SRE

Thu, Apr 25

Ladsgroup updated the task description for T352010: Gradually drop old pagelinks columns.
Thu, Apr 25, 5:11 PM · Schema-change-in-production, DBA, MediaWiki-Page-derived-data
Ladsgroup added a comment to T355730: Provide developer access to the cassandra-dev cluster.

I was asked to provide feedback from mariadb perspective (and how consistent we want to be across different technologies but in the same team).

We don't usually hand over dev accounts to the staging environment. Many development/staging work gets done on mariadb instance outside of production, most notably beta cluster (which has its own issues but I assume setting up a dedicated project for cassandra in cloud VPS and giving access to that wouldn't be too hard). Given that it's outside of prod, the impact of mistakes or compromise is quite limited, It also discourages "testing in production" situation. I know the staging cluster has different data but still it's in prod infra with all the complexities/downsides that it brings with itself.

Ok, so to be clear: This ticket was the result of needs arising from the Generated Data (née AQS) cluster, and not the RESTBase cluster. @Jgiannelos —who may yet weigh in with their requirements— is awaiting the outcome of this issue in relation to the latter. The cassandra-dev cluster is hosting use-cases for both, and I hadn't planned on differentiating when it came to providing developer access.

Thu, Apr 25, 4:45 PM · Patch-For-Review, Cassandra

Wed, Apr 24

Ladsgroup awarded T363407: Proper service names in trace data a Love token.
Wed, Apr 24, 8:06 PM · Observability-Tracing
Ladsgroup added a comment to T355730: Provide developer access to the cassandra-dev cluster.

I was asked to provide feedback from mariadb perspective (and how consistent we want to be across different technologies but in the same team).

Wed, Apr 24, 12:02 PM · Patch-For-Review, Cassandra
Ladsgroup added a comment to T361644: Create Wikipedia Igala.

Removed. The bot won't update a closed ticket.

Wed, Apr 24, 9:59 AM · MW-1.43-notes (1.43.0-wmf.3; 2024-04-30), Wiki-Setup (Create)
Ladsgroup added a comment to T363276: Prepare and check storage layer for sysop_plwiki.

This will be messy. It's a private wiki.

Wed, Apr 24, 9:59 AM · Data-Services, DBA
Ladsgroup updated the task description for T361644: Create Wikipedia Igala.
Wed, Apr 24, 9:57 AM · MW-1.43-notes (1.43.0-wmf.3; 2024-04-30), Wiki-Setup (Create)
Ladsgroup moved T363262: Prepare and check storage layer for iglwiki from Ready to Done on the DBA board.

Ready for data engineering people to take over

Wed, Apr 24, 9:57 AM · cloud-services-team, Data-Services, DBA
Ladsgroup moved T363269: Prepare and check storage layer for mywikisource from Ready to Done on the DBA board.

Ready for data engineering people to take over

Wed, Apr 24, 9:57 AM · cloud-services-team, Data-Services, DBA
Ladsgroup moved T363255: Prepare and check storage layer for kaawiktionary from Ready to Done on the DBA board.

Ready for data engineering people to take over

Wed, Apr 24, 9:56 AM · cloud-services-team, Data-Services, DBA
Ladsgroup moved T360302: Prepare and check storage layer for kuswiki from Ready to Done on the DBA board.

Ready for data engineering people to take over

Wed, Apr 24, 9:56 AM · cloud-services-team, Data-Services, DBA
Ladsgroup moved T360309: Prepare and check storage layer for bewwiki from Ready to Done on the DBA board.

Ready for data engineering people to take over

Wed, Apr 24, 9:56 AM · cloud-services-team, Data-Services, DBA
Ladsgroup moved T363249: Prepare and check storage layer for mswikisource from Ready to Done on the DBA board.

Ready for data engineering people to take over

Wed, Apr 24, 9:56 AM · cloud-services-team, Data-Services, DBA
Ladsgroup moved T363242: Prepare and check storage layer for kawikisource from Ready to Done on the DBA board.

Ready for data engineering people to take over

Wed, Apr 24, 9:55 AM · cloud-services-team, Data-Services, DBA
Ladsgroup triaged T360309: Prepare and check storage layer for bewwiki as Medium priority.
Wed, Apr 24, 9:48 AM · cloud-services-team, Data-Services, DBA
Ladsgroup triaged T360302: Prepare and check storage layer for kuswiki as Medium priority.
Wed, Apr 24, 9:48 AM · cloud-services-team, Data-Services, DBA
Ladsgroup moved T360302: Prepare and check storage layer for kuswiki from Blocked to Ready on the DBA board.
Wed, Apr 24, 9:48 AM · cloud-services-team, Data-Services, DBA
Ladsgroup moved T360309: Prepare and check storage layer for bewwiki from Blocked to Ready on the DBA board.
Wed, Apr 24, 9:48 AM · cloud-services-team, Data-Services, DBA
Ladsgroup updated the task description for T342697: Audit of unused indexes, 2023.
Wed, Apr 24, 9:46 AM · MW-1.42-notes (1.42.0-wmf.17; 2024-02-06), MediaWiki-Platform-Team (Radar), DBA
Ladsgroup updated the task description for T342697: Audit of unused indexes, 2023.
Wed, Apr 24, 9:45 AM · MW-1.42-notes (1.42.0-wmf.17; 2024-02-06), MediaWiki-Platform-Team (Radar), DBA
Ladsgroup updated the task description for T352010: Gradually drop old pagelinks columns.
Wed, Apr 24, 9:42 AM · Schema-change-in-production, DBA, MediaWiki-Page-derived-data
Ladsgroup claimed T359529: Future of flaggedtemplates feature.
Wed, Apr 24, 9:35 AM · DBA, Patch-For-Review, MediaWiki-extensions-FlaggedRevs
Ladsgroup updated the task description for T361041: Create wikipedia-pl-sysop.wikimedia.org (was: sysop-pl.wikipedia.org).
Wed, Apr 24, 9:34 AM · Patch-For-Review, Wiki-Setup (Create)

Tue, Apr 23

Ladsgroup moved T359529: Future of flaggedtemplates feature from Triage to In progress on the DBA board.
Tue, Apr 23, 11:06 PM · DBA, Patch-For-Review, MediaWiki-extensions-FlaggedRevs
Ladsgroup added a project to T359529: Future of flaggedtemplates feature: DBA.
Tue, Apr 23, 11:06 PM · DBA, Patch-For-Review, MediaWiki-extensions-FlaggedRevs
Ladsgroup closed T358024: Captcha overflow as Resolved.
Tue, Apr 23, 11:06 PM · MW-1.43-notes (1.43.0-wmf.3; 2024-04-30), ConfirmEdit (CAPTCHA extension)
Ladsgroup awarded T254201: Compile, organize and schedule various Wikimedia security-related user audits a Love token.
Tue, Apr 23, 9:54 PM · Security-Team, Wikimedia-GitHub, user-sbassett
Ladsgroup added a comment to T361041: Create wikipedia-pl-sysop.wikimedia.org (was: sysop-pl.wikipedia.org).

Change #1022447 merged by Ladsgroup:

[operations/puppet@production] Add sysop_plwiki to private wikis

https://gerrit.wikimedia.org/r/1022447

Tue, Apr 23, 9:52 PM · Patch-For-Review, Wiki-Setup (Create)
Ladsgroup added a comment to T363232: Phabricator maintenance bot removed Patch-for-Review from task with open gitlab MR.

When the bot got created gitlab in wikimedia didn't exist. We need to add support to it. It shouldn't be hard.

Tue, Apr 23, 9:40 PM · Phabricator maintenance bot
Ladsgroup removed a project from T359425: API:alllinks and API:alltransclusions query fails with RequestTimeout for several wikis: mariadb-optimizer-bug.

It's not a bug of the optimizer.

Tue, Apr 23, 3:39 PM · Regression, MW-1.42-notes (1.42.0-wmf.24; 2024-03-26), MediaWiki-Action-API, Wikimedia-production-error, Wikimedia-Slow-DB-Query, WME-API-Usability
Ladsgroup closed T363165: Changing the logo of Persian Wikipedia on the occasion of one million articles as Resolved.
Tue, Apr 23, 3:30 PM · Wikimedia-Site-requests, Logos
Ladsgroup closed T363161: s2 replication to cloud broken as Resolved.
Tue, Apr 23, 2:00 PM · DBA
Ladsgroup claimed T363165: Changing the logo of Persian Wikipedia on the occasion of one million articles.

I do it ASAP.

Tue, Apr 23, 1:51 PM · Wikimedia-Site-requests, Logos
Ladsgroup triaged T363161: s2 replication to cloud broken as High priority.

Running optimize table.

Tue, Apr 23, 12:19 PM · DBA
Ladsgroup created T363161: s2 replication to cloud broken.
Tue, Apr 23, 12:16 PM · DBA
Ladsgroup added a comment to T346293: ipblocks schema redesign for multiblocks.

I know there is a backward compatibility layer added to wikireplicas but I highly recommend announcing the change and removing that layer a couple months later. The main use of views is to hide private information and combining b/c with that adds a lot of complexity and can and has caused private data to be leaked before (e.g. an extra schema change has been but didn't take the b/c view into account). Specially given that we hide private information via views in block data already.

Tue, Apr 23, 10:07 AM · Patch-For-Review, MW-1.42-notes (1.42.0-wmf.14; 2024-01-16), Multiblocks, Community-Tech (CommTech-Kanban), MediaWiki-Blocks
Ladsgroup added a comment to T363119: db1246 crashed.

oh thanks for looking into it.

Tue, Apr 23, 10:03 AM · SRE, ops-eqiad, DBA
Ladsgroup added a comment to T363119: db1246 crashed.

It recovered on its own, might be a network issue. I will take a look.

Tue, Apr 23, 9:47 AM · SRE, ops-eqiad, DBA
Ladsgroup added a comment to T363119: db1246 crashed.

nvm, it didn't recover

Tue, Apr 23, 9:46 AM · SRE, ops-eqiad, DBA
Ladsgroup added a comment to T334623: How do we log unsuccessful first edits for temporary users?.

I was summoned: It's fine for now but some should be put in place after roll out.

Tue, Apr 23, 9:36 AM · Trust and Safety Product Sprint (Sprint Pennywhistle (23rd April - 3rd May)), Patch-For-Review, Data-Persistence, AbuseFilter, Temporary accounts

Mon, Apr 22

Novem_Linguae awarded T363077: High replication lag for enwiki (db1154 s1 replication crashed) a Party Time token.
Mon, Apr 22, 8:20 PM · DBA, Data-Services
Ladsgroup closed T363077: High replication lag for enwiki (db1154 s1 replication crashed) as Resolved.

The rebuild fixed the issue. The replag is zero.

Mon, Apr 22, 5:36 PM · DBA, Data-Services
Ladsgroup created T363102: db1234 has hardware issues.
Mon, Apr 22, 2:32 PM · SRE, ops-eqiad, DBA
Ladsgroup added a comment to T363077: High replication lag for enwiki (db1154 s1 replication crashed).

Anyway, the lag is decreasing, It'll take two hours or so to fully recover (unless there is another replication crash). I will close this once the lag goes below one second.

Mon, Apr 22, 1:58 PM · DBA, Data-Services
Ladsgroup added a comment to T363077: High replication lag for enwiki (db1154 s1 replication crashed).

Sorry to butt in, but, my understanding from the above is that the corrupted table coincidentally happens to be the same table which was just recently "normalized", correct?

Mon, Apr 22, 1:56 PM · DBA, Data-Services
Ladsgroup added a comment to T363077: High replication lag for enwiki (db1154 s1 replication crashed).

I started an optimize table on pagelinks, It's unlikely it would fix the issue but worth a try.

Mon, Apr 22, 1:51 PM · DBA, Data-Services
Ladsgroup added a comment to T363077: High replication lag for enwiki (db1154 s1 replication crashed).

Thanks I will try that!

Mon, Apr 22, 12:48 PM · DBA, Data-Services
Ladsgroup added a comment to T363077: High replication lag for enwiki (db1154 s1 replication crashed).

FWIW, this is a data corruption issue. Last time it happened on sanitarium hosts the whole system went down for a week. I'm trying to figure out if I can avoid re-cloning the whole host (which would take a lot of time, potentially weeks) and only reclone the corrupted table but there is no easy way to do this AFAICS. I keep you posted.

Mon, Apr 22, 9:59 AM · DBA, Data-Services

Sun, Apr 21

Ladsgroup added a comment to T363077: High replication lag for enwiki (db1154 s1 replication crashed).

I started an optimize table on pagelinks, It's unlikely it would fix the issue but worth a try.

Sun, Apr 21, 10:56 PM · DBA, Data-Services
Ladsgroup added a comment to T352010: Gradually drop old pagelinks columns.

Oh, no, this task isn't done yet? System lag again? I hope this one is shorter than the last one.

Sun, Apr 21, 10:46 PM · Schema-change-in-production, DBA, MediaWiki-Page-derived-data

Fri, Apr 19

Ladsgroup updated the task description for T352010: Gradually drop old pagelinks columns.
Fri, Apr 19, 2:37 PM · Schema-change-in-production, DBA, MediaWiki-Page-derived-data
Ladsgroup added a comment to T120242: Eventually-Consistent MediaWiki state change events | MediaWiki events as source of truth.

There is a lil discussion about this topic in T249745: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable". Moving that discussion to here.

@Ladsgroup wrote:

those data could be regenerated from canonical data on the wiki

@Ottomata wrote:

This is very expensive, and requires complicated logic and maintenance to do.

@Ladsgroup wrote:

if you have a way to find mismatch and redo for a small portion of changes (e.g. even 1% of changes), it should be fine.

Agree, but how?

How should one find out they are missing records, and which records to fetch?

Fri, Apr 19, 2:33 PM · Data-Engineering, Analytics, DBA, WMF-Architecture-Team, Platform Team Legacy (Later), Event-Platform, Services (later)
Ladsgroup added a comment to T352010: Gradually drop old pagelinks columns.

To people who are subscribed: This automatic depools and repool comments are going to continue for weeks, feel free to unsubscribe

Fri, Apr 19, 11:33 AM · Schema-change-in-production, DBA, MediaWiki-Page-derived-data
Ladsgroup updated subscribers of T341775: Discourage, deprecate and stop using Xml methods for building HTML markup.
Fri, Apr 19, 11:03 AM · MW-1.43-notes (1.43.0-wmf.3; 2024-04-30), MW-1.42-notes (1.42.0-wmf.19; 2024-02-20), Technical-Debt, Epic, HTML5, MediaWiki-General
Ladsgroup awarded T341775: Discourage, deprecate and stop using Xml methods for building HTML markup a Love token.
Fri, Apr 19, 11:02 AM · MW-1.43-notes (1.43.0-wmf.3; 2024-04-30), MW-1.42-notes (1.42.0-wmf.19; 2024-02-20), Technical-Debt, Epic, HTML5, MediaWiki-General
Ladsgroup added a comment to T362899: Wikimedia Cloud Services Wiki Replicas replication lag in wikidata.

FWIW, it has already recovered and indeed it was caused by the schema change.

Fri, Apr 19, 10:50 AM · Data-Services
Ladsgroup added a comment to T344000: lists.wikimedia.org pages should have a "who to contact" link.

The idea is to send an email to -owner@ mail and give it a grace period, if they all are inactive, then ask "central admins" to do something about it. I did the exact same thing with the gendergap mailing list.

Fri, Apr 19, 3:47 AM · SRE, Wikimedia-Mailing-lists
Ladsgroup closed T329647: Puppet failing on mailman03.mailman.eqiad1.wikimedia.cloud as Declined.
Fri, Apr 19, 3:45 AM · SRE, Wikimedia-Mailing-lists

Thu, Apr 18

Ladsgroup added a comment to T359425: API:alllinks and API:alltransclusions query fails with RequestTimeout for several wikis.

yeah, I'm honestly inclined to move that to behind misermode (=disabling it in production). Going through every link in a wiki doesn't make much sense, doesn't provide any benefit.

Thu, Apr 18, 12:54 PM · Regression, MW-1.42-notes (1.42.0-wmf.24; 2024-03-26), MediaWiki-Action-API, Wikimedia-production-error, Wikimedia-Slow-DB-Query, WME-API-Usability
Ladsgroup added a comment to T249745: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable".

search index not getting updated in 0.001% of edits

Search is probably fine.

Thu, Apr 18, 12:23 PM · MediaWiki-Engineering, Data-Engineering, Unstewarded-production-error, User-brennen, serviceops, WMF-JobQueue, Wikimedia-production-error
Ladsgroup added a comment to T359757: Create Wikipedia Kusaal.

Nothing on your side, the wiki creation script was broken for a while. I'm planning to create wikis soon.

Thu, Apr 18, 10:40 AM · MW-1.42-notes (1.42.0-wmf.23; 2024-03-19), Wiki-Setup (Create)
Ladsgroup added a comment to T352010: Gradually drop old pagelinks columns.

Looks like everything is back to normal. Thanks for all of the work you do for the project.

Thu, Apr 18, 9:56 AM · Schema-change-in-production, DBA, MediaWiki-Page-derived-data

Wed, Apr 17

Ladsgroup updated the task description for T352010: Gradually drop old pagelinks columns.
Wed, Apr 17, 10:41 PM · Schema-change-in-production, DBA, MediaWiki-Page-derived-data
Ladsgroup updated Ladsgroup.
Wed, Apr 17, 9:50 PM
Ladsgroup triaged T361824: PHP Notice: Undefined offset in rdbms/loadbalancer/LoadBalancer.php as Medium priority.
Wed, Apr 17, 9:43 PM · MW-1.43-notes (1.43.0-wmf.3; 2024-04-30), DBA, MediaWiki-libs-Rdbms, Wikimedia-production-error
Ladsgroup claimed T361824: PHP Notice: Undefined offset in rdbms/loadbalancer/LoadBalancer.php.
Wed, Apr 17, 9:42 PM · MW-1.43-notes (1.43.0-wmf.3; 2024-04-30), DBA, MediaWiki-libs-Rdbms, Wikimedia-production-error
Ladsgroup closed T361943: Decide on a Software Bill of Materials (SBOM) format for MediaWiki as Resolved.

No comment for over a week after some outreach work as well. CycloneDX it is.

Wed, Apr 17, 9:27 PM · SecTeam-Processed, Security-Team, Security
Ladsgroup closed T361943: Decide on a Software Bill of Materials (SBOM) format for MediaWiki, a subtask of T359634: Adopt Software Bill of Materials (SBOM) for MediaWiki, as Resolved.
Wed, Apr 17, 9:26 PM · SecTeam-Processed, Security-Team, Security
Ladsgroup placed T304653: Release and package auto_schema 0.1.0 up for grabs.

Avoiding cookie licking.

Wed, Apr 17, 9:20 PM · Auto schema, User-Ladsgroup, DBA
Ladsgroup closed T299445: Add tests to auto schema as Resolved.
Wed, Apr 17, 9:20 PM · Auto schema, DBA
Ladsgroup closed T299445: Add tests to auto schema, a subtask of T304653: Release and package auto_schema 0.1.0, as Resolved.
Wed, Apr 17, 9:19 PM · Auto schema, User-Ladsgroup, DBA
Ladsgroup added a comment to T358308: AssembleUploadChunksJob & PublishStashedFile jobs seem to be timing out at about 3 minutes, but should be ~20 minutes.

How can I get into a pod in job runners namespace(?) via shell.php? I want to try some stuff

Wed, Apr 17, 9:05 PM · Patch-For-Review, WMF-JobQueue, MediaWiki-File-management
Ladsgroup added a comment to T249745: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable".

For replicating state changes (T120242) [...]

Why though? Why is 99.9999% (or 99.999999% or 99.99%) not enough?

There is a "Why do we need this?" section in T120242's description. Let's keep this discussion there?

That task wants MW state propagation using events to be equally consistent with MariaDB. That way folks can trust the state they are getting to build read-model products outside of MW, where it is difficult to do so (search indexes, AI backed patrolling tools, WDQS, etc. etc.)

Wed, Apr 17, 8:06 PM · MediaWiki-Engineering, Data-Engineering, Unstewarded-production-error, User-brennen, serviceops, WMF-JobQueue, Wikimedia-production-error
Ladsgroup added a comment to T352010: Gradually drop old pagelinks columns.

Just wanted to note we're past the previous estimate and the lag is past 25 hours and seemingly getting longer. This is affecting a Quarry query I've been needing to run for literal days and database reports I have aren't keeping current. Any new estimates?

Wed, Apr 17, 7:51 PM · Schema-change-in-production, DBA, MediaWiki-Page-derived-data
Ladsgroup added a comment to T360029: Integrate dbctl IP changes as part of VLAN changes. .

I don't see why you couldn't do a simple subprocess.run to do a commit, probably checking first that dbctl diff returns empty.

Wed, Apr 17, 5:37 PM · conftool, Data-Persistence, SRE, Infrastructure-Foundations
Ladsgroup committed rEPSae6598bc466d: Remove call to LoadBalancer::reuseConnection() (authored by gerritbot).
Remove call to LoadBalancer::reuseConnection()
Wed, Apr 17, 3:43 PM
Ladsgroup added a comment to T362786: Enable dbctl for parsercache.

We do need to list the spares somehow somewhere because otherwise we will forget where they are (or even their hoatnames)

Wed, Apr 17, 3:43 PM · Infrastructure-Foundations, Data-Persistence, conftool
Ladsgroup updated subscribers of T362786: Enable dbctl for parsercache.

First we need to add them to dbctl, I'm not sure how that can be done. We currently have this in mw config:

                'parsercache-dbs' => [
                        'pc1' => '10.64.0.57',   # pc1011, A1 8.8TB 512GB # pc1
                        'pc2' => '10.64.16.65',  # pc1012, B1 8.8TB 512GB # pc2
                        'pc3' => '10.64.32.163', # pc1013, C5 8.8TB 512GB # pc3
                        'pc4' => '10.64.32.53',  # pc1016, C6 8.6TB 512GB # pc4
                        # spare: '10.64.48.89',  # pc1014, D6 8.8TB 512GB
                        # spare: '10.64.0.17',   # pc1015, A6 8.8TB 512GB
                        # Use spare(s) to replace any of the above if needed
                ],
and
                'parsercache-dbs' => [ 
                        'pc1' => '10.192.0.72',   # pc2011, A5 8.8TB 512GB # pc1 
                        'pc2' => '10.192.16.55',  # pc2012, B5 8.8TB 512GB # pc2
                        'pc3' => '10.192.32.57',  # pc2013, C1 8.8TB 512GB # pc3
                        'pc4' => '10.192.48.92',  # pc2016, D3 8.8TB 512GB # pc4
                        # spare: '10.192.48.52',  # pc2014, D1 8.8TB 512GB
                        # spare: '10.192.32.132', # pc2015, C5 8.8TB 512GB
                        # Use spare(s) to replace any of the above if needed
                ],
Wed, Apr 17, 3:35 PM · Infrastructure-Foundations, Data-Persistence, conftool
Ladsgroup committed rLPRIe4b4ae97286e: Setting dummy password for cumin dedicated mysql user.
Setting dummy password for cumin dedicated mysql user
Wed, Apr 17, 1:27 PM
Ladsgroup closed T361653: Drop flaggedtemplates in production as Resolved.
Wed, Apr 17, 12:11 PM · DBA
Ladsgroup updated the task description for T352010: Gradually drop old pagelinks columns.
Wed, Apr 17, 11:26 AM · Schema-change-in-production, DBA, MediaWiki-Page-derived-data
Ladsgroup added a comment to T360029: Integrate dbctl IP changes as part of VLAN changes. .

Just to make sure I understand, the request here is an easy-to-automate way of dbctl to change the instance IP address?

It probably wouldn't be too hard to add support for editing IP or just any single field in general, and then we could do that from a cookbook + a dbctl commit. Would that meet needs here?

Wed, Apr 17, 11:00 AM · conftool, Data-Persistence, SRE, Infrastructure-Foundations
Ladsgroup added a comment to T352010: Gradually drop old pagelinks columns.

Do you have any idea when this task will be done, the system will catch up to the 10 hour time lag and everything goes back to normal?

Wed, Apr 17, 10:06 AM · Schema-change-in-production, DBA, MediaWiki-Page-derived-data