Page MenuHomePhabricator

jcrespo (Jaime Crespo)
Sr Site Reliability Engineer

Projects (15)

Today

  • No visible events.

Tomorrow

  • No visible events.

Sunday

  • No visible events.

User Details

User Since
May 11 2015, 8:31 AM (578 w, 4 d)
Availability
Available
IRC Nick
jynus
LDAP User
Jcrespo
MediaWiki User
JCrespo (WMF) [ Global Accounts ]

Recent Activity

Wed, Jun 10

jcrespo added a comment to T427897: Upgrade Cumin hosts to Trixie.

I will try on Friday or ASAP.

Wed, Jun 10, 4:21 PM · Patch-For-Review, Infrastructure-Foundations, SRE
jcrespo closed T424661: Migrate ES backups from backup[12]003 to backup[12]014 and archive the es read only ES clusters in ro-es bacula section, a subtask of T420506: Setup backup[12]01[456789] & backup[12]020 and migrate data to them; prepare for decommission backup[12]00[34567], as Resolved.
Wed, Jun 10, 2:09 PM · Data-Persistence-Backup, media-backups, database-backups, bacula
jcrespo closed T424661: Migrate ES backups from backup[12]003 to backup[12]014 and archive the es read only ES clusters in ro-es bacula section as Resolved.

This is resolved, decom tasks will be done on the parent task.

Wed, Jun 10, 2:09 PM · database-backups, bacula
jcrespo updated the task description for T424661: Migrate ES backups from backup[12]003 to backup[12]014 and archive the es read only ES clusters in ro-es bacula section.
Wed, Jun 10, 2:08 PM · database-backups, bacula
jcrespo added a comment to T428020: codfw: rack A5 maintenance.

Re: backup2013, it needs no special treatment other than downtime, it has no issue with a temporary network maintenance unless it gets extended for a few days, as backups usually run during UTC night.

Wed, Jun 10, 10:59 AM · Patch-For-Review, Infrastructure-Foundations, netops, ServiceOps new

Mon, Jun 8

jcrespo added a comment to T427357: codfw: rack A4 maintenance.

I have moved the primary function (except the backups) of db2183 to db2184 (T428467) so db2183 can lose connectivity for an extended period of time for this task. CC @FCeratto-WMF No further action is needed except downtime/remove downtime of db2183,db2198,db2198 before and after maintenance.

Mon, Jun 8, 3:59 PM · Infrastructure-Foundations, netops, Observability-Logging, Machine-Learning-Team, Traffic, ServiceOps new, Discovery-Search
jcrespo renamed T428467: Switchover backup1-codfw primary before network maintenance from Switchover backup1-eqiad primary before network maintenance to Switchover backup1-codfw primary before network maintenance.
Mon, Jun 8, 3:56 PM · Data-Persistence, media-backups
jcrespo closed T428467: Switchover backup1-codfw primary before network maintenance, a subtask of T427357: codfw: rack A4 maintenance, as Resolved.
Mon, Jun 8, 3:56 PM · Infrastructure-Foundations, netops, Observability-Logging, Machine-Learning-Team, Traffic, ServiceOps new, Discovery-Search
jcrespo closed T428467: Switchover backup1-codfw primary before network maintenance as Resolved.

Done, db2184 is now the primary and db2183 is the replica/candidate, and now can lose network without issues for the parent task.

Mon, Jun 8, 3:55 PM · Data-Persistence, media-backups
jcrespo added a comment to T428467: Switchover backup1-codfw primary before network maintenance.

Tendril updated. heartbeat table cleanedup, orchestrator looks good.

Mon, Jun 8, 3:42 PM · Data-Persistence, media-backups
jcrespo created T428467: Switchover backup1-codfw primary before network maintenance.
Mon, Jun 8, 3:18 PM · Data-Persistence, media-backups

Thu, Jun 4

jcrespo closed T411111: Database Creation request for requestctl.wikimedia.org as Resolved.

Backups are now correctly configured. I would like to propose some ownership changes, though, on the next Data Persistence meeting to avoid issues like this in the future.

Thu, Jun 4, 4:33 PM · DBA, Hiddenparma
jcrespo closed T411111: Database Creation request for requestctl.wikimedia.org, a subtask of T409264: Data storage for HP, as Resolved.
Thu, Jun 4, 4:33 PM · Hiddenparma
jcrespo added a comment to T427949: Uncompressed TIFFs on commons.

I don't think anyone is disputing that orthophotos can be educationally useful. The question is whether storing massive numbers of huge TIFFs on Commons is the best use of Commons infrastructure and budget.

Thu, Jun 4, 11:02 AM · media-backups, MediaWiki-File-management, SRE, SRE-swift-storage, Commons
jcrespo added a comment to T411111: Database Creation request for requestctl.wikimedia.org.

Waiting for confirmation that database is included in a new backup run before resolving. Meanwhile, I will audit any other db that could have been created without requested backups configured.

Thu, Jun 4, 10:01 AM · DBA, Hiddenparma
jcrespo reopened T411111: Database Creation request for requestctl.wikimedia.org as "Open".

Backups were not setup.

Thu, Jun 4, 9:17 AM · DBA, Hiddenparma
jcrespo reopened T411111: Database Creation request for requestctl.wikimedia.org, a subtask of T409264: Data storage for HP, as Open.
Thu, Jun 4, 9:17 AM · Hiddenparma

Tue, Jun 2

jcrespo added a project to T427949: Uncompressed TIFFs on commons: media-backups.
Tue, Jun 2, 4:46 PM · media-backups, MediaWiki-File-management, SRE, SRE-swift-storage, Commons
jcrespo added a comment to T427897: Upgrade Cumin hosts to Trixie.

I tested remote backups, and packages seem to be in a working state, but cumin (a dependency) seem to not be working well or lacking extra setup. No worries, because I don't think I have any actionable on my side that would block this, but please do not remove cumin2002 yet because of it (nor I am in a hurry for the upgrade).

Tue, Jun 2, 3:05 PM · Patch-For-Review, Infrastructure-Foundations, SRE
jcrespo created P93507 (An Untitled Masterwork).
Tue, Jun 2, 9:11 AM

Fri, May 29

jcrespo added a comment to T424661: Migrate ES backups from backup[12]003 to backup[12]014 and archive the es read only ES clusters in ro-es bacula section.

After the issues encountered with bacula versioning, the most important/difficult blockers are gone, now only waiting on generating extra backups before removing the old host with the old ones.

Fri, May 29, 9:59 AM · database-backups, bacula
jcrespo updated the task description for T424661: Migrate ES backups from backup[12]003 to backup[12]014 and archive the es read only ES clusters in ro-es bacula section.
Fri, May 29, 9:58 AM · database-backups, bacula

Thu, May 28

jcrespo added a comment to T424661: Migrate ES backups from backup[12]003 to backup[12]014 and archive the es read only ES clusters in ro-es bacula section.

We need to reimage trixie host backup2014 back to bookworm, as all backup hosts (storages and directors) need to be on the same bacula version, or backups will error out with "3913 Bad use command"

Thu, May 28, 1:02 PM · database-backups, bacula
jcrespo updated the task description for T424661: Migrate ES backups from backup[12]003 to backup[12]014 and archive the es read only ES clusters in ro-es bacula section.
Thu, May 28, 12:59 PM · database-backups, bacula
jcrespo updated the task description for T424661: Migrate ES backups from backup[12]003 to backup[12]014 and archive the es read only ES clusters in ro-es bacula section.
Thu, May 28, 12:59 PM · database-backups, bacula
jcrespo updated the task description for T424661: Migrate ES backups from backup[12]003 to backup[12]014 and archive the es read only ES clusters in ro-es bacula section.
Thu, May 28, 10:46 AM · database-backups, bacula
jcrespo added a comment to T427465: Move thumbnail caching from upload cluster to text.

I have only one suggestion regarding the ticket name and the framing (not the project itself, which looks to me like a great idea): I would pitch it as "having a dedicated cluster for thumbs only", which is what the description seems to imply, except that we may have limited resources and the cluster may have to share resources with the existing text one for the reasons stated, but the different domain establishes a "logical" separation but that way: 1) I wouldn't reject being able to get additional resources so early in the timeline for the main cluster and 2) it may be more attractive and relatively accurate as a summary despite later trade offs, so stressing the need to separate thumbs from originals rather than "moving" it. I would do the same for swift, pitching it as a separation, even if due to resource constraints it may end up sharing the same physical cluster hw in implementation time. Feel free to disagree with my opinion.

Thu, May 28, 8:58 AM · Patch-For-Review, Data-Persistence, Traffic
jcrespo added a comment to T420506: Setup backup[12]01[456789] & backup[12]020 and migrate data to them; prepare for decommission backup[12]00[34567].

Verification has finished, all files transferred ok. I found some metadata issues while doing the checking, but those do not block the decommissioning of the old hosts.

Thu, May 28, 8:12 AM · Data-Persistence-Backup, media-backups, database-backups, bacula

Wed, May 27

jcrespo added a comment to T427357: codfw: rack A4 maintenance.

db2183 will require stopping mediabackups in advance, to prevent losing metadata. I will take care of that.

Wed, May 27, 8:44 AM · Infrastructure-Foundations, netops, Observability-Logging, Machine-Learning-Team, Traffic, ServiceOps new, Discovery-Search
jcrespo added a comment to T427301: codfw: rack A3 maintenance.

Thanks for the heads up, @Marostegui

Wed, May 27, 8:17 AM · DBA, ServiceOps new, netops, Infrastructure-Foundations

Mon, May 25

jcrespo added a comment to T426199: codfw: rack A2 maintenance.

Thanks, then I will start the depool at 11:30 UTC in order to minimize backup lag.

Mon, May 25, 3:33 PM · ServiceOps-Upgrades-Hardware, ServiceOps new, Infrastructure-Foundations, netops
jcrespo added a comment to T426199: codfw: rack A2 maintenance.

@ayounsi not urgent, but please ping me with a time of start of maintenance when you have it so I can do what I mention above in advance.

Mon, May 25, 10:47 AM · ServiceOps-Upgrades-Hardware, ServiceOps new, Infrastructure-Foundations, netops

Fri, May 22

jcrespo closed T423570: wiki.openstreetmap.org Commons thumbs rate limit allowance as Resolved.

I am not seeing any 429 from this source in the last 15 days, so tentatively resolving. Please reopen if you disagree.

Fri, May 22, 11:07 AM · Traffic, SRE
jcrespo updated the task description for T341504: Migrate SRE repositories to GitLab - operations/software.
Fri, May 22, 9:29 AM · GitLab (Project Migration), collaboration-services

Wed, May 20

jcrespo closed T425637: Migrate backup1-* sections to Debian Trixie, a subtask of T422365: Migration to Debian Trixie of production database-related hosts, as Resolved.
Wed, May 20, 2:51 PM · DBA
jcrespo closed T425637: Migrate backup1-* sections to Debian Trixie as Resolved.
Wed, May 20, 2:51 PM · media-backups, DBA
jcrespo updated the task description for T425637: Migrate backup1-* sections to Debian Trixie.
Wed, May 20, 2:08 PM · media-backups, DBA
jcrespo added a comment to T415165: Install a clouddb host with Debian Trixie.

Percona Toolkit, which this package is a fork of, doesn't depend on that for trixie: https://packages.debian.org/trixie/percona-toolkit so I don't know but my guess is it could be ignored on the package, although it would could just in case some update + repackage.

Wed, May 20, 11:21 AM · tools-platform-team, Data-Services, Data-Persistence

Tue, May 19

jcrespo added a comment to T384274: Backups for x3.

image.png (1,467×209 px, 51 KB)

Tue, May 19, 8:36 AM · database-backups

Thu, May 14

jcrespo added a comment to T426293: Commons files doesn't update.

Hi, once an issue is resolved, discussion on it are expected to be kept to a minimum unless there is a chance to reopen the ticket because it hasn't been properly solved.

Thu, May 14, 3:31 PM · MediaWiki-File-management
jcrespo added a comment to T426293: Commons files doesn't update.

The only thing I did was purging following the instructions of Manual:Purge, didn't do anything else anyone else can do on their own.

Thu, May 14, 1:33 PM · MediaWiki-File-management
jcrespo closed T426293: Commons files doesn't update as Resolved.

Purging is a mediawiki thing, so technically it doesn't work with files, but it is my understanding that purging certain pages also purges the files referenced by it. Don't worry, it gets sometimes complex, and sometimes you have bad luck - e.g. if a partial outage or error happens during normal purging. Even if 0.0000001% of files get affected, that means many dozens of them happen to have that issue, and as long as it is reversible it something with have to live with (making caching purging ultrareliable would have a very large cost).

Thu, May 14, 11:40 AM · MediaWiki-File-management
jcrespo added a comment to T426199: codfw: rack A2 maintenance.

@ayounsi not urgent, but please ping me with a time of start of maintenance when you have it so I can do what I mention above in advance.

Thu, May 14, 10:24 AM · ServiceOps-Upgrades-Hardware, ServiceOps new, Infrastructure-Foundations, netops
jcrespo edited projects for T426293: Commons files doesn't update, added: MediaWiki-File-management; removed Commons, SRE-swift-storage.

I did a few manual purges, can you check if that helped? I removed the Swift and commons tags because bypasing cache shows me the right files (I believe) on both datacenters.

Thu, May 14, 10:07 AM · MediaWiki-File-management
jcrespo added a comment to T426293: Commons files doesn't update.

I seem to be getting different responses for https://upload.wikimedia.org/wikipedia/commons/3/36/CitationHelper_-_VE_Editor_Toolbar.png from different datacenters. Potentially(?) notably, eqiad serves me the version of the file shown in @Aklapper's screenshot, while codfw serves the version seen in my screenshot.

Thu, May 14, 10:02 AM · MediaWiki-File-management
jcrespo added a comment to T426293: Commons files doesn't update.

Hi, @Jcubic thanks for the report. On upload of a new version, caches are normally purged from our content delivery network, however how much time it takes for that to propagate depends on different causes, from our servers to your browser. If that happens once to you, please do as @Aklapper mentions above and reload your page after clearing your browser cache or, if it still doesn't get fixed, you can also purge manually the cache from the page (https://www.mediawiki.org/wiki/Manual:Purge) it is not impossible that, out of the millions of purges happening on millions of pages every day on all of our datacenters one could fail or be delayed temporarilly.

Thu, May 14, 10:00 AM · MediaWiki-File-management

Wed, May 13

jcrespo added a comment to T426199: codfw: rack A2 maintenance.

backup2015 is part of the media backup hosts, I can stop media backups for codfw before the maintenance on my own.

Wed, May 13, 2:34 PM · ServiceOps-Upgrades-Hardware, ServiceOps new, Infrastructure-Foundations, netops

May 8 2026

jcrespo added a comment to T423570: wiki.openstreetmap.org Commons thumbs rate limit allowance.

I sent an email to Grant with additional data I can share, as allowed by staff SREs.

May 8 2026, 12:45 PM · Traffic, SRE
jcrespo added a comment to T170327: (re)establish offsite backups for fundraising.

We are working on this for production network backups, feel free to talk to me if this gets prioritized in the future.

May 8 2026, 9:56 AM · fundraising-tech-ops
jcrespo added a comment to T425759: Security Issue Access Request for CWilliams-WMF.

Hey, @CWilliams-WMF - this is not a requirement, but it would be quite useful for you to link your phab account with your LDAP one at: https://phabricator.wikimedia.org/settings/user/CWilliams-WMF/page/external/ It makes people handle certain requests faster and finding you faster. :-)

May 8 2026, 9:31 AM · SecTeam-Processed, Security-Team, Security
jcrespo renamed T425759: Security Issue Access Request for CWilliams-WMF from Security Issue Access Request for (Your Phabricator Username) to Security Issue Access Request for CWilliams-WMF.
May 8 2026, 9:26 AM · SecTeam-Processed, Security-Team, Security

May 7 2026

jcrespo updated the task description for T425637: Migrate backup1-* sections to Debian Trixie.
May 7 2026, 4:46 PM · media-backups, DBA
jcrespo triaged T424541: Upgrade backup sources to Debian Trixie as Medium priority.
May 7 2026, 11:51 AM · database-backups
jcrespo triaged T425637: Migrate backup1-* sections to Debian Trixie as Medium priority.
May 7 2026, 11:49 AM · media-backups, DBA
jcrespo updated the task description for T425637: Migrate backup1-* sections to Debian Trixie.
May 7 2026, 11:41 AM · media-backups, DBA
jcrespo created T425637: Migrate backup1-* sections to Debian Trixie.
May 7 2026, 10:46 AM · media-backups, DBA

May 4 2026

jcrespo added a comment to T424852: Investigate performance issues in cloudelastic.

All in favor of installing any tool necessary for profiling/debugging a specific issue, absolutely no problems there, but I would talk to Observability team about the gaps in observability in the long term, as existing tools are supposed to cover all essential needs regarding basic monitoring, and standarization would help everyone and prevent multiple tools doing the same thing.

May 4 2026, 3:52 PM · Patch-For-Review, Data-Platform-SRE (2026-04-24 - 2026-05-15), Discovery-Search (2026.04.06 - 2026.05.01)
jcrespo added a comment to T425328: db2187 and es1046 no events found on the database.

All fixed - grants issue.

May 4 2026, 9:41 AM · DBA
jcrespo added a comment to T425328: db2187 and es1046 no events found on the database.

Maybe it is related to grants, a.k.a. it fails for the icinga user (?).

May 4 2026, 9:41 AM · DBA
jcrespo added a comment to T425328: db2187 and es1046 no events found on the database.

BTW, the warning is also @ es1045 (3 in total).

May 4 2026, 9:40 AM · DBA

Apr 29 2026

jcrespo updated the task description for T424028: Decommission db2141-db2152.
Apr 29 2026, 8:00 AM · DBA

Apr 28 2026

jcrespo added a comment to T424028: Decommission db2141-db2152.

db2141 is done in our side and sent to dcops, after the backups on the replacement were tested.

Apr 28 2026, 4:10 PM · DBA
jcrespo added a comment to T424327: decommission db2141.codfw.wmnet.

This is ready for dc ops.

Apr 28 2026, 4:08 PM · SRE, DC-Ops, ops-codfw, decommission-hardware
jcrespo placed T424327: decommission db2141.codfw.wmnet up for grabs.
Apr 28 2026, 4:08 PM · SRE, DC-Ops, ops-codfw, decommission-hardware
jcrespo added a comment to T423619: Should we skip some directories from deploy backups?.

How challenging would it be to restore to a different host than the one from which the backup was taken?

Apr 28 2026, 3:47 PM · User-Raine, ServiceOps-SharedInfra, ServiceOps new, DC-Ops
jcrespo added a comment to T420993: Rotate discovery intermediate certificate (expires 2026-05-03).

I thought my puppet code had a bug caused by a refresh not being enough to reload the tls configuration, but that wasn't the issue, the automatic refresh from puppet was enough; thus the problem I had above was due to the long-running client requests, not the server. Nevertheless, I did a full restart of the service for testing and then verified it all got the discovery2026 cert (checked with openssl for all open ports). All good on my side.

Apr 28 2026, 3:45 PM · ServiceOps new, Infrastructure-Foundations, Patch-For-Review
jcrespo added a subtask for T420506: Setup backup[12]01[456789] & backup[12]020 and migrate data to them; prepare for decommission backup[12]00[34567]: T424661: Migrate ES backups from backup[12]003 to backup[12]014 and archive the es read only ES clusters in ro-es bacula section.
Apr 28 2026, 1:57 PM · Data-Persistence-Backup, media-backups, database-backups, bacula
jcrespo added a parent task for T424661: Migrate ES backups from backup[12]003 to backup[12]014 and archive the es read only ES clusters in ro-es bacula section: T420506: Setup backup[12]01[456789] & backup[12]020 and migrate data to them; prepare for decommission backup[12]00[34567].
Apr 28 2026, 1:57 PM · database-backups, bacula
jcrespo triaged T424661: Migrate ES backups from backup[12]003 to backup[12]014 and archive the es read only ES clusters in ro-es bacula section as High priority.
Apr 28 2026, 1:57 PM · database-backups, bacula
jcrespo created T424661: Migrate ES backups from backup[12]003 to backup[12]014 and archive the es read only ES clusters in ro-es bacula section.
Apr 28 2026, 1:57 PM · database-backups, bacula
jcrespo updated the task description for T424327: decommission db2141.codfw.wmnet.
Apr 28 2026, 1:27 PM · SRE, DC-Ops, ops-codfw, decommission-hardware
jcrespo updated the task description for T424327: decommission db2141.codfw.wmnet.
Apr 28 2026, 1:13 PM · SRE, DC-Ops, ops-codfw, decommission-hardware

Apr 27 2026

jcrespo updated subscribers of T424204: profile/module violations in use of profile::pki::get_cert().

I had a look when D.P. was added, but probably @Eevans should be the primary contact for cassandra related changes.

Apr 27 2026, 2:58 PM · SRE Observability (FY2025/2026-Q4), Data-Persistence, ServiceOps new, Infrastructure-Foundations
jcrespo added a comment to T423619: Should we skip some directories from deploy backups?.

@Scott_French would it be possible to schedule with someone in your team a recovery test, around a 2 weeks to 1 month from now, to double check we are able to recover data as needed, and make sure we are backing up everything we need? I think that would be my only pre-requisite for resolving this (scheduling the test).

Apr 27 2026, 2:54 PM · User-Raine, ServiceOps-SharedInfra, ServiceOps new, DC-Ops
jcrespo closed T423689: netbox2003 backups (maybe others?) are missconfigured or failing to find the backup directory as Resolved.

Resolving unless issues are seen after deployment- reopen if anything happens.

Apr 27 2026, 2:51 PM · Data-Persistence-Backup, bacula, SRE-tools, netops, netbox, Infrastructure-Foundations
jcrespo created T424541: Upgrade backup sources to Debian Trixie.
Apr 27 2026, 2:47 PM · database-backups
jcrespo renamed T422365: Migration to Debian Trixie of production database-related hosts from Migration to Debian Trixie to Migration to Debian Trixie of database-related hosts.
Apr 27 2026, 2:34 PM · DBA

Apr 24 2026

jcrespo closed T421729: Create cluster32 and cluster33 in existing es6 and es7 hosts as Resolved.

This is now tested, will do the archiving in another task as part of decommissioning backup1003/backup2003 to prevent duplicate work.

Apr 24 2026, 12:30 PM · MW-1.46-notes (1.46.0-wmf.26; 2026-04-28), database-backups, DBA
jcrespo updated the task description for T424327: decommission db2141.codfw.wmnet.
Apr 24 2026, 11:10 AM · SRE, DC-Ops, ops-codfw, decommission-hardware
jcrespo updated subscribers of T424327: decommission db2141.codfw.wmnet.

FYI @Marostegui This is almost ready to proceed, but waiting on backups check until Tuesday to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/1276407 and proceed.

Apr 24 2026, 11:09 AM · SRE, DC-Ops, ops-codfw, decommission-hardware
jcrespo claimed T424327: decommission db2141.codfw.wmnet.
Apr 24 2026, 11:06 AM · SRE, DC-Ops, ops-codfw, decommission-hardware
jcrespo added a parent task for T424327: decommission db2141.codfw.wmnet: T424028: Decommission db2141-db2152.
Apr 24 2026, 11:05 AM · SRE, DC-Ops, ops-codfw, decommission-hardware
jcrespo added a subtask for T424028: Decommission db2141-db2152: T424327: decommission db2141.codfw.wmnet.
Apr 24 2026, 11:05 AM · DBA
jcrespo updated the task description for T424028: Decommission db2141-db2152.
Apr 24 2026, 11:05 AM · DBA
jcrespo added a subtask for T418979: Productionize db225[0-3]: T424327: decommission db2141.codfw.wmnet.
Apr 24 2026, 11:05 AM · Patch-For-Review, DBA
jcrespo added a parent task for T424327: decommission db2141.codfw.wmnet: T418979: Productionize db225[0-3].
Apr 24 2026, 11:05 AM · SRE, DC-Ops, ops-codfw, decommission-hardware
jcrespo updated the task description for T424327: decommission db2141.codfw.wmnet.
Apr 24 2026, 11:04 AM · SRE, DC-Ops, ops-codfw, decommission-hardware
jcrespo renamed T424327: decommission db2141.codfw.wmnet from decommission db2142.codfw.wmnet to decommission db2141.codfw.wmnet.
Apr 24 2026, 11:03 AM · SRE, DC-Ops, ops-codfw, decommission-hardware
jcrespo created T424327: decommission db2141.codfw.wmnet.
Apr 24 2026, 11:02 AM · SRE, DC-Ops, ops-codfw, decommission-hardware
jcrespo updated the task description for T418979: Productionize db225[0-3].
Apr 24 2026, 11:01 AM · Patch-For-Review, DBA

Apr 23 2026

jcrespo created P91376 decom instances 2026 Q4.
Apr 23 2026, 3:23 PM

Apr 21 2026

jcrespo added a comment to T418979: Productionize db225[0-3].

@jcrespo db2250 is for you to replace db2141

Apr 21 2026, 3:28 PM · Patch-For-Review, DBA
jcrespo added a comment to T421729: Create cluster32 and cluster33 in existing es6 and es7 hosts.

New backups worked, they went from almost 10h and 2.2TB to 8 minutes and 23GB.

Apr 21 2026, 10:16 AM · MW-1.46-notes (1.46.0-wmf.26; 2026-04-28), database-backups, DBA

Apr 20 2026

jcrespo added a comment to T423570: wiki.openstreetmap.org Commons thumbs rate limit allowance.

Let me ask, while all data I have access is already anonymous, it is still user's private data, just osm wiki is the referrer. Let me ask what parts (in any) I can disclose for people without an NDA.

Apr 20 2026, 5:08 PM · Traffic, SRE

Apr 17 2026

jcrespo edited projects for T406745: MediaWiki periodic job db-lag-stats-reporter failed, added: MediaWiki-General, MediaWiki-Core-JobQueue; removed Data-Persistence.

I am then changing the tags to reflect this is not something we own, a different thing is if we want to remove it or fix it, but someone that uses it should come up and say how/why it is used.

Apr 17 2026, 9:45 PM · MW-Interfaces-Team, MediaWiki-Core-JobQueue
jcrespo updated subscribers of T423745: Purge orphaned resources on bacula db.

CC @Marostegui just FYI this is what created some lag on m1 (it may happen in the future again), but will allow to remove some garbage from the db instead. You can unsub from the ticket after you read this or disable notifications.

Apr 17 2026, 9:37 PM · Data-Persistence-Backup, bacula
jcrespo triaged T423745: Purge orphaned resources on bacula db as Medium priority.
Apr 17 2026, 9:36 PM · Data-Persistence-Backup, bacula
jcrespo created T423745: Purge orphaned resources on bacula db.
Apr 17 2026, 9:35 PM · Data-Persistence-Backup, bacula
jcrespo added a comment to T422596: Failing Trixie VM installations on routed Ganeti.

On a trixie system I had an issue in which the host decided to kill big processes rather than becoming slow (something related to stall detection, not due to VM/swapping), as the debian default assumed more of a k8s-style management. Modifying a kernel parameter worked for me, in case that is relevant.

Apr 17 2026, 2:04 PM · Infrastructure-Foundations, SRE
jcrespo claimed T423689: netbox2003 backups (maybe others?) are missconfigured or failing to find the backup directory.
Apr 17 2026, 2:02 PM · Data-Persistence-Backup, bacula, SRE-tools, netops, netbox, Infrastructure-Foundations