Page MenuHomePhabricator

MoritzMuehlenhoff (Moritz Mühlenhoff)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Apr 1 2015, 4:33 PM (426 w, 2 d)
Availability
Available
LDAP User
Moritz Mühlenhoff
MediaWiki User
MMuhlenhoff (WMF) [ Global Accounts ]

Recent Activity

Today

MoritzMuehlenhoff updated the task description for T335575: Integrate Bullseye 11.7 point update.
Fri, Jun 2, 12:13 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff closed T230712: sre.ganeti.makevm cook book only allows specifying RAM size in full gigabytes as Resolved.

sre.ganeti.makevm now supports fractions of gigabytes.

Fri, Jun 2, 11:00 AM · Ganeti, Infrastructure-Foundations, SRE-tools
MoritzMuehlenhoff added a comment to T333614: Upgrade mwlog hosts to Bullseye.

Is there a task about the udp2log porting work to Python 3, or will that be unnecessary due to T205856?

Fri, Jun 2, 10:38 AM · User-herron, SRE Observability (FY2022/2023-Q4)
MoritzMuehlenhoff updated the task description for T335575: Integrate Bullseye 11.7 point update.
Fri, Jun 2, 8:46 AM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff closed T258686: Stop using mod_access_compat as Declined.

The old-style syntax is used all over the place and it would be a significant effort to change. Since it wil be continue to supported by Apache in the foreeseable future, marking as declined for now

Fri, Jun 2, 7:16 AM · Puppet-Core, Infrastructure-Foundations, User-MoritzMuehlenhoff, SRE

Yesterday

MoritzMuehlenhoff updated the task description for T335575: Integrate Bullseye 11.7 point update.
Thu, Jun 1, 3:12 PM · Infrastructure-Foundations, SRE

Wed, May 31

MoritzMuehlenhoff closed T337003: Decommission labstore1004 and labstore1005 once they're no longer used, a subtask of T247045: Migrate all of production metal and VMs to Buster or later, as Resolved.
Wed, May 31, 8:56 AM · SRE, Infrastructure-Foundations, Epic
MoritzMuehlenhoff closed T337003: Decommission labstore1004 and labstore1005 once they're no longer used as Resolved.

This happened via T337269

Wed, May 31, 8:56 AM · Data-Services, cloud-services-team

Tue, May 30

MoritzMuehlenhoff created T337711: Remove home/HDFS leftovers of xihua.
Tue, May 30, 5:38 AM · Product-Analytics, Data-Engineering

Fri, May 26

MoritzMuehlenhoff claimed T203964: Create a spicerack cookbook to empty a ganeti node from VMs.
Fri, May 26, 12:13 PM · Patch-For-Review, Ganeti, Spicerack, Infrastructure-Foundations, SRE-tools, User-Joe, SRE
MoritzMuehlenhoff closed T207666: Redefine privileges and access for perf-roots group as Invalid.

Indeed, there's nothing really to fix here (or something changed between 2018 and now): perf-roots grants a few people root access on app servers and caches to help them debug some issues. deployment access is moslty unrelated to that.

Fri, May 26, 12:10 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff removed a project from T258686: Stop using mod_access_compat: Infrastructure-Foundations.
Fri, May 26, 12:07 PM · Puppet-Core, Infrastructure-Foundations, User-MoritzMuehlenhoff, SRE
MoritzMuehlenhoff added a comment to T258686: Stop using mod_access_compat.

I'd say let's just remove legacy_compat, nothing should rely on it anymore.

Fri, May 26, 12:06 PM · Puppet-Core, Infrastructure-Foundations, User-MoritzMuehlenhoff, SRE
MoritzMuehlenhoff updated the task description for T335575: Integrate Bullseye 11.7 point update.
Fri, May 26, 12:02 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff closed T216043: Sort out which RAID packages are still needed, a subtask of T220787: Fix RAID handler alert and puppet facter to work with Gen10 hosts and ssacli tool, as Resolved.
Fri, May 26, 11:07 AM · SRE, Icinga, observability
MoritzMuehlenhoff closed T216043: Sort out which RAID packages are still needed as Resolved.

I think this is resolved. Since this task was opened we obsoleted some controllers and generally shrunk the list of packages we imported for later distro releases.

Fri, May 26, 11:07 AM · Infrastructure-Foundations, Packaging, User-MoritzMuehlenhoff
MoritzMuehlenhoff added a project to T214489: Improve LDAP logging: Infrastructure-Foundations.
Fri, May 26, 11:05 AM · Infrastructure-Foundations, Observability-Logging, Infrastructure Security, LDAP
MoritzMuehlenhoff added a comment to T142821: Synchronise groups defined in data.yaml to LDAP.

im going to close this, altough i think its probably still valuable i think its already captured in the IDM planning, but please re-open if you disagree

Fri, May 26, 11:05 AM · Infrastructure-Foundations, Bitu, LDAP
MoritzMuehlenhoff closed T154665: Look into behaviour of /etc/exim4/update-exim4.conf.conf related to updates as Declined.

Yes, I think we can close this. This didn't cause any other issue AFAICT (in fact I don't remeber the issue that prompted to file the task) and we're moving away from Exim anyway.

Fri, May 26, 11:04 AM · Infrastructure-Foundations, Mail
MoritzMuehlenhoff renamed T337544: Investigate crypto KDC deprecations after Bullseye update from Investigate crypto deprecations after Bullseye update to Investigate crypto KDC deprecations after Bullseye update.
Fri, May 26, 9:42 AM · Data-Engineering, Infrastructure-Foundations, SRE
MoritzMuehlenhoff created T337544: Investigate crypto KDC deprecations after Bullseye update.
Fri, May 26, 9:42 AM · Data-Engineering, Infrastructure-Foundations, SRE

Mon, May 22

MoritzMuehlenhoff updated the task description for T335575: Integrate Bullseye 11.7 point update.
Mon, May 22, 3:59 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T336845: puppet: profile::auto_restarts::service: have a way to don't deploy the systemd timers.

The auto restarts are needed in Cloud VPS as well; with unattended-upgrades installing a security update of libfoo, this ensures that all services using a copy of libfoo get restarted, otherwise they'd continue to use the old library.

i agree with this however currently the wmf-auto-restart.py config options depends on the debdeploy config file as such i think we need to refactor and decouple this dependency before we can configure options in the cloud environment. Worth noting that the script will work however /etc/debdeploy-client/config.json will not exist as such the relevant settings in hiera will not apply

Mon, May 22, 12:17 PM · Patch-For-Review, Infrastructure-Foundations, cloud-services-team, Puppet-Infrastructure
MoritzMuehlenhoff updated the task description for T335575: Integrate Bullseye 11.7 point update.
Mon, May 22, 10:06 AM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T336491: Merge reimaging cookbooks.

I've updated https://wikitech.wikimedia.org/wiki/Ganeti to point to the new cookbook

Mon, May 22, 9:18 AM · Spicerack, SRE-tools, Infrastructure-Foundations
MoritzMuehlenhoff added a comment to T336995: decommission bast2002.wikimedia.org.

This host is still in Icinga.. so not removed from puppet db or something...

We have a rare race in the decom cookbook, where it sometimes fails to properly remove the puppetdb entry (likely a case where a submission is ongoing while the record is pruned), I've now re-run the cookbook.

Mon, May 22, 8:49 AM · SRE, ops-codfw, decommission-hardware
MoritzMuehlenhoff closed T330889: Retire sre.aqs.roll-restart cookbook as Resolved.

The old cookbook has been removed and the docs were updated.

Mon, May 22, 8:20 AM · Infrastructure-Foundations, SRE-tools, Spicerack, SRE
MoritzMuehlenhoff updated the task description for T330889: Retire sre.aqs.roll-restart cookbook.
Mon, May 22, 8:19 AM · Infrastructure-Foundations, SRE-tools, Spicerack, SRE
MoritzMuehlenhoff triaged T337208: Import/create samplicator source package as Low priority.
Mon, May 22, 7:00 AM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff created T337208: Import/create samplicator source package.
Mon, May 22, 7:00 AM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T330884: Upgrade Fastnetmon to 1.2.4.

I copied over samplicator from bullseye-wikimedia to bookworm-wikimedia (the only dependency is glibc itself), but there wasn't a source package on apt.wikimedia.org, do you by chance still have it on your laptop or the build host so that we can import it?

Unfortunately, no, if it can help I think I got it from https://github.com/sleinen/samplicator
Some people tried to generate deb files as well, see https://github.com/hfeeki/samplicator-debian

Mon, May 22, 6:59 AM · Patch-For-Review, SRE-tools, Infrastructure-Foundations
MoritzMuehlenhoff added a comment to T336995: decommission bast2002.wikimedia.org.

This host is still in Icinga.. so not removed from puppet db or something...

Mon, May 22, 6:40 AM · SRE, ops-codfw, decommission-hardware

Fri, May 19

MoritzMuehlenhoff added a comment to T336856: Docker Registry: Catalog API endpoint can lead to OOM via malicious user input (CVE-2023-2253 ).

Unless there are still general concerns about more widely disclosing this issue.

Fri, May 19, 2:49 PM · Vuln-VulnComponent, SecTeam-Processed, serviceops, Security
MoritzMuehlenhoff updated the task description for T335575: Integrate Bullseye 11.7 point update.
Fri, May 19, 2:14 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T336688: Add ApereoCAS as SSO provider for Semgrep Cloud Dashboard.

So far we've been only using Apereo CAS for authentication against our self-hosted infrastructure. Given that SemGrep is more along the linesof other SaaS solutions provided to staff I think makes more sense to integrate it into Okta.

Fri, May 19, 1:07 PM · Infrastructure-Foundations, CAS-SSO
MoritzMuehlenhoff added a comment to T336173: Update Proton to include Chromium 113.0.5672.126.

New target version is 113.0.5672.126:
https://lists.debian.org/debian-security-announce/2023/msg00095.html

Fri, May 19, 12:40 PM · Content-Transform-Team-WIP, Proton
MoritzMuehlenhoff renamed T336173: Update Proton to include Chromium 113.0.5672.126 from Update Proton to include Chromium 113.0.5672.63 to Update Proton to include Chromium 113.0.5672.126.
Fri, May 19, 12:40 PM · Content-Transform-Team-WIP, Proton
MoritzMuehlenhoff added a project to T336995: decommission bast2002.wikimedia.org: ops-codfw.
Fri, May 19, 10:58 AM · SRE, ops-codfw, decommission-hardware
MoritzMuehlenhoff reassigned T336995: decommission bast2002.wikimedia.org from MoritzMuehlenhoff to Papaul.
Fri, May 19, 10:58 AM · SRE, ops-codfw, decommission-hardware
MoritzMuehlenhoff updated the task description for T336995: decommission bast2002.wikimedia.org.
Fri, May 19, 10:58 AM · SRE, ops-codfw, decommission-hardware
MoritzMuehlenhoff added a comment to T330884: Upgrade Fastnetmon to 1.2.4.

@ayounsi There's now netflow2003 running Bookworm with FNM 1.2.4. If that works fine, we can reimage the other netflow* VMs in-place once Bookworm is stable.

Fri, May 19, 8:53 AM · Patch-For-Review, SRE-tools, Infrastructure-Foundations
MoritzMuehlenhoff updated subscribers of T336856: Docker Registry: Catalog API endpoint can lead to OOM via malicious user input (CVE-2023-2253 ).

Question: do we open this task to the public?

Fri, May 19, 8:00 AM · Vuln-VulnComponent, SecTeam-Processed, serviceops, Security
MoritzMuehlenhoff claimed T336995: decommission bast2002.wikimedia.org.
Fri, May 19, 6:34 AM · SRE, ops-codfw, decommission-hardware
MoritzMuehlenhoff created T336995: decommission bast2002.wikimedia.org.
Fri, May 19, 6:33 AM · SRE, ops-codfw, decommission-hardware

Wed, May 17

MoritzMuehlenhoff added a comment to T336856: Docker Registry: Catalog API endpoint can lead to OOM via malicious user input (CVE-2023-2253 ).

I think I'll simply backport/release the patch to bullseye-security, then we can simply re-use the resulting binary and import it to buster-wikimedia (it only depends on adduser, libc6 (>= 2.4), lsb-base)

Wed, May 17, 6:42 PM · Vuln-VulnComponent, SecTeam-Processed, serviceops, Security
MoritzMuehlenhoff updated the task description for T335575: Integrate Bullseye 11.7 point update.
Wed, May 17, 2:49 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff edited projects for T336856: Docker Registry: Catalog API endpoint can lead to OOM via malicious user input (CVE-2023-2253 ), added: serviceops; removed Security-Team.
Wed, May 17, 12:31 PM · Vuln-VulnComponent, SecTeam-Processed, serviceops, Security
MoritzMuehlenhoff created T336856: Docker Registry: Catalog API endpoint can lead to OOM via malicious user input (CVE-2023-2253 ).
Wed, May 17, 12:30 PM · Vuln-VulnComponent, SecTeam-Processed, serviceops, Security
MoritzMuehlenhoff added a comment to T336845: puppet: profile::auto_restarts::service: have a way to don't deploy the systemd timers.

I think you should simply add nfsv4 to profile::debdeploy::client::exclude_filesystems, that will entire make the issues vanish. We have similar config snippets in prouction for systems with an HFDS mount (such as the stat* nodes).

Wed, May 17, 12:10 PM · Patch-For-Review, Infrastructure-Foundations, cloud-services-team, Puppet-Infrastructure
MoritzMuehlenhoff added a comment to T333861: Host open source LLM (bloom, etc.) on Lift Wing.

Usual IANAL disclaimer ahead: If this were a software license this would not meet the standard required by OSI. They e.g. cover this in the FAQ at https://opensource.org/faq/#evil and one infamous example is the JSON license (http://www.json.org/license.html) for which https://lwn.net/Articles/707510/ is a nice writeup. That said, restrictions might not be fully enforced (I have no idea if "You agree not to use the Model or Derivatives of the Model" is a binding restriction?)

Wed, May 17, 10:45 AM · Patch-For-Review, Machine-Learning-Team

Tue, May 16

MoritzMuehlenhoff updated the task description for T335575: Integrate Bullseye 11.7 point update.
Tue, May 16, 1:45 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T335042: codfw row D switches upgrade.
Tue, May 16, 1:27 PM · Data-Platform-SRE, Discovery-Search (Current work), SRE, DBA, netops, Machine-Learning-Team, Traffic, serviceops-collab, SRE Observability, serviceops, cloud-services-team, Infrastructure-Foundations, Platform Engineering
MoritzMuehlenhoff updated the task description for T335042: codfw row D switches upgrade.
Tue, May 16, 12:54 PM · Data-Platform-SRE, Discovery-Search (Current work), SRE, DBA, netops, Machine-Learning-Team, Traffic, serviceops-collab, SRE Observability, serviceops, cloud-services-team, Infrastructure-Foundations, Platform Engineering
MoritzMuehlenhoff added a comment to T331699: Migrate the r/w LDAP servers to Bullseye.

Thanks for all the input, much appreciated! I'll revise the plan and update the task in the next days.

Tue, May 16, 9:53 AM · LDAP, Infrastructure-Foundations, SRE
MoritzMuehlenhoff closed T268735: debdeploy skipped hosts and assumed they're up to date(?) as Declined.

Old task, no longer really actionable at this point and this hasn't been seen since then.

Tue, May 16, 8:42 AM · Infrastructure-Foundations, User-Kormat, SRE
MoritzMuehlenhoff added a project to T235163: Investigate GID allocation for system users: Infrastructure-Foundations.
Tue, May 16, 8:40 AM · Infrastructure-Foundations, User-MoritzMuehlenhoff, User-jbond, SRE
MoritzMuehlenhoff updated the task description for T335042: codfw row D switches upgrade.
Tue, May 16, 8:24 AM · Data-Platform-SRE, Discovery-Search (Current work), SRE, DBA, netops, Machine-Learning-Team, Traffic, serviceops-collab, SRE Observability, serviceops, cloud-services-team, Infrastructure-Foundations, Platform Engineering
MoritzMuehlenhoff updated the task description for T335042: codfw row D switches upgrade.
Tue, May 16, 8:19 AM · Data-Platform-SRE, Discovery-Search (Current work), SRE, DBA, netops, Machine-Learning-Team, Traffic, serviceops-collab, SRE Observability, serviceops, cloud-services-team, Infrastructure-Foundations, Platform Engineering
MoritzMuehlenhoff updated the task description for T335042: codfw row D switches upgrade.
Tue, May 16, 8:16 AM · Data-Platform-SRE, Discovery-Search (Current work), SRE, DBA, netops, Machine-Learning-Team, Traffic, serviceops-collab, SRE Observability, serviceops, cloud-services-team, Infrastructure-Foundations, Platform Engineering
MoritzMuehlenhoff updated the task description for T335042: codfw row D switches upgrade.
Tue, May 16, 8:14 AM · Data-Platform-SRE, Discovery-Search (Current work), SRE, DBA, netops, Machine-Learning-Team, Traffic, serviceops-collab, SRE Observability, serviceops, cloud-services-team, Infrastructure-Foundations, Platform Engineering

Thu, May 11

MoritzMuehlenhoff created T336497: Add support for nftables in profile::base::firewall.
Thu, May 11, 1:23 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T271587: Create auto-populated LDAP group of those who have production shell access.

@SLyngshede-WMF, @MoritzMuehlenhoff this seems like something that fits with the IDM. I'm not sure it needs to be part of the bitu software. however perhaps a timer on that machine, thoughts?

Thu, May 11, 9:34 AM · Bitu, Infrastructure-Foundations, LDAP, SRE
MoritzMuehlenhoff lowered the priority of T271587: Create auto-populated LDAP group of those who have production shell access from Medium to Low.
Thu, May 11, 9:34 AM · Bitu, Infrastructure-Foundations, LDAP, SRE
MoritzMuehlenhoff added a comment to T330495: Prepare our custom installer for Bookworm.

The installer is working fine for baremetal and VM installations, but there will be a few more RC releases before the final release, so keeping the task open for now.

Thu, May 11, 7:32 AM · Patch-For-Review, Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T336231: Migrate jgreen out of the ops group and into the fr-tech-admins group.

This cert is used by the frack banner loggers to connect to production kafka. I'm not sure how we would put cergen in frack unless we're also importing the production puppet CA, since fundraising doesn't use production puppet at all. One hitch with the current setup is that we don't get notified when that cert is approaching expiration. We can fix that internally with our own cert monitoring, but that would still leave SRE getting notified separately for a cert used externally.

Thu, May 11, 7:27 AM · Infrastructure Security, Infrastructure-Foundations, fundraising-tech-ops
MoritzMuehlenhoff removed a project from T247045: Migrate all of production metal and VMs to Buster or later: Infrastructure Security.
Thu, May 11, 6:40 AM · SRE, Infrastructure-Foundations, Epic

Wed, May 10

MoritzMuehlenhoff updated the task description for T335575: Integrate Bullseye 11.7 point update.
Wed, May 10, 3:10 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T334455: Failover all db1115 services to db1215.

I am not sure if @MoritzMuehlenhoff was involved and might recall something, or could help us figure this out.

Wed, May 10, 3:10 PM · DBA

Tue, May 9

MoritzMuehlenhoff updated the task description for T335575: Integrate Bullseye 11.7 point update.
Tue, May 9, 3:26 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T331699: Migrate the r/w LDAP servers to Bullseye.
Tue, May 9, 2:58 PM · LDAP, Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T331699: Migrate the r/w LDAP servers to Bullseye.
Tue, May 9, 2:47 PM · LDAP, Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T335575: Integrate Bullseye 11.7 point update.
Tue, May 9, 12:50 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T335575: Integrate Bullseye 11.7 point update.
Tue, May 9, 11:30 AM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T336231: Migrate jgreen out of the ops group and into the fr-tech-admins group.

One other actionable: We'd need a new LDAP group (e.g. cn=fr-tech-admins or cn=fr-tech-sres) which grants +2 on operations/git.dns

Tue, May 9, 10:33 AM · Infrastructure Security, Infrastructure-Foundations, fundraising-tech-ops
MoritzMuehlenhoff added a comment to T336231: Migrate jgreen out of the ops group and into the fr-tech-admins group.

AFAICS are two remaining use cases for the private repo:

Tue, May 9, 9:58 AM · Infrastructure Security, Infrastructure-Foundations, fundraising-tech-ops
MoritzMuehlenhoff claimed T331699: Migrate the r/w LDAP servers to Bullseye.
Tue, May 9, 8:42 AM · LDAP, Infrastructure-Foundations, SRE

Mon, May 8

MoritzMuehlenhoff updated the task description for T335575: Integrate Bullseye 11.7 point update.
Mon, May 8, 4:15 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T331699: Migrate the r/w LDAP servers to Bullseye.

The current VMs are quite overdimensioned in terms of CPU core: I'd go with 4G RAM, 4 CPUs and 20G disk space instead for ldap-rw1001/2001

Mon, May 8, 3:50 PM · LDAP, Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T335575: Integrate Bullseye 11.7 point update.
Mon, May 8, 3:33 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T335575: Integrate Bullseye 11.7 point update.
Mon, May 8, 3:23 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T334154: As an FR-Tech SRE, we want to be able to designate a host for decommissioning.

Dear Infrastructure-Foundations, please choose a name for this new group

Mon, May 8, 2:57 PM · SRE, Infrastructure Security, Infrastructure-Foundations, SRE-Access-Requests, fundraising-tech-ops
MoritzMuehlenhoff created T336173: Update Proton to include Chromium 113.0.5672.126.
Mon, May 8, 1:11 PM · Content-Transform-Team-WIP, Proton
MoritzMuehlenhoff added a comment to T335981: let Eoghan see security tickets in Phabricator.

@MoritzMuehlenhoff - I'd agree with @Aklapper that you should probably just link to https://www.mediawiki.org/wiki/Security/SOP/Access_to_Phabricator_Security_Issues in the internal SRE doc as that describes the ideal process to follow. If folks email security-help@ about this, we would just create a Phab task for the request anyways.

Mon, May 8, 8:57 AM · SecTeam-Processed, Security, Security-Team, serviceops-collab

Fri, May 5

MoritzMuehlenhoff added a comment to T335981: let Eoghan see security tickets in Phabricator.

Maybe it's time to add "Mail security-help@wikimedia.org to get Security access in Phabricator" as part of the onboarding checklist for SREs. Anyone in SRE needs to be able to react to Security tasks opened by users, so this seems like a sensible default.

Fri, May 5, 8:50 AM · SecTeam-Processed, Security, Security-Team, serviceops-collab
MoritzMuehlenhoff added a comment to T305147: ipmiseld not running reliably.

@herron we've seen this alert being flapping on db2180 a lot lately:

[08:15:42]  <jinxer-wm> (SystemdUnitFailed) firing: ipmiseld.service Failed on db2180:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:30:42]  <jinxer-wm> (SystemdUnitFailed) resolved: ipmiseld.service Failed on db2180:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed

Even during the time of the alert, ipmi-tool keeps working fine. So I am not sure what's going on. Also, there are no wikitech page associated to it and I haven't been able to find any while searching. Can you give some insights here?
Thanks!

Fri, May 5, 8:36 AM · User-MoritzMuehlenhoff, Infrastructure-Foundations, observability, SRE

Thu, May 4

MoritzMuehlenhoff added a comment to T335981: let Eoghan see security tickets in Phabricator.

You need to email security-help@wikimedia.org, they make the change.

Thu, May 4, 5:06 PM · SecTeam-Processed, Security, Security-Team, serviceops-collab

May 3 2023

MoritzMuehlenhoff created T335879: spicerack.phabricator: Don't fail when logging to a restricted task.
May 3 2023, 3:30 PM · Infrastructure-Foundations, SRE-tools
MoritzMuehlenhoff closed T335282: Deal with archival of Stretch on Debian mirrors as Resolved.

This is resolved:

  • apt sources on remaining stretch servers stopped using the mirrors
  • stretch-based images are no longer being reported by docker-reporter and no longer being built in the core images
  • support for stretch has been removed from the cowbuilders on build2001
May 3 2023, 3:18 PM · serviceops-radar, Patch-For-Review, Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T334863: libmagickcore / libmagickwand 8:6.9.10.23+dfsg-2.1+deb10u4 causing test failures.

I could track this down to the upstream security fix for https://www.cve.org/CVERecord?id=CVE-2020-27759.

May 3 2023, 10:07 AM · Platform Team Workboards (Platform Engineering Reliability), Thumbor Migration
MoritzMuehlenhoff updated the task description for T335575: Integrate Bullseye 11.7 point update.
May 3 2023, 7:22 AM · Infrastructure-Foundations, SRE

May 2 2023

MoritzMuehlenhoff updated the task description for T335575: Integrate Bullseye 11.7 point update.
May 2 2023, 3:23 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T335575: Integrate Bullseye 11.7 point update.
May 2 2023, 2:40 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T334049: codfw row C switches upgrade.
May 2 2023, 1:25 PM · Data-Platform-SRE, Discovery-Search (Current work), SRE, DBA, serviceops, Infrastructure-Foundations, SRE Observability, serviceops-collab, Platform Engineering, Traffic, Data-Engineering, Machine-Learning-Team, netops, cloud-services-team
MoritzMuehlenhoff updated the task description for T334049: codfw row C switches upgrade.
May 2 2023, 12:11 PM · Data-Platform-SRE, Discovery-Search (Current work), SRE, DBA, serviceops, Infrastructure-Foundations, SRE Observability, serviceops-collab, Platform Engineering, Traffic, Data-Engineering, Machine-Learning-Team, netops, cloud-services-team
MoritzMuehlenhoff updated the task description for T335575: Integrate Bullseye 11.7 point update.
May 2 2023, 8:08 AM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T334049: codfw row C switches upgrade.
May 2 2023, 7:45 AM · Data-Platform-SRE, Discovery-Search (Current work), SRE, DBA, serviceops, Infrastructure-Foundations, SRE Observability, serviceops-collab, Platform Engineering, Traffic, Data-Engineering, Machine-Learning-Team, netops, cloud-services-team
MoritzMuehlenhoff removed a member for acl*sre-team: Legoktm.
May 2 2023, 7:03 AM
MoritzMuehlenhoff removed a member for acl*sre-team: Cmjohnson.
May 2 2023, 7:03 AM

Apr 28 2023

MoritzMuehlenhoff added a comment to T334863: libmagickcore / libmagickwand 8:6.9.10.23+dfsg-2.1+deb10u4 causing test failures.

Narrowed down to these three patches:

Apr 28 2023, 3:28 PM · Platform Team Workboards (Platform Engineering Reliability), Thumbor Migration
MoritzMuehlenhoff updated the task description for T335575: Integrate Bullseye 11.7 point update.
Apr 28 2023, 3:27 PM · Infrastructure-Foundations, SRE