Page MenuHomePhabricator

MoritzMuehlenhoff (Moritz Mühlenhoff)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Apr 1 2015, 4:33 PM (228 w, 3 d)
Availability
Available
LDAP User
Moritz Mühlenhoff
MediaWiki User
MMuhlenhoff (WMF) [ Global Accounts ]

Recent Activity

Fri, Aug 16

MoritzMuehlenhoff updated subscribers of T230609: Site: 2 VMs for puppetdb.
Fri, Aug 16, 12:55 PM · vm-requests, Operations
MoritzMuehlenhoff created T230609: Site: 2 VMs for puppetdb.
Fri, Aug 16, 12:55 PM · vm-requests, Operations
MoritzMuehlenhoff updated the task description for T210704: Migrate node-based services in production to node10.
Fri, Aug 16, 7:50 AM · Core Platform Team (Needs Cleaning - Services Operations), serviceops, Operations

Fri, Aug 9

MoritzMuehlenhoff added a comment to T230126: LDAP: multiples accounts for Phamhi.

Our admin module is in serious need of some revamp, I don't trust it to properly handle a rename. Hence, I'd suggest you handle it in in two steps and absent hpham with a subsequent step to re-add phamhi.

Fri, Aug 9, 10:24 AM · Patch-For-Review, LDAP, cloud-services-team (Kanban)

Thu, Aug 8

MoritzMuehlenhoff added a comment to T230024: Update component/php72 to 7.2.20.

Status: php7.2 currently fails to build on boron due to some build time hostname check which fails on boron, I still need to get to the bottom of that.

Thu, Aug 8, 11:42 AM · serviceops, Operations
MoritzMuehlenhoff added a comment to T180761: Move XHGui from tungsten to webperf-002.

Regarding multi-dc, we have four options I know of:

  1. Or; Push back this problem and migrate from tungsten to webperf1002 first.
    • no standby/failover. no backup.
    • performance.wikimedia.org/xhgui will remain SPOF.
Thu, Aug 8, 9:49 AM · Beta-Cluster-Infrastructure, Performance-Team

Wed, Aug 7

MoritzMuehlenhoff updated the task description for T228942: Onboard Hieu Pham to Wikimedia Foundation as SRE in Cloud Services.
Wed, Aug 7, 4:43 PM · cloud-services-team (Kanban)
MoritzMuehlenhoff updated the task description for T229860: SRE Onboarding for Sukhbir Singh.
Wed, Aug 7, 4:42 PM · SRE-Access-Requests, Traffic, Operations
MoritzMuehlenhoff added a comment to T224677: Cannot connect to vcs@git-ssh.wikimedia.org (since move from phab1001 to phab1003).

The update has been accepted by the Debian stable release managers and was uploded: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=932175#24, so the 9.10 point release for Stretch will contain the updated package.

Wed, Aug 7, 1:43 PM · Patch-For-Review, Release-Engineering-Team-TODO (201907), Release-Engineering-Team (Development services), Upstream, Packaging, User-zeljkofilipin, Operations, Diffusion
MoritzMuehlenhoff added a project to T230024: Update component/php72 to 7.2.20: serviceops.
Wed, Aug 7, 1:41 PM · serviceops, Operations
MoritzMuehlenhoff created T230024: Update component/php72 to 7.2.20.
Wed, Aug 7, 1:41 PM · serviceops, Operations
MoritzMuehlenhoff added a comment to T230022: Create a cookbook to restart the jvms on a Cassandra cluster.

I supports single instance Cassandra clusters as well (for maps), so all it should take is to add "aqs" to the list of clusters

Wed, Aug 7, 1:28 PM · SRE-tools, Operations
MoritzMuehlenhoff updated subscribers of T230022: Create a cookbook to restart the jvms on a Cassandra cluster.

@jbond added that a fews days ago in https://gerrit.wikimedia.org/r/#/c/operations/cookbooks/+/528133/ :-)

Wed, Aug 7, 1:22 PM · SRE-tools, Operations
MoritzMuehlenhoff added a comment to T180761: Move XHGui from tungsten to webperf-002.

This is blocking the removal of tungsten, what are the remaining blockers/work to do?

Wed, Aug 7, 11:29 AM · Beta-Cluster-Infrastructure, Performance-Team
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, Aug 7, 11:27 AM · Operations
MoritzMuehlenhoff closed T224572: Migrate pool counters to Buster, a subtask of T224549: Track remaining jessie systems in production, as Resolved.
Wed, Aug 7, 11:25 AM · Operations
MoritzMuehlenhoff closed T224572: Migrate pool counters to Buster as Resolved.

We now have the main pool counters running on Buster using the stock Debian package of poolcounter (poolcounter1004, poolcounter1005, poolcounter2003, poolcounter2004), the old Jessie instances have been removed.

Wed, Aug 7, 11:24 AM · serviceops, Operations
MoritzMuehlenhoff renamed T224572: Migrate pool counters to Buster from Migrate pool counters to Stretch/Buster to Migrate pool counters to Buster.
Wed, Aug 7, 11:23 AM · serviceops, Operations
MoritzMuehlenhoff created T230002: puppetdb queue size went up since July 30.
Wed, Aug 7, 9:31 AM · Patch-For-Review, Operations
MoritzMuehlenhoff renamed T229998: decom cookbook: dry-run mode not working / PuppetDB and Debmonitor removals can fail from decom cookbook: dry-run mode not working / PuppetDB removal failed to decom cookbook: dry-run mode not working / PuppetDB and Debmonitor removals can fail.
Wed, Aug 7, 9:10 AM · Operations
MoritzMuehlenhoff added a comment to T229998: decom cookbook: dry-run mode not working / PuppetDB and Debmonitor removals can fail.

The removal in Debmonitor has a similar race to the PuppetDB removal: I seem to be really lucky, hitting two different races in two subsequent decom runs :-)

Wed, Aug 7, 9:10 AM · Operations
MoritzMuehlenhoff added a comment to T229998: decom cookbook: dry-run mode not working / PuppetDB and Debmonitor removals can fail.

There's more: Next I ran the cook book for a host for which the dry-run mode had not been used on previously (to rule out that the incomplete dry-run skews the effective run):
(Started at 2019-08-07 08:37:09,859)

Wed, Aug 7, 8:54 AM · Operations
MoritzMuehlenhoff added a comment to T229998: decom cookbook: dry-run mode not working / PuppetDB and Debmonitor removals can fail.

After running the deactivate step a second time, poolcounter1003 got correctly removed. Looking at PuppetDB logs there might be some kind of race in PuppetDB:

Wed, Aug 7, 8:35 AM · Operations
MoritzMuehlenhoff created T229998: decom cookbook: dry-run mode not working / PuppetDB and Debmonitor removals can fail.
Wed, Aug 7, 8:17 AM · Operations

Tue, Aug 6

MoritzMuehlenhoff added a comment to T203963: Convert makevm to spicerack cookbook.

From my PoV yes, I've used this multiple times successfully to create Ganeti instances, all further enhancesments can be done via separate patches/tasks.

Tue, Aug 6, 12:34 PM · serviceops-radar, Patch-For-Review, User-crusnov, SRE-tools, User-jijiki, User-Joe, Operations
MoritzMuehlenhoff added a comment to T229915: Clean up "nobarrier" mount options for Buster.

The nobarrier option wasn't ever supported in the Debian installer. partman-xfs only supports the following options and the last change to that file was 12 years ago :-)
https://salsa.debian.org/installer-team/partman-xfs/blob/master/mountoptions/xfs

Tue, Aug 6, 11:56 AM · DBA
MoritzMuehlenhoff added a comment to T226633: PDF renderer needs better CJK font.

It would be best if we can do that, Noto/Source Serif CJK is the most comprehensive free serif CJK font available out there as of now. It would also be sweet if we can also back-port fonts-noto-cjk-extra which includes addition font-weights. I'm just not sure how we should do that though.

Tue, Aug 6, 9:50 AM · Operations, Patch-For-Review, Chinese-Sites, PDF-Rendering, Reading-Infrastructure-Team-Backlog, Proton
MoritzMuehlenhoff added a project to T226633: PDF renderer needs better CJK font: Operations.
Tue, Aug 6, 9:49 AM · Operations, Patch-For-Review, Chinese-Sites, PDF-Rendering, Reading-Infrastructure-Team-Backlog, Proton
MoritzMuehlenhoff added a comment to T226633: PDF renderer needs better CJK font.

Looks like the font is already installed in T184664, but its CJK variations weren't included in fc-list. I'm not sure it seems we're running on Stretch? fonts-noto-cjk on Buster also includes Noto Serif CJK, in addition to Noto Sans CJK.

Tue, Aug 6, 8:20 AM · Operations, Patch-For-Review, Chinese-Sites, PDF-Rendering, Reading-Infrastructure-Team-Backlog, Proton
akosiaris awarded T224559: Migrate Failoid hosts to Stretch/Buster a Like token.
Tue, Aug 6, 7:39 AM · Traffic, serviceops, Operations
MoritzMuehlenhoff added a comment to T229903: eqiad/codfw: One VM for Failoid.

LGTM. Naming wise I 'd say let's do failoid{1,2}001.(eqiad|codfw).wmnet instead of the less obvious tureis/roentgenium that we have now.

Tue, Aug 6, 7:38 AM · vm-requests, Operations
MoritzMuehlenhoff claimed T229903: eqiad/codfw: One VM for Failoid.
Tue, Aug 6, 7:30 AM · vm-requests, Operations
MoritzMuehlenhoff created T229903: eqiad/codfw: One VM for Failoid.
Tue, Aug 6, 7:29 AM · vm-requests, Operations
MoritzMuehlenhoff closed T104699: Firewall configurations for database hosts as Resolved.

This is complete.

Tue, Aug 6, 6:55 AM · DBA, Operations, Patch-For-Review
MoritzMuehlenhoff placed T187673: Build and deploy php-luasandbox 3.0.1 to Wikimedia wikis up for grabs.

This can wait until HHVM is undeployed, removing myself for now

Tue, Aug 6, 6:54 AM · Operations, LuaSandbox
MoritzMuehlenhoff renamed T187673: Build and deploy php-luasandbox 3.0.1 to Wikimedia wikis from Build and deploy hhvm-luasandbox 3.0.1 to Wikimedia wikis to Build and deploy php-luasandbox 3.0.1 to Wikimedia wikis.
Tue, Aug 6, 6:53 AM · Operations, LuaSandbox
MoritzMuehlenhoff closed T227778: Create an LDAP replica in codfw (using LVS), a subtask of T227650: Migrate web services using LDAP authentication towards the readonly LDAP replicas, as Resolved.
Tue, Aug 6, 6:52 AM · LDAP, Operations
MoritzMuehlenhoff closed T227778: Create an LDAP replica in codfw (using LVS) as Resolved.

This is completed and all services not requiring writes have been switched over.

Tue, Aug 6, 6:52 AM · LDAP, Operations

Mon, Aug 5

MoritzMuehlenhoff added a comment to T220504: Decommission sarin.

@RobH This needs to wait until https://phabricator.wikimedia.org/T229796 is complete, I'll reassign the bug to you when that's done.

Mon, Aug 5, 10:31 AM · Patch-For-Review, Operations, decommission, ops-codfw
MoritzMuehlenhoff added a comment to T220503: Decommission neodymium.

Please comment and if its ready to start the decom process, check off the boxes and assign to me for followup. Thanks in advance!

Mon, Aug 5, 10:30 AM · Patch-For-Review, decommission, Operations, ops-eqiad
MoritzMuehlenhoff updated subscribers of T151304: tmpreaper possible race condition.

Toolforge/Toollabs also uses tmpreaper (but not the puppetised version with the tmpreaper Puppet class). I'm adding @Andrew and @aborrero for comments whether we should keep it open for this or whether it's not worth tracking there.

Mon, Aug 5, 10:15 AM · serviceops, Operations
MoritzMuehlenhoff added a comment to T151304: tmpreaper possible race condition.

The patch seems sane, but I'm wondering whether we actually need to pursue this further? tmpreaper is dead upstream (the Debian maintainer keeps it alive a little for security fixes, but the origin of the codebase is a 20 years old tmpwatch RPM from Red Hat) and has significant bit rot on modern systems (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=881725). Notably we only use it on app servers, it seems to have been added back in 2015 to address core dumps from HHVM clogging up /tmp.

Mon, Aug 5, 7:35 AM · serviceops, Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Mon, Aug 5, 7:05 AM · Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Mon, Aug 5, 6:55 AM · Operations
MoritzMuehlenhoff updated the task description for T224561: Migrate remaining cloudvirt hosts to Stretch/Mitaka.
Mon, Aug 5, 6:53 AM · cloud-services-team, Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Mon, Aug 5, 6:32 AM · Operations

Thu, Jul 25

MoritzMuehlenhoff added a comment to T228346: PHP 7.2 garbage collector segfault.

We build our own PHP 7.2 packages already, so we can just cherry-pick that patch ourselves.

Thu, Jul 25, 1:47 PM · Patch-For-Review, Parsoid-PHP, PHP 7.2 support
MoritzMuehlenhoff added a comment to T224260: restbase-dev1006 has a broken disk.

The failed install might be due to https://phabricator.wikimedia.org/T222960#5327461 ?

Thu, Jul 25, 11:17 AM · Core Platform Team (Needs Cleaning - Cassandra Operational), Cassandra, RESTBase, Services (watching), Operations
MoritzMuehlenhoff reopened T227496: Access to WikimediaFoundation.org analytics for Deb as "Open".

@herron : You've added her to the wrong group, staff members need to be a member of cn=wmf, cn=nda is for people who have access to PII-relevant data, but are not staff members of the Foundation (i.e. community members or staff of Wikimedia Deutschland).

Thu, Jul 25, 6:18 AM · Operations, LDAP-Access-Requests, wikimediafoundation.org, Analytics

Wed, Jul 24

MoritzMuehlenhoff updated the task description for T226782: a1-eqiad pdu refresh (Thursday 9/12 @11am UTC).
Wed, Jul 24, 3:16 PM · DC-Ops, Operations, ops-eqiad
MoritzMuehlenhoff added a comment to T226104: Set up a generic workflow to create Kerberos accounts.

Is this limited to an-tool1006 or also other hosts?
Is this limited to the HDFS command or are other commands also affected? Do basic operations like klist work as expected?

Wed, Jul 24, 1:29 PM · Analytics-Kanban, User-Elukey, Analytics
MoritzMuehlenhoff reopened T227496: Access to WikimediaFoundation.org analytics for Deb as "Open".

@herron: If you add an account to a PII-relevant LDAP group which does not have shell access to the production cluster, it needs to be added to modules/admin/data/data.yaml

Wed, Jul 24, 9:03 AM · Operations, LDAP-Access-Requests, wikimediafoundation.org, Analytics

Tue, Jul 23

MoritzMuehlenhoff claimed T224572: Migrate pool counters to Buster.
Tue, Jul 23, 1:53 PM · serviceops, Operations
MoritzMuehlenhoff closed T199876: Migrate pool counters to stretch as Resolved.

Duplicate of T224572

Tue, Jul 23, 1:52 PM · PoolCounter, Operations
MoritzMuehlenhoff claimed T227650: Migrate web services using LDAP authentication towards the readonly LDAP replicas.

The following services have been converted to use the read-only replicas:

Tue, Jul 23, 1:42 PM · LDAP, Operations
MoritzMuehlenhoff updated the task description for T227139: a3-eqiad pdu refresh.
Tue, Jul 23, 12:34 PM · DC-Ops, Operations, ops-eqiad
MoritzMuehlenhoff added a comment to T227139: a3-eqiad pdu refresh.

restbase / logstash / graphite / prometheus hosts should be fine in an event of power loss,

Tue, Jul 23, 12:33 PM · DC-Ops, Operations, ops-eqiad

Fri, Jul 19

MoritzMuehlenhoff added a comment to T224260: restbase-dev1006 has a broken disk.

This server is still on Jessie, might be the best option to simply reimage as Stretch and re-bootstrap?

Fri, Jul 19, 1:14 PM · Core Platform Team (Needs Cleaning - Cassandra Operational), Cassandra, RESTBase, Services (watching), Operations

Jul 18 2019

MoritzMuehlenhoff added a comment to T228086: Swift TCP retransmits increase.

We could try rebooting the Thumbor hosts to the kernel version with the SACK fixes, they are currently running with SACKs disabled.

Jul 18 2019, 2:08 PM · User-fgiunchedi, Operations, media-storage
akosiaris awarded T228403: eqiad: One VM request for identity provider a Like token.
Jul 18 2019, 12:16 PM · Patch-For-Review, vm-requests, Operations
MoritzMuehlenhoff renamed T228403: eqiad: One VM request for identity provider from Site: (QUANTITY) VM %request for SERVICE[S] to eqiad: One VM request for identity provider.
Jul 18 2019, 10:43 AM · Patch-For-Review, vm-requests, Operations
MoritzMuehlenhoff claimed T228403: eqiad: One VM request for identity provider.
Jul 18 2019, 10:41 AM · Patch-For-Review, vm-requests, Operations
MoritzMuehlenhoff created T228403: eqiad: One VM request for identity provider.
Jul 18 2019, 10:41 AM · Patch-For-Review, vm-requests, Operations
MoritzMuehlenhoff added a comment to T228288: debmonitor send status update before the package actually got upgraded.

There is thus a possibility for a package to fail to upgrade but be listed as having been upgraded.

Jul 18 2019, 7:09 AM · SRE-tools

Jul 17 2019

MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Jul 17 2019, 8:21 AM · Operations
MoritzMuehlenhoff added a comment to T227288: eqiad: 1 misc node for the Kerberos KDC service.

Also followed up on the codfw task, but adding here for completeness as well: This looks good to me!

Jul 17 2019, 8:02 AM · hardware-requests, Operations, User-Elukey, Analytics
MoritzMuehlenhoff added a comment to T227425: codfw: 1 misc node for the Kerberos KDC service.

Ack, this looks good to me!

Jul 17 2019, 8:02 AM · hardware-requests, Operations, User-Elukey, Analytics
MoritzMuehlenhoff added a comment to T211881: graphoid: Code stewardship request.

Graphoid is based on NodeJS, so it should be migrated to Node 10 (and thus Stretch) either this or next quarter, see T210704.

Jul 17 2019, 7:13 AM · Release-Engineering-Team-TODO (201908), Release-Engineering-Team (Code Health), Core Platform Team Legacy (Watching / External), Services (watching), Operations, Code-Stewardship-Reviews, Graphoid
MoritzMuehlenhoff added a comment to T227734: Investigate whether GD is still needed on appservers.

On the Debian packaging level there are also no reverse depencies on php-gd or php7.2-gd.

Jul 17 2019, 7:02 AM · Patch-For-Review, Release-Engineering-Team-TODO, Release-Engineering-Team (Deployment services), Technical-Debt, Operations

Jul 16 2019

MoritzMuehlenhoff closed T226236: Upload docker-ce 18.06.3 upstream package for Stretch, a subtask of T224591: Migrate contint* hosts to Stretch/Buster, as Resolved.
Jul 16 2019, 11:58 AM · Continuous-Integration-Infrastructure (phase-out-jessie), Operations
MoritzMuehlenhoff closed T226236: Upload docker-ce 18.06.3 upstream package for Stretch, a subtask of T226233: Rebuild integration-slave-docker-* instances to use less RAM, new name and Stretch, as Resolved.
Jul 16 2019, 11:58 AM · Release-Engineering-Team-TODO (201908), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure (phase-out-jessie)
MoritzMuehlenhoff closed T226236: Upload docker-ce 18.06.3 upstream package for Stretch as Resolved.

Packages have been synched to thirdparty/ci for stretch-wikimedia.

Jul 16 2019, 11:58 AM · serviceops, Operations, Continuous-Integration-Infrastructure (phase-out-jessie)
MoritzMuehlenhoff added a comment to T228086: Swift TCP retransmits increase.

I've also rebooted the remaining frontends, but with some more data it doesn't actually seem as if this is caused by the disabled SACKs, if e.g. one limits the dashboard to "stat1005" (the blank period is where the server was depooled for the reboot), it seems as spiky as before: https://grafana.wikimedia.org/d/SxmTH3IZk/arzhels-playground?orgId=1&panelId=3&fullscreen&from=now-3h&to=now

Jul 16 2019, 11:28 AM · User-fgiunchedi, Operations, media-storage
MoritzMuehlenhoff added a comment to T228086: Swift TCP retransmits increase.

The effect is pretty visible for ms-be1005 on https://grafana.wikimedia.org/d/SxmTH3IZk/arzhels-playground?orgId=1&panelId=3&fullscreen&from=now-1h&to=now ; I'll also reboot the other frontends.

Jul 16 2019, 11:02 AM · User-fgiunchedi, Operations, media-storage
MoritzMuehlenhoff added a comment to T198939: Decommission servermon.

What about the puppet database on m1?

Jul 16 2019, 10:34 AM · Patch-For-Review, Operations
MoritzMuehlenhoff added a comment to T224677: Cannot connect to vcs@git-ssh.wikimedia.org (since move from phab1001 to phab1003).

I've submitted a proposed update to fix the underlying OpenSSH bug in Debian Stretch: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=932175

Jul 16 2019, 9:47 AM · Patch-For-Review, Release-Engineering-Team-TODO (201907), Release-Engineering-Team (Development services), Upstream, Packaging, User-zeljkofilipin, Operations, Diffusion
MoritzMuehlenhoff closed T170152: mc2023 / mc2025 fail to mount root partition within 90 seconds using Linux 4.9 as Resolved.

I think we can close this, the error didn't reoccur with the subsequent reboots and might have just been a race condition on the OS level.

Jul 16 2019, 8:28 AM · Operations, ops-codfw
MoritzMuehlenhoff added a comment to T186550: Anycast recdns.

It's my understanding that this reduces the steps necessary to restart our recursors is now reduced to a simple depool/repool and that the previous, complex approach from
https://wikitech.wikimedia.org/wiki/Service_restarts#DNS_recursors_(in_production_and_labservices) is now obsolete, right?

Jul 16 2019, 6:50 AM · Patch-For-Review, netops, Operations, Traffic

Jul 15 2019

MoritzMuehlenhoff triaged T227650: Migrate web services using LDAP authentication towards the readonly LDAP replicas as Normal priority.
Jul 15 2019, 1:48 PM · LDAP, Operations
MoritzMuehlenhoff added a comment to T212231: Remove Diamond from production.

It got removed from all production hosts (i.e. including cloudstore*) in fcd6990165c7ec8922a531d11782e21f1a5de04f and made specific to Cloud VPS instances with 3afb8303f164ced695dd5977d70c14611d54be7d

Jul 15 2019, 1:06 PM · observability, Operations
MoritzMuehlenhoff closed T212231: Remove Diamond from production as Resolved.

Diamond is now gone from production.

Jul 15 2019, 9:57 AM · observability, Operations
MoritzMuehlenhoff added a comment to T210850: WMCS-related dashboards using Diamond metrics.

Cole fixed the remaining dashboards. Andrew, can you have a final look whether everything works as expected, then we can close the task?

Jul 15 2019, 7:44 AM · cloud-services-team (Kanban), Operations
MoritzMuehlenhoff updated the task description for T210850: WMCS-related dashboards using Diamond metrics.
Jul 15 2019, 7:44 AM · cloud-services-team (Kanban), Operations
MoritzMuehlenhoff updated the task description for T210850: WMCS-related dashboards using Diamond metrics.
Jul 15 2019, 7:43 AM · cloud-services-team (Kanban), Operations

Jul 12 2019

MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Jul 12 2019, 8:09 AM · Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Jul 12 2019, 8:09 AM · Operations
MoritzMuehlenhoff updated the task description for T224553: Migrate remaining Restbase servers to Stretch.
Jul 12 2019, 8:08 AM · Core Platform Team (Needs Cleaning - Services Operations), RESTBase-Cassandra, Cassandra, RESTBase, Operations
MoritzMuehlenhoff added a comment to T226948: Degraded RAID on mw2250.

We don't use a lot of disk space on mw servers, let's go with option 2.

Jul 12 2019, 7:37 AM · User-jijiki, serviceops, Operations, ops-codfw

Jul 11 2019

MoritzMuehlenhoff closed T216384: Integrate Stretch 9.8 point update as Resolved.

All complete

Jul 11 2019, 4:42 PM · Operations
MoritzMuehlenhoff updated the task description for T216384: Integrate Stretch 9.8 point update.
Jul 11 2019, 4:42 PM · Operations
MoritzMuehlenhoff triaged T227778: Create an LDAP replica in codfw (using LVS) as Normal priority.
Jul 11 2019, 2:12 PM · LDAP, Operations
MoritzMuehlenhoff created T227778: Create an LDAP replica in codfw (using LVS).
Jul 11 2019, 2:12 PM · LDAP, Operations
MoritzMuehlenhoff closed T227669: codfw: 2 VMs for LDAP replicas as Resolved.

VMs have been created.

Jul 11 2019, 2:10 PM · vm-requests, Operations
MoritzMuehlenhoff added a comment to T209260: Integrate Stretch 9.6 point update.

These updates have been fully deployed:

Jul 11 2019, 10:41 AM · Operations
MoritzMuehlenhoff updated the task description for T216384: Integrate Stretch 9.8 point update.
Jul 11 2019, 10:14 AM · Operations

Jul 10 2019

MoritzMuehlenhoff added a comment to T198939: Decommission servermon.

We discussed this in the SRE Infrastructure Foundations meeting; given that there are other issues with Servermon blocking the Buster migration of the Puppet masters, servermon/netmon1003 can go away now. An alternative solution will be found for the use case described by Alex when the need comes up again.

Jul 10 2019, 4:50 PM · Patch-For-Review, Operations
akosiaris awarded T227657: Reduce memory allocation for ldap-eqiad-replica instances a Like token.
Jul 10 2019, 2:09 PM · Operations
akosiaris awarded T227669: codfw: 2 VMs for LDAP replicas a Like token.
Jul 10 2019, 2:00 PM · vm-requests, Operations
MoritzMuehlenhoff claimed T227669: codfw: 2 VMs for LDAP replicas.
Jul 10 2019, 1:59 PM · vm-requests, Operations
MoritzMuehlenhoff created T227669: codfw: 2 VMs for LDAP replicas.
Jul 10 2019, 1:59 PM · vm-requests, Operations