Page MenuHomePhabricator

MoritzMuehlenhoff (Moritz Mühlenhoff)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Apr 1 2015, 4:33 PM (255 w, 4 d)
Availability
Available
LDAP User
Moritz Mühlenhoff
MediaWiki User
MMuhlenhoff (WMF) [ Global Accounts ]

Recent Activity

Fri, Feb 21

MoritzMuehlenhoff added a comment to T245810: Standard partman recipe for druid hosts.

One option would be to manually create /srv/druid symlinks on the existing installed base and then switch Puppet to use it, with buster reimages and hw refreshes, the underlying remaining uses of /var/lib/druid would vanish over time.

Fri, Feb 21, 10:18 AM · Analytics, User-Elukey
MoritzMuehlenhoff closed T136094: Race condition in setting net.netfilter.nf_conntrack_tcp_timeout_time_wait as Resolved.

I think this bug can be closed in fact.

Fri, Feb 21, 9:27 AM · User-Elukey, Operations
MoritzMuehlenhoff triaged T245808: Ferm rules for cloudbackup2001/2001 as High priority.
Fri, Feb 21, 9:22 AM · cloud-services-team, Operations
MoritzMuehlenhoff created T245808: Ferm rules for cloudbackup2001/2001.
Fri, Feb 21, 9:22 AM · cloud-services-team, Operations
MoritzMuehlenhoff renamed T165136: Ferm rules for labstore1004/1005 NFS hosts from Ferm rules for labstore NFS hosts to Ferm rules for labstore1004/1005 NFS hosts.
Fri, Feb 21, 9:21 AM · cloud-services-team (Kanban), Cloud-VPS, Operations

Thu, Feb 20

MoritzMuehlenhoff added a comment to T245754: (No date provided) setup/install sretest100[12].eqiad.wmnet.

sretest100[12] is fine with me, but given that these are meant for various tests let's rather use an internal IP, unless @akosiaris has specific needs.

Thu, Feb 20, 6:34 PM · ops-eqiad, DC-Ops, Operations
MoritzMuehlenhoff closed T245747: Revoke LDAP access for Tobias Schumann (WMDE) as Resolved.

Thanks for opening a task. I've removed him from the nda and wmde groups.

Thu, Feb 20, 3:45 PM · LDAP-Access-Requests, Operations
MoritzMuehlenhoff claimed T245747: Revoke LDAP access for Tobias Schumann (WMDE).
Thu, Feb 20, 3:36 PM · LDAP-Access-Requests, Operations
MoritzMuehlenhoff created T245743: Icinga check for CAS-protected web services.
Thu, Feb 20, 2:37 PM · Security-Team, User-jbond, Operations
MoritzMuehlenhoff added a comment to T214024: Two test hosts for SREs.

I had missed the followup. sorry. These two spare hosts would be fine as test hosts!

Thu, Feb 20, 1:54 PM · Operations, hardware-requests
MoritzMuehlenhoff added a comment to T240187: mw1280 crashed logging correctable memory errors.

The server went down with the following error today:

Thu, Feb 20, 11:22 AM · Operations, ops-eqiad
MoritzMuehlenhoff updated the task description for T244693: Integrate Buster 10.3 point update.
Thu, Feb 20, 11:07 AM · Operations

Wed, Feb 19

MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, Feb 19, 9:21 AM · Operations
MoritzMuehlenhoff updated the task description for T245161: Track down and replace very old HW.
Wed, Feb 19, 8:16 AM · DC-Ops
MoritzMuehlenhoff added a comment to T245161: Track down and replace very old HW.

tungsten is currently running XHGui, once https://phabricator.wikimedia.org/T180761 is resolved it can be decommissioned.

Wed, Feb 19, 8:15 AM · DC-Ops

Mon, Feb 17

MoritzMuehlenhoff updated the task description for T244695: Integrate Stretch 9.12 point update.
Mon, Feb 17, 6:36 PM · Operations
MoritzMuehlenhoff updated the task description for T244695: Integrate Stretch 9.12 point update.
Mon, Feb 17, 12:06 PM · Operations
MoritzMuehlenhoff added a comment to T244499: Upgrade the Hadoop test cluster to BigTop.

Looking at "hadoop checknative" is opens/usr/lib/x86_64-linux-gnu/libcrypto.so which is a symlink to /usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.2. But EVP_CIPHER_CTX_encrypting was only introduced in OpenSSL 1.1.0

Mon, Feb 17, 11:09 AM · User-Elukey, Analytics-Cluster, Analytics

Fri, Feb 14

MoritzMuehlenhoff updated the task description for T244693: Integrate Buster 10.3 point update.
Fri, Feb 14, 4:20 PM · Operations
MoritzMuehlenhoff updated the task description for T244693: Integrate Buster 10.3 point update.
Fri, Feb 14, 4:14 PM · Operations
MoritzMuehlenhoff updated the task description for T244693: Integrate Buster 10.3 point update.
Fri, Feb 14, 4:10 PM · Operations
MoritzMuehlenhoff closed T178575: Add require_package() variant with repository component to wmflib as Declined.

With package_from_component() I don't think we need this any longer, it serves a similar purpose and proper dependencies can be defined.

Fri, Feb 14, 3:59 PM · User-jijiki, Puppet, Operations
MoritzMuehlenhoff created P10412 root partition usage on mw*.
Fri, Feb 14, 12:44 PM
MoritzMuehlenhoff updated the task description for T244693: Integrate Buster 10.3 point update.
Fri, Feb 14, 9:56 AM · Operations

Thu, Feb 13

MoritzMuehlenhoff added a comment to T244792: Determine any impacts to SRE from OIT's planned move to JumpCloud for LDAP.

@HMarcus Sure, we can do that. Let's do Thursday (2/20) - 7am PST, 4pm CET

Thu, Feb 13, 9:35 PM · User-jbond, Security-Team, Operations
MoritzMuehlenhoff added a comment to T245158: ganeti doesn't change the boot order to network.

How did the OVMF come about? Some snowflake setting from ealier tests or something we might run into again?

Thu, Feb 13, 3:49 PM · Operations
MoritzMuehlenhoff added a comment to T245158: ganeti doesn't change the boot order to network.

How did the OVMF come about? Some snowflake setting from ealier tests or something we might run into again?

Thu, Feb 13, 3:43 PM · Operations
MoritzMuehlenhoff triaged T245127: Full root partition/disk on gerrit1002 as Medium priority.
Thu, Feb 13, 9:51 AM · Operations
MoritzMuehlenhoff created T245127: Full root partition/disk on gerrit1002.
Thu, Feb 13, 9:51 AM · Operations
MoritzMuehlenhoff created T245114: Migrate Cumin hosts to Buster.
Thu, Feb 13, 8:56 AM · Operations
MoritzMuehlenhoff added a comment to T245071: mirrors.wikimedia.org libgtk-3-common all 3.22.11-1 hash mismatch.

I think we can rule out a change in the upstream repository; initially I had the hunch that this could be caused by a stale mirror after the latest Stretch point release, but nothing changed to that deb (libgtk-3-common) since 24 Mar 2017 when it was uploaded initially to Debian, the expected hashes are also the same of what's currently found on the mirrors.

Thu, Feb 13, 6:39 AM · cloud-services-team (Kanban), Operations, Toolforge
MoritzMuehlenhoff added a comment to T244792: Determine any impacts to SRE from OIT's planned move to JumpCloud for LDAP.

There are two different angles to consider:

Thu, Feb 13, 6:31 AM · User-jbond, Security-Team, Operations

Wed, Feb 12

MoritzMuehlenhoff added a comment to T244719: Create a replacement for kraz.wikimedia.org.

Given that Luca also had an error during initial setup related to name resolution, this sounds like some error related to the DNS records for the new host?

Wed, Feb 12, 7:39 PM · serviceops, Operations, vm-requests, User-Elukey, Analytics
MoritzMuehlenhoff updated the task description for T244693: Integrate Buster 10.3 point update.
Wed, Feb 12, 5:32 PM · Operations
MoritzMuehlenhoff updated the task description for T244693: Integrate Buster 10.3 point update.
Wed, Feb 12, 4:54 PM · Operations
MoritzMuehlenhoff updated the task description for T244695: Integrate Stretch 9.12 point update.
Wed, Feb 12, 1:43 PM · Operations
MoritzMuehlenhoff added a comment to T244719: Create a replacement for kraz.wikimedia.org.

Does this really need 8 GB RAM and 8 CPUs? The machine that this will replace (kraz) uses a single CPU (and hardly uses it) and has an average memory usage of 0.25G. I'm all for adding some headroom, but that seems a little excessive :-)

Wed, Feb 12, 8:58 AM · serviceops, Operations, vm-requests, User-Elukey, Analytics
MoritzMuehlenhoff updated the task description for T244693: Integrate Buster 10.3 point update.
Wed, Feb 12, 8:24 AM · Operations
MoritzMuehlenhoff added a comment to T244792: Determine any impacts to SRE from OIT's planned move to JumpCloud for LDAP.

Can you elaborate further on why this replication is needed\necessary? I understand it was in place for legacy reasons before the move to Google Enterprise, but I'm curious what needs it fills at this point. For instance, when we have had (multiple) LDAP outages at the office, it did not seem to impact mailflow from the SRE side (nobody yelled fire, at least), so it would be great if you could give me a rundown on why this is needed.

Wed, Feb 12, 8:10 AM · User-jbond, Security-Team, Operations
MoritzMuehlenhoff added a comment to T244410: LDAP access to the wmf group for CherRaye Glenn (superset, turnilo, hue).

Hi @Dzahn I was able to log into Superset! Do I use the same credentials for Turnilo as well?

Wed, Feb 12, 8:06 AM · Analytics-Kanban, Analytics, Operations, LDAP-Access-Requests

Tue, Feb 11

MoritzMuehlenhoff updated the task description for T244693: Integrate Buster 10.3 point update.
Tue, Feb 11, 11:34 AM · Operations
MoritzMuehlenhoff added a comment to T244792: Determine any impacts to SRE from OIT's planned move to JumpCloud for LDAP.

The LDAP replicas are critical to the wikimedia.org mail servers: We currently have a replication setup between two OpenLDAP servers in the production realm (ldap-corp1001.wikimedia.org in Virginia and ldap-corp2001.wikimedia.org in Texas) and ldap1.corp.wikimedia.org in the OIT network. The mail servers then query the ldap-corp* systems in production to determine whether a given @wikimedia.org address is legitimate or not.

Tue, Feb 11, 11:10 AM · User-jbond, Security-Team, Operations
MoritzMuehlenhoff reopened T244410: LDAP access to the wmf group for CherRaye Glenn (superset, turnilo, hue) as "Open".

This needs an entry in data.yaml, reopening

Tue, Feb 11, 8:12 AM · Analytics-Kanban, Analytics, Operations, LDAP-Access-Requests

Mon, Feb 10

MoritzMuehlenhoff added a comment to T185195: tmpreaper doesn't play along with PrivateTmp systemd units.

JFTR, this got fixed in tmpreaper via the 9.12 point release: https://packages.qa.debian.org/t/tmpreaper/news/20200130T211747Z.html and the 10.2 point release: https://packages.qa.debian.org/t/tmpreaper/news/20191013T191724Z.html

Mon, Feb 10, 3:25 PM · Patch-For-Review, Operations, User-Elukey
MoritzMuehlenhoff added a comment to T244626: vm requests for APT repo / webserver.

Yeah, separate VM is the idea. The specs looks good, we could probably even lower it a bit, but also won't hurt to have some head-room for further JunOS files etc. These will need a public IP.

Mon, Feb 10, 10:46 AM · Operations, serviceops-radar, vm-requests
MoritzMuehlenhoff added a comment to T244624: Beta puppet patch "prometheus: make ferm DNS record type configurable".

ferm has been fixed in stretch-wikimedia and buster-wikimedia to properly resolve AAAA records with a fallback, if all jessie instances are gone from deployment-prep, this patch be be removed (if all stretch/buster hosts are running ferm 2.4-1+wmf2+deb10u1 or 2.4-1+wmf2+deb9u1)

Mon, Feb 10, 10:42 AM · Beta-Cluster-Infrastructure, Operations, observability
MoritzMuehlenhoff updated the task description for T244695: Integrate Stretch 9.12 point update.
Mon, Feb 10, 10:03 AM · Operations
MoritzMuehlenhoff triaged T244695: Integrate Stretch 9.12 point update as Medium priority.
Mon, Feb 10, 9:07 AM · Operations
MoritzMuehlenhoff created T244695: Integrate Stretch 9.12 point update.
Mon, Feb 10, 9:07 AM · Operations
MoritzMuehlenhoff triaged T244693: Integrate Buster 10.3 point update as Medium priority.
Mon, Feb 10, 9:03 AM · Operations
MoritzMuehlenhoff created T244693: Integrate Buster 10.3 point update.
Mon, Feb 10, 8:23 AM · Operations

Fri, Feb 7

MoritzMuehlenhoff closed T244438: codfw: new mw servers not getting an IP when default to Stretch as Resolved.

This is confirmed working by Papaul when using the stretch-bootif tftpboot environment, closing.

Fri, Feb 7, 8:19 AM · serviceops-radar, Operations, ops-codfw

Thu, Feb 6

MoritzMuehlenhoff added a comment to T244438: codfw: new mw servers not getting an IP when default to Stretch.

The ethernet adapter is slightly different than the BCM5720 we otherwise already run on stretch. E.g. on ms-be2050 it reports as

Thu, Feb 6, 3:48 PM · serviceops-radar, Operations, ops-codfw
MoritzMuehlenhoff closed T209260: Integrate Stretch 9.6 point update as Resolved.

This is done for quite a while, closing.

Thu, Feb 6, 1:41 PM · Operations
MoritzMuehlenhoff added a comment to T242309: Onboarding Hugh Nowlan.

@hnowlan There's an error in the username configured in https://gerrit.wikimedia.org/r/566823, let me fix that.

Thu, Feb 6, 11:12 AM · serviceops-radar, Core Platform Team Workboards (Clinic Duty Team), Operations, SRE-Access-Requests

Wed, Feb 5

MoritzMuehlenhoff added projects to T244390: VM requests for install_server replacements: vm-requests, Operations.
Wed, Feb 5, 7:38 PM · Operations, vm-requests
MoritzMuehlenhoff added a comment to T244390: VM requests for install_server replacements.

Why the "Networking Requirements: public"? With the repository split off, those should be fine with an internal IP.

Wed, Feb 5, 7:37 PM · Operations, vm-requests

Mon, Feb 3

MoritzMuehlenhoff added a project to T225604: log spam from mtail 3.0.0~rc19 on wezen: Operations.
Mon, Feb 3, 1:13 PM · Operations, Patch-For-Review, observability
MoritzMuehlenhoff added a comment to T225604: log spam from mtail 3.0.0~rc19 on wezen.

I noticed that we have rc24 on mx1001 which is flagged for downgrade, should remaining hosts running 24 also be downgraded to rc19?

Mon, Feb 3, 1:12 PM · Operations, Patch-For-Review, observability

Jan 24 2020

MoritzMuehlenhoff changed the status of T242000: Allow LDAP access to superset dashboards for Moushira Elamrawy from Open to Stalled.
Jan 24 2020, 3:11 PM · LDAP-Access-Requests, Operations

Jan 23 2020

MoritzMuehlenhoff closed T241046: Add a second CPU to debmonitor hosts as Resolved.

Both debmonitor instances now have two CPUs.

Jan 23 2020, 1:36 PM · vm-requests, Operations
MoritzMuehlenhoff triaged T212395: cergen CI fails to run on Debian Stretch because cryptography dependency cannot be built against newer openssl version as Medium priority.

This task is from 2018, is that still an issue?

Jan 23 2020, 8:24 AM · Continuous-Integration-Config, Operations
MoritzMuehlenhoff triaged T243475: vm request for etherpad1002 as Medium priority.
Jan 23 2020, 8:23 AM · Wikimedia-Etherpad, serviceops, Operations
MoritzMuehlenhoff triaged T243444: Request took down both zotero and citoid (exceeding memory) as High priority.
Jan 23 2020, 8:23 AM · Operations, Citoid

Jan 22 2020

Dzahn awarded T224580: Migrate etherpad1001 to Buster a Mountain of Wealth token.
Jan 22 2020, 6:41 PM · Patch-For-Review, Wikimedia-Etherpad, serviceops, Operations
MoritzMuehlenhoff added a comment to T243288: Retire the Tor relay.

torrelay1001 is being reclaimed to the spare pool via https://phabricator.wikimedia.org/T243390 (only pending DC ops steps like disk wipe)

Jan 22 2020, 1:06 PM · Tor, Operations
MoritzMuehlenhoff moved T243390: Reclaim torrelay1001 to spares from Backlog to pending onsite steps (eqiad) on the decommission board.
Jan 22 2020, 1:03 PM · Operations, DC-Ops, decommission
MoritzMuehlenhoff renamed T243390: Reclaim torrelay1001 to spares from decommission torrelay1001 to Reclaim torrelay1001 to spares.
Jan 22 2020, 1:03 PM · Operations, DC-Ops, decommission
MoritzMuehlenhoff updated the task description for T243390: Reclaim torrelay1001 to spares.
Jan 22 2020, 12:43 PM · Operations, DC-Ops, decommission
MoritzMuehlenhoff updated the task description for T243390: Reclaim torrelay1001 to spares.
Jan 22 2020, 12:26 PM · Operations, DC-Ops, decommission
MoritzMuehlenhoff moved T243319: decommission labstore2003.codfw.wmnet and labstore2004.codfw.wmnet from Backlog to pending onsite steps (codfw) on the decommission board.
Jan 22 2020, 11:55 AM · ops-codfw, cloud-services-team (Hardware), Operations, DC-Ops, decommission
MoritzMuehlenhoff moved T243329: decommission labstore2001.codfw.wmnet and labstore2002.codfw.wmnet from Backlog to pending onsite steps (codfw) on the decommission board.
Jan 22 2020, 11:55 AM · ops-codfw, Operations, DC-Ops, decommission
MoritzMuehlenhoff updated the task description for T243390: Reclaim torrelay1001 to spares.
Jan 22 2020, 11:53 AM · Operations, DC-Ops, decommission
MoritzMuehlenhoff claimed T243390: Reclaim torrelay1001 to spares.
Jan 22 2020, 11:53 AM · Operations, DC-Ops, decommission
MoritzMuehlenhoff created T243390: Reclaim torrelay1001 to spares.
Jan 22 2020, 11:52 AM · Operations, DC-Ops, decommission
MoritzMuehlenhoff updated the task description for T232308: Integrate Stretch 9.10/9.11 point updates.
Jan 22 2020, 11:46 AM · Operations
MoritzMuehlenhoff placed T212395: cergen CI fails to run on Debian Stretch because cryptography dependency cannot be built against newer openssl version up for grabs.
Jan 22 2020, 10:02 AM · Continuous-Integration-Config, Operations
MoritzMuehlenhoff closed T243354: Requesting access to wmf LDAP group for dpifke as Resolved.

@dpifke I've added you to cn=wmf, let me know if you run into any issues.

Jan 22 2020, 9:59 AM · LDAP-Access-Requests, Operations
MoritzMuehlenhoff added a comment to T224580: Migrate etherpad1001 to Buster.

The following packages are used by the puppet role but so far missing on buster:

  • prometheus-etherpad-exporter
Jan 22 2020, 9:01 AM · Patch-For-Review, Wikimedia-Etherpad, serviceops, Operations

Jan 21 2020

Bawolff awarded T243288: Retire the Tor relay a Heartbreak token.
Jan 21 2020, 7:28 PM · Tor, Operations
MoritzMuehlenhoff assigned T243226: Upgrade puppet in deployment-prep (Puppet agent broken in Beta Cluster) to jbond.
Jan 21 2020, 2:34 PM · Operations, Beta-Cluster-Infrastructure
MoritzMuehlenhoff triaged T243288: Retire the Tor relay as Medium priority.
Jan 21 2020, 2:25 PM · Tor, Operations
MoritzMuehlenhoff created T243288: Retire the Tor relay.
Jan 21 2020, 2:25 PM · Tor, Operations

Jan 20 2020

MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Jan 20 2020, 12:58 PM · Operations
MoritzMuehlenhoff closed T224551: Migrate URL downloaders to Buster, a subtask of T224549: Track remaining jessie systems in production, as Resolved.
Jan 20 2020, 12:58 PM · Operations
MoritzMuehlenhoff closed T224551: Migrate URL downloaders to Buster as Resolved.

This is complete. The new Buster instances are urldownloader[12]00[12] and the old jessie systems have been removed.

Jan 20 2020, 12:58 PM · Operations
MoritzMuehlenhoff added a comment to T224551: Migrate URL downloaders to Buster.

But running puppet recreates the squid3 file. So this will happen again next time it gets restarted and needs a follow-up fix.

Jan 20 2020, 12:57 PM · Operations
MoritzMuehlenhoff triaged T243167: Upgrade BIOS and IDRAC firmware on R440 cp systems as High priority.
Jan 20 2020, 8:28 AM · DC-Ops, Traffic, Operations, ops-esams
MoritzMuehlenhoff created T243167: Upgrade BIOS and IDRAC firmware on R440 cp systems.
Jan 20 2020, 8:28 AM · DC-Ops, Traffic, Operations, ops-esams

Jan 17 2020

MoritzMuehlenhoff added a comment to T243056: Set up static-codereview.wikimedia.org to host static HTML dump of CodeReview .

We have role::webserver_misc_static (bromine/vega) for this.

Jan 17 2020, 12:14 PM · Patch-For-Review, Operations, MediaWiki-extensions-CodeReview
MoritzMuehlenhoff triaged T243056: Set up static-codereview.wikimedia.org to host static HTML dump of CodeReview as Medium priority.
Jan 17 2020, 12:13 PM · Patch-For-Review, Operations, MediaWiki-extensions-CodeReview
MoritzMuehlenhoff triaged T243057: Move Prometheus off eqsin/ulsfo/esams bastions as Medium priority.
Jan 17 2020, 12:13 PM · Operations, observability
MoritzMuehlenhoff triaged T243065: Provision plaintext syslog collectors in esams/ulsfo/eqsin as Medium priority.
Jan 17 2020, 12:12 PM · netops, observability, Operations
MoritzMuehlenhoff added a comment to T224551: Migrate URL downloaders to Buster.

Good catch! I'll review the difference between the Logrotate config shipped in the Debian config and our Puppet one, maybe we can simply stick with the Debian default entirely.

Jan 17 2020, 9:37 AM · Operations
MoritzMuehlenhoff closed T242807: Requesting access to analytics-privatedata-users, researchers & wmf for jennifer wang (jwang) as Resolved.

@jwang Your access is now enabled, let me know if you run into any issues logging in with SSH. If you have any specific questions wrt Hadoop access, best to ask the #wikimedia-analytics channel on IRC.

Jan 17 2020, 9:07 AM · Operations, SRE-Access-Requests
MoritzMuehlenhoff closed T242813: Requesting kerberos access for jwang as Resolved.

I created your Kerberos account. You should have a mail for your Kerberos account (required to access Hadoop) with further instructions.

Jan 17 2020, 9:00 AM · Analytics
MoritzMuehlenhoff triaged T241481: deployment-logstash03: UDP listener died EADDRINUSE as Medium priority.
Jan 17 2020, 8:10 AM · Operations, Beta-Cluster-Infrastructure
MoritzMuehlenhoff triaged T243048: python3.4 broken on deployment-logstash2 as Medium priority.
Jan 17 2020, 8:10 AM · Operations, Beta-Cluster-Infrastructure
MoritzMuehlenhoff added a comment to T242481: d-i fails to install on servers with BRCM 2P 1G BT + 2P 10G SFP NDC.

Thanks Papaul! I think we don't need to pursue the "let's disable the unused port" option further, the current solution within the debian-installer addresses this just fine (plus if we disable the 10G port, it'll cause further effort down the road to re-enable this once we have a 10G rack).

Jan 17 2020, 7:55 AM · Operations, DBA, ops-codfw

Jan 16 2020

MoritzMuehlenhoff added a comment to T242807: Requesting access to analytics-privatedata-users, researchers & wmf for jennifer wang (jwang).

@jwang : I already enabled your LDAP access via the "wmf" group, the services listed at https://wikitech.wikimedia.org/wiki/LDAP/Groups#wmf_group can now be accessed.

Jan 16 2020, 3:51 PM · Operations, SRE-Access-Requests