Page MenuHomePhabricator

MoritzMuehlenhoff (Moritz Mühlenhoff)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Apr 1 2015, 4:33 PM (220 w, 6 d)
Availability
Available
LDAP User
Moritz Mühlenhoff
MediaWiki User
MMuhlenhoff (WMF) [ Global Accounts ]

Recent Activity

Today

MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Tue, Jun 25, 2:42 PM · Operations
MoritzMuehlenhoff reassigned T200209: Decom graphite2001 from fgiunchedi to RobH.
Tue, Jun 25, 10:15 AM · decommission, ops-codfw, Operations, observability
MoritzMuehlenhoff updated the task description for T200209: Decom graphite2001.
Tue, Jun 25, 10:15 AM · decommission, ops-codfw, Operations, observability
MoritzMuehlenhoff added a comment to T225998: Study performance impact of disabling TCP selective acknowledgments.

Breakdown of servers and their config

Tue, Jun 25, 8:18 AM · Traffic, Performance-Team, Performance, Operations
MoritzMuehlenhoff added a comment to T224677: Cannot connect to vcs@git-ssh.wikimedia.org (since move from phab1001 to phab1003).

I'll file a bug against the Debian OpenSSH package, this seems like a suitable candidate to apply in a point release as the patch is small enough and it fixes a genuine bug.

Tue, Jun 25, 8:08 AM · Upstream, Packaging, User-zeljkofilipin, Release-Engineering-Team (Kanban), Operations, Diffusion

Yesterday

Restricted Application added a project to T226405: Remove access to network gear for Casey Dentinger: Operations.
Mon, Jun 24, 2:26 PM · netops, Operations
MoritzMuehlenhoff created T226404: Check home leftovers of cwdent.
Mon, Jun 24, 2:20 PM · Analytics
MoritzMuehlenhoff added a comment to T226382: Hardware Request: puppet master eqiad.

Once T201342 is done, it seems like the best candidate for this.

Mon, Jun 24, 10:21 AM · Operations, ops-eqiad, DC-Ops
MoritzMuehlenhoff added a comment to T224188: rack/setup/install (3) new osd ceph nodes.

our network interface saturation monitoring is still diamond-based

What does that mean?

Mon, Jun 24, 7:24 AM · ops-eqiad, Operations, cloud-services-team (Kanban), Cloud-Services

Fri, Jun 21

MoritzMuehlenhoff added a comment to T220811: Test Thumbor OpenCL smart cropping on stat1005.

Thumbor relies on the python-opencv library for that stuff. I imagine that maybe python-opencv and/or dependencies may have to be compiled from source to leverage the GPU?

Fri, Jun 21, 8:26 AM · Patch-For-Review, User-jijiki, Thumbor, Performance-Team

Thu, Jun 20

MoritzMuehlenhoff added a comment to T202966: Make cp1099 the new pinkunicorn.

Or maybe use one of cp1071-cp1074, the servers which were used for the original ATS tests? These were bought in 2015 and are currently unused.

Thu, Jun 20, 11:02 AM · Patch-For-Review, Traffic, Operations
MoritzMuehlenhoff added a comment to T224260: restbase-dev1006 has a broken disk.

Can we please move forward with ordering a fixed disk? This broken disk causes subtle errors for all fleet-wide Cumin/debdeploy runs touching e.g. dpkg as it stalls I/O almost infinitely.

Thu, Jun 20, 7:18 AM · Cassandra, RESTBase, Core Platform Team (Security, stability, performance and scalability (TEC1)), Core Platform Team Backlog (Watching / External), Services (watching), Operations, DC-Ops, ops-eqiad

Wed, Jun 19

MoritzMuehlenhoff updated the task description for T226089: Make the Kerberos infrastructure production ready.
Wed, Jun 19, 1:38 PM · User-Elukey, Analytics
MoritzMuehlenhoff added a comment to T226104: Set up a generic workflow to create Kerberos accounts.

JFTR, this is implemented using the +needchange flag, e.g.

Wed, Jun 19, 1:35 PM · User-Elukey, Analytics
MoritzMuehlenhoff added a comment to T226089: Make the Kerberos infrastructure production ready.

add puppet automation to bootstrap a KDC service from scratch on a node (caveat: this might mean only partial automation since currently the kdc packages, when installing, require manual inputs)

Wed, Jun 19, 10:50 AM · User-Elukey, Analytics
MoritzMuehlenhoff updated the task description for T216384: Integrate Stretch 9.8 point update.
Wed, Jun 19, 10:27 AM · Operations
MoritzMuehlenhoff updated the task description for T216384: Integrate Stretch 9.8 point update.
Wed, Jun 19, 10:25 AM · Operations
MoritzMuehlenhoff added a comment to T210704: Migrate node-based services in production to node10.

Are you using component/node10? This should be fixed already, see https://phabricator.wikimedia.org/T215562#5066711 and followups.

Wed, Jun 19, 7:02 AM · serviceops, Core Platform Team Backlog (Later), Services (next), Operations

Tue, Jun 18

MoritzMuehlenhoff added a comment to T220590: Decom ms-be101[345].

Can we please move forward with the decom steps for at least 1013? This host is down due to hardware trouble for nearly two months( T220907) and always shows up as failing in fleet-wide Cumin runs.

Tue, Jun 18, 2:40 PM · decommission, User-fgiunchedi, media-storage, Operations
MoritzMuehlenhoff created T225998: Study performance impact of disabling TCP selective acknowledgments.
Tue, Jun 18, 9:36 AM · Traffic, Performance-Team, Performance, Operations
MoritzMuehlenhoff added a comment to T200209: Decom graphite2001.

I'm taking graphite2001 now to do some tests for prometheus v2 upgrade in T187987: 100% of Prometheus traffic served by Prometheus v2

Tue, Jun 18, 9:08 AM · decommission, ops-codfw, Operations, observability

Tue, Jun 11

MoritzMuehlenhoff added a comment to T224572: Migrate pool counters to Stretch/Buster.

I don't remember if there was a reason I didn't build it for stretch-backports at the time, but that should be relatively straightforward if we decide to go with stretch instead of buster.

Tue, Jun 11, 11:02 PM · serviceops, Operations
Legoktm awarded T224572: Migrate pool counters to Stretch/Buster a Like token.
Tue, Jun 11, 2:09 PM · serviceops, Operations

Fri, Jun 7

MoritzMuehlenhoff added a comment to T225306: Reboot all Analytics hosts for kernel + openjdk upgrades.

Note that the Ganeti reboots are a little different here as we need to load the new QEMU along for the new instruction. To reboot a Ganeti instance one needs to log to the respective Ganeti master (ganeti2003.codfw.wmnet for codfw, ganeti1001.eqiad.wmnet for eqiad) and run

Fri, Jun 7, 2:34 PM · Analytics-Kanban, Analytics
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Fri, Jun 7, 6:38 AM · Operations

Thu, Jun 6

MoritzMuehlenhoff added a comment to T218233: Doxygen search.php no longer works on doc.wikimedia.org.

There's two angles to address:

Thu, Jun 6, 1:23 PM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO, Upstream, Regression, MediaWiki-Documentation, Continuous-Integration-Infrastructure

Wed, Jun 5

MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, Jun 5, 7:20 AM · Operations

Tue, Jun 4

MoritzMuehlenhoff added a comment to T212257: Set up a Kerberos KDC service in production with minimal puppet automation .

Adding some ideas about a possible layout:

  • keytabs will be stored on puppetmaster1001's puppet private repo under /srv/private/modules/secrets/secrets/kerberos/FQDN/role (basically grouping keytabs by host and role - for example, analytics1028.eqiad.wmnet/hadoop/hdfs.keytab).
Tue, Jun 4, 9:13 PM · User-Elukey, Patch-For-Review, Analytics-Kanban, Analytics
MoritzMuehlenhoff added a comment to T212257: Set up a Kerberos KDC service in production with minimal puppet automation .

Added the following to the Analytics VLAN firewall rules:

elukey@re0.cr1-eqiad# show | compare
[edit firewall family inet filter analytics-in4]
       term schema { ... }
+      term kerberos {
+          from {
+              destination-address {
+                  /* kerberos1001 */
+                  10.64.0.182/32;
+              }
+              protocol tcp;
+              destination-port 88;
Tue, Jun 4, 2:46 PM · User-Elukey, Patch-For-Review, Analytics-Kanban, Analytics
MoritzMuehlenhoff triaged T224988: Reduce memory allocation for kafkamon instances as Normal priority.
Tue, Jun 4, 1:30 PM · Analytics, Operations
MoritzMuehlenhoff created T224988: Reduce memory allocation for kafkamon instances.
Tue, Jun 4, 1:30 PM · Analytics, Operations

Mon, Jun 3

MoritzMuehlenhoff created T224904: Check home leftovers of juliaglen.
Mon, Jun 3, 4:21 PM · Analytics-Kanban, Analytics
MoritzMuehlenhoff added a comment to T224577: Migrate etcd networking cluster to Stretch/Buster.

Actually, this is probably entirely unused, Fabián pointed me to T212934

Mon, Jun 3, 2:29 PM · serviceops, Kubernetes, Operations
MoritzMuehlenhoff added a comment to T224877: prometheus-pdns-exporter: add stretch support.

prometheus-pdns-rec-exporter should be available for Stretch, it's used on the production recursors, which are on Stretch:
https://debmonitor.wikimedia.org/packages/prometheus-pdns-rec-exporter

Mon, Jun 3, 12:19 PM · Patch-For-Review, Cloud-VPS, cloud-services-team (Kanban)
MoritzMuehlenhoff added a comment to T224857: Enhance MediaWiki deployments for support of php7.x.

If the server is pooled, wait for a lock on poolcounter (we can tune appropriately the concurrency allowed)

Mon, Jun 3, 11:23 AM · Release-Engineering-Team (Deployment services), Release-Engineering-Team-TODO, Patch-For-Review, User-jijiki, PHP 7.2 support, Scap, serviceops
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Mon, Jun 3, 10:49 AM · Operations
MoritzMuehlenhoff added a comment to T224562: Decommission darmstadtium.

yep, it could be decomm'd. Registry has being served by new servers by two weeks and i didn't see any hiccup yet so this server can go.

I can take care of the decom process but i don't know if it should be added to spare or contact dcops for complete obliteration.

Mon, Jun 3, 9:28 AM · Operations, Kubernetes
MoritzMuehlenhoff added a comment to T224723: Import AMD rocm packages in wikimedia-buster.

If Tensorflow works fine without hsa-ext-rocr-dev, we also have a third option, which seems cleaner and easier:

  • Import the existing repository (sans hsa-ext-rocr-dev) to a new thirdparty/rocm component
  • Create a dummy hsa-ext-rocr-dev deb using https://packages.debian.org/stable/equivs and import that to component/rocm
Mon, Jun 3, 7:59 AM · User-Elukey, Operations, Analytics
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Mon, Jun 3, 7:05 AM · Operations

Wed, May 29

MoritzMuehlenhoff added a comment to T224589: Migrate dbmonitor hosts to Stretch/Buster.

Apache should be harmless, it's just different versions of Apache 2.4, but I vaguely remember an issue with something requiring PHP5. But I might be completely off track here, it's just a vague recollection.

Wed, May 29, 2:25 PM · Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 1:54 PM · Operations
MoritzMuehlenhoff created T224591: Migrate contint* hosts to Stretch/Buster.
Wed, May 29, 1:54 PM · Continuous-Integration-Infrastructure (phase-out-jessie), Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 1:53 PM · Operations
MoritzMuehlenhoff created T224590: Migrate mendelevium/OTRS host to Stretch/Buster.
Wed, May 29, 1:52 PM · OTRS, Operations
MoritzMuehlenhoff created T224589: Migrate dbmonitor hosts to Stretch/Buster.
Wed, May 29, 1:51 PM · Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 1:43 PM · Operations
MoritzMuehlenhoff created T224586: Migrate fermium to stretch/buster.
Wed, May 29, 1:42 PM · Operations
MoritzMuehlenhoff created T224585: Migrate labmon* to Stretch.
Wed, May 29, 1:40 PM · cloud-services-team, Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 1:33 PM · Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 1:32 PM · Operations
MoritzMuehlenhoff created T224583: Migrate labstore1006/1007 to Stretch/Buster.
Wed, May 29, 1:32 PM · cloud-services-team (Kanban), Operations
MoritzMuehlenhoff created T224582: Migrate labstore1004/labstore1005 to Stretch/Buster.
Wed, May 29, 1:31 PM · cloud-services-team (Kanban), Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 1:23 PM · Operations
MoritzMuehlenhoff created T224580: Migrate etherpad1001 to Stretch/Buster.
Wed, May 29, 1:23 PM · Wikimedia-Etherpad, serviceops, Operations
MoritzMuehlenhoff created T224579: Migrate irc.wikimedia.org/kraz to Stretch/Buster.
Wed, May 29, 1:22 PM · Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 1:17 PM · Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 1:15 PM · Operations
MoritzMuehlenhoff created T224577: Migrate etcd networking cluster to Stretch/Buster.
Wed, May 29, 1:14 PM · serviceops, Kubernetes, Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 1:13 PM · Operations
MoritzMuehlenhoff created T224576: Upgrade install servers to Stretch/Buster.
Wed, May 29, 1:13 PM · Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 1:10 PM · Operations
MoritzMuehlenhoff added a parent task for T176774: Reimage cobalt as stretch: T224549: Track remaining jessie systems in production.
Wed, May 29, 1:09 PM · Release-Engineering-Team (Development services), Release-Engineering-Team-TODO, Gerrit, Operations
MoritzMuehlenhoff added a subtask for T224549: Track remaining jessie systems in production: T176774: Reimage cobalt as stretch.
Wed, May 29, 1:09 PM · Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 1:09 PM · Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 1:08 PM · Operations
MoritzMuehlenhoff created T224575: Migrate ununpentium/RT to Stretch/Buster.
Wed, May 29, 1:07 PM · Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 12:56 PM · Operations
MoritzMuehlenhoff created T224574: Migrate Kubernetes etcd clusters to Stretch/Buster.
Wed, May 29, 12:56 PM · serviceops, Kubernetes, Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 12:47 PM · Operations
MoritzMuehlenhoff created T224572: Migrate pool counters to Stretch/Buster.
Wed, May 29, 12:47 PM · serviceops, Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 12:36 PM · Operations
MoritzMuehlenhoff created T224571: Migrate auth* servers to Stretch/Buster.
Wed, May 29, 12:36 PM · Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 12:34 PM · Operations
MoritzMuehlenhoff created T224570: Migrate pybal-test2001 away from jessie.
Wed, May 29, 12:34 PM · Pybal, Traffic, Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 12:28 PM · Operations
Restricted Application added a project to T224569: Migrate ORES Redis servers to Stretch/Buster: Scoring-platform-team.
Wed, May 29, 12:28 PM · Scoring-platform-team, ORES, serviceops, Operations
MoritzMuehlenhoff created T224569: Migrate ORES Redis servers to Stretch/Buster.
Wed, May 29, 12:27 PM · Scoring-platform-team, ORES, serviceops, Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 12:18 PM · Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 12:17 PM · Operations
MoritzMuehlenhoff created T224568: Migrate etcd cluster for Kubernetes staging cluster to Stretch/Buster.
Wed, May 29, 12:17 PM · Kubernetes, Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 12:14 PM · Operations
MoritzMuehlenhoff created T224567: Migrate debug proxies to Stretch/Buster.
Wed, May 29, 12:13 PM · serviceops, Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 12:10 PM · Operations
MoritzMuehlenhoff created T224566: Reimage cloudvirtan* to Stretch.
Wed, May 29, 12:10 PM · cloud-services-team, Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 12:07 PM · Operations
MoritzMuehlenhoff created T224565: Migrate mwlog/udp2log servers to Stretch/Buster.
Wed, May 29, 12:07 PM · Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 11:44 AM · Operations
MoritzMuehlenhoff created T224564: Reimage wezen to Stretch (and rename to centrallog2001).
Wed, May 29, 11:44 AM · Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 11:41 AM · Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 11:38 AM · Operations
MoritzMuehlenhoff created T224563: Migrate dumpsdata hosts to Stretch/Buster.
Wed, May 29, 11:38 AM · Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 11:35 AM · Operations
MoritzMuehlenhoff added a subtask for T224549: Track remaining jessie systems in production: T224562: Decommission darmstadtium.
Wed, May 29, 11:35 AM · Operations
MoritzMuehlenhoff added a parent task for T224562: Decommission darmstadtium: T224549: Track remaining jessie systems in production.
Wed, May 29, 11:35 AM · Operations, Kubernetes
MoritzMuehlenhoff created T224562: Decommission darmstadtium.
Wed, May 29, 11:34 AM · Operations, Kubernetes
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 11:31 AM · Operations
MoritzMuehlenhoff created T224561: Migrate remaining cloudvirt hosts to Stretch/Mitaka.
Wed, May 29, 11:31 AM · cloud-services-team, Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 11:26 AM · Operations
MoritzMuehlenhoff created T224560: Migrate Zookeeper/etcd conf cluster in codfw to Stretch.
Wed, May 29, 11:26 AM · serviceops, Operations
MoritzMuehlenhoff updated the task description for T224549: Track remaining jessie systems in production.
Wed, May 29, 11:20 AM · Operations