Page MenuHomePhabricator

MoritzMuehlenhoff (Moritz Mühlenhoff)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Apr 1 2015, 4:33 PM (211 w, 4 d)
Availability
Available
LDAP User
Moritz Mühlenhoff
MediaWiki User
MMuhlenhoff (WMF) [ Global Accounts ]

Recent Activity

Thu, Apr 18

MoritzMuehlenhoff created T221376: Broken network connection on ms-be1044 after reboot.
Thu, Apr 18, 2:31 PM · Operations, ops-eqiad
MoritzMuehlenhoff added a comment to T221343: puppet fails to run in cp1008 under certain conditions.

but this was working as expected before

Thu, Apr 18, 9:09 AM · Packaging, Puppet, Operations

Wed, Apr 17

MoritzMuehlenhoff created T221256: Jenkins plugins security update 2019-04-17.
Wed, Apr 17, 4:02 PM · Jenkins, Continuous-Integration-Infrastructure

Tue, Apr 16

MoritzMuehlenhoff updated the task description for T216384: Integrate Stretch 9.8 point update.
Tue, Apr 16, 10:49 AM · Patch-For-Review, Operations

Mon, Apr 15

MoritzMuehlenhoff added a comment to T220931: deploy1001 cannot reach cloudweb2001-dev.wikimedia.org when running scap.

JFTR; this needs to be dropped from hieradata/common/scap/dsh.yaml

Mon, Apr 15, 9:54 AM · Patch-For-Review, Scap, Operations, cloud-services-team
MoritzMuehlenhoff closed T203069: Deploy wikidiff2 v1.8.1 with changed signature as Resolved.

The PHP extension is now also fully deployed (with the exception of the labweb/wikitech hosts, but they still need the migration to PHP 7.2 and are currently HHVM-only)

Mon, Apr 15, 9:46 AM · Patch-For-Review, WMDE-QWERTY-Season-Sprint-2019-03-20, WMDE-QWERTY-Sprint-2019-03-06, WMDE-QWERTY-Sprint-2019-01-23, WMDE-QWERTY-Sprint-2019-01-10, WMDE-QWERTY-Sprint-2018-08-29, wikidiff2, MediaWiki-History-and-Diffs, TCB-Team
MoritzMuehlenhoff closed T203069: Deploy wikidiff2 v1.8.1 with changed signature, a subtask of T194272: Clean up config variable handling, as Resolved.
Mon, Apr 15, 9:46 AM · MW-1.33-notes (1.33.0-wmf.23; 2019-03-26), Patch-For-Review, WMDE-QWERTY-Sprint-2019-01-10, WMDE-QWERTY-X-Mas-Sprint-2018-12-18, WMDE-QWERTY-Sprint-2018-08-29, MW-1.32-notes (WMF-deploy-2018-08-28 (1.32.0-wmf.19)), WMDE-QWERTY-Sprint-2018-08-14, WMDE-MediaWiki-maintenance, wikidiff2, WMDE-QWERTY-Team, MediaWiki-History-and-Diffs, TCB-Team

Fri, Apr 12

MoritzMuehlenhoff added a comment to T209707: tagged_interface sometimes exceeds IFNAMSIZ.

We could also look into a backport of https://github.com/systemd/systemd/commit/9009d3b5c3b6d191be69215736be77583e0f23f9 to Stretch, seems totally doable and once confirmed to work fine in our environment, submit it as a merge request for the Debian systemd maintainers for a Stretch point release (every point release ships backports of important bugfixes, e.g. https://tracker.debian.org/news/1037358/accepted-systemd-232-25deb9u10-source-into-proposed-updates-stable-new-proposed-updates/

Fri, Apr 12, 2:22 PM · Patch-For-Review, Traffic, Operations
MoritzMuehlenhoff triaged T220820: Add a CI check for the use of hiera() function as Low priority.
Fri, Apr 12, 1:30 PM · Puppet, Operations
MoritzMuehlenhoff created T220820: Add a CI check for the use of hiera() function.
Fri, Apr 12, 1:30 PM · Puppet, Operations
MoritzMuehlenhoff added a comment to T203069: Deploy wikidiff2 v1.8.1 with changed signature.

Please remember that, when you do the upgrade php 7.2, you also need to clear the whole opcache for the same reasons we needed to clear the HHVM cache.

Fri, Apr 12, 9:53 AM · Patch-For-Review, WMDE-QWERTY-Season-Sprint-2019-03-20, WMDE-QWERTY-Sprint-2019-03-06, WMDE-QWERTY-Sprint-2019-01-23, WMDE-QWERTY-Sprint-2019-01-10, WMDE-QWERTY-Sprint-2018-08-29, wikidiff2, MediaWiki-History-and-Diffs, TCB-Team
MoritzMuehlenhoff added a comment to T220787: Fix RAID handler alert and puppet facter to work with Gen10 hosts and ssacli tool.

In addition io T220787#5106275, from the top of my head I think we need also:

  • check if the DSA script we're using to alarm on HP raid ( modules/raid/files/dsa-check-hpssacli ) has been updated upstream (Debian) and update it or patch it and send the patch upstream (cc @faidon )
Fri, Apr 12, 9:20 AM · Patch-For-Review, Operations, Icinga, monitoring
MoritzMuehlenhoff added a comment to T219461: rack/setup/install db2102.codfw.wmnet as a testing host for codfw backups.

@RobH @faidon Re: T219461#5103942 I wonder if we should document this stop as one to do for these models.

Fri, Apr 12, 7:55 AM · Patch-For-Review, ops-codfw, Operations, DBA
MoritzMuehlenhoff added a comment to T220787: Fix RAID handler alert and puppet facter to work with Gen10 hosts and ssacli tool.

We need to extend the "raid" fact in modules/raid/lib/facter/raid.rb to also detect the Gen10 controller and then return a custom fact (e.g. "ssa"). modules/raid/manifests/init.pp can then be updated in a subsequent step to automatically install the ssacli tool on the Smart Array Gen10 RAID systems.

Fri, Apr 12, 7:13 AM · Patch-For-Review, Operations, Icinga, monitoring
MoritzMuehlenhoff added a comment to T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host.

HPE renamed the tool, I installed "ssacli" and now "ssacli controller all show config" works fine.

Fri, Apr 12, 6:22 AM · DBA
MoritzMuehlenhoff added a comment to T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host.

I tried 4.19 on db2102, doesn't make a difference.

Fri, Apr 12, 6:07 AM · DBA

Thu, Apr 11

MoritzMuehlenhoff created T220716: Juniper security advisories (April 2019).
Thu, Apr 11, 4:22 PM · netops, Operations
Krenair awarded T213546: Prepare puppet infrastructure for Debian buster a Like token.
Thu, Apr 11, 4:07 PM · Patch-For-Review, Packaging, Puppet, Operations
MoritzMuehlenhoff added a comment to T213546: Prepare puppet infrastructure for Debian buster.

The only strange things I see in puppet runs on a couple of buster instances I deal with are this:

Thu, Apr 11, 4:03 PM · Patch-For-Review, Packaging, Puppet, Operations
MoritzMuehlenhoff updated subscribers of T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host.

The RAID controller shows up in early device detection by the kernel:

Thu, Apr 11, 3:59 PM · DBA
MoritzMuehlenhoff added a comment to T213546: Prepare puppet infrastructure for Debian buster.

I think we can close this one?

Thu, Apr 11, 3:24 PM · Patch-For-Review, Packaging, Puppet, Operations
MoritzMuehlenhoff added a comment to T203069: Deploy wikidiff2 v1.8.1 with changed signature.

The HHVM extension has been fully rolled out to production. The PHP extension (built from a different source package) is still TBD.

Thu, Apr 11, 3:09 PM · Patch-For-Review, WMDE-QWERTY-Season-Sprint-2019-03-20, WMDE-QWERTY-Sprint-2019-03-06, WMDE-QWERTY-Sprint-2019-01-23, WMDE-QWERTY-Sprint-2019-01-10, WMDE-QWERTY-Sprint-2018-08-29, wikidiff2, MediaWiki-History-and-Diffs, TCB-Team
MoritzMuehlenhoff added a comment to T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host.

This is weird, do we have a second server of that model for comparison? I don't even see the controller is lspci (it should identify as "Subsystem: Hewlett-Packard Company Smart Array P408i-a SR Gen10"), so I'd like to rule out a hardware/connection issue with that specific server.

Thu, Apr 11, 3:03 PM · DBA
MoritzMuehlenhoff added a comment to T40010: Re-evaluate librsvg as SVG renderer on Wikimedia wikis.

It has everything we need, then. If you feel like backporting this to Stretch, I can write the Thumbor engine and tests for it. Then we can test it easily via config override on a specific Thumbor server.

Thu, Apr 11, 10:30 AM · TechCom-RFC, MediaWiki-File-management, Commons, Multimedia, Wikimedia-SVG-rendering
MoritzMuehlenhoff updated the task description for T220565: offboard tilman bayer.
Thu, Apr 11, 9:10 AM · Operations, SRE-Access-Requests
MoritzMuehlenhoff added a comment to T40010: Re-evaluate librsvg as SVG renderer on Wikimedia wikis.

I can't find the man page for the resvg command-line tool. What it needs to support is rendering to a specific width and the ability to set the language you want rendered (for multilingual SVGs).

Thu, Apr 11, 8:13 AM · TechCom-RFC, MediaWiki-File-management, Commons, Multimedia, Wikimedia-SVG-rendering
MoritzMuehlenhoff added a comment to T218575: Reallocate LDAP database from labtestservices2001.

These test hosts don't use replication and are standalone, right? I think then we can simply do a "slapcat > foo.ldif" on the old trusty host, then stop slapd on the new stretch host and use slapadd to transfer the LDIF data.

Thu, Apr 11, 7:39 AM · Patch-For-Review, cloud-services-team (Kanban)
MoritzMuehlenhoff updated the task description for T212772: Track remaining trusty servers in production.
Thu, Apr 11, 7:14 AM · cloud-services-team (Kanban), Patch-For-Review, Operations
MoritzMuehlenhoff closed T186288: replace all Ubuntu (trusty) hosts in production with Debian as Resolved.

This ticket is superseded by https://phabricator.wikimedia.org/T212772

Thu, Apr 11, 7:13 AM · Epic, cloud-services-team, Operations

Wed, Apr 10

MoritzMuehlenhoff triaged T220600: Remove PHP 7.0 from production application servers as Normal priority.
Wed, Apr 10, 12:35 PM · serviceops, Operations
MoritzMuehlenhoff created T220600: Remove PHP 7.0 from production application servers.
Wed, Apr 10, 12:35 PM · serviceops, Operations
MoritzMuehlenhoff added a comment to T215810: Package envoy 1.9.X for stretch and use it as redis proxy on docker registry.

Building 1.9.1 due to CVE

Wed, Apr 10, 9:15 AM · Patch-For-Review, User-fsero, serviceops, Prod-Kubernetes, Kubernetes, Operations
MoritzMuehlenhoff added a comment to T220565: offboard tilman bayer.

He wasn't removed from the cn=wmf LDAP group, I fixed that:

Wed, Apr 10, 7:41 AM · Operations, SRE-Access-Requests

Tue, Apr 9

MoritzMuehlenhoff triaged T220505: Decommission iron as Normal priority.
Tue, Apr 9, 1:49 PM · ops-eqiad, decommission, Operations
MoritzMuehlenhoff created T220505: Decommission iron.
Tue, Apr 9, 1:48 PM · ops-eqiad, decommission, Operations
MoritzMuehlenhoff triaged T220503: Decommission neodymium as Normal priority.
Tue, Apr 9, 1:47 PM · decommission, ops-eqiad, Operations
MoritzMuehlenhoff triaged T220504: Decommission sarin as Normal priority.
Tue, Apr 9, 1:47 PM · Operations, decommission, ops-codfw
MoritzMuehlenhoff created T220504: Decommission sarin.
Tue, Apr 9, 1:45 PM · Operations, decommission, ops-codfw
MoritzMuehlenhoff created T220503: Decommission neodymium.
Tue, Apr 9, 1:44 PM · decommission, ops-eqiad, Operations
MoritzMuehlenhoff closed T219274: cronspam: cross-validate-accounts, a subtask of T132324: Tracking and Reducing cron-spam to root@ , as Resolved.
Tue, Apr 9, 10:48 AM · Patch-For-Review, Operations
MoritzMuehlenhoff closed T219274: cronspam: cross-validate-accounts as Resolved.

I would say the team responsible is the entire SRE which historically was the same as people receiving root mail.

Tue, Apr 9, 10:48 AM · Operations
MoritzMuehlenhoff added a comment to T148843: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models.

The https://rocm.github.io/ROCmInstall.html module lists among the things to do the following:

  • echo 'SUBSYSTEM=="kfd", KERNEL=="kfd", TAG+="uaccess", GROUP="video"' | sudo tee /etc/udev/rules.d/70-kfd.rules
Tue, Apr 9, 9:57 AM · Patch-For-Review, User-Elukey, Operations, Analytics, Research-management

Mon, Apr 8

MoritzMuehlenhoff added a comment to T192457: Reallocate former image scalers.

During the HHVM updates I noticed that mw2151 is in site.pp as a jobrunner, but not listed in conftool-data.

Mon, Apr 8, 1:49 PM · Patch-For-Review, Operations
MoritzMuehlenhoff added a comment to T215415: mw2206.codfw.wmnet memory issues .

Also the error I have here is not telling me which memory row or channel it refers to so it's difficult to tell which one to replace . The reason being maybe the memory is about to fail thats why it ls not logged into the HW log yet. I will take the system down and run memtest to see if that can help me find the bad DIMM.

Mon, Apr 8, 1:23 PM · User-jijiki, serviceops, ops-codfw, Operations
MoritzMuehlenhoff created T220362: Evaluate SSO solutions.
Mon, Apr 8, 11:02 AM · Operations
MoritzMuehlenhoff created T220361: Audit our infrastructure for authenticated services.
Mon, Apr 8, 11:02 AM · Operations
MoritzMuehlenhoff added a project to T220342: Only one thumbor server (thumbor1002) upgraded to librsvg 2.40.20-3: Operations.
Mon, Apr 8, 9:35 AM · Patch-For-Review, serviceops, Operations, Thumbor, Multimedia, Commons
MoritzMuehlenhoff added a comment to T220342: Only one thumbor server (thumbor1002) upgraded to librsvg 2.40.20-3.

We currently only have an apt::pin for wikimedia-thumbor, we also need one for librsvg2-2, librsvg2-bin and librsvg2-common.

Mon, Apr 8, 9:34 AM · Patch-For-Review, serviceops, Operations, Thumbor, Multimedia, Commons
MoritzMuehlenhoff reopened T219776: labtestnet2003.codfw.wmnet: rename to cloudnet2003-dev.codfw.wmnet and reimage to stretch as "Open".

labtestnet2003 is still in puppetdb:

Mon, Apr 8, 7:33 AM · Operations, DC-Ops, ops-codfw, Patch-For-Review, Cloud-VPS, cloud-services-team (Kanban)
MoritzMuehlenhoff reopened T219776: labtestnet2003.codfw.wmnet: rename to cloudnet2003-dev.codfw.wmnet and reimage to stretch, a subtask of T217891: CloudVPS: rework codfw deployments, as Open.
Mon, Apr 8, 7:33 AM · Cloud-VPS, cloud-services-team (Kanban)

Fri, Apr 5

MoritzMuehlenhoff added a comment to T219803: upgrade facter and puppet across the fleet.

Ah, got it. Sorry for not reading more of the context here, just saw that
one line and thought "uh oh" :)

Fri, Apr 5, 4:45 PM · Patch-For-Review, Packaging, Puppet, Operations
MoritzMuehlenhoff added a comment to T220217: PHP Warning: wgWikiDiff2MovedParagraphDetectionCutoff is set WikiDiff2 does not support it.

@Krinkle, @WMDE-Fisch : Shall we depool the five servers already upgraded until that is resolved?

Fri, Apr 5, 4:25 PM · Patch-For-Review, Operations, wikidiff2, Wikimedia-production-error
MoritzMuehlenhoff added a comment to T203069: Deploy wikidiff2 v1.8.1 with changed signature.

Sure, I can do that. Once the freeze is over, I can get it into buster-backports and I assume stretch-backports-sloppy, though I'm not sure the latter would help since we'd still need to rebuild it for 7.2.

Fri, Apr 5, 7:59 AM · Patch-For-Review, WMDE-QWERTY-Season-Sprint-2019-03-20, WMDE-QWERTY-Sprint-2019-03-06, WMDE-QWERTY-Sprint-2019-01-23, WMDE-QWERTY-Sprint-2019-01-10, WMDE-QWERTY-Sprint-2018-08-29, wikidiff2, MediaWiki-History-and-Diffs, TCB-Team
MoritzMuehlenhoff added a comment to T40010: Re-evaluate librsvg as SVG renderer on Wikimedia wikis.

resvg is now available in Debian unstable: https://packages.qa.debian.org/r/resvg/news/20190403T150642Z.html

Fri, Apr 5, 7:17 AM · TechCom-RFC, MediaWiki-File-management, Commons, Multimedia, Wikimedia-SVG-rendering
MoritzMuehlenhoff added a comment to T219764: jessie rsyslog upgrade problems.

Thanks. Do we know how many production hosts are affected, if any?

Fri, Apr 5, 7:16 AM · User-fgiunchedi, Operations

Thu, Apr 4

MoritzMuehlenhoff added a comment to T198939: Decommission servermon.

Is anyone still using Servermon at this point?

Thu, Apr 4, 3:55 PM · Patch-For-Review, Operations
MoritzMuehlenhoff reassigned T215562: npm 6 consistently fails with "Z_DATA_ERROR: invalid distance too far back" on some repos from MoritzMuehlenhoff to Krinkle.

OK. Looks like the image will already be tested as part of another service deployment. Assigning back to Moritz to notify once it's up on apt-wikimedia so that I can rebuild the relevant CI images after that.

Thu, Apr 4, 11:41 AM · Operations, User-zeljkofilipin, Patch-For-Review, Continuous-Integration-Config
MoritzMuehlenhoff added a comment to T219764: jessie rsyslog upgrade problems.

I looked into this as well and the rsyslog 8.38 prerm doesn't have the extra [ "$1" = remove ] test guarding invoke-rc.d, which 8.1901 does have instead, hence why rsyslog stop is called on prerm + upgrade but shouldn't. Adding the extra guard to the prerm makes the upgrade work as expected:

Thu, Apr 4, 10:20 AM · User-fgiunchedi, Operations
MoritzMuehlenhoff added a comment to T220003: Add security apt security suites to pbuilder base images .

Ah, so jessie-security is partly behaving like backports in a sense. OK, so my assumption wasn't entirely correct. Thanks for explaining it. Would it make sense to if guard this somehow for jessie specifically, or some other tunable? Or would it just make things worse?

Thu, Apr 4, 10:16 AM · Patch-For-Review, Packaging, Operations
MoritzMuehlenhoff added a comment to T220003: Add security apt security suites to pbuilder base images .

Currently the /etc/apt/sources.list for the pbuilder base images are missing entries for the security suites. Theses files should be updated and managed by puppet.

Why though? When creating that puppet module I avoided that on purpose, relying on the fact that security updates would anyway be making it to our hosts and package names for that remain constant. That assumption might not be true anymore, or I may very well have erred back then, but I 'd like to know which of the 2 (or something else entirely) it is.

Thu, Apr 4, 9:47 AM · Patch-For-Review, Packaging, Operations

Wed, Apr 3

MoritzMuehlenhoff updated subscribers of T220003: Add security apt security suites to pbuilder base images .
Wed, Apr 3, 3:38 PM · Patch-For-Review, Packaging, Operations
MoritzMuehlenhoff added a comment to T219803: upgrade facter and puppet across the fleet.

Clang-4.0 is provided by security jessie/updates and have managed to get pbuilder working by adding the following

Wed, Apr 3, 3:14 PM · Patch-For-Review, Packaging, Puppet, Operations
MoritzMuehlenhoff added a comment to T219803: upgrade facter and puppet across the fleet.

It's a tough nut to crack, I've made progress on a number of issues, but still not fully done yet:

Wed, Apr 3, 2:00 PM · Patch-For-Review, Packaging, Puppet, Operations
MoritzMuehlenhoff updated subscribers of T219933: parsoid-vd on scandium randomly died.
Wed, Apr 3, 7:57 AM · Patch-For-Review, Operations
MoritzMuehlenhoff added a comment to T148843: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models.
Wed, Apr 3, 7:45 AM · Patch-For-Review, User-Elukey, Operations, Analytics, Research-management

Tue, Apr 2

MoritzMuehlenhoff added a comment to T219803: upgrade facter and puppet across the fleet.

This makes the build phase work fine (but it's failing in test suite now, but unrelated).

Tue, Apr 2, 12:54 PM · Patch-For-Review, Packaging, Puppet, Operations
MoritzMuehlenhoff added a comment to T219803: upgrade facter and puppet across the fleet.

The later cmake version in combination with the debian/rules file tries to enable position independent ELF files, which doesn't work with libcurl-openssl from standard jessie as it's not yet built as a PIC binary. See the "hardening" section of https://manpages.debian.org/stretch/dpkg-dev/dpkg-buildflags.1.en.html for some background.

Tue, Apr 2, 12:50 PM · Patch-For-Review, Packaging, Puppet, Operations
MoritzMuehlenhoff added a comment to T219803: upgrade facter and puppet across the fleet.

I had a look at the missing packages:

Tue, Apr 2, 10:06 AM · Patch-For-Review, Packaging, Puppet, Operations
MoritzMuehlenhoff added a comment to T219764: jessie rsyslog upgrade problems.

Running the steps from the prerm on a jessie system with 8.38 works fine:

Tue, Apr 2, 9:39 AM · User-fgiunchedi, Operations
MoritzMuehlenhoff created T219854: Broken disk on ms-be2026.
Tue, Apr 2, 7:49 AM · Patch-For-Review, Operations, ops-codfw

Mon, Apr 1

MoritzMuehlenhoff added a comment to T215562: npm 6 consistently fails with "Z_DATA_ERROR: invalid distance too far back" on some repos.

@Krinkle I've prepared a new build and uploaded it to https://people.wikimedia.org/~jmm/node/

Mon, Apr 1, 1:20 PM · Operations, User-zeljkofilipin, Patch-For-Review, Continuous-Integration-Config
MoritzMuehlenhoff closed T218193: Switch dumps to component/php7.2 as Resolved.

This is done

Mon, Apr 1, 7:29 AM · Dumps-Generation, Operations
MoritzMuehlenhoff closed T218193: Switch dumps to component/php7.2, a subtask of T216712: Switch PHP 7.2 packages to an internal component, as Resolved.
Mon, Apr 1, 7:29 AM · Patch-For-Review, cloud-services-team (Kanban), Toolforge, Operations

Fri, Mar 29

MoritzMuehlenhoff added a comment to T219279: Some pages will become completely unreachable after PHP7 update due to Unicode changes.

@Anomie so you're suggesting we need to complete the switchover, then amend the problematic situations?

I would've assumed such a bug would be a blocker for larger deployments.

As I see it, we can go down the following paths:

  • "Patch" php 7.2 to behave as the preceding versions, complete the deployment, fix the now-duplicate pages, and remove the patch.
Fri, Mar 29, 11:05 AM · Core Platform Team Kanban (Doing), MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), Patch-For-Review, Core Platform Team (PHP7 (TEC4)), serviceops, Operations, PHP 7.2 support, MediaWiki-General-or-Unknown
MoritzMuehlenhoff added a comment to T215562: npm 6 consistently fails with "Z_DATA_ERROR: invalid distance too far back" on some repos.

As such, it is effectively our fault for packaging it this way. We need to either:

Fri, Mar 29, 8:28 AM · Operations, User-zeljkofilipin, Patch-For-Review, Continuous-Integration-Config

Thu, Mar 28

MoritzMuehlenhoff closed T216711: Audit our puppet tree for uses of jessie-backports as Resolved.

Closing, the remaining work for this is handled via T219333

Thu, Mar 28, 9:09 PM · Patch-For-Review, Operations
MoritzMuehlenhoff added a comment to T217055: labtestmetal2001.codfw.wmnet needs to be rebuilt.

labtestmetal2001 had /etc/apt/apt.conf.d/00backports-default-release pointing to jessie-backports, which broke debmonitor as jessie-backports has been archived, I removed the file to unbreak it (and given that the host will be rebuilt anyway).

Thu, Mar 28, 11:50 AM · cloud-services-team (Kanban)

Mar 22 2019

MoritzMuehlenhoff updated subscribers of T203069: Deploy wikidiff2 v1.8.1 with changed signature.

@Legoktm How shall we handle the PHP update of wikidiff2 now that buster is frozen, maybe upload 1.8.1 to experimental and I'll rebuild this in component/php72 fo deployment to production?

Mar 22 2019, 9:22 AM · Patch-For-Review, WMDE-QWERTY-Season-Sprint-2019-03-20, WMDE-QWERTY-Sprint-2019-03-06, WMDE-QWERTY-Sprint-2019-01-23, WMDE-QWERTY-Sprint-2019-01-10, WMDE-QWERTY-Sprint-2018-08-29, wikidiff2, MediaWiki-History-and-Diffs, TCB-Team
MoritzMuehlenhoff added a comment to T218815: Access to yarn.wikimedia.org for julia.glen.

I confirm that she has an NDA in place, so I've added uid=julianglen to the nda group in LDAP.

Mar 22 2019, 8:29 AM · Analytics-Kanban, Analytics-Cluster, Analytics
MoritzMuehlenhoff added a comment to T213493: Install PHP7 on scandium.

So first of all, why do wtp servers have php installed even? They should not, and they don't.

Mar 22 2019, 7:43 AM · Patch-For-Review, Operations, Parsoid-PHP

Mar 21 2019

MoritzMuehlenhoff added a comment to T218875: ms-be1043 - /dev/sdk disappeared.

There's already https://phabricator.wikimedia.org/T218544

Mar 21 2019, 12:53 PM · media-storage, Operations
MoritzMuehlenhoff added a comment to T215415: mw2206.codfw.wmnet memory issues .

It could be simply a broken CPU? If we have such the CPU type in a decom host, we could loot it from there.

Mar 21 2019, 12:05 PM · User-jijiki, serviceops, ops-codfw, Operations

Mar 18 2019

MoritzMuehlenhoff added a comment to T218448: Volunteer NDA for Alex Monk.

Adding SRE-Access-Requests tag as well because I'm not 100% certain if the NDA needed is just L2 or if the Cobblestone process is required.

Mar 18 2019, 3:21 PM · Operations, WMF-NDA-Requests

Mar 15 2019

MoritzMuehlenhoff updated the task description for T212772: Track remaining trusty servers in production.
Mar 15 2019, 3:24 PM · cloud-services-team (Kanban), Patch-For-Review, Operations
MoritzMuehlenhoff closed T212219: wmf-auto-restart fails on certain legacy services as Resolved.

Fix has been deployed,

Mar 15 2019, 10:07 AM · Patch-For-Review, Operations

Mar 14 2019

MoritzMuehlenhoff added a comment to T215775: Check home leftovers of ISI researchers.

@MoritzMuehlenhoff do I recall correctly that you were working with 1+ of the researchers to figure out how to keep their data?

Mar 14 2019, 10:02 AM · Research, Analytics
MoritzMuehlenhoff added a comment to T213089: Upgrade memcached for Debian Stretch/Buster.

Leaving here also a reference of https://github.com/memcached/memcached/issues/359:

Regression in systemd-based sandboxing in 1.5.6

Mar 14 2019, 9:41 AM · User-jijiki, serviceops, Performance-Team (Radar), Operations, User-Elukey

Mar 13 2019

MoritzMuehlenhoff added a comment to T212219: wmf-auto-restart fails on certain legacy services.

Ah, I completely forgot to merge it, it's https://gerrit.wikimedia.org/r/480520, will do that later on

Mar 13 2019, 3:02 PM · Patch-For-Review, Operations
MoritzMuehlenhoff claimed T216712: Switch PHP 7.2 packages to an internal component.
Mar 13 2019, 11:32 AM · Patch-For-Review, cloud-services-team (Kanban), Toolforge, Operations
MoritzMuehlenhoff added a subtask for T216712: Switch PHP 7.2 packages to an internal component: T218193: Switch dumps to component/php7.2.
Mar 13 2019, 10:24 AM · Patch-For-Review, cloud-services-team (Kanban), Toolforge, Operations
MoritzMuehlenhoff added a parent task for T218193: Switch dumps to component/php7.2: T216712: Switch PHP 7.2 packages to an internal component.
Mar 13 2019, 10:24 AM · Dumps-Generation, Operations
MoritzMuehlenhoff created T218193: Switch dumps to component/php7.2.
Mar 13 2019, 10:23 AM · Dumps-Generation, Operations
MoritzMuehlenhoff added a comment to T216712: Switch PHP 7.2 packages to an internal component.

Phabricator uses thirdparty/php72 I think.

Mar 13 2019, 9:32 AM · Patch-For-Review, cloud-services-team (Kanban), Toolforge, Operations
MoritzMuehlenhoff added a comment to T216712: Switch PHP 7.2 packages to an internal component.

Actually, I had only been looking at servers with php7.2-fpm installed, the deployment, maintenance and snapshot hosts will also need to be converted to the component.

Mar 13 2019, 9:31 AM · Patch-For-Review, cloud-services-team (Kanban), Toolforge, Operations
MoritzMuehlenhoff added a comment to T216712: Switch PHP 7.2 packages to an internal component.

Production has been switched to the new component, all working fine. The new approach is also fairly straightforward, the upgrade from 7.2.15 to 7.2.16 was a straightforward import/build of the new php7.2 source package with all extensions continuing to work fine.

Mar 13 2019, 9:06 AM · Patch-For-Review, cloud-services-team (Kanban), Toolforge, Operations

Mar 12 2019

MoritzMuehlenhoff added a comment to T217412: Enable encryption and authentication for TLS-based Hadoop services.

use a self signed CA, and generate one certificate for each hostname via cergen, and deploy them via puppet. This is more cumbersome maintenance-wise but it would completely separates concerns from puppet.

Mar 12 2019, 3:06 PM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
MoritzMuehlenhoff updated subscribers of T130593: investigate slapd memory leak.

@GTirloni upgraded OpenLDAP on serpens to 2.4.47, but that doesn't change the memory leak.

Mar 12 2019, 8:31 AM · LDAP, cloud-services-team (Kanban), Operations, Cloud-VPS

Mar 11 2019

MoritzMuehlenhoff added a comment to T212774: Upgrade jenkins-debian-glue to v0.20.0.
  • the source package is a Debian native package, seems to qualify for the main component.
Mar 11 2019, 10:52 AM · Release-Engineering-Team (Kanban), Patch-For-Review, Operations, Packaging, Continuous-Integration-Infrastructure
MoritzMuehlenhoff added a comment to T213077: Migrate Kartotherian/Tilerator to Node 10.

@MSantos : The service::node class which is used by Kartotherian/Tilerator recently gained an option $use_nodejs10, you can extend the kartotherian/tilerator Puppet classes with a new parameter for node10 which then gets passed down to the service::node class and then enable it in deployment-prep. See modules/aqs/manifests/init.pp which already does that.

Mar 11 2019, 10:23 AM · Maps (Kartotherian), Epic, Reading-Infrastructure-Team-Backlog
MoritzMuehlenhoff closed T216493: Proton fails with Chromium 72.0.3626.96 as Resolved.

The Chromium update has been rolled out, closing the task. I've also notified https://github.com/GoogleChrome/puppeteer/issues/4040 that this seems caused by a Chromium regression.

Mar 11 2019, 10:13 AM · Reading-Infrastructure-Team-Backlog (Kanban), Core Platform Team Backlog (Watching / External), Services (watching), Proton, Operations