MoritzMuehlenhoff (Moritz Mühlenhoff)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Apr 1 2015, 4:33 PM (181 w, 5 d)
Availability
Available
LDAP User
Moritz Mühlenhoff
MediaWiki User
MMuhlenhoff (WMF) [ Global Accounts ]

Recent Activity

Today

MoritzMuehlenhoff added a project to T205364: helium (bacula) - Device not healthy -SMART-: ops-eqiad.
Tue, Sep 25, 7:11 AM · ops-eqiad, Operations

Yesterday

MoritzMuehlenhoff created T205287: Degraded RAID on rdb1004.
Mon, Sep 24, 2:08 PM · Operations
MoritzMuehlenhoff added a comment to T201343: rack/setup/install mwmaint1002.eqiad.wmnet.

Disk space on the root partition of mwmaint1002 is depleted, which results in failing puppet runs

Mon, Sep 24, 12:17 PM · Patch-For-Review, ops-eqiad, Operations
MoritzMuehlenhoff added a comment to T203625: mwdebug1001 and mwdebug1002 are reliably the last two hosts to finish scap-cdb-rebuild .

Compared to the rest, mwdebug* are VMs, how large is the difference to the other servers you were seeing?

Mon, Sep 24, 11:16 AM · Release-Engineering-Team, Scap, Operations
MoritzMuehlenhoff triaged T204993: Update certspotter as Normal priority.
Mon, Sep 24, 10:10 AM · Traffic, Operations
MoritzMuehlenhoff updated subscribers of T204993: Update certspotter.

Adding the Debian maintainer :-) This seems fixed in 0.9-1 so updating stretch-backports to 0.9 could fix this.

Mon, Sep 24, 10:10 AM · Traffic, Operations
MoritzMuehlenhoff triaged T203520: decommission thulium.frack.eqiad.wmnet as Normal priority.
Mon, Sep 24, 10:08 AM · ops-eqiad, Operations
MoritzMuehlenhoff triaged T205240: MCE errors on mw2181 / temperature warnings as Normal priority.
Mon, Sep 24, 10:07 AM · Operations, ops-codfw
MoritzMuehlenhoff assigned T204567: ms-be2030 spontaneous reboot to Papaul.
Mon, Sep 24, 10:07 AM · ops-codfw, Operations
MoritzMuehlenhoff added a comment to T204450: Why doesn't profile::mediawiki::nutcracker create /var/run/nutcracker/ ?.

The .sock file is created via systemd-tmpfiles, which is only read during boot, the socket will be created with the next restart

Mon, Sep 24, 10:05 AM · Puppet, Operations
MoritzMuehlenhoff assigned T204479: Heating alerts on kafka1014 to elukey.
Mon, Sep 24, 10:04 AM · Operations, ops-eqiad
MoritzMuehlenhoff closed T204491: Heating alerts / memory errors on mw1254 as Resolved.

This error hasn't resurfaced, I'm closing the task.

Mon, Sep 24, 10:02 AM · Operations, ops-eqiad
MoritzMuehlenhoff added a comment to T203254: labstore1004 and labstore1005 high load issues following upgrades.

Nice, if these are confirmed working, we should import my nfs-utils backport to apt.wikimedia.org. Should these go to a separate component (something like component/nfs13, which is then added to selective NFS servers) or be added in general? Apart from labstore100[4/5], we also have labstore100[67] and dumpsdata, would these get updated as well or rather not? If yes, we can also simply import the packages to apt.wikimedia.org/main.

Mon, Sep 24, 7:17 AM · Patch-For-Review, cloud-services-team (Kanban)
MoritzMuehlenhoff created T205240: MCE errors on mw2181 / temperature warnings.
Mon, Sep 24, 6:41 AM · Operations, ops-codfw

Fri, Sep 21

MoritzMuehlenhoff renamed T199029: 1.31.0 tarball is missing .htaccess files (CVE-2018-13258) from 1.31.0 tarball is missing .htaccess files to 1.31.0 tarball is missing .htaccess files (CVE-2018-13258).
Fri, Sep 21, 7:26 AM · MW-1.27-release-notes, MW-1.32-release-notes (WMF-deploy-2018-09-25 (1.32.0-wmf.23)), MW-1.31-release-notes, MW-1.30-release-notes, MW-1.29-release-notes, MW-1.31-release, Security
MoritzMuehlenhoff renamed T187638: When a log event is (partially) hidden Special:Redirect/logid can link to the incorrect log and reveal hidden information (CVE-2018-0504) from When a log event is (partially) hidden Special:Redirect/logid can link to the incorrect log and reveal hidden information to When a log event is (partially) hidden Special:Redirect/logid can link to the incorrect log and reveal hidden information (CVE-2018-0504).
Fri, Sep 21, 7:25 AM · Patch-For-Review, Vuln-Infoleak, MediaWiki-Revision-deletion, MediaWiki-Special-pages, Security
MoritzMuehlenhoff renamed T194605: BotPassword can bypass CentralAuth's account lock (CVE-2018-0505) from BotPassword can bypass CentralAuth's account lock to BotPassword can bypass CentralAuth's account lock (CVE-2018-0505).
Fri, Sep 21, 7:24 AM · MW-1.32-release-notes (WMF-deploy-2018-09-25 (1.32.0-wmf.23)), MW-1.31-release-notes, MW-1.27-release-notes, MW-1.30-release-notes, MW-1.29-release-notes, MediaWiki-Authentication-and-authorization, MediaWiki-extensions-CentralAuth, Security
MoritzMuehlenhoff renamed T169545: $wgRateLimits (rate limit / ping limiter) entry for 'user' overrides that for 'newbie' (CVE-2018-0503) from $wgRateLimits (rate limit / ping limiter) entry for 'user' overrides that for 'newbie' to $wgRateLimits (rate limit / ping limiter) entry for 'user' overrides that for 'newbie' (CVE-2018-0503).
Fri, Sep 21, 7:24 AM · MW-1.27-release-notes, MW-1.32-release-notes (WMF-deploy-2018-09-25 (1.32.0-wmf.23)), MW-1.31-release-notes, MW-1.30-release-notes, MW-1.29-release-notes, Security, MediaWiki-General-or-Unknown

Thu, Sep 20

MoritzMuehlenhoff closed T204667: Ferm leftovers on labtestnet2003 as Resolved.

@MoritzMuehlenhoff anything else that needs to be done for this task?

Thu, Sep 20, 3:34 PM · cloud-services-team, Operations
MoritzMuehlenhoff added a comment to T204491: Heating alerts / memory errors on mw1254.

Ok, I've repooled the server for now.

Thu, Sep 20, 1:20 PM · Operations, ops-eqiad
MoritzMuehlenhoff closed T204812: mc1021 boot failure as Resolved.

Thanks, I finished up the reimage via install_console and re-added it to Icinga, looks all fine now.

Thu, Sep 20, 8:36 AM · ops-eqiad, Operations

Wed, Sep 19

MoritzMuehlenhoff created T204812: mc1021 boot failure.
Wed, Sep 19, 10:07 AM · ops-eqiad, Operations
MoritzMuehlenhoff added a comment to T201470: Add contint-roots to releases{1,2}001.

For Jenkins, Release-Engineering-Team and Moritz receive the security email notification. Release engineering manually fill a task upon reception and we already synchronize with Moritz for the upload to apt.wikimedia.org. Seems that is working smoothly.

Wed, Sep 19, 7:23 AM · Patch-For-Review, Release-Engineering-Team (Watching / External), SRE-Access-Requests, Operations

Tue, Sep 18

MoritzMuehlenhoff created T204730: Enable cumin2001 in router ACLs.
Tue, Sep 18, 3:31 PM · Operations, netops
MoritzMuehlenhoff added a comment to T204567: ms-be2030 spontaneous reboot.

Server went down again at 10:45 UTC.

Tue, Sep 18, 10:54 AM · ops-codfw, Operations
MoritzMuehlenhoff created T204667: Ferm leftovers on labtestnet2003.
Tue, Sep 18, 9:28 AM · cloud-services-team, Operations

Mon, Sep 17

MoritzMuehlenhoff added a comment to T204604: Add "do not use this server" login message to non active mwmaint* server.

I've added the "Datacenter-Switchover-2018" project as this was filed as a response to a question in the staff channel (where the active maintenance server wasn't obvious). Not sure if that's over-stretching the use case of that project, if so, please remove.

Mon, Sep 17, 8:53 PM · Datacenter-Switchover-2018, Operations
MoritzMuehlenhoff added a project to T204604: Add "do not use this server" login message to non active mwmaint* server: Datacenter-Switchover-2018.
Mon, Sep 17, 8:51 PM · Datacenter-Switchover-2018, Operations
MoritzMuehlenhoff created T204491: Heating alerts / memory errors on mw1254.
Mon, Sep 17, 11:06 AM · Operations, ops-eqiad
MoritzMuehlenhoff added a comment to T203239: Create Debian packages for Node.js 10 upgrade.

nodejs 10 packages for stretch-wikimedia are now available in the repository component "component/node10" for testing. I'm keeping this bug open to track possible further additions (addons etc.)

Mon, Sep 17, 9:53 AM · Patch-For-Review, Readers-Web-Backlog (Tracking), Services (watching), Operations
MoritzMuehlenhoff created T204479: Heating alerts on kafka1014.
Mon, Sep 17, 8:59 AM · Operations, ops-eqiad

Fri, Sep 14

MoritzMuehlenhoff added a comment to T199125: rack/setup/install cloudvirt102[34].

As I had made a backport of the megaraid_sas driver for Perc 740/840 to the 4.9 stretch kernel anyway, I ran some tests on backup2001 (which has the new controller) and acamar (which has an older Perc controller running the megaraid_sas driver), which were successful. Submitted to the Debian kernel team in https://salsa.debian.org/kernel-team/linux/merge_requests/61

Fri, Sep 14, 10:49 AM · Patch-For-Review, cloud-services-team (Kanban), ops-eqiad, Cloud-VPS, Operations
MoritzMuehlenhoff added a comment to T203254: labstore1004 and labstore1005 high load issues following upgrades.

I've created a backport of the nfs-utils package from stretch for jessie, it's not yet uploaded to apt.wikimedia.org, but available at https://people.wikimedia.org/~jmm/nfs/

Fri, Sep 14, 10:25 AM · Patch-For-Review, cloud-services-team (Kanban)
MoritzMuehlenhoff closed T202363: Requesting access to restricted production access and analytics-privatedata-users for Ty Hargrove as Resolved.

Closing the task, please reopen if it doesn't work for you.

Fri, Sep 14, 7:13 AM · Patch-For-Review, Operations, SRE-Access-Requests
MoritzMuehlenhoff reassigned T202486: Requesting access to restricted production access and analytics-privatedata-users for Kalliope Tsouroupidou from Kalliope to ayounsi.
Fri, Sep 14, 7:12 AM · Patch-For-Review, Operations, SRE-Access-Requests

Thu, Sep 13

MoritzMuehlenhoff closed T202563: Access to restbase servers (including sudo) for Imarlier as Resolved.

I've added Ian to restbase-roots.

Thu, Sep 13, 5:02 PM · Patch-For-Review, Operations, SRE-Access-Requests
MoritzMuehlenhoff added a comment to T202563: Access to restbase servers (including sudo) for Imarlier.

This was approved in the SRE meeing on Monday.

Thu, Sep 13, 4:55 PM · Patch-For-Review, Operations, SRE-Access-Requests
MoritzMuehlenhoff closed T204156: setup/install cumin2001.eqiad.wmnet as Resolved.

Closing this task, actual implementation will happen via T177385

Thu, Sep 13, 12:58 PM · Operations
MoritzMuehlenhoff added a comment to T204156: setup/install cumin2001.eqiad.wmnet.

@RobH Ack, I'll take care of that.

Thu, Sep 13, 6:51 AM · Operations

Mon, Sep 10

MoritzMuehlenhoff added a comment to T203840: Add which ldap groups can login on netbox login form.

That would require code changes in Netbox and doesn't seem to warrant the overhead. Alex documented the access https://wikitech.wikimedia.org/wiki/LDAP/Groups#Specific_groups and I I think that's good enough.

Mon, Sep 10, 4:02 PM · Operations
MoritzMuehlenhoff added a comment to T202476: Give thiemowmde permission to upload wikidiff2 releases (releasers-wikidiff2).

@thiemowmde Does the access work for you?

Mon, Sep 10, 2:50 PM · Patch-For-Review, SRE-Access-Requests, Operations, User-Addshore, wikidiff2
MoritzMuehlenhoff added a comment to T202658: request to add phedenskog to perf-roots.

@Peter Does the access work for you?

Mon, Sep 10, 2:50 PM · Patch-For-Review, SRE-Access-Requests, Operations
MoritzMuehlenhoff closed T194835: mw2182 crash as Resolved.

Server is running fine since a while, closing the task

Mon, Sep 10, 1:22 PM · ops-codfw, Operations
MoritzMuehlenhoff closed T169290: New anti-stackclash (4.9.25-1~bpo8+3 ) kernel super bad for NFS as Resolved.

This is resolved, the jessie-based labstore servers are running 4.9 since a few weeks.

Mon, Sep 10, 1:20 PM · Upstream, cloud-services-team, Operations
MoritzMuehlenhoff closed T169290: New anti-stackclash (4.9.25-1~bpo8+3 ) kernel super bad for NFS, a subtask of T169289: Tool Labs 2017-06-29 Labstore100[45] kernel upgrade issues, as Resolved.
Mon, Sep 10, 1:20 PM · Toolforge, Cloud-Services
MoritzMuehlenhoff closed T203851: Trying to install updated versions of "linux-meta linux-meta-4.9" fails as Resolved.

This is fixed in 1.20+deb9u2 which only builds the "linux-meta-4.14" package for stretch, "linux-meta-4.9" isn't relevant/needed for stretch.

Mon, Sep 10, 10:28 AM · Operations
MoritzMuehlenhoff closed T201196: analytics-privatedata-users access for Dario Rossi (username drossi) as Resolved.
Mon, Sep 10, 9:07 AM · Patch-For-Review, SRE-Access-Requests, Operations
MoritzMuehlenhoff closed T201196: analytics-privatedata-users access for Dario Rossi (username drossi), a subtask of T200800: NDA access for Telecom Paristech Research Team, as Resolved.
Mon, Sep 10, 9:06 AM · Operations, SRE-Access-Requests
MoritzMuehlenhoff closed T201199: analytics-privatedata-users access for Flavia Salutari as Resolved.
Mon, Sep 10, 9:06 AM · Patch-For-Review, SRE-Access-Requests, Operations
MoritzMuehlenhoff closed T201199: analytics-privatedata-users access for Flavia Salutari, a subtask of T200800: NDA access for Telecom Paristech Research Team, as Resolved.
Mon, Sep 10, 9:06 AM · Operations, SRE-Access-Requests
MoritzMuehlenhoff assigned T194176: wtp2020 correctable memory errors to Papaul.
Mon, Sep 10, 7:26 AM · Parsing-Team, Operations, ops-codfw
MoritzMuehlenhoff merged task T203265: wtp2020 - Memory correctable errors -EDAC- into T194176: wtp2020 correctable memory errors.
Mon, Sep 10, 7:25 AM · Operations
MoritzMuehlenhoff merged T203265: wtp2020 - Memory correctable errors -EDAC- into T194176: wtp2020 correctable memory errors.
Mon, Sep 10, 7:25 AM · Parsing-Team, Operations, ops-codfw

Sun, Sep 9

Krinkle awarded T203239: Create Debian packages for Node.js 10 upgrade a Orange Medal token.
Sun, Sep 9, 10:55 PM · Patch-For-Review, Readers-Web-Backlog (Tracking), Services (watching), Operations

Fri, Sep 7

MoritzMuehlenhoff added a comment to T202255: Support for QLogic FastLinQ 41112 Dual Port 10Gb SFP+ Adapter.

I now have a stretch netboot image with a 4.14 kernel which PXE boots via the QLogic 41xx adapter. In d-i I'm getting a strange error message which tells me that no modules could be found (although lsmod shows plenty of kernel modules loaded, need to dig a bit further in d-i what that could be caused by.

Fri, Sep 7, 11:32 AM · Operations
MoritzMuehlenhoff added a comment to T203239: Create Debian packages for Node.js 10 upgrade.

nodejs 10 packages will be in a separate repository component, allowing applications to gradually move over. We'll continue to support nodejs 6 with security updates until all applications are migrated to 10.

Fri, Sep 7, 8:43 AM · Patch-For-Review, Readers-Web-Backlog (Tracking), Services (watching), Operations

Thu, Sep 6

MoritzMuehlenhoff added a comment to T199125: rack/setup/install cloudvirt102[34].

See https://phabricator.wikimedia.org/T202255#4563157 for the 4.14 kernel.

Thu, Sep 6, 3:19 PM · Patch-For-Review, cloud-services-team (Kanban), ops-eqiad, Cloud-VPS, Operations
MoritzMuehlenhoff added a comment to T202255: Support for QLogic FastLinQ 41112 Dual Port 10Gb SFP+ Adapter.

Status update: I've created a stretch backport of a 4.14 kernel which should support both QLogic 41xx and the new HP Perc megaraid controller properly. To allow to use this kernel to be used in the PXE boot I've been working on an updated stretch netboot image with the 4.14 kernel integrated. This has been quite cumbersome, I've fixed up a bunch of issues so far, but the netboot image still fails to load the initrd. I'm seeing the error message

Thu, Sep 6, 3:18 PM · Operations
MoritzMuehlenhoff added a comment to T203674: Debian package or files managed my puppet for pt-kill-wmf.

+1 for creating a deb. I can give you an introduction on how to do that if you want.

Thu, Sep 6, 2:37 PM · User-Banyek, Puppet, Operations
MoritzMuehlenhoff added a comment to T199125: rack/setup/install cloudvirt102[34].

I also tried to disable an HTTP-based PXE boot via https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/458463/, but that didn't work either, same symptoms as above.

Thu, Sep 6, 9:41 AM · Patch-For-Review, cloud-services-team (Kanban), ops-eqiad, Cloud-VPS, Operations
MoritzMuehlenhoff updated the task description for T203489: Onboard gtirloni to WMF.
Thu, Sep 6, 7:45 AM · Patch-For-Review, Operations, cloud-services-team
MoritzMuehlenhoff added a comment to T203489: Onboard gtirloni to WMF.

Added to pwstore.

Thu, Sep 6, 7:45 AM · Patch-For-Review, Operations, cloud-services-team
MoritzMuehlenhoff updated the task description for T202136: Onboarding Cole White.
Thu, Sep 6, 7:44 AM · Patch-For-Review, Operations
MoritzMuehlenhoff added a comment to T202136: Onboarding Cole White.

Added to pwstore.

Thu, Sep 6, 7:44 AM · Patch-For-Review, Operations
MoritzMuehlenhoff added a comment to T201816: Onboarding Effie Mouzeli.

Added to pwstore.

Thu, Sep 6, 7:44 AM · SRE-Access-Requests, Patch-For-Review, Operations
MoritzMuehlenhoff updated the task description for T201816: Onboarding Effie Mouzeli.
Thu, Sep 6, 7:44 AM · SRE-Access-Requests, Patch-For-Review, Operations
MoritzMuehlenhoff placed T203434: Decom mw2213 up for grabs.
Thu, Sep 6, 7:19 AM · Patch-For-Review, decommission, ops-codfw, Operations
MoritzMuehlenhoff updated the task description for T203434: Decom mw2213.
Thu, Sep 6, 7:18 AM · Patch-For-Review, decommission, ops-codfw, Operations

Wed, Sep 5

MoritzMuehlenhoff added a comment to T190424: modify labs-hosts1-vlans for http load of installer kernel.

Arzhel and I had a look and this doesn't seem to be ACL-related, the tftp packets are flowing in both directions. This is possibly a bug in the firmware, see https://phabricator.wikimedia.org/T199125#4560182

Wed, Sep 5, 5:03 PM · Patch-For-Review, cloud-services-team, Operations, netops
MoritzMuehlenhoff added a comment to T199125: rack/setup/install cloudvirt102[34].

I tried an installation from cloudvirt1023, but the PXELINUX version on the NIC is affected by a bug in syslinux 6.0.3 as used on the Broadcom NIC and fails to fetch the install image:

Wed, Sep 5, 5:02 PM · Patch-For-Review, cloud-services-team (Kanban), ops-eqiad, Cloud-VPS, Operations
MoritzMuehlenhoff updated the task description for T203434: Decom mw2213.
Wed, Sep 5, 10:06 AM · Patch-For-Review, decommission, ops-codfw, Operations

Tue, Sep 4

MoritzMuehlenhoff added a comment to T196477: rack/setup/install backup2001.

@Papaul: That's expected, this also need a change to the DHCP config to use the netboot image based on 4.14, e.g. by using the patch at https://gerrit.wikimedia.org/r/457930 or setting this manually on install2002. I'll test this tomorrow (or feel free to go ahead!), the installation still won't be 100% complete as the 4.14 kernel it not yet uploaded to apt.wikimedia.org and we need another patch to install it in late-setup. With the current image it uses 4.14 in the installer, but then install the 4.9 kernel in the end which lacks the updated driver.

Tue, Sep 4, 3:53 PM · Patch-For-Review, ops-codfw, Operations
MoritzMuehlenhoff closed T194172: mw2213 correctable memory errors as Resolved.

Closing this task, opened T203434 for decom.

Tue, Sep 4, 2:53 PM · Operations, ops-codfw
MoritzMuehlenhoff added a comment to T196477: rack/setup/install backup2001.

I've created a custom Linux 4.14 kernel which worked fine in my tests with an updated firmware-qlogic. I've also created a netboot image based on Linux 4.14.
It's based on the last version which was in unstable for 4.14.x (4.14.17), but that's good enough for initial tests. If it's working fine and we decide to keep using it, I'll update the packages to the latest 4.14.x kernel.

Tue, Sep 4, 1:18 PM · Patch-For-Review, ops-codfw, Operations
MoritzMuehlenhoff added projects to T203434: Decom mw2213: Operations, ops-codfw, decommission.
Tue, Sep 4, 8:20 AM · Patch-For-Review, decommission, ops-codfw, Operations
MoritzMuehlenhoff created T203434: Decom mw2213.
Tue, Sep 4, 8:20 AM · Patch-For-Review, decommission, ops-codfw, Operations
MoritzMuehlenhoff closed T202301: Release and deploy wikidiff2 v1.7.3 as Resolved.

1.7.3 has been rolled out to the app servers (some in codfw still need the update, this will be piggybacked on other maintenance later the week)

Tue, Sep 4, 8:06 AM · WMDE-QWERTY-Sprint-2018-08-29, WMDE-QWERTY-Team, Operations, wikidiff2, TCB-Team

Mon, Sep 3

MoritzMuehlenhoff updated subscribers of T194172: mw2213 correctable memory errors.

@Joe , @elukey : Any objections? Otherwise I'll turn this into a decom ticket.

Mon, Sep 3, 3:55 PM · Operations, ops-codfw
MoritzMuehlenhoff closed T127825: Re-add intel-microcode as Resolved.

Microcode is now enabled on all baremetal servers with an Intel CPU and we haven't seen any issues so far. Closing the task.

Mon, Sep 3, 11:48 AM · Patch-For-Review, Operations

Fri, Aug 31

MoritzMuehlenhoff updated the task description for T202521: Onboarding Balazs Pocze.
Fri, Aug 31, 3:28 PM · SRE-Access-Requests, Patch-For-Review, Operations
MoritzMuehlenhoff added a comment to T202521: Onboarding Balazs Pocze.

Balazs has been added to pwstore.

Fri, Aug 31, 3:28 PM · SRE-Access-Requests, Patch-For-Review, Operations
MoritzMuehlenhoff added a comment to T203239: Create Debian packages for Node.js 10 upgrade.

One notable change which is to be expected from moving to 10:
Some node modules ship binary blobs in their modules and the official node packages are build against OpenSSL 1.0.2. nodejs 10 only supports OpenSSL 1.1 (which has a different ABI/API) and then those modules fail to load or throw runtime errors
Upstream discussion is at https://github.com/nodejs/node/issues/21897, but there's no real solution

Fri, Aug 31, 2:58 PM · Patch-For-Review, Readers-Web-Backlog (Tracking), Services (watching), Operations
MoritzMuehlenhoff claimed T203239: Create Debian packages for Node.js 10 upgrade.
Fri, Aug 31, 11:25 AM · Patch-For-Review, Readers-Web-Backlog (Tracking), Services (watching), Operations
MoritzMuehlenhoff created T203239: Create Debian packages for Node.js 10 upgrade.
Fri, Aug 31, 11:05 AM · Patch-For-Review, Readers-Web-Backlog (Tracking), Services (watching), Operations
MoritzMuehlenhoff added a comment to T190424: modify labs-hosts1-vlans for http load of installer kernel.

@ayounsi : I can still reproduce this with an installation of cloudvirt1023, I can see in syslog that atftpd is serving lpxelinux.0 to 10.64.20.42 and I can see on the serial console that the PXE boot firmware doesn't get a reply. Ping me when you have some to debug this?

Fri, Aug 31, 10:33 AM · Patch-For-Review, cloud-services-team, netops, Operations
MoritzMuehlenhoff added a comment to T202255: Support for QLogic FastLinQ 41112 Dual Port 10Gb SFP+ Adapter.

I worked on a backport of the driver 4.9 and I got to the point where the driver loaded along with the firmware, but there were runtime issues which caused connection failures. The errors were related to statistics gathering in the driver (a change I had to backport and which seems to need additional changes). I tried to keep my backport minimal to the qede driver, but all the Qlogic drivers share some common base (e.g. qede also required the qed kernel module) and to fully correct this I'd probably need to cherrypick additional upstream changes for qed. Ideally there would be some officially blessed upstream backport for the 4.9 LTS kernel series, I've contacted upstream whether they have something like this.

Fri, Aug 31, 8:36 AM · Operations
MoritzMuehlenhoff added a comment to T202910: add performance team members to webserver_misc_static servers to maintain sitemaps.

@aaron: To clarify/confirm: You don't need cluster-wide root access anymore? The only reason we have this discussion because we were (accidentally) added to the new groups created for the performance team. But none of those actually add anything to your privileges as you already have global root. So please either confirm that

  1. you want to keep your existing global root access (which you might be using for debugging during outages) so that we strip the superfluous performance team groups
  2. you don't actually need global root anymore; then we'd keep you in the performance team groups and remove the global root access.
Fri, Aug 31, 8:27 AM · Patch-For-Review, Performance-Team (Radar), SRE-Access-Requests, Operations

Thu, Aug 30

MoritzMuehlenhoff added a comment to T196477: rack/setup/install backup2001.

The hardware side is fixed, but I'm seeing a kernel error, looking into it.

Thu, Aug 30, 2:44 PM · Patch-For-Review, ops-codfw, Operations
MoritzMuehlenhoff added a comment to T202521: Onboarding Balazs Pocze.

@jcrespo Can you please push the signed key to the keyservers?

Thu, Aug 30, 2:39 PM · SRE-Access-Requests, Patch-For-Review, Operations
MoritzMuehlenhoff added a comment to T202335: Have a more active developer take over as release manager for wikidiff2.

The 1.7.3 is now signed by a freshly generated key by Christoph without any signatures (yet), while all previous releases were signed by Kunal. @Legoktm it would be good if you could sign keys with everyone now involved in wikidiff releases.

Thu, Aug 30, 9:17 AM · User-Addshore, wikidiff2
MoritzMuehlenhoff added a comment to T196477: rack/setup/install backup2001.

Thanks! I've installed my backported test kernel and figured out why additional firmware we need, it looks promising, the driver gets loaded along with the firmware:

Thu, Aug 30, 8:59 AM · Patch-For-Review, ops-codfw, Operations
MoritzMuehlenhoff added a comment to T202910: add performance team members to webserver_misc_static servers to maintain sitemaps.

Same issue as https://phabricator.wikimedia.org/T202650#4541158; Aaron has global root and can access these hosts without sitemaps-admins.

Thu, Aug 30, 6:56 AM · Patch-For-Review, Performance-Team (Radar), SRE-Access-Requests, Operations

Wed, Aug 29

MoritzMuehlenhoff added a comment to T196477: rack/setup/install backup2001.

@Papaul : Does this maybe need some additional changein the BIOS to make the server PXE-boot from the internal NIC?

Wed, Aug 29, 11:35 AM · Patch-For-Review, ops-codfw, Operations
MoritzMuehlenhoff closed T199027: Obtain CVEs for 1.27.5/1.29.3/1.30.1/1.31.1 security releases as Resolved.

I've added the CVE IDs to the task description.

Wed, Aug 29, 10:59 AM · MediaWiki-Releasing, Security
MoritzMuehlenhoff closed T199027: Obtain CVEs for 1.27.5/1.29.3/1.30.1/1.31.1 security releases, a subtask of T199021: Release MediaWiki 1.27.5/1.29.3/1.30.1/1.31.1, as Resolved.
Wed, Aug 29, 10:58 AM · MediaWiki-Releasing, Security
MoritzMuehlenhoff updated the task description for T199027: Obtain CVEs for 1.27.5/1.29.3/1.30.1/1.31.1 security releases.
Wed, Aug 29, 10:58 AM · MediaWiki-Releasing, Security
MoritzMuehlenhoff reopened T202650: Please add aaron to perf-team as "Open".

I don't understand this task. Aaron already had global root already, why is that needed at all?

Wed, Aug 29, 6:59 AM · Patch-For-Review, Operations, SRE-Access-Requests
MoritzMuehlenhoff reopened T202650: Please add aaron to perf-team, a subtask of T202648: Please add everyone on the performance team to perf-roots, as Open.
Wed, Aug 29, 6:59 AM · Operations, SRE-Access-Requests

Tue, Aug 28

MoritzMuehlenhoff closed T196901: Replace memory bank on scb1002 as Resolved.

Thanks, I've repooled the server. Closing the task, will reopen in case there are still issues.

Tue, Aug 28, 3:29 PM · Operations, ops-eqiad, DC-Ops
MoritzMuehlenhoff added a comment to T196701: rack/setup/install torrelay1001.wikimedia.org.

migration plan:

goal: keep the same fingerprints

  • stop tor service on radium
  • rsync datadir contents (/var/lib/tor/ from radium to torrelay1001
  • delete datadir and config on radium or otherwise ensure it cant come back with the same fingerprints
  • start service on torrelay1001
Tue, Aug 28, 8:49 AM · Patch-For-Review, Tor, Operations