MoritzMuehlenhoff (Moritz Mühlenhoff)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Apr 1 2015, 4:33 PM (151 w, 1 d)
Availability
Available
LDAP User
Moritz Mühlenhoff
MediaWiki User
MMuhlenhoff (WMF)

Recent Activity

Wed, Feb 21

MoritzMuehlenhoff created P6727 (An Untitled Masterwork).
Wed, Feb 21, 6:20 PM
MoritzMuehlenhoff closed T182656: Integrate jessie 8.10 point release as Resolved.

This is fully rolled out.

Wed, Feb 21, 6:04 PM · Operations
MoritzMuehlenhoff added a comment to T181121: Kernels errors on ganeti1005- ganeti1008 under high I/O.

Happened again on ganeti1007, again with page allocation errors.

Wed, Feb 21, 1:32 PM · ops-eqiad, Operations
MoritzMuehlenhoff closed T182655: Integrate stretch 9.3 point update as Resolved.

This is completely rolled out.

Wed, Feb 21, 1:14 PM · Operations

Tue, Feb 20

MoritzMuehlenhoff closed T164703: Integrate jessie 8.8 point release as Resolved.

This is complete

Tue, Feb 20, 5:37 PM · Operations

Mon, Feb 19

MoritzMuehlenhoff added a comment to T187442: Request to be added to the ldap/wmde group.

Rachel wanted to doublecheck within the WMF Legal department, but that will take until tomorrow at least due to the WMF holiday for US staff.

Mon, Feb 19, 6:30 PM · LDAP-Access-Requests, Operations, WMF-NDA-Requests
MoritzMuehlenhoff added a comment to T170567: Support TLSv1.3.

Yeah, that's the most plausible option (and we're already using custom OpenSSL 1.1 packages on Debian jessie to support e.g. chacha), but 1.1.1 has only just seen it's first alpha release (and they won't release it in a final version until TLS 1.3 is final)

Mon, Feb 19, 10:57 AM · Patch-For-Review, Operations, Traffic
MoritzMuehlenhoff placed T187466: Decommission mw1259-mw1260 up for grabs.

The two hosts have been switched to role::spare, dropped from conftool and marked as downtime until the end of the year. Unclaiming myself again, the rest is DC ops territory.

Mon, Feb 19, 10:11 AM · hardware-requests, Patch-For-Review, Operations, ops-eqiad
MoritzMuehlenhoff added a project to T187466: Decommission mw1259-mw1260: hardware-requests.
Mon, Feb 19, 10:10 AM · hardware-requests, Patch-For-Review, Operations, ops-eqiad
MoritzMuehlenhoff updated the task description for T187466: Decommission mw1259-mw1260.
Mon, Feb 19, 10:09 AM · hardware-requests, Patch-For-Review, Operations, ops-eqiad

Fri, Feb 16

MoritzMuehlenhoff claimed T187466: Decommission mw1259-mw1260.
Fri, Feb 16, 1:11 PM · hardware-requests, Patch-For-Review, Operations, ops-eqiad

Thu, Feb 15

MoritzMuehlenhoff added a comment to T135991: Automated service restarts for common low-level system services.

Yeah, definitely, this is currently only meant for all many common system services we use across the fleet (nrpe, diamond, systemd-timesyncd, atd, prometheus exporters, site-local exim, sshd etc.) and not for outward-facing LVSed services, those need separate tooling improvements.

Thu, Feb 15, 3:10 PM · Patch-For-Review, Operations
MoritzMuehlenhoff added a comment to T166081: rack/setup/install conf1004-conf1006.

Ack, the deb8u2 patch for jessie was for a security fix which is also fixed in the stretch version.

Thu, Feb 15, 2:35 PM · Analytics-Kanban, Patch-For-Review, User-Elukey, User-Joe, Operations
MoritzMuehlenhoff added a comment to T187442: Request to be added to the ldap/wmde group.

If this request is only about getting added to the wmde group (which controls some Gerrit settings related to WMDE projects) you don't need to sign an NDA with the WMF Legal department (but let me clarify that with them, I'll update this task).

Thu, Feb 15, 1:44 PM · LDAP-Access-Requests, Operations, WMF-NDA-Requests
MoritzMuehlenhoff added a comment to T182397: Decomission eventlog2001.

Not showing there now, someone did a cleanup.

Thu, Feb 15, 11:21 AM · DC-Ops, ops-codfw, Analytics, Operations
MoritzMuehlenhoff added a comment to T166081: rack/setup/install conf1004-conf1006.
  1. Check in labs what zookeeper version would end up in stretch. On conf100[123] we have 3.4.5+dfsg-2+deb8u2 and I believe that we should keep the same on stretch.
Thu, Feb 15, 11:06 AM · Analytics-Kanban, Patch-For-Review, User-Elukey, User-Joe, Operations

Wed, Feb 14

MoritzMuehlenhoff added a comment to T164703: Integrate jessie 8.8 point release.

These are fully rolled out:
ca-certificates
uwsgi
python-cryptography

Wed, Feb 14, 5:44 PM · Operations
MoritzMuehlenhoff updated the task description for T187035: Ops Onboarding for Valentín Gutiérrez.
Wed, Feb 14, 11:30 AM · Patch-For-Review, Ops-Access-Requests, Traffic, Operations
MoritzMuehlenhoff added a comment to T187035: Ops Onboarding for Valentín Gutiérrez.

Valentín has been added to pwstore.

Wed, Feb 14, 11:29 AM · Patch-For-Review, Ops-Access-Requests, Traffic, Operations
MoritzMuehlenhoff created T187292: labvirt1008 rebooted / system was overheated.
Wed, Feb 14, 8:00 AM · Patch-For-Review, Cloud-VPS, cloud-services-team, ops-eqiad, Operations

Tue, Feb 13

MoritzMuehlenhoff added a comment to T182832: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state.

I am in favor of upgrading phab* hosts to buster as soon as that is feasible. Maybe we should aim for that rather than sinking a lot of time into importing/backporting packages from external repos and all that goes with it.

Tue, Feb 13, 5:00 PM · Wikimedia-Incident, Patch-For-Review, User-Elukey, Release-Engineering-Team (Kanban), Operations, Phabricator
MoritzMuehlenhoff added a comment to T187063: Remove video scaler instances from deployment-prep.

Should not have any impact on the production scalers, to be extra sure I only shut them down for now and if no one complains in the next days, I'll completely remove them.

Tue, Feb 13, 11:01 AM · Patch-For-Review, Operations, Beta-Cluster-Infrastructure
MoritzMuehlenhoff added a comment to T182832: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state.

Still, PHP 7.1 should be considered independently, has this specific bug been reported upstream (or is that blocked by Wednesday's upgrade as upstream doesn't accepted bug report for outdated versions?)

Tue, Feb 13, 10:35 AM · Wikimedia-Incident, Patch-For-Review, User-Elukey, Release-Engineering-Team (Kanban), Operations, Phabricator
MoritzMuehlenhoff added a comment to T182832: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state.

We can import the PHP 7.1 packages from Ondrej Sury to a separate repository component (like component/php71), the maintainer can be trusted. But this would be specific for use by Phabricator, since using external repo has a number of notable downsides (e.g. no update guarantees as for the Debian updates and more importantly no integration with the wider PHP extensions ecosystem (i.e. all php packages not build from the main PHP package need to imported/adapted manually). Looking at phab1001 we have

  • php-apcu (needs an update)
  • php5-json (in PHP7 this is part of the main package)
  • php5-mailparse (is a custom package anyway)
Tue, Feb 13, 10:28 AM · Wikimedia-Incident, Patch-For-Review, User-Elukey, Release-Engineering-Team (Kanban), Operations, Phabricator
MoritzMuehlenhoff added a comment to T186289: Remove cloud-admin rights from YuviPanda.

Yuvi's shell access was removed via https://gerrit.wikimedia.org/r/407577 and I've also just removed him from the wmflabs.org root mail alias and from the cn=nda LDAP group.

Tue, Feb 13, 9:48 AM · Patch-For-Review, cloud-services-team (Kanban), wikitech.wikimedia.org, Operations
MoritzMuehlenhoff updated the task description for T187035: Ops Onboarding for Valentín Gutiérrez.
Tue, Feb 13, 9:09 AM · Patch-For-Review, Ops-Access-Requests, Traffic, Operations
MoritzMuehlenhoff updated the task description for T187035: Ops Onboarding for Valentín Gutiérrez.
Tue, Feb 13, 9:09 AM · Patch-For-Review, Ops-Access-Requests, Traffic, Operations
MoritzMuehlenhoff added a comment to T187035: Ops Onboarding for Valentín Gutiérrez.

Added to cn=ops and cn=wmf LDAP groups.

Tue, Feb 13, 9:08 AM · Patch-For-Review, Ops-Access-Requests, Traffic, Operations

Mon, Feb 12

Eevans awarded T186619: Upload cassandra package(s) to wikimedia apt repository a Cookie token.
Mon, Feb 12, 2:42 PM · Services (done), Patch-For-Review, User-fgiunchedi, User-Eevans, RESTBase-Cassandra, Cassandra, Operations
MoritzMuehlenhoff closed T186619: Upload cassandra package(s) to wikimedia apt repository as Resolved.

Uploaded to apt.wikimedia.org. To add it to a server you can use

Mon, Feb 12, 2:41 PM · Services (done), Patch-For-Review, User-fgiunchedi, User-Eevans, RESTBase-Cassandra, Cassandra, Operations
MoritzMuehlenhoff closed T177739: Integrate stretch 9.2 point release as Resolved.

Fully rolled out now.

Mon, Feb 12, 2:08 PM · User-fgiunchedi, Operations
MoritzMuehlenhoff added a comment to T143931: Update ICU version to 55.1.

Beta has been upgraded to ICU 57, we'll also upgrade production to that version at (no timeline established yet).

Mon, Feb 12, 1:40 PM · Operations
MoritzMuehlenhoff added a comment to T177498: Provide a forward port of ICU 52 for stretch / Investigate best ICU update strategy.

Beta/deployment-prep has been upgraded to an HHVM build using ICU 57.

Mon, Feb 12, 1:39 PM · Patch-For-Review, User-Elukey, HHVM, Operations
MoritzMuehlenhoff created T187063: Remove video scaler instances from deployment-prep.
Mon, Feb 12, 12:45 PM · Patch-For-Review, Operations, Beta-Cluster-Infrastructure

Fri, Feb 9

MoritzMuehlenhoff added a comment to T186619: Upload cassandra package(s) to wikimedia apt repository.

@Eevans wrote:

Even as of right now we have versions 2.1.13 and 2.2.6, (in addition to 3.11.0) in play. Version 2.1.13 is used by maps (which depending on who you talk to Doesn't Matter(tm)), and AQS uses 2.2.6, and probably will for the foreseeable future (they have no plans to upgrade). Even if you disqualify maps, we did keep AQS on a 2.1.x release for a considerable period of time (months?) after we'd moved RESTBase to a 2.2 release.

Fri, Feb 9, 3:02 PM · Services (done), Patch-For-Review, User-fgiunchedi, User-Eevans, RESTBase-Cassandra, Cassandra, Operations
MoritzMuehlenhoff added a comment to T182656: Integrate jessie 8.10 point release.

These are fully rolled out:
libxtst
icu
libio-socket-ssl-perl

Fri, Feb 9, 2:52 PM · Operations
MoritzMuehlenhoff added a comment to T182655: Integrate stretch 9.3 point update.

These are fully rolled out:
linux
icu

Fri, Feb 9, 2:51 PM · Operations
MoritzMuehlenhoff added a comment to T183888: php-luasandbox in Wikimedia's Stretch apt repo depends on php5.

Our internal php-luasandbox package has been rebuilt to only provide the hhvm-luasandbox package (that's kind of confusing given the source package name, but it's only temporary given our migration to PHP 7).

Fri, Feb 9, 11:35 AM · Patch-For-Review, Operations, MediaWiki-extensions-Scribunto, MediaWiki-Vagrant
MoritzMuehlenhoff added a comment to T186814: Missing servers in racktables.

The Jupyterhub spare (notebook1002) was repurposed as kafka1023 in https://phabricator.wikimedia.org/T181518

Fri, Feb 9, 9:54 AM · Operations, ops-eqiad
MoritzMuehlenhoff created T186866: Etherpad 1.6.3 security release.
Fri, Feb 9, 9:26 AM · Operations
MoritzMuehlenhoff added a comment to T186289: Remove cloud-admin rights from YuviPanda.

@MoritzMuehlenhoff could you take care of removing @yuvipanda from the ops LDAP group?

Fri, Feb 9, 8:04 AM · Patch-For-Review, cloud-services-team (Kanban), wikitech.wikimedia.org, Operations

Thu, Feb 8

MoritzMuehlenhoff created T186808: Non-redundant power supply on helium.
Thu, Feb 8, 3:54 PM · Operations, ops-eqiad
MoritzMuehlenhoff added a comment to T164456: Migrate to nginx-light.

First is upgrade tlsproxy hosts to 1.13.6-2+wmf1 (but still on existing nginx-full packages)

Thu, Feb 8, 12:55 PM · Traffic, Operations
MoritzMuehlenhoff renamed T181121: Kernels errors on ganeti1005- ganeti1008 under high I/O from Hardware errors on ganeti1005- ganeti1008 to Kernels errors on ganeti1005- ganeti1008 under high I/O.
Thu, Feb 8, 11:01 AM · ops-eqiad, Operations

Wed, Feb 7

MoritzMuehlenhoff added a comment to T186619: Upload cassandra package(s) to wikimedia apt repository.

Is there a specific reason for calling the repo component cassandra311? That's very specific and adding/removing components requires some puppet churn. IOW do we have reason to believe that Cassandra 3.12 will be incompatible with a 3.11 cluster?

Wed, Feb 7, 6:10 PM · Services (done), Patch-For-Review, User-fgiunchedi, User-Eevans, RESTBase-Cassandra, Cassandra, Operations
MoritzMuehlenhoff closed T184722: Hardware check on mw1271 as Resolved.

Thanks, I ran "scap pull" and repooled the server.

Wed, Feb 7, 4:00 PM · Operations, ops-eqiad

Fri, Feb 2

MoritzMuehlenhoff added a comment to T172487: decom iridium.

That host has a broken sshd config (coming from Phabricator), but it's possible to login via mgmt and the root password.

Fri, Feb 2, 5:48 PM · ops-eqiad, hardware-requests, Operations
MoritzMuehlenhoff added a comment to T181121: Kernels errors on ganeti1005- ganeti1008 under high I/O.

Happened again on ganeti1005, similar errors, but this time triggered by a copy of the Archiva data.

Fri, Feb 2, 2:43 PM · ops-eqiad, Operations
MoritzMuehlenhoff added a comment to T172487: decom iridium.

No, no point i debugging indeed. Instead it would be really nice if it could be shutdown after running such a long time doing nothing.

Fri, Feb 2, 1:09 PM · ops-eqiad, hardware-requests, Operations
MoritzMuehlenhoff closed T183888: php-luasandbox in Wikimedia's Stretch apt repo depends on php5 as Resolved.

Our internal wikidiff2 package has been rebuilt to only provide the hhvm-wikdiff2 package now (and after some fiddling with reprepro I removed the old php-wikidiff2 from apt.wikimedia.org).

Fri, Feb 2, 12:15 PM · Patch-For-Review, Operations, MediaWiki-extensions-Scribunto, MediaWiki-Vagrant
MoritzMuehlenhoff closed T183888: php-luasandbox in Wikimedia's Stretch apt repo depends on php5, a subtask of T181353: [EPIC] Migrate base image to Debian Stretch, as Resolved.
Fri, Feb 2, 12:15 PM · Patch-For-Review, Epic, MediaWiki-Vagrant
MoritzMuehlenhoff added a comment to T52864: Have a conversation about migrating from GNU Mailman 2.1 to GNU Mailman 3.0.

That's dependent on goal planning / road map considerations, I only meant to point out the availability in backports since it was mentioned earlier on this task.

Fri, Feb 2, 12:05 PM · Operations, Wikimedia-Mailing-lists
MoritzMuehlenhoff added a comment to T52864: Have a conversation about migrating from GNU Mailman 2.1 to GNU Mailman 3.0.

mailman3-core, mailman3-hyperkitty, postorius and mailmanclient have been accepted into stretch-backports today.

Fri, Feb 2, 11:52 AM · Operations, Wikimedia-Mailing-lists
MoritzMuehlenhoff added a comment to T184270: rebuild php-wikidiff2 and php-luasandbox for php7 and stretch.

In addition I'll drop the php-wikidiff2 from our internal src:php-wikidiff2 package (so that it only builds hhvm-wikidiff2).

Fri, Feb 2, 11:30 AM · Packaging, Operations
MoritzMuehlenhoff added a comment to T183888: php-luasandbox in Wikimedia's Stretch apt repo depends on php5.

I've uploaded a backport of Kunal's 1.5.1-3 package from Debian testing to stretch-backports. The packages in Debian only support Zend PHP (since Debian doesn't feature more in depth integration of HHVM in the wider module eco system), but we still need hhvm-wikidiff2, so I'll update the internal source package to only build the hhvm-wikidiff2 binary package. (And when we've migrated to PHP7 we can remove the internal package entirely)

Fri, Feb 2, 11:17 AM · Patch-For-Review, Operations, MediaWiki-extensions-Scribunto, MediaWiki-Vagrant
MoritzMuehlenhoff closed T186193: clamav errors on mendelevium as Resolved.

@grin: Thanks for the pointer! Since ClamAV has retracted the broken signature (and will make sure this doesn't reoccur) I'll close this task. We're following ClamAV via jessie-updates, so when this is fixed upstream, we'll pick up the new version once released on short notice.

Fri, Feb 2, 11:09 AM · Mail

Thu, Feb 1

MoritzMuehlenhoff added a comment to T185667: setup/install eventlog1002.eqiad.wmnet.

I actually tried to move to systemd a couple of years ago.

Thu, Feb 1, 6:27 PM · Analytics, Operations
MoritzMuehlenhoff added a comment to T184722: Hardware check on mw1271.

@Cmjohnson Is this ready to be re-pooled with the new DIMM or are you planning further tests which require the server to be out of service?

Thu, Feb 1, 3:44 PM · Operations, ops-eqiad
MoritzMuehlenhoff added a comment to T168407: rack/setup/install labnodepool1002.eqiad.wmnet.

@RobH : Given Antoine's comment, let's reclaim, then? This host has almost 2.5 years remaining warranty

Thu, Feb 1, 1:19 PM · cloud-services-team (Kanban), Cloud-VPS, Operations
MoritzMuehlenhoff added a comment to T86209: set up DMARC aggregate report collection into a database for research and reporting .

I'll tear down the systems from T169566 as well

Thu, Feb 1, 12:31 PM · Operations, Mail
MoritzMuehlenhoff added a comment to T186193: clamav errors on mendelevium.

clamav is socket-activated, maybe it tripped over some rule? I installed the new version and the errors are gone for now, let's keep an eye on it.

Thu, Feb 1, 9:37 AM · Mail
MoritzMuehlenhoff added a comment to T186169: varnishkafka fails to build on Alpine Linux (strndupa).

Let's simply use a Debian base image, then? With the overhead that Kafka adds, the disk space saving of musl over glibc is negligable anyway.

Thu, Feb 1, 8:12 AM · Patch-For-Review, User-Elukey, Varnish

Wed, Jan 31

MoritzMuehlenhoff updated subscribers of T176370: Migrate to PHP 7 in WMF production.

@MoritzMuehlenhoff Is someone actively working on dumps? I haven't seen movement on https://phabricator.wikimedia.org/T117534

Wed, Jan 31, 5:33 PM · TechCom-RFC (TechCom-Approved), User-ArielGlenn, NewPHP, HHVM, MediaWiki-Platform-Team, Operations
MoritzMuehlenhoff added a comment to T176370: Migrate to PHP 7 in WMF production.

During the SRE offsite/onsite we came up with the following plan:

Wed, Jan 31, 2:52 PM · TechCom-RFC (TechCom-Approved), User-ArielGlenn, NewPHP, HHVM, MediaWiki-Platform-Team, Operations
MoritzMuehlenhoff removed projects from T185004: Decommission mw1201-mw1220: hardware-requests, HHVM.
Wed, Jan 31, 2:39 PM · ops-eqiad, Patch-For-Review, User-Joe, Operations
MoritzMuehlenhoff assigned T185004: Decommission mw1201-mw1220 to Cmjohnson.
Wed, Jan 31, 2:39 PM · ops-eqiad, Patch-For-Review, User-Joe, Operations
MoritzMuehlenhoff closed T167225: Upload hhvm to stretch apt repo in apt.wikimedia.org as Resolved.

HHVM is available for stretch-wikimedia for quite a while now (used by the video scalers).

Wed, Jan 31, 2:38 PM · Operations, HHVM
MoritzMuehlenhoff closed T167225: Upload hhvm to stretch apt repo in apt.wikimedia.org, a subtask of T168494: tracking task: jessie -> stretch, as Resolved.
Wed, Jan 31, 2:38 PM · Operations

Tue, Jan 30

MoritzMuehlenhoff added a comment to T185024: HHVM 3.18.5+dfsg-1+wmf3 changes parse_url causing unit tests to fail.

A revised fix has been released (along with 3.18.8), I'll roll that into our packages: https://hhvm.com/blog/2018/01/30/hhvm-3.24.1.html

Tue, Jan 30, 6:23 PM · MediaWiki-Core-Tests, Operations, Continuous-Integration-Infrastructure, HHVM
MoritzMuehlenhoff added a comment to T186020: Expand meitnerium's root partition to 100G.

Yeah, it's probably easiest to add a new disk and move /var/lib/archiva to it

Tue, Jan 30, 5:54 PM · Operations, Analytics-Kanban, User-Elukey
MoritzMuehlenhoff added a comment to T182656: Integrate jessie 8.10 point release.

These are fully rolled out:
krb5
libx11
libxfixes
libxi
libxrandr
ncurses
sudo

Tue, Jan 30, 12:29 PM · Operations
MoritzMuehlenhoff created T185997: Transcode logging should also log the server on which the transcode process ran.
Tue, Jan 30, 11:46 AM · TimedMediaHandler, Multimedia

Thu, Jan 25

MoritzMuehlenhoff added a comment to T185024: HHVM 3.18.5+dfsg-1+wmf3 changes parse_url causing unit tests to fail.

Also:

  • From Facebook's perspective, HHVM 3.18 is unsupported as of 2018-01-16; that said, I'm likely to make an exception for the a fix for this issue given how recently we introduced the regression, if backporting it isn't too involved.
Thu, Jan 25, 6:15 PM · MediaWiki-Core-Tests, Operations, Continuous-Integration-Infrastructure, HHVM

Jan 23 2018

MoritzMuehlenhoff added a comment to T185024: HHVM 3.18.5+dfsg-1+wmf3 changes parse_url causing unit tests to fail.

For the HHVM builds on apt.wikimedia.org this has been fixed in 3.18.5+dfsg-1+wmf4 (jessie) and 3.18.5+dfsg-1+wmf4+deb9u1 (stretch). Does Travis use the deb packages provided by Facebook?

Jan 23 2018, 8:27 AM · MediaWiki-Core-Tests, Operations, Continuous-Integration-Infrastructure, HHVM

Jan 19 2018

MoritzMuehlenhoff added a comment to T184788: mw2140 unresponsive, mgmt not accessible.

@Papaul: ethtool shows "Link detected: no" for both network interfaces, the next time you're in the DC could you please check the cabling? (Not time-critical)

Jan 19 2018, 3:50 PM · Patch-For-Review, Operations, ops-codfw
MoritzMuehlenhoff reassigned T184788: mw2140 unresponsive, mgmt not accessible from MoritzMuehlenhoff to elukey.
Jan 19 2018, 2:28 PM · Patch-For-Review, Operations, ops-codfw
MoritzMuehlenhoff closed T183507: Create 'releng' LDAP group as Resolved.

Done, see below. @greg, please also add a description of the purpose (and which privileges this group entails) to https://wikitech.wikimedia.org/wiki/LDAP_Groups

Jan 19 2018, 1:55 PM · Patch-For-Review, Release-Engineering-Team (Watching / External), Operations, LDAP
MoritzMuehlenhoff claimed T183507: Create 'releng' LDAP group.
Jan 19 2018, 1:50 PM · Patch-For-Review, Release-Engineering-Team (Watching / External), Operations, LDAP
MoritzMuehlenhoff added a comment to T178392: Replacement hardware for cumin masters.

The hardware configuration from T181419 seems perfectly fine for Cumin masters. I don't have a good estimate how much cheaper a single CPU/32 GB machine would be compared to this setup, so I'll leave it at your (and Faidon/Mark's) discretion whether evaluating a lower spec option is actually worthwhile.

Jan 19 2018, 11:33 AM · hardware-requests, Operations
MoritzMuehlenhoff added a comment to T185024: HHVM 3.18.5+dfsg-1+wmf3 changes parse_url causing unit tests to fail.

Stretch packages have also been uploaded in the mean time.

Jan 19 2018, 9:25 AM · MediaWiki-Core-Tests, Operations, Continuous-Integration-Infrastructure, HHVM

Jan 18 2018

MoritzMuehlenhoff added a comment to T185236: Password Vault for Security Team.

Some general docs (targeted at ops pwstore, but pretty similar) are at https://office.wikimedia.org/wiki/Pwstore

Jan 18 2018, 6:18 PM · Security-Team, Operations, Security
MoritzMuehlenhoff updated subscribers of T185236: Password Vault for Security Team.

Ops and Releng are using pwstore, which is just using a simple git repository underneath for storage.

Jan 18 2018, 6:07 PM · Security-Team, Operations, Security
MoritzMuehlenhoff added a comment to T174465: Puppet admin module should support adding system users to managed groups.

This is solely for T174110 or are we anticipating other use cases?

Jan 18 2018, 9:58 AM · Analytics-Kanban, Patch-For-Review, Operations

Jan 17 2018

MoritzMuehlenhoff added a comment to T185024: HHVM 3.18.5+dfsg-1+wmf3 changes parse_url causing unit tests to fail.

I've built/uploaded new HHVM packages for jessie (stretch following soon) which disable the broken patch and also reported this upstream at https://github.com/facebook/hhvm/issues/8104

Jan 17 2018, 1:32 PM · MediaWiki-Core-Tests, Operations, Continuous-Integration-Infrastructure, HHVM

Jan 16 2018

MoritzMuehlenhoff added a comment to T172487: decom iridium.

I rebooted this spare host for completeless wrt Meltdown kernel update and while it's now running the fixed kernel, sshd came up running the /etc/ssh/sshd_config.phabricator instead of the regular /etc/ssh/sshd_config. iridium can still be reached via mgmt and is up for decom, so no point in debugging/fixing this IMO.

Jan 16 2018, 1:01 PM · ops-eqiad, hardware-requests, Operations

Jan 15 2018

MoritzMuehlenhoff created P6584 (An Untitled Masterwork).
Jan 15 2018, 11:09 AM

Jan 12 2018

MoritzMuehlenhoff created T184788: mw2140 unresponsive, mgmt not accessible.
Jan 12 2018, 10:05 AM · Patch-For-Review, Operations, ops-codfw

Jan 11 2018

MoritzMuehlenhoff created T184722: Hardware check on mw1271.
Jan 11 2018, 2:36 PM · Operations, ops-eqiad
MoritzMuehlenhoff added a comment to T166081: rack/setup/install conf1004-conf1006.

Given that this task is stalled for a while now, we should reimage these servers with stretch before eventually putting them into production?

Jan 11 2018, 11:13 AM · Analytics-Kanban, Patch-For-Review, User-Elukey, User-Joe, Operations

Jan 10 2018

MoritzMuehlenhoff added a comment to T184189: Cloud: Labvirt and instance reboots for Meltdown.

Linux jessie-meltdown-image 4.9.0-0.bpo.5-amd64 #1 SMP Debian 4.9.65-3+deb9u1~bpo8+2 (2018-01-04) x86_64 GNU/Linux

(So... apparently we are running 4.9 kernels on Jessie even though the security patch for Jessie is only in the 3.16 kernel.  Not sure how to move forward from this.  That also raises concerns about the upgrade path for existing VMs.)
Jan 10 2018, 2:22 PM · Patch-For-Review, Operations, Toolforge, Cloud-VPS, cloud-services-team (Kanban)
MoritzMuehlenhoff created P6566 (An Untitled Masterwork).
Jan 10 2018, 11:28 AM
MoritzMuehlenhoff added a comment to T184443: Reboot snapshot*, dumpsdata*, dataset1001, ms1001, francium.

Fixed kernels are available for trusty now, I've installed them on francium and snapshot100[1,5-7].

Jan 10 2018, 9:21 AM · Dumps-Generation, Operations

Jan 8 2018

MoritzMuehlenhoff added a comment to T184434: prometheus-blazegraph-exporter failing to start after reboot.

That's a bug in the systemd unit of prometheus-blazegraph-exporter, it needs to start after Blazegraph, but the current version doesn't declare that, so systemd tries to start it when multi-user.target is reached. I can fix it some time this week.

Jan 8 2018, 2:50 PM · Patch-For-Review, Discovery, Wikidata, Operations, Discovery-Wikidata-Query-Service-Sprint, Wikidata-Query-Service
MoritzMuehlenhoff added a comment to T184239: Puppet broken on deployment-mediawiki07, deployment-imagescaler02, deployment-redis06, deployment-videoscaler01 due to prometheus exporter packages being missing in stretch.

Can you also remove apt::use_experimental from the Hiera settings for deployment-prep? There's no point for deployment-prep to use "experimental" at this point.

Jan 8 2018, 11:14 AM · Patch-For-Review, Puppet, Beta-Cluster-Infrastructure

Jan 5 2018

MoritzMuehlenhoff updated subscribers of T184270: rebuild php-wikidiff2 and php-luasandbox for php7 and stretch.

@Legoktm already prepared a stretch-backports upload of php-luasandbox, so we can use that one. We could update wikidiff2 in stretch-backports to 1.5.1-3 and stick with the Debian releases?

Jan 5 2018, 1:24 PM · Packaging, Operations

Jan 4 2018

MoritzMuehlenhoff added a comment to T184189: Cloud: Labvirt and instance reboots for Meltdown.

@Paladox: Most of WMCS runs trusty with either the 3.13 or 4.4 kernel and needs an update by Canonical (which isn't available).

Jan 4 2018, 10:42 PM · Patch-For-Review, Operations, Toolforge, Cloud-VPS, cloud-services-team (Kanban)

Jan 3 2018

MoritzMuehlenhoff added a comment to T184018: Remove overlay from kernel blacklist on toolforge.

The module was initially blacklisted since there were multiple security issues which exploited privilege escalation bugs in overlayfs. Since then trusty has gained support for disabling unprivileged user namepaces (which was enabled), which was the biggest risk. I'm fine with adding a Hiera setting to disable the blacklist for Docker hosts. For the rest of the fleet we don't have a use for it and should keep it blacklisted.

Jan 3 2018, 9:45 AM · Patch-For-Review, Toolforge

Jan 2 2018

MoritzMuehlenhoff added a comment to T182993: TLS security review of the Kafka stack.

K! Kafka SSLTransportLayer uses javax.net.ssl with SSL Engine, which is part of the standard JSSE implementation provided with Java 8. I'm pretty sure this is not Mozilla NSS/JSS.

Jan 2 2018, 6:38 PM · Patch-For-Review, Traffic, User-Elukey, Analytics-Kanban, Operations, Analytics-Cluster
MoritzMuehlenhoff reopened T182397: Decomission eventlog2001 as "Open".

This host still shows up in puppetdb, i.e. misses the deactivate step (e.g. visible in https://servermon.wikimedia.org/hosts/)

Jan 2 2018, 12:39 PM · DC-Ops, ops-codfw, Analytics, Operations
MoritzMuehlenhoff added a comment to T175264: Decommission db1049.

This host still shows up in puppetdb, i.e. misses the deactivate step (e.g. visible in https://servermon.wikimedia.org/hosts/)

Jan 2 2018, 12:36 PM · hardware-requests, ops-eqiad, Operations, DBA