Page MenuHomePhabricator

Krenair (Alex Monk)
Wikimedia volunteer

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Oct 3 2014, 2:34 PM (237 w, 6 d)
Availability
Available
IRC Nick
Krenair
LDAP User
Alex Monk
MediaWiki User
Krenair [ Global Accounts ]

I am a Wikimedia volunteer helping in various technical ways. These days it's usually Beta Cluster, Cloud VPS, or Operations related. Since 2012 I've spent significant amounts of time involved in MediaWiki development, software deployments to the Wikimedia cluster, OTRS (email response to e.g. info-en@wikimedia.org addresses), and various other things.

Some of my old VisualEditor and other work (2014-2016) can be found under @AlexMonk-WMF instead.

I have opinions on things, which do not necessarily represent those of any organisation I am, have previously been, or will in the future be affiliated with.

Recent Activity

Today

Krenair updated the task description for T218729: Migrate away from Debian Jessie to Debian Stretch.
Thu, Apr 25, 3:53 AM · Beta-Cluster-Infrastructure
Krenair added a comment to T221157: Request creation of Gratitude VPS project.

Just to be clear - this system is just going to be processing publicly available data from the wikis, right? You won't be processing anything directly from users with it? There won't be surveys hosted on it or anything

Thu, Apr 25, 12:40 AM · Cloud-VPS (Project-requests)

Yesterday

Krenair added a project to T221796: Moving a page over a protected page results in no protection: MediaWiki-Page-protection.
Wed, Apr 24, 5:52 PM · MediaWiki-Page-protection
Krenair closed T214455: shinken-wm hasn't alerted in -releng since 2018-12-09 as Resolved.

It's been alerting recently

Wed, Apr 24, 4:34 PM · Shinken, Beta-Cluster-Infrastructure, monitoring

Tue, Apr 23

Krenair created T221726: Add test checking all registered special pages are basically functioning.
Tue, Apr 23, 11:08 PM · MediaWiki-Core-Testing, MediaWiki-Special-pages
Krenair closed T205344: Inconsistent lists of labs-ns* nameservers as Resolved.

With the shutting down of labs-ns* looming, to make T221531: Update RIPE about changes in WMCS auth servers possible @Andrew made a change which cleaned this up in the process:

<XioNoX> getting a RIPE error:
<XioNoX>  Parent has nameserver(s) not listed at the child (cloud-ns0.wikimedia.org; cloud-ns1.wikimedia.org).
<XioNoX> None of the nameservers listed at the parent are listed at the child. 
<XioNoX> looking at what it mean exactly
<Krenair> Maybe it wants us to update our end first XioNoX ?
<Krenair> Right now pri.authdns.ripe.net serves this:
<Krenair> 56.15.185.in-addr.arpa. 172800 IN NS labs-ns0.wikimedia.org.
<Krenair> 56.15.185.in-addr.arpa. 172800 IN NS labs-ns1.wikimedia.org.
<Krenair> Whereas we serve:
<Krenair> 56.15.185.in-addr.arpa. 3599 IN NS labs-ns1.wikimedia.org.
<Krenair> 56.15.185.in-addr.arpa. 3599 IN NS labs-ns2.wikimedia.org.
<Krenair> 56.15.185.in-addr.arpa. 3599 IN NS labs-ns0.wikimedia.org.
<Krenair> 56.15.185.in-addr.arpa. 3599 IN NS labs-ns3.wikimedia.org.
<Krenair> Maybe to get RIPE to add cloud-ns0 we have to add the cloud-ns0 on our end etc.?
<XioNoX> Krenair: possibly, can you do it now?
<Krenair> I can't but andrewbogott could probably
<andrewbogott> I'm not 100% sure I know what you mean but let me look...
<andrewbogott> um… ok, now I officially don't know how to do that.  Is it something in our dns repo?
<Krenair> no, this would be a setting in designate somewhere I think
<Krenair> it might be referred to as a pool
<andrewbogott> hm...
<wikibugs> Traffic, Operations, Patch-For-Review, Performance-Team (Radar): Refactor public-facing DYNA scheme for primary project hostnames in our DNS - https://phabricator.wikimedia.org/T208263 (CDanis) `authdns-update` complete as of ~20:33:56 UTC.
<andrewbogott> I can't find that set anywhere in designate, but I do need to update this pool anyway
<andrewbogott> so will do that and see what shakes out
<Krenair> At some point you presumably did it to add ns2 and ns3 andrewbogott 
<Krenair> possibly with novaadmin credentials, 'designate server-list', 'designate server-update', or the openstackclient equivalents?
* Platonid1s is now known as Platonides
<Krenair> andrewbogott, possibly something under https://docs.openstack.org/designate/pike/admin/designate-manage.html
<andrewbogott> yep, I upgraded that just now
<andrewbogott> although I failed to check if ptr records were working properly before the change :(
<Krenair> mitaka docs: https://docs.openstack.org/designate/mitaka/pools.html#designate-manage-pools-command-reference
<andrewbogott> Krenair: do things look any different on your end?
<andrewbogott> I changed the cloud-ns0 servers but the labs-ns0/ns1 servers don't know about it so if you have those cached you'll still see the old results
<Krenair> yes:
<Krenair> 56.15.185.in-addr.arpa. 3599 IN NS cloud-ns0.wikimedia.org.
<Krenair> 56.15.185.in-addr.arpa. 3599 IN NS cloud-ns1.wikimedia.org.
<Krenair> that looks righght
<Krenair> right* excuse my keyboard
<andrewbogott> oh, great
<andrewbogott> ok, so now… XioNoX did that warning go away?
<Krenair> It also looks fine when I query labs-ns*
<andrewbogott> oh, that's right, they share a db
<andrewbogott> so I updated it everywhere
<XioNoX> Your object has been successfully modified
Tue, Apr 23, 9:01 PM · cloud-services-team (Kanban), Operations, Cloud-VPS, Traffic, DNS
Krenair added a comment to T204762: On deployment-prep scap cache_git_info takes 12 minutes (that is too slow).

Takes a while on beta because of all the extensions (plus disks are slower than in production where it takes like 20 seconds). IIRC we haven't done much to parallelize any of this, it serially walks the extensions directory.

Tue, Apr 23, 6:29 PM · Release-Engineering-Team (Kanban), Scap, Beta-Cluster-Infrastructure
Krenair updated the task description for T218729: Migrate away from Debian Jessie to Debian Stretch.
Tue, Apr 23, 2:45 PM · Beta-Cluster-Infrastructure
Krenair updated the task description for T218729: Migrate away from Debian Jessie to Debian Stretch.
Tue, Apr 23, 2:42 PM · Beta-Cluster-Infrastructure
Krenair added a comment to T178014: Update android-builder cloud instance.

Its image is a bit outdated, and it suffers from T135033 (run Puppet manually to reproduce).

FWIW the age of the image used to build an instance is not usually an issue, as long as it's been running packages updates (which should be done by unattended-upgrades), puppet is working, and the distribution its using is still supported (jessie is expected to be okay for another year or so based on https://wikitech.wikimedia.org/wiki/Operating_system_upgrade_policy) it should be fine. Images in Wikimedia Cloud VPS get marked as deprecated very quickly.

Tue, Apr 23, 2:10 PM · Wikipedia-Android-App-Backlog
Krenair added a parent task for T221620: Editing Hiera namespace broken on wikitech for project admins: T218729: Migrate away from Debian Jessie to Debian Stretch.
Tue, Apr 23, 12:53 PM · wikitech.wikimedia.org, MediaWiki-extensions-OpenStackManager
Krenair added a subtask for T218729: Migrate away from Debian Jessie to Debian Stretch: T221620: Editing Hiera namespace broken on wikitech for project admins.
Tue, Apr 23, 12:53 PM · Beta-Cluster-Infrastructure
Krenair added a comment to T221620: Editing Hiera namespace broken on wikitech for project admins.

This blocks me removing deployment-ms-fe02 and deployment-poolcounter04 due to the references to those hosts on that page

Tue, Apr 23, 12:52 PM · wikitech.wikimedia.org, MediaWiki-extensions-OpenStackManager
Krenair updated subscribers of T221620: Editing Hiera namespace broken on wikitech for project admins.
Tue, Apr 23, 12:25 PM · wikitech.wikimedia.org, MediaWiki-extensions-OpenStackManager
Krenair added a project to T221620: Editing Hiera namespace broken on wikitech for project admins: wikitech.wikimedia.org.
Tue, Apr 23, 12:25 PM · wikitech.wikimedia.org, MediaWiki-extensions-OpenStackManager
Krenair created T221620: Editing Hiera namespace broken on wikitech for project admins.
Tue, Apr 23, 12:25 PM · wikitech.wikimedia.org, MediaWiki-extensions-OpenStackManager
Krenair edited P8428 runas-tools.isa.
Tue, Apr 23, 10:33 AM
Krenair created P8428 runas-tools.isa.
Tue, Apr 23, 10:25 AM
Krenair updated the task description for T218729: Migrate away from Debian Jessie to Debian Stretch.
Tue, Apr 23, 9:31 AM · Beta-Cluster-Infrastructure
Krenair closed T199387: Beta eswikibooks certificate issues, a subtask of T182927: Get letsencrypt wildcard cert for *.beta.wmflabs.org domains, as Resolved.
Tue, Apr 23, 8:36 AM · Patch-For-Review, Release-Engineering-Team (Watching / External), Beta-Cluster-Infrastructure
Krenair closed T199387: Beta eswikibooks certificate issues as Resolved.
Tue, Apr 23, 8:36 AM · Beta-Cluster-Infrastructure
Krenair closed T182927: Get letsencrypt wildcard cert for *.beta.wmflabs.org domains as Resolved.
Tue, Apr 23, 8:36 AM · Patch-For-Review, Release-Engineering-Team (Watching / External), Beta-Cluster-Infrastructure
Krenair closed T206922: Write designate integration script for certcentral DNS challenges as Resolved.
Tue, Apr 23, 8:29 AM · Patch-For-Review, Beta-Cluster-Infrastructure, Acme-chief
Krenair closed T206922: Write designate integration script for certcentral DNS challenges, a subtask of T182927: Get letsencrypt wildcard cert for *.beta.wmflabs.org domains, as Resolved.
Tue, Apr 23, 8:29 AM · Patch-For-Review, Release-Engineering-Team (Watching / External), Beta-Cluster-Infrastructure
Krenair updated the task description for T221268: Remove old letsencrypt puppet module.
Tue, Apr 23, 8:28 AM · Puppet, Patch-For-Review, Operations, Traffic

Mon, Apr 22

Krenair updated the task description for T220894: Replacement of network::constant's special_hosts.
Mon, Apr 22, 12:58 PM · Patch-For-Review, Operations
Krenair updated the task description for T221531: Update RIPE about changes in WMCS auth servers.
Mon, Apr 22, 2:25 AM · Traffic, Operations, cloud-services-team (Kanban)
Krenair updated the task description for T221531: Update RIPE about changes in WMCS auth servers.
Mon, Apr 22, 2:22 AM · Traffic, Operations, cloud-services-team (Kanban)

Sun, Apr 21

Krenair renamed T221527: DNS WARNING - 1.018 seconds response time - cloud-ns names advertise IPv6 addresses but do not accept DNS requests on those IPs from DNS WARNING - 1.018 seconds response time (tools-sgegrid-master.tools.eqiad.wmflabs. to DNS WARNING - 1.018 seconds response time - cloud-ns names advertise IPv6 addresses but do not accept DNS requests on those IPs.
Sun, Apr 21, 9:35 PM · Patch-For-Review, cloud-services-team (Kanban)
Krenair added a comment to T221527: DNS WARNING - 1.018 seconds response time - cloud-ns names advertise IPv6 addresses but do not accept DNS requests on those IPs.

cloud-ns0.wikimedia.org has address 208.80.154.135
cloud-ns0.wikimedia.org has IPv6 address 2620:0:861:2:208:80:154:135

Sun, Apr 21, 9:32 PM · Patch-For-Review, cloud-services-team (Kanban)
Krenair added a comment to T221527: DNS WARNING - 1.018 seconds response time - cloud-ns names advertise IPv6 addresses but do not accept DNS requests on those IPs.

I think it's not the resolution of the name of the nameserver itself, based on a very quick check of strace I think it's timing out trying to connect to the nameserver over IPv6.

Sun, Apr 21, 9:29 PM · Patch-For-Review, cloud-services-team (Kanban)

Sat, Apr 20

Krenair renamed T221499: Disable TLS 1.0 and 1.1 in apache for gerrit.wikimedia.org from Disable SSLv2, SSLv3, TLS 1.0 and TLS 1.1 in apache for gerrit.wikimedia.org to Disable TLS 1.0 and 1.1 in apache for gerrit.wikimedia.org.
Sat, Apr 20, 4:19 PM · Patch-For-Review, Gerrit, Security
Krenair added a comment to T221499: Disable TLS 1.0 and 1.1 in apache for gerrit.wikimedia.org.

Am guessing this is a default for built-in TLS support that does not apply with our apache proxy in front?

Sat, Apr 20, 4:18 PM · Patch-For-Review, Gerrit, Security
Krenair updated the task description for T220894: Replacement of network::constant's special_hosts.
Sat, Apr 20, 3:54 PM · Patch-For-Review, Operations
Krenair updated the task description for T220894: Replacement of network::constant's special_hosts.
Sat, Apr 20, 4:23 AM · Patch-For-Review, Operations
Krenair updated the task description for T220894: Replacement of network::constant's special_hosts.
Sat, Apr 20, 1:23 AM · Patch-For-Review, Operations

Fri, Apr 19

Krenair added a comment to T221463: questions about standalone wmf-mariadb103.

For the record I ran into this at T219087 and basically my way around it was find /tmp -name mysql.sock, which returned something like /tmp/systemd-private-d6c71da3465641b3aa68e8390a2cc75c-mariadb.service-HPflMl/tmp/mysql.sock, then ran mysql -S <path>

Fri, Apr 19, 4:30 PM · Patch-For-Review, DBA
Krenair updated the task description for T221290: wiki-mail DKIM failing.
Fri, Apr 19, 3:53 PM · Patch-For-Review, Traffic, Operations, DNS, Mail
Krenair updated the task description for T221288: Phabricator SPF record contains internal addressing for phab[12]001.
Fri, Apr 19, 3:52 PM · Patch-For-Review, Traffic, Operations, DNS, Mail
Krenair added a comment to T221288: Phabricator SPF record contains internal addressing for phab[12]001.

I am the "User in #2019041710004636" mentioned in the first comment and a very novice subscriber to Phabricator.

Fri, Apr 19, 3:52 PM · Patch-For-Review, Traffic, Operations, DNS, Mail
Krenair added a comment to T171188: Move the main WMCS puppetmaster into the Labs realm.

The number of puppet.git cherry-picks on cloudinfra-internal-puppetmaster is now 0, there's just the two secret commits to labs/private that are pretty much the purpose of that instance.

Fri, Apr 19, 4:56 AM · Patch-For-Review, cloud-services-team (Kanban), Cloud-Services, Puppet, Operations
Krenair updated the task description for T218729: Migrate away from Debian Jessie to Debian Stretch.
Fri, Apr 19, 3:13 AM · Beta-Cluster-Infrastructure
Krenair closed T204745: cloudvps: migrate projects from main to eqiad1 as Resolved.

The old region is gone.

Fri, Apr 19, 3:01 AM · cloud-services-team (Kanban), Patch-For-Review, Cloud-Services
Krenair closed T204745: cloudvps: migrate projects from main to eqiad1, a subtask of T167293: Nova-network to Neutron migration, as Resolved.
Fri, Apr 19, 3:01 AM · Patch-For-Review, Epic, Cloud-Services
Krenair archived Cloud-VPS (Ubuntu Trusty Deprecation).
Fri, Apr 19, 2:55 AM
Krenair closed T186029: Remove support for Ubuntu Trusty prior to upstream End Of Life for release as Resolved.

No trusty VMs left, all subtasks closed.

Fri, Apr 19, 2:53 AM · Cloud-VPS (Ubuntu Trusty Deprecation), Epic

Thu, Apr 18

Krenair added a comment to T221290: wiki-mail DKIM failing.

How did it work until now?

I wonder the same thing. Looking through old personal emails I have a message from wiki@wikimedia.org dated Sep 18 2018 which has the same issue:

Date: Tue, 18 Sep 2018 17:52:18 +0000
    dkim=invalid (public key: granularity mismatch, 1024-bit rsa key sha256)
      header.d=wikimedia.org header.i=@wikimedia.org header.b=yW/crbag
      header.a=rsa-sha256 header.s=wiki-mail x-bits=1024;

Maybe someone has old archived messages from wiki@wikimedia.org and could chime in with the date and dkim headers.

Thu, Apr 18, 8:22 PM · Patch-For-Review, Traffic, Operations, DNS, Mail
Krenair added a comment to T221389: setup/install WMF7426 as phab1003.eqiad.wmnet.

Did you mean phab1003.eqiad.wmnet? Existing phab* hosts are internal

Thu, Apr 18, 4:27 PM · Patch-For-Review, serviceops, Operations
Krenair added a project to T221372: mw.Title.getNamespacePrefix() does not work with namespaceGenderAliases: MediaWiki-General-or-Unknown.
Thu, Apr 18, 2:01 PM · MediaWiki-General-or-Unknown, JavaScript
Krenair updated the task description for T218729: Migrate away from Debian Jessie to Debian Stretch.
Thu, Apr 18, 1:41 PM · Beta-Cluster-Infrastructure
Krenair closed T220895: Udev (?) problems using modified swift puppet classes on deployment-ms-be0[56] as Resolved.
Thu, Apr 18, 1:40 PM · Patch-For-Review, Beta-Cluster-Infrastructure
Krenair closed T220895: Udev (?) problems using modified swift puppet classes on deployment-ms-be0[56], a subtask of T218729: Migrate away from Debian Jessie to Debian Stretch, as Resolved.
Thu, Apr 18, 1:40 PM · Beta-Cluster-Infrastructure
Krenair closed T221171: Some Java clients unable to handle beta cluster TLS as Resolved.

Thanks @Vgutierrez

Thu, Apr 18, 1:34 PM · Patch-For-Review, Beta-Cluster-reproducible, Wikipedia-Android-App-Backlog
Krenair added a comment to T221339: Missing index on revision_userindex.rev_actor.

I wonder if this is due to it being defined as coalesce(revactor_actor,0) in https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/profile/templates/labs/db/views/maintain-views.yaml$675
ar_actor down on line 282 has no such thing.

Thu, Apr 18, 3:00 AM · Data-Services
Krenair added a comment to T219390: Have puppet-merge on puppetmaster1001 publish the official sha1 after merging.

I think for this to make sense we should require labs/private repository to also exist on prod puppetmasters and have the trusted sha1 published in this manner too.

Thu, Apr 18, 12:57 AM · Patch-For-Review, cloud-services-team (Kanban), Puppet
Krenair added a comment to T133548: Create a secure redirect service for large count of non-canonical / junk domains.

@Dzahn Assuming that with Let's Encrypt, HTTPS will work in modern browsers for all redirects - do we need any of the redirect domains in SAN? Perhaps we don't need any.

Thu, Apr 18, 12:39 AM · Goal, Patch-For-Review, HTTPS, Operations, Traffic

Wed, Apr 17

Krenair added a comment to T200832: remove mathoid from scb.

deployment-mathoid still exists and has been failing puppet runs since December 3rd when profile::mathoid got removed.

I 'd delete the VM, profile::mathoid isn't coming back. If anything the VMs with the role::beta::docker_services role applied can probably handle the service now.

Wed, Apr 17, 9:36 PM · Beta-Cluster-Infrastructure, Core Platform Team Backlog (Watching / External), Services (watching), SCB, Mathoid, Operations
Krenair added a comment to T220235: Migrate Beta cluster services to use Kubernetes .

It's not just going to become a problem once T198901 is done, it's already a problem - due to the roles that have been removed in favour of k8s in T200832 and T213194, puppet is already failing on deployment-mathoid, deployment-sca01, and deployment-sca02, meaning these servers/services are already going to be out of date and will eventually break.

Wed, Apr 17, 9:30 PM · Kubernetes, Release Pipeline, serviceops, Services (later), Core Platform Team Backlog (Later), Beta-Cluster-Infrastructure
Krenair closed T221285: deployment-snapshot01 puppet error due to nginx-apache2 conflict as Resolved.
krenair@deployment-snapshot01:~$ sudo rm /etc/nginx/sites-available/default
krenair@deployment-snapshot01:~$ sudo puppet agent -tv
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for deployment-snapshot01.deployment-prep.eqiad.wmflabs
Notice: /Stage[main]/Base::Environment/Tidy[/var/tmp/core]: Tidying 0 files
Info: Applying configuration version '1555534419'
Notice: openstack::clientpackages::mitaka::stretch: no special configuration yet
Notice: /Stage[main]/Openstack::Clientpackages::Mitaka::Stretch/Notify[openstack::clientpackages::mitaka::stretch: no special configuration yet]/message: defined 'message' as 'openstack::clientpackages::mitaka::stretch: no special configuration yet'
Notice: The LDAP client stack for this host is: classic
Notice: /Stage[main]/Profile::Ldap::Client::Labs/Notify[LDAP client stack]/message: defined 'message' as 'The LDAP client stack for this host is: classic'
Notice: /Stage[main]/Nginx/Package[nginx-full]/ensure: created
Notice: /Stage[main]/Nginx/File[/etc/nginx/sites-enabled/default]/ensure: removed
Info: /etc/nginx/sites-enabled: Scheduling refresh of Service[nginx]
Notice: /Stage[main]/Nginx/Service[nginx]: Triggered 'refresh' from 1 events
Notice: Applied catalog in 9.06 seconds
Wed, Apr 17, 8:56 PM · Beta-Cluster-Infrastructure
Krenair added a comment to T221285: deployment-snapshot01 puppet error due to nginx-apache2 conflict.

I think T216164#4963388 explains why it still tries to install nginx despite the ensure: absent. It should be enough to just delete the default site.

Wed, Apr 17, 8:53 PM · Beta-Cluster-Infrastructure
Krenair added a comment to T221288: Phabricator SPF record contains internal addressing for phab[12]001.

T216714: gmail considers all Phabricator email to be spam due to missing SPF record may be related here

Wed, Apr 17, 8:43 PM · Patch-For-Review, Traffic, Operations, DNS, Mail
Krenair created T221290: wiki-mail DKIM failing.
Wed, Apr 17, 8:42 PM · Patch-For-Review, Traffic, Operations, DNS, Mail
Krenair created P8416 Email headers for DKIM failure task.
Wed, Apr 17, 8:41 PM
Krenair created T221288: Phabricator SPF record contains internal addressing for phab[12]001.
Wed, Apr 17, 8:34 PM · Patch-For-Review, Traffic, Operations, DNS, Mail
Krenair moved T217941: deployment-deploy02 missing PHP cURL extension? from Puppet errors to To Triage on the Beta-Cluster-Infrastructure board.
Wed, Apr 17, 8:22 PM · Beta-Cluster-Infrastructure
Krenair added a comment to T216164: Puppet failures on deployment-deploy01.deployment-prep.eqiad.wmflabs.

T221285: deployment-snapshot01 puppet error due to nginx-apache2 conflict is also about issues involving the services proxy's use of nginx

Wed, Apr 17, 8:21 PM · Patch-For-Review, Beta-Cluster-Infrastructure
Krenair closed T205672: Elasticsearch puppet config changes broke puppet in various instances as Resolved.

I'm going to go ahead and assume tools-elastic* is fine.

Wed, Apr 17, 8:17 PM · Discovery-Search, Beta-Cluster-Infrastructure, Patch-For-Review, Beta-Cluster-reproducible, Puppet
Krenair created T221285: deployment-snapshot01 puppet error due to nginx-apache2 conflict.
Wed, Apr 17, 8:14 PM · Beta-Cluster-Infrastructure
Krenair renamed T219764: Upgrade jessie hosts to rsyslog 8.1901.0-1 from Some jessie instances upset about rsyslog package to jessie rsyslog upgrade problems.
Wed, Apr 17, 8:04 PM · User-fgiunchedi, Operations
Krenair added a comment to T221171: Some Java clients unable to handle beta cluster TLS.

Excellent. Thanks for reporting this and all the help debugging @Dbrant.

Wed, Apr 17, 8:02 PM · Patch-For-Review, Beta-Cluster-reproducible, Wikipedia-Android-App-Backlog
Krenair added a comment to T218609: Figure out future for newly created deployment-prep jessie instances.

Jessie creation is now disabled in most projects (including deployment-prep). I'd prefer to leave it that way in order to provide some mild resistance to new Jessie VMs showing up in the cloud.

That said, enabling it for creation of select special VMs is easy -- just ping me and remind me that I documented how to do it in T218119.

Wed, Apr 17, 7:59 PM · Beta-Cluster-Infrastructure
Krenair created T221277: Puppet errors on deployment-maps04 due to node.js package problems.
Wed, Apr 17, 7:28 PM · Beta-Cluster-Infrastructure
Krenair added a comment to T220235: Migrate Beta cluster services to use Kubernetes .

Is this related to T218609 and T200832?

Wed, Apr 17, 7:24 PM · Kubernetes, Release Pipeline, serviceops, Services (later), Core Platform Team Backlog (Later), Beta-Cluster-Infrastructure
Krenair updated the task description for T221268: Remove old letsencrypt puppet module.
Wed, Apr 17, 7:00 PM · Puppet, Patch-For-Review, Operations, Traffic
Krenair added a comment to T221268: Remove old letsencrypt puppet module.

It does work in WMCS, with some puppet cherry-picks and some credentials generated by WMCS admins to allow modification of designate DNS records from within instances.

Wed, Apr 17, 5:38 PM · Puppet, Patch-For-Review, Operations, Traffic
Krenair updated subscribers of T221268: Remove old letsencrypt puppet module.
Wed, Apr 17, 5:37 PM · Puppet, Patch-For-Review, Operations, Traffic
Krenair added a comment to T221268: Remove old letsencrypt puppet module.
Wed, Apr 17, 5:34 PM · Puppet, Patch-For-Review, Operations, Traffic
Krenair triaged T221268: Remove old letsencrypt puppet module as Low priority.
Wed, Apr 17, 5:33 PM · Puppet, Patch-For-Review, Operations, Traffic
Krenair created T221268: Remove old letsencrypt puppet module.
Wed, Apr 17, 5:33 PM · Puppet, Patch-For-Review, Operations, Traffic
Krenair updated subscribers of T220867: Gerrit: Cannot assign user name "vladi2016" to account XXXX; name already in use..
Wed, Apr 17, 4:36 PM · LDAP, Gerrit
Krenair updated subscribers of T220867: Gerrit: Cannot assign user name "vladi2016" to account XXXX; name already in use..

@Tulsi_Bhagat also has this: https://imgur.com/BdFe9TH

Wed, Apr 17, 4:35 PM · LDAP, Gerrit
Krenair updated subscribers of T221257: 2FA broken on mediawiki.org.

There's 4 OATHAuth changes in this deployment, of which these two sound more likely to be involved:
https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/OATHAuth/+/471217/
https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/OATHAuth/+/502973/

Wed, Apr 17, 4:17 PM · MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), MediaWiki-extensions-OATHAuth
Krenair added a subtask for T220726: 1.34.0-wmf.1 deployment blockers: T221257: 2FA broken on mediawiki.org.
Wed, Apr 17, 4:09 PM · Patch-For-Review, Release-Engineering-Team (Kanban), Release, Train Deployments
Krenair added a parent task for T221257: 2FA broken on mediawiki.org: T220726: 1.34.0-wmf.1 deployment blockers.
Wed, Apr 17, 4:09 PM · MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), MediaWiki-extensions-OATHAuth
Krenair added a comment to T221257: 2FA broken on mediawiki.org.

I can log in just fine on enwiki but not mediawiki. Marking as possible deployment blocker for 1.34.0-wmf.1

Wed, Apr 17, 4:08 PM · MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), MediaWiki-extensions-OATHAuth
Krenair triaged T221257: 2FA broken on mediawiki.org as Unbreak Now! priority.

Oh, yeah, broken for me too.

Wed, Apr 17, 4:07 PM · MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), MediaWiki-extensions-OATHAuth
Krenair added a comment to T221257: 2FA broken on mediawiki.org.

@Skizzerz: Is the clock on your device accurate?

Wed, Apr 17, 4:06 PM · MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), MediaWiki-extensions-OATHAuth
Krenair created P8415 Git fatal errors trying to rebase puppet patch.
Wed, Apr 17, 3:57 PM
Krenair added a comment to T221171: Some Java clients unable to handle beta cluster TLS.

Has it had any effect on the other clients?

Wed, Apr 17, 2:58 PM · Patch-For-Review, Beta-Cluster-reproducible, Wikipedia-Android-App-Backlog
Krenair updated subscribers of T221171: Some Java clients unable to handle beta cluster TLS.

@Dbrant: @Vgutierrez pointed out the OCSP stapling served by nginx was out of date, we found the puppet manifest had not deployed the file that causes it to reload nginx when the file changes. Try now, at least on the ones showing the revocation check failure

Wed, Apr 17, 2:51 PM · Patch-For-Review, Beta-Cluster-reproducible, Wikipedia-Android-App-Backlog
Krenair renamed T221171: Some Java clients unable to handle beta cluster TLS from SSL connections to beta cluster not working to Some Java clients unable to handle beta cluster TLS.
Wed, Apr 17, 2:23 PM · Patch-For-Review, Beta-Cluster-reproducible, Wikipedia-Android-App-Backlog
Krenair added a comment to T221171: Some Java clients unable to handle beta cluster TLS.

Does it happen with plain Java URLConnection then?

Wed, Apr 17, 2:15 PM · Patch-For-Review, Beta-Cluster-reproducible, Wikipedia-Android-App-Backlog
Krenair added a project to T221171: Some Java clients unable to handle beta cluster TLS: Beta-Cluster-reproducible.

I haven't been able to track down anything going wrong on the server-side yet.
<Krenair> beta is testing getting the unified cert via acme-chief
<Krenair> we've discovered that certain java installs do not trust the site anymore
<Krenair> at least we assume it to be caused by this
<Krenair> but that we still don't fully understand what's going on because the same client seems able to talk fine to a prod misc site which also uses certs issued this way, and also some clients complain about different things - android seemed upset about revocation checking

Wed, Apr 17, 12:28 AM · Patch-For-Review, Beta-Cluster-reproducible, Wikipedia-Android-App-Backlog

Tue, Apr 16

Krenair created P8407 instance metadata problem.
Tue, Apr 16, 4:45 PM

Mon, Apr 15

Krenair closed T220990: wmcs cumin broken as Resolved.

Fixed with a lot of help from @Volans and @aborrero

Mon, Apr 15, 4:13 PM · Operations-Software-Development, cloud-services-team (Kanban)
Krenair added a comment to T220860: access for foks to labweb (in one way or another) (or make changePassword.php work on mwmaint hosts).

Well from hieradata/role/common/mediawiki/maintenance.yaml:

  • restricted - doesn't have as many different ways to break things as deployment but could still do a lot of damage and cause DBA headaches
  • deployment - can and occasionally does break wikipedia
  • ldap-admins - presumably we already trust this group to do be able to do everything this could do, check
  • maintenance-log-readers - this group can't do much. The only current member of this group is also a member of deployment though.
  • perf-roots - root on varnish and application servers
Mon, Apr 15, 2:59 AM · Operations, SRE-Access-Requests
Krenair merged T142615: nova-network deprecated, for real this time, as of Openstack N into T167293: Nova-network to Neutron migration.
Mon, Apr 15, 2:55 AM · Patch-For-Review, Epic, Cloud-Services
Krenair merged task T142615: nova-network deprecated, for real this time, as of Openstack N into T167293: Nova-network to Neutron migration.
Mon, Apr 15, 2:55 AM · Cloud-VPS, cloud-services-team (Kanban)

Sun, Apr 14

Krenair added a comment to T220912: Some Kubernetes tools were stopped on 2019-04-13 19:31 and can’t be restarted.

That timing would put it right around the cloudvirt1015 reboot

Sun, Apr 14, 4:19 PM · cloud-services-team, Kubernetes, Toolforge
Krenair updated the task description for T220894: Replacement of network::constant's special_hosts.
Sun, Apr 14, 2:37 PM · Patch-For-Review, Operations