Page MenuHomePhabricator

Krenair (Alex Monk)
Wikimedia volunteer

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Oct 3 2014, 2:34 PM (262 w, 3 d)
Availability
Available
IRC Nick
Krenair
LDAP User
Alex Monk
MediaWiki User
Krenair [ Global Accounts ]

I am a Wikimedia volunteer helping in various technical ways. These days it's usually Beta Cluster, Cloud VPS, or Operations related. Since 2012 I've spent significant amounts of time involved in MediaWiki development, software deployments to the Wikimedia cluster, OTRS (email response to e.g. info-en@wikimedia.org addresses), and various other things.

Some of my old VisualEditor and other work (2014-2016) can be found under @AlexMonk-WMF instead.

I have opinions on things, which do not necessarily represent those of any organisation I am, have previously been, or will in the future be affiliated with.

Recent Activity

Today

Krenair added a comment to T235252: Toolforge: SSL support for new domain toolforge.org.

Yep.

toolforge:
  CN: toolforge.org
  SNI:
  - toolforge.org
  - '*.toolforge.org'
  - tools.wmflabs.org
  - '*.tools.wmflabs.org'
  authorized_regexes:
  - ^tools-proxy-[0-9]+\.tools\.eqiad\.wmflabs$
  challenge: dns-01

Currently the toolforge front proxy is using *.wmflabs.org in the certificate. I believe the 2 you mentioned (tools.wmflabs.org and *.tools.wmflabs.org) are a good replacement, but we should double-check that we aren't leaving anything uncovered.
Here are some docs https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/SSL_certificates for additional context.
Specifically, the same current *.wmflabs.org cert is used also in tools-static* and in novaproxy-* servers (mind the cross-project usage).

Mon, Oct 14, 8:11 PM · Toolforge, cloud-services-team (Kanban), Kubernetes
Krenair added a comment to T235252: Toolforge: SSL support for new domain toolforge.org.

Yep.

toolforge:
  CN: toolforge.org
  SNI:
  - toolforge.org
  - '*.toolforge.org'
  - tools.wmflabs.org
  - '*.tools.wmflabs.org'
  authorized_regexes:
  - ^tools-proxy-[0-9]+\.tools\.eqiad\.wmflabs$
  challenge: dns-01

Currently the toolforge front proxy is using *.wmflabs.org in the certificate. I believe the 2 you mentioned (tools.wmflabs.org and *.tools.wmflabs.org) are a good replacement, but we should double-check that we aren't leaving anything uncovered.
Here are some docs https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/SSL_certificates for additional context.
Specifically, the same current *.wmflabs.org cert is used also in tools-static* and in novaproxy-* servers (mind the cross-project usage).

Mon, Oct 14, 7:14 PM · Toolforge, cloud-services-team (Kanban), Kubernetes

Yesterday

Krenair added a comment to T235382: MariaDB User s54171, access denied on replicas..

It probably needs someone with wmcs-admin/ops production rights.

Sun, Oct 13, 3:22 PM · cloud-services-team (Kanban), Data-Services
Krenair added a comment to T235321: Block renaming of certain users.

Am I missing something here? Why are there local wiki processes for handling renaming of users? I assumed this was made a global matter when SULF was done. How would a local wiki have any authority to approve or reject renames of global users?

Sun, Oct 13, 6:13 AM · MediaWiki-extensions-CentralAuth, WorkType-NewFunctionality, GlobalRename

Sat, Oct 12

Krenair added a comment to T235252: Toolforge: SSL support for new domain toolforge.org.

I've updated my comment above to reflect the current status, at some point I should turn it into proper docs for anyone else who needs to set acme-chief up with wikimedia puppetisation.
The remaining issues here are that:

  • toolforge.org. still has NS records in the org. zone pointing to prod - that's T235303: Update authoratiative nameservers for the toolforge.org domain to point to Designate - until this is done we can only get certs for tools.wmflabs.org.
  • we need to decide whether prod's setup of two nodes - active and passive, with a 'cert-sync' mechanism to keep passive up to date - makes sense here.
    • If it does then we should generate the cert-sync SSH key, cherry-pick that into labs/private on tools-puppetmaster, add the role::acme_chief::cloud role to tools-acme-chief-02 and run puppet there.
    • If not then we should introduce some flag to disable the sync mechanism to stop puppet on the active host from attempting to sync.
Sat, Oct 12, 9:01 PM · Toolforge, cloud-services-team (Kanban), Kubernetes
Krenair added a comment to T235346: Edits which has been reverted and revision deleted over 40 hours ago were visible on page previews.

Looks like the page preview just pulls the extract from https://en.wikipedia.org/api/rest_v1/page/summary/{title} - did something go wrong with RB cache purging? Theoretically the reverting edit should've triggered RB to update this extract, theoretically more recent purges of the edited page should've done it too.

Sat, Oct 12, 8:12 PM · RESTBase-API, User-Josve05a, Page-Previews
Krenair added a comment to T232486: Unable to ssh to ws-web.wikistream.eqiad.wmflabs as user edsu.

I'm sorry to have to reopen this. But I've been notified that wikistream.wmflabs.org is down again and I no longer seem to ssh with the the ProxyJump configuration? Here's the log of my ssh attempt:

kaizen:~ edsu$ ssh -vvv edsu@ws-web.wmflabs
Sat, Oct 12, 2:36 PM · VPS-Projects, Documentation, cloud-services-team (Kanban), Cloud-VPS

Fri, Oct 11

Krenair added a comment to T235252: Toolforge: SSL support for new domain toolforge.org.

Yep.

toolforge:
  CN: toolforge.org
  SNI:
  - toolforge.org
  - '*.toolforge.org'
  - tools.wmflabs.org
  - '*.tools.wmflabs.org'
  authorized_regexes:
  - ^tools-proxy-[0-9]+\.tools\.eqiad\.wmflabs$
  challenge: dns-01
Fri, Oct 11, 11:06 PM · Toolforge, cloud-services-team (Kanban), Kubernetes
Krenair added a comment to T235252: Toolforge: SSL support for new domain toolforge.org.

I've done some of the initial setup and begun thinking about what the rest of the process will be:

  • Created tools-acme-chief-0[12] as buster instances (acme-chief will not work on the version of python packaged in stretch or below) and done the usual dance with puppet to get them signed puppet certs.
  • Created a puppet prefix config in Horizon for tools-acme-chief:
profile::acme_chief::accounts: {}
profile::acme_chief::active: tools-acme-chief-01.tools.eqiad.wmflabs
profile::acme_chief::passive: tools-acme-chief-02.tools.eqiad.wmflabs
profile::acme_chief::certificates: {}
profile::acme_chief::challenges:
  dns-01:
    issuing_ca: letsencrypt.org
    ns_records:
    - cloud-ns0.wikimedia.org.
    - cloud-ns1.wikimedia.org.
    zone_update_cmd: /usr/local/bin/acme-chief-designate-sync.py
profile::acme_chief::cloud::designate_sync_auth_url: http://cloudcontrol1003.wikimedia.org:5000/v3
profile::acme_chief::cloud::designate_sync_project_name: tools
profile::acme_chief::cloud::designate_sync_region_name: eqiad1-r
profile::acme_chief::cloud::designate_sync_tidyup_enabled: true
profile::acme_chief::cloud::designate_sync_username: tools-dns-manager
  • Inserted tools-dns-manager password into puppet through a cherry-pick on tools-puppetmaster-01 adding profile::acme_chief::cloud::designate_sync_password to labs/private - done in commit 9847c18edea198412b4a124e540989e3cbfc4032
  • TODO: If we're going to have the active-passive set up (I made the second host for it but am now wondering whether we think we need this outside prod/beta), generate the SSH public/private key pair for cert-sync, replace modules/secret/secrets/keyholder/authdns_acmechief and modules/secret/secrets/keyholder/authdns_acmechief.pub in tools-puppetmaster-01 cherry-picks. Any thoughts on whether we should bother with the passive node for tools @Vgutierrez?
  • TODO: Apply the role::acme_chief::cloud role on each of the instances individually (in my experience roles in prefix/project config can be problematic) and run puppet.
  • TODO: Run the account creation code (I should probably get around to T207372: Add simple script for account creation) and insert into accounts dict in hiera. It should look something like this:
{hash}:
  directory: https://acme-v02.api.letsencrypt.org/directory
  regr: '{"body": {}, "uri": "https://acme-v02.api.letsencrypt.org/acme/acct/{number}"}'
toolforge:
  CN: toolforge.org
  SNI:
  - toolforge.org
  - '*.toolforge.org'
  authorized_regexes:
  - ^tools-proxy-[0-9]+\.tools\.eqiad\.wmflabs$
  challenge: dns-01
Fri, Oct 11, 11:00 PM · Toolforge, cloud-services-team (Kanban), Kubernetes
Krenair closed T235304: Create a service account to manage toolforge.org. from acme-chief as Resolved.

Thank you!

Fri, Oct 11, 9:30 PM · Toolforge, cloud-services-team (Kanban)
Krenair closed T235304: Create a service account to manage toolforge.org. from acme-chief, a subtask of T235252: Toolforge: SSL support for new domain toolforge.org, as Resolved.
Fri, Oct 11, 9:30 PM · Toolforge, cloud-services-team (Kanban), Kubernetes
Krenair placed T235304: Create a service account to manage toolforge.org. from acme-chief up for grabs.
Fri, Oct 11, 9:02 PM · Toolforge, cloud-services-team (Kanban)
Krenair reopened T235304: Create a service account to manage toolforge.org. from acme-chief, a subtask of T235252: Toolforge: SSL support for new domain toolforge.org, as Open.
Fri, Oct 11, 9:02 PM · Toolforge, cloud-services-team (Kanban), Kubernetes
Krenair reopened T235304: Create a service account to manage toolforge.org. from acme-chief as "Open".

<Krenair> Well the next step is for someone with novaadmin access to give it the designateadmin role (and maybe observer?) in the tools project.
<Krenair> Don't think it's something us mortal projectadmins can grant from the horizon UI
<Krenair> actually the ticket said 'with enough access', guess I'll leave it open for that

Fri, Oct 11, 9:02 PM · Toolforge, cloud-services-team (Kanban)
Krenair closed T235304: Create a service account to manage toolforge.org. from acme-chief, a subtask of T235252: Toolforge: SSL support for new domain toolforge.org, as Resolved.
Fri, Oct 11, 8:54 PM · Toolforge, cloud-services-team (Kanban), Kubernetes
Krenair closed T235304: Create a service account to manage toolforge.org. from acme-chief as Resolved.

https://wikitech.wikimedia.org/wiki/Special:Log?type=newusers&page=User%3ATools-dns-manager

Fri, Oct 11, 8:54 PM · Toolforge, cloud-services-team (Kanban)
Krenair claimed T235304: Create a service account to manage toolforge.org. from acme-chief.
Fri, Oct 11, 8:52 PM · Toolforge, cloud-services-team (Kanban)
Krenair added a project to T235303: Update authoratiative nameservers for the toolforge.org domain to point to Designate: DNS.

This will need to be communicated to MarkMonitor who register domains on WMF's behalf... Is that via the foundation legal team or can ops do it directly?

Fri, Oct 11, 8:29 PM · Traffic, Operations, DNS, Toolforge, cloud-services-team (Kanban)
Krenair added a comment to T235252: Toolforge: SSL support for new domain toolforge.org.

Actually the parent also links to https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/DNS_domain_usage#Resolution which says DNS will be managed via OpenStack Designate. for this domain

Fri, Oct 11, 8:20 PM · Toolforge, cloud-services-team (Kanban), Kubernetes
Krenair added a comment to T235252: Toolforge: SSL support for new domain toolforge.org.

ping @Krenair would you like to help me in this project? also T234617: Toolforge. introduce new domain toolforge.org

Fri, Oct 11, 8:08 PM · Toolforge, cloud-services-team (Kanban), Kubernetes

Thu, Oct 10

Krenair added a comment to T235218: Catch cloud-puppetmasters up with production puppetmaster versions.

I imagine this is a case of replacing them with buster instances?

Thu, Oct 10, 8:58 PM · cloud-services-team (Kanban)

Tue, Oct 8

Krenair created P9271 Keystone v3 API working with debug.
Tue, Oct 8, 10:01 PM
Krenair created P9270 Keystone v2 API not working?.
Tue, Oct 8, 9:22 PM
Krenair awarded T234996: Login on wikitech wiki fails after OpenStack upgrade removed v2 identity API a The World Burns token.
Tue, Oct 8, 8:12 PM · MW-1.35-notes (1.35.0-wmf.1; 2019-10-08), cloud-services-team (Kanban), wikitech.wikimedia.org, Operations

Sat, Oct 5

Krenair added a parent task for T212302: CloudVPS: upgrade: jessie -> stretch & mitaka -> newton: T224708: Drop most of mwopenstackclients.DnsManager in favour of designateclient.
Sat, Oct 5, 7:17 PM · Cloud-VPS, Patch-For-Review, cloud-services-team (Kanban)
Krenair added a subtask for T224708: Drop most of mwopenstackclients.DnsManager in favour of designateclient: T212302: CloudVPS: upgrade: jessie -> stretch & mitaka -> newton.
Sat, Oct 5, 7:17 PM · Patch-For-Review, Cloud-VPS
Krenair added a comment to T224708: Drop most of mwopenstackclients.DnsManager in favour of designateclient.

https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/522196/ needs newton

Sat, Oct 5, 7:17 PM · Patch-For-Review, Cloud-VPS
Krenair merged T87519: Migrate as much as possible from network::constants from network.pp to hiera into T220894: Replacement of network::constant's special_hosts.
Sat, Oct 5, 7:16 PM · Operations
Krenair merged task T87519: Migrate as much as possible from network::constants from network.pp to hiera into T220894: Replacement of network::constant's special_hosts.
Sat, Oct 5, 7:16 PM · Operations, Patch-For-Review, Puppet
Krenair added a comment to T87519: Migrate as much as possible from network::constants from network.pp to hiera.

Reverse-duping this against T220894: Replacement of network::constant's special_hosts, anyone should feel free to reopen if they disagree

Sat, Oct 5, 7:15 PM · Operations, Patch-For-Review, Puppet
Krenair removed a parent task for T220894: Replacement of network::constant's special_hosts: T87519: Migrate as much as possible from network::constants from network.pp to hiera.
Sat, Oct 5, 7:14 PM · Operations
Krenair removed a subtask for T87519: Migrate as much as possible from network::constants from network.pp to hiera: T220894: Replacement of network::constant's special_hosts.
Sat, Oct 5, 7:14 PM · Operations, Patch-For-Review, Puppet
Krenair closed T234723: PHP fatal error: syntax error, unexpected T_CONST, expecting T_VARIABLE as Resolved.
Sat, Oct 5, 5:47 PM · Beta-Cluster-Infrastructure
Krenair added a comment to T234723: PHP fatal error: syntax error, unexpected T_CONST, expecting T_VARIABLE.

shinken seems happy: <shinken-wm> RECOVERY - App Server Main HTTP Response on deployment-mediawiki-09 is OK: HTTP OK: HTTP/1.1 200 OK - 48091 bytes in 0.798 second response time

Sat, Oct 5, 5:04 PM · Beta-Cluster-Infrastructure
Krenair updated subscribers of T234723: PHP fatal error: syntax error, unexpected T_CONST, expecting T_VARIABLE.

added

profile::mediawiki::vhost_feature_flags:
  php72_only: true

to deployment-mediawiki-09, like we did with deployment-mediawiki-07 the other day

Sat, Oct 5, 5:03 PM · Beta-Cluster-Infrastructure
Krenair claimed T234723: PHP fatal error: syntax error, unexpected T_CONST, expecting T_VARIABLE.
Sat, Oct 5, 4:58 PM · Beta-Cluster-Infrastructure
Krenair added a comment to T234723: PHP fatal error: syntax error, unexpected T_CONST, expecting T_VARIABLE.

It's definitely culprit, but... WTF? That error can only happen on HHVM, and we're not running on HHVM anymore, right???

Sat, Oct 5, 4:57 PM · Beta-Cluster-Infrastructure

Sun, Sep 29

Krenair added a comment to T234159: Wikipedia app creates a new IP address when using anonymously.

Okay so basically what we've seen then is from the browser you always edit from that IPv6 address, but from the app it will do either?

Sun, Sep 29, 4:51 PM · Wikipedia-Android-App-Backlog, Android-app-Bugs
Krenair added a comment to T234159: Wikipedia app creates a new IP address when using anonymously.

Maybe, but I don't think we've proven that yet. Did you check whatismyip.com and whether that matches any IP seen when you edited Wikipedia?

Sun, Sep 29, 4:41 PM · Wikipedia-Android-App-Backlog, Android-app-Bugs
Krenair added a comment to T234159: Wikipedia app creates a new IP address when using anonymously.

2607:fb90:64e8:663d:563d:f18d:ce4:9f27 is a T-Mobile IPv6 address. I don't see anything wrong here...?

Sun, Sep 29, 4:17 PM · Wikipedia-Android-App-Backlog, Android-app-Bugs
Krenair added a comment to T234159: Wikipedia app creates a new IP address when using anonymously.

https://en.wikipedia.org/wiki/Wikipedia:Advice_to_T-Mobile_IPv6_users
When you access Wikipedia from a IPv6-compatible device on the T-Mobile network, a proxy server (with its own IPv6 address) retrieves and sends data to Wikipedia's servers on your behalf rather than your mobile device doing so directly.

Sun, Sep 29, 4:02 PM · Wikipedia-Android-App-Backlog, Android-app-Bugs
Krenair added a comment to T234159: Wikipedia app creates a new IP address when using anonymously.

I looked at your diff, it has a reference to https://en.wikipedia.org/wiki/Template:TMOblock which talks about T-Mobile IPv6 IPs. Yet the IP used on the edit is a T-Mobile IPv4 IP. Is it just that the wikipedia app forces IPv4 or something?

Sun, Sep 29, 3:48 PM · Wikipedia-Android-App-Backlog, Android-app-Bugs

Thu, Sep 26

Krenair added projects to T233991: Vendor's Emails Not Coming Through: Mail, Operations.
Thu, Sep 26, 9:58 PM · Operations, Mail
Krenair added a comment to T153163: Set up and use exported resources for Tool Labs's shared knowledge.

I figured I'd clean up puppet problems in the toolsbeta project before going near tools proper.

Thu, Sep 26, 1:41 AM · cloud-services-team (Kanban), Patch-For-Review, Toolforge
Krenair claimed T153163: Set up and use exported resources for Tool Labs's shared knowledge.
Thu, Sep 26, 12:19 AM · cloud-services-team (Kanban), Patch-For-Review, Toolforge

Wed, Sep 25

Krenair added a comment to T128715: Add other Tools administrators to the Icinga notification group.

Where does the group membership list live? What does being added imply, icinga emails when things break?

Wed, Sep 25, 11:19 PM · observability, Cloud-Services, Operations, Toolforge
Krenair added a comment to T136225: Backup and/or puppetize @toolserver.org mail forwards.

I wonder if we should just shut down the toolserver.org mail forwards. It's been years.

Wed, Sep 25, 11:16 PM · Cloud-Services, Toolforge
Krenair added a comment to T122403: tool labs: provide custom domain proxy?.

I also don't think we should permit use of custom domains.

Wed, Sep 25, 11:12 PM · Cloud-Services, Toolforge
Krenair added a comment to T91619: Clean out unused security groups on toollabs.

Tricky to figure out which ones are in use or not due to T222414: Nova policy does not permit novaobserver to view an instance's security groups

Wed, Sep 25, 11:09 PM · cloud-services-team (Kanban), Toolforge
Krenair added a comment to T153163: Set up and use exported resources for Tool Labs's shared knowledge.

How's it going there? Are we still interested in using it in the tools project?

Wed, Sep 25, 10:52 PM · cloud-services-team (Kanban), Patch-For-Review, Toolforge
Krenair added a comment to T128716: Make icinga-wm report Tools homepage check at #wikimedia-cloud, too.

I've updated this from -labs to -cloud but I'm not convinced it's necessary. Shinken and Icinga things are going into #wikimedia-cloud-feed...

Wed, Sep 25, 10:48 PM · observability, Operations, Toolforge
Krenair renamed T128716: Make icinga-wm report Tools homepage check at #wikimedia-cloud, too from Make icinga-wm report Tools homepage check at #wikimedia-labs, too to Make icinga-wm report Tools homepage check at #wikimedia-cloud, too.
Wed, Sep 25, 10:47 PM · observability, Operations, Toolforge
Krenair merged T151675: Test that execution nodes have public IPs assigned into T151704: Freenode sometimes throttles bot connections from tools.
Wed, Sep 25, 10:46 PM · Patch-For-Review, cloud-services-team (Kanban), wikimedia-irc-freenode, Toolforge
Krenair merged task T151675: Test that execution nodes have public IPs assigned into T151704: Freenode sometimes throttles bot connections from tools.
Wed, Sep 25, 10:46 PM · Toolforge
Krenair closed T138182: User tools.admin is in group project-tools as Resolved.
krenair@tools-sgebastion-07:~$ ldapsearch -x cn=project-tools | grep 'tools\.'
krenair@tools-sgebastion-07:~$
Wed, Sep 25, 10:43 PM · LDAP, Toolforge, Cloud-Services
Krenair removed a project from T122583: Apply pretty 'banned' error page to user-agent bans: Cloud-Services.
Wed, Sep 25, 10:36 PM · Toolforge
Krenair closed T105059: exec hosts have apache2 running as Resolved.

Looking at cumin results, apache2 is only on tools-prometheus-[01-02].tools.eqiad.wmflabs,tools-puppetmaster-01.tools.eqiad.wmflabs and lighttpd is only on tools-sgewebgrid-lighttpd-[0902-0928].tools.eqiad.wmflabs. That seems fine.

Wed, Sep 25, 10:33 PM · Toolforge
Krenair closed T99072: Fix 'unknown's in shinken as Resolved.

not seeing anything on shinken for tools right now

Wed, Sep 25, 10:26 PM · Cloud-Services, Toolforge
Krenair edited projects for T97862: Add shinken admin accounts for tools ops, added: Shinken; removed Cloud-Services.

It's been just me and @valhallasw in shinken for several years and it doesn't seem like we're committed to shinken long-term, this task is unlikely to be done.

Wed, Sep 25, 10:16 PM · Shinken, Toolforge
Krenair edited projects for T86218: Make labsdb views fully column-whitelist based , added: Data-Services; removed Cloud-Services, Toolforge.
Wed, Sep 25, 10:12 PM · Data-Services, LabsDB-Auditor

Tue, Sep 24

Krenair added a subtask for T233534: db1075 (s3 master) crashed - BBU failure: T233684: Make primary DB masters page on HOST DOWN alert.
Tue, Sep 24, 8:08 AM · Wikimedia-Incident, ops-eqiad, Operations, DBA
Krenair added a parent task for T233684: Make primary DB masters page on HOST DOWN alert: T233534: db1075 (s3 master) crashed - BBU failure.
Tue, Sep 24, 8:08 AM · Wikimedia-Incident, observability, Icinga, DBA

Mon, Sep 23

Krenair added a comment to T233534: db1075 (s3 master) crashed - BBU failure.

I'm wondering if an entry should be added under "Where did we get lucky?" along the lines of "I/We noticed this incident before SMS paging begun".

Mon, Sep 23, 7:14 PM · Wikimedia-Incident, ops-eqiad, Operations, DBA
Krenair added a comment to T233534: db1075 (s3 master) crashed - BBU failure.
Mon, Sep 23, 6:43 PM · Wikimedia-Incident, ops-eqiad, Operations, DBA

Sun, Sep 22

Krenair created T233533: Beta is serving CSP headers allowing prod rather than beta.
Sun, Sep 22, 6:45 PM · Beta-Cluster-Infrastructure
Krenair added a comment to T233489: CentralAuth and local account creation are not working on beta cluster wikis.

So it looks like we're not properly changing the CSP header to beta.wmflabs.org addresses. That would be bad, except they're listed as Report Only?

Sun, Sep 22, 6:39 PM · MediaWiki-extensions-CentralAuth, Beta-Cluster-Infrastructure
Krenair added a comment to T233530: When user create tool via toolsadmin, it doesn't create replica.my.cnf.

Yeah, unfortunately the script that does this runs on the NFS server, which lives in eqiad.wmnet.

Sun, Sep 22, 5:34 PM · cloud-services-team (Kanban), Data-Services, Toolforge
Krenair updated subscribers of T232676: af-nb-db-2.automation-framework.eqiad.wmflabs has broken network.

@aborrero it looks like arturo-k8s-test-3.openstack.eqiad.wmflabs has also got this issue

Sun, Sep 22, 12:07 PM · Cloud-VPS
Krenair changed the status of T218423: Add python 3 packages to openstack::clientpackages::common, a subtask of T218426: Upgrade various Cloud VPS Python 2 scripts to Python 3 , from Stalled to Open.
Sun, Sep 22, 11:58 AM · cloud-services-team (Kanban), patch-welcome, Cloud-VPS
Krenair changed the status of T218423: Add python 3 packages to openstack::clientpackages::common from Stalled to Open.

AFAIK no OpenStack hosts are left on Jessie anymore.

Sun, Sep 22, 11:58 AM · cloud-services-team (Kanban), Cloud-VPS
Krenair added a comment to T230657: Thanks extension is not shown on Minerva history page (AMC mode).

Wouldn't the same logic apply on desktop though?

I guess so.

Sun, Sep 22, 11:48 AM · MobileFrontend (MobileFrontend Special Pages), Readers-Web-Backlog (Design), MinervaNeue, Growth Design, Thanks, Growth-Team, Advanced Mobile Contributions
Krenair added a comment to T230657: Thanks extension is not shown on Minerva history page (AMC mode).

I can't find the conversation but I remember our rationale being something like: it doesn't seem super useful to be able to thank someone before seeing the diff.

Sun, Sep 22, 12:43 AM · MobileFrontend (MobileFrontend Special Pages), Readers-Web-Backlog (Design), MinervaNeue, Growth Design, Thanks, Growth-Team, Advanced Mobile Contributions
Krenair added a comment to T233372: Create a "novaobserver" equivalent for Toolforge Kubernetes cluster inspection.

The "k8sobserver" role/user/whatever should NOT be able to see "private" or "secret" things in a namespace. This certainly includes Secret objects, and also probably should extend to ConfigMap objects.

I wonder how many ConfigMaps we actually have, depending on that maybe we could review them to check whether there's anything secret stored in there. We could have maintainers move secrets to Secret objects and then open up ConfigMap access.

Sun, Sep 22, 12:08 AM · Kubernetes, Toolforge
Krenair added a comment to T233372: Create a "novaobserver" equivalent for Toolforge Kubernetes cluster inspection.

The "k8sobserver" role/user/whatever should NOT be able to see "private" or "secret" things in a namespace. This certainly includes Secret objects, and also probably should extend to ConfigMap objects.

Sun, Sep 22, 12:02 AM · Kubernetes, Toolforge
Krenair updated subscribers of T232924: Decide what to do with labpuppetmaster100[12].wikimedia.org.

(@Andrew marked them spare in https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/537111/)

Sun, Sep 22, 12:00 AM · cloud-services-team, Cloud-VPS

Sat, Sep 21

Krenair closed T139011: puppet function ipresolve unable to look up instance on labs-puppetmaster as Invalid.

Presumed no longer relevant following T171188: Move the main WMCS puppetmaster into the Labs realm

Sat, Sep 21, 11:58 PM · cloud-services-team (Kanban), Phabricator, Puppet, Cloud-Services
Krenair updated the task description for T220359: Benefit from acme-chief features in acme-chief clients.
Sat, Sep 21, 5:26 PM · Operations, Traffic, Acme-chief
Krenair added a comment to T233483: Login at labtestwikitech.wikimedia.org.

On a semi related note https://labtesttoolsadmin.wikimedia.org/ returns
502s too

Sat, Sep 21, 3:02 PM · Cloud-Services
Krenair added a comment to T233483: Login at labtestwikitech.wikimedia.org.

Ah, yeah, am on phone but it looks like it's serving Horizon, not MW or
Striker.
Can't log in as labtestalex either.

Sat, Sep 21, 3:00 PM · Cloud-Services
Krenair added a comment to T233483: Login at labtestwikitech.wikimedia.org.

Labtestwikitech has a separate user database IIRC, don't recall if it's
open for public registration

Sat, Sep 21, 2:20 PM · Cloud-Services

Thu, Sep 19

Krenair awarded Blog Post: Wikipedia's JavaScript initialisation on a budget a Like token.
Thu, Sep 19, 9:36 PM
Krenair added a comment to T233281: Check/remove unused databases following labpuppetmaster deprecation.

Sounds like a good idea, let's do it.

Thu, Sep 19, 8:08 AM · DBA, Operations
Krenair added a comment to T233281: Check/remove unused databases following labpuppetmaster deprecation.

It's the labspuppet database, yes. Note that the toolforge project has its
own puppetmasters and only they were talking to the central puppetmaster.

Thu, Sep 19, 8:06 AM · DBA, Operations

Wed, Sep 18

Krenair awarded Blog Post: Cloud-vps Puppetmasters Moved to VMs, thanks to Krenair a Love token.
Wed, Sep 18, 9:58 PM
Krenair added a comment to T222820: Experiment with hosted kubernetes solutions for Beta.

I would like to say that if we are considering external clouds to integrate into deployment-prep we should ensure we have access to those sorted out for existing deployment-prep members and new ones going forward, before committing to anything. I don't want to end up in a situation where part of deployment-prep is only administer-able from inside the wikimedia.org google domain or something.

Wed, Sep 18, 9:55 PM · Release-Engineering-Team-TODO, Beta-Cluster-Infrastructure, Release Pipeline
Krenair added a comment to T231684: Request access to deployment-prep and beta-cluster logstash.

The reason beta-cluster Logstash no longer uses regular LDAP I believe has to do with privacy and sensitive nature of the data there. It is currently limited to Beta Cluster roots, and presumably requires an NDA on-file.

Wed, Sep 18, 9:50 PM · Beta-Cluster-Infrastructure
Krenair added a comment to T231684: Request access to deployment-prep and beta-cluster logstash.

help in reporting, fixing and cleaning up errors

Could you be a bit more specific, maybe? This feels a bit like "I want to help anywhere and everywhere" ticket (which is very kind)

I understand @Aklapper and you have a point there. The issue is though I may have access to beta-cluster LS etc, I'll not be helping everywhere or anywhere as I'm not that capable to understand all logs I see.
The issue is, I'll be making contributions from my end by tackling what I'm capable (for example on mediawiki/*, I merge only what I fully understand and have tested). It's something like that but let me try to be specific a bit.

  1. Access to beta-cluster Logstash - for viewing staged error logs and fix issues I find (within my reach) and fix them before they hit production.

I guess that's possible without doing anything specific - just create an account or share the shared account PWD with you. @Krenair, what do you think?

Wed, Sep 18, 9:28 PM · Beta-Cluster-Infrastructure
Krenair added a comment to T233176: ssl renewal: *.wmflabs.org expires 2019-11-16.

@Bstorm, it might need procurement if we've ruled out LE, but AFAIK that is still a perfectly valid option?

I would rather not mess with LE replacement of this relatively inexpensive cert near the expiration date. I'm very interested in using LE and acme-chief for toolforge.org when we introduce that into Toolforge (hopefully in the next 6 months). That seems like a better time to figure out all the things that would need to be done than now.

Wed, Sep 18, 8:38 PM · procurement, cloud-services-team (Kanban), Cloud-Services, Operations
Krenair added a comment to T232676: af-nb-db-2.automation-framework.eqiad.wmflabs has broken network.

@crusnov is this instance working at all? If not please could you try deleting it, and if needed, re-create?

Wed, Sep 18, 8:20 PM · Cloud-VPS
Krenair updated the task description for T232676: af-nb-db-2.automation-framework.eqiad.wmflabs has broken network.
Wed, Sep 18, 8:13 PM · Cloud-VPS
Krenair added a comment to T231616: Request access to Analytics cluster for Urbanecm.

The analytics team does not run a platform intended for wide community access, to do so we will need many times the resources we have in terms of people and infrastructure. Sorry this answer is disappointing but analytics serves the community by making publicly accessible as much data as we can about the movement (and this is a goal towards which we work every day). We do not provide a publicly accessible computation platform.

Wed, Sep 18, 7:15 PM · Patch-For-Review, Operations, SRE-Access-Requests
Krenair updated subscribers of T233176: ssl renewal: *.wmflabs.org expires 2019-11-16.
Wed, Sep 18, 6:31 PM · procurement, cloud-services-team (Kanban), Cloud-Services, Operations
Krenair added a comment to T233176: ssl renewal: *.wmflabs.org expires 2019-11-16.

@Bstorm, it might need procurement if we've ruled out LE, but AFAIK that is still a perfectly valid option?

Wed, Sep 18, 6:29 PM · procurement, cloud-services-team (Kanban), Cloud-Services, Operations
Krenair added a comment to T231616: Request access to Analytics cluster for Urbanecm.

Sorry this is disappointing but given our very limited resources we really cannot support ad-hoc data access for community members, the best way we have found to have a policy around granting access has to do with employment or active collaborations with research team.

Wed, Sep 18, 6:20 PM · Patch-For-Review, Operations, SRE-Access-Requests

Tue, Sep 17

Krenair added a comment to T233134: logstash-beta.wmflabs.org does not receive any mediawiki events.

In fact according to openstack-browser, logstash-beta.wmflabs.org still points at deployment-logstash2 and not deployment-logstash03 at all?

Tue, Sep 17, 11:07 PM · Release-Engineering-Team-TODO, observability, Wikimedia-Logstash, Beta-Cluster-Infrastructure
Krenair added a comment to T233134: logstash-beta.wmflabs.org does not receive any mediawiki events.

The details of where we got to with this logstash03 instance are in the thread from T218729#5153739, I'm slightly surprised this host is logstash-beta.wmflabs.org?

Tue, Sep 17, 11:05 PM · Release-Engineering-Team-TODO, observability, Wikimedia-Logstash, Beta-Cluster-Infrastructure
Krenair added a comment to T233158: clarification of cloud terms of use regarding LDAP servers.

I don't see how having read-only replicas changes the real problems involved, maybe we should just add the missing 's' to 'servers'? Labs instances should never *see* (i.e. process in any manner) a password that can log people into LDAP, particularly if that user has, or might ever have in future, privileged groups like nda, wmf, or ops. If people want to make separate test users that can't do anything, and process those credentials via a labs machine, just to test the configuration of an app that normally runs in production, that sounds like something we should consider granting an exemption for on a case-by-case basis?

Tue, Sep 17, 10:53 PM · cloud-services-team (Kanban), Documentation, Cloud-Services

Sun, Sep 15

Krenair added a comment to T232946: Google group for #Wikimedia-Site-requests.

That was my understanding too until T215940: Mailing list migration for Arbitration Committee to Google Group happened.

Sun, Sep 15, 1:09 PM · Office-IT

Sep 14 2019

Krenair added a comment to T232921: Have ORES-web[01,02] been removed?.

https://tools.wmflabs.org/openstack-browser/project/ores does not list them, but does list ores-web0[456], which suggests they were deleted.

Sep 14 2019, 7:47 PM · User-Zppix, VPS-project-icinga2, Scoring-platform-team
Krenair added a comment to T232538: Make the parsoid server on the beta cluster a mediawiki app server.

Fixed ferm on deployment-mediawiki-parsoid10 and deployment-mediawiki-jhuneidi by restarting those machines (they were stuck on a ferm error about /sbin/iptables-restore and /sbin/ip6tables-restore not working)
This seems to have had a positive impact on puppet on deployment-mediawiki-parsoid10 such as it installing some missing things like /run/hhvm and /tmp/heaps.

Sep 14 2019, 7:21 PM · Patch-For-Review, Beta-Cluster-Infrastructure, Core Platform Team Workboards (Purple), RESTBase, Parsoid-PHP