Page MenuHomePhabricator

Dzahn (Daniel Zahn)
SREAdministrator

Projects (28)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Sep 30 2014, 4:39 PM (403 w, 3 d)
Roles
Administrator
Availability
Available
IRC Nick
mutante
LDAP User
Dzahn
MediaWiki User
Unknown

Recent Activity

Yesterday

Dzahn added a comment to T311314: Shell access request for @demon.

Welcome back!

Fri, Jun 24, 10:50 PM · SRE, SRE-Access-Requests
Dzahn awarded T311314: Shell access request for @demon a Party Time token.
Fri, Jun 24, 7:44 PM · SRE, SRE-Access-Requests
Dzahn added a comment to T311133: Site: eqiad : 2 VMs requested for DSE Kubernetes Cluster control plane servers.

T311290 has been named as the reason for that issue with the cookbook. Should be fixed already.

Fri, Jun 24, 6:56 PM · DSE-Kubernetes-Cluster, vm-requests, Infrastructure-Foundations, SRE

Thu, Jun 23

Dzahn closed T230178: Install wrk, siege and lua-cjson packages on deploy1001, a subtask of T229697: Investigate Kask request latency, as Resolved.
Thu, Jun 23, 9:24 PM · User-Eevans, Platform Team Workboards (Clinic Duty Team), Platform Team Initiatives (Session Management Service (CDP2)), Performance-Team (Radar)
Dzahn closed T230178: Install wrk, siege and lua-cjson packages on deploy1001 as Resolved.
Thu, Jun 23, 9:24 PM · SRE, Platform Team Workboards (Green)
Dzahn added a comment to T230178: Install wrk, siege and lua-cjson packages on deploy1001.

Notice: /Stage[main]/Profile::Mediawiki::Deployment::Server/Package[siege]/ensure: removed
Notice: /Stage[main]/Profile::Mediawiki::Deployment::Server/Package[wrk]/ensure: removed
Notice: /Stage[main]/Profile::Mediawiki::Deployment::Server/Package[lua-cjson]/ensure: removed

Thu, Jun 23, 9:21 PM · SRE, Platform Team Workboards (Green)
Dzahn created P30044 (An Untitled Masterwork).
Thu, Jun 23, 8:54 PM
Dzahn added a comment to T311133: Site: eqiad : 2 VMs requested for DSE Kubernetes Cluster control plane servers.

@BTullis I tried to create one for you but the cookbook failed at the DNS update step:

Thu, Jun 23, 8:49 PM · DSE-Kubernetes-Cluster, vm-requests, Infrastructure-Foundations, SRE
Dzahn added a comment to T311133: Site: eqiad : 2 VMs requested for DSE Kubernetes Cluster control plane servers.
dzahn@cumin1001:~$ sudo cookbook sre.ganeti.makevm --vcpus 2 --memory 4 --disk 20 --network private eqiad_C dse-k8s-ctrl1001
Ready to create Ganeti VM dse-k8s-ctrl1001.eqiad.wmnet in the ganeti01.svc.eqiad.wmnet cluster on row C with 2 vCPUs, 4GB of RAM, 20GB of disk in the private network.
Thu, Jun 23, 8:31 PM · DSE-Kubernetes-Cluster, vm-requests, Infrastructure-Foundations, SRE
Dzahn created P30043 (An Untitled Masterwork).
Thu, Jun 23, 7:41 PM
Dzahn committed rLPRI2959dd131111: add missing fake keys for keyholder/trainbranchbot (authored by Dzahn).
add missing fake keys for keyholder/trainbranchbot
Thu, Jun 23, 7:31 PM
Dzahn added a comment to T310620: Requesting SSH keypair for deployment server keyholder to push to Gerrit.

fake secrets were needed to be able to puppet compile scap changes such as https://gerrit.wikimedia.org/r/c/operations/puppet/+/806397

Thu, Jun 23, 7:25 PM · serviceops, SRE
Dzahn added a comment to T230178: Install wrk, siege and lua-cjson packages on deploy1001.

@hashar here you go :) https://gerrit.wikimedia.org/r/808052

Thu, Jun 23, 6:54 PM · SRE, Platform Team Workboards (Green)
Dzahn reopened T230178: Install wrk, siege and lua-cjson packages on deploy1001, a subtask of T229697: Investigate Kask request latency, as Open.
Thu, Jun 23, 6:45 PM · User-Eevans, Platform Team Workboards (Clinic Duty Team), Platform Team Initiatives (Session Management Service (CDP2)), Performance-Team (Radar)
Dzahn reopened T230178: Install wrk, siege and lua-cjson packages on deploy1001 as "Open".
Thu, Jun 23, 6:45 PM · SRE, Platform Team Workboards (Green)
Dzahn added a comment to T311264: SSH on cp5012.mgmt is flapping (CRITICAL).

See T283582

Thu, Jun 23, 6:23 PM · SRE, ops-eqsin, Traffic
Dzahn added a comment to T311264: SSH on cp5012.mgmt is flapping (CRITICAL).

We have had the "mgmt flapping"-issue in other DCs. In codfw a bunch of them were fixed after Papaul did firmware upgrades on the DRACs.

Thu, Jun 23, 6:21 PM · SRE, ops-eqsin, Traffic
Dzahn added a comment to T311133: Site: eqiad : 2 VMs requested for DSE Kubernetes Cluster control plane servers.

Thank you @BTullis for all the details. Now I know what DSE means. If they doc could be public, even better. The project description at https://phabricator.wikimedia.org/project/profile/5959/ is also helpful though for the casual observer.

Thu, Jun 23, 6:02 PM · DSE-Kubernetes-Cluster, vm-requests, Infrastructure-Foundations, SRE
Dzahn added a comment to T302870: Grant cn=nda some sort of read only access to Netbox.

Thank you for the examples. That makes sense to me. Especially if Dell advises to keep them secret.

Thu, Jun 23, 5:57 PM · SRE, Infrastructure-Foundations, netbox

Wed, Jun 22

Dzahn added a comment to T310738: Setup redirect of policy.wikimedia.org to Advocacy portal on Foundation website.

We are "closing" this site on the VIP site. So, essentially whenever we want on our side - we can bring that back to our DNS setup...

Wed, Jun 22, 11:17 PM · Patch-For-Review, Traffic, wikimediafoundation.org, SRE, serviceops, DNS, WMF-Legal
Dzahn added a comment to T302870: Grant cn=nda some sort of read only access to Netbox.

Before we talk about technical implementation and putting this on ice. I am wondering..has anyone even had specific concerns or data fields in mind that should be hidden?

Wed, Jun 22, 10:23 PM · SRE, Infrastructure-Foundations, netbox
Dzahn added a comment to T311133: Site: eqiad : 2 VMs requested for DSE Kubernetes Cluster control plane servers.

fyi: The design document isn't accesible and from the tickets alone it's unclear what this is about.

Wed, Jun 22, 10:02 PM · DSE-Kubernetes-Cluster, vm-requests, Infrastructure-Foundations, SRE
Dzahn added a comment to T110203: migrate policy.wikimedia.org from WMF cluster to Wordpress.

In T310738 there is a request to revert this and move the domain back to WMF infra.

Wed, Jun 22, 9:47 PM · WMF-Annual-Report (Policy site), SRE

Tue, Jun 21

Dzahn added a comment to T310884: Add pcmwiki to wikistats.

@Zabe I see. thank you for that!

Tue, Jun 21, 9:21 PM · VPS-project-Wikistats
Dzahn added a comment to T310738: Setup redirect of policy.wikimedia.org to Advocacy portal on Foundation website.

@Dzahn - is that doable? I am not sure if we have redirected to web.archive.org before - although I think the 9/11 Wiki archive link does.

Tue, Jun 21, 9:18 PM · Patch-For-Review, Traffic, wikimediafoundation.org, SRE, serviceops, DNS, WMF-Legal
Dzahn added a comment to T310884: Add pcmwiki to wikistats.

P.S. What would be actually useful for me is if those only actually got created _after_ the wiki has been created. Or ideally if it was first "stalled" and the actual wiki creation changed that to "open". I guess. Currently I see them but still need to manually watch the "newprojects" list or so to know when they are _really_ ready to go.

Tue, Jun 21, 7:14 PM · VPS-project-Wikistats
Dzahn updated subscribers of T310884: Add pcmwiki to wikistats.

@Zabe Curious why those tickets start with a custom policy in the first place. Is that something we should try to change in the bot creating those?

Tue, Jun 21, 7:13 PM · VPS-project-Wikistats
Dzahn added a comment to T308271: Deploy buildkitd to trusted GitLab runners.

@dduvall Well, you could try to use it to build a docker image from a dockerfile. So far it's just "the buildkitd service is running" and follow-ups are about making sure it survives reboots and next time we setup a gitlab-runner it works more automatically. Don't worry about part. But I don't think anyone has actually let it build an image yet.

Tue, Jun 21, 5:33 PM · Patch-For-Review, User-brennen, GitLab (CI & Job Runners), Release-Engineering-Team (GitLab-a-thon 🦊)

Sat, Jun 18

Aklapper awarded T122144: Move most (all?) exim personal aliases to WMF ITS a Yellow Medal token.
Sat, Jun 18, 10:31 AM · Infrastructure-Foundations, Epic, Mail, SRE

Fri, Jun 17

Dzahn moved T309375: Requesting access to contint-admins for taavi from Untriaged to Ready To Go on the SRE-Access-Requests board.
Fri, Jun 17, 11:33 PM · SRE, SRE-Access-Requests
Dzahn updated the task description for T309375: Requesting access to contint-admins for taavi.
Fri, Jun 17, 11:33 PM · SRE, SRE-Access-Requests
Dzahn added a comment to T122144: Move most (all?) exim personal aliases to WMF ITS.

The remaining SRE aliases in the file can now be separated into:

Fri, Jun 17, 10:18 PM · Infrastructure-Foundations, Epic, Mail, SRE
Dzahn closed T122144: Move most (all?) exim personal aliases to WMF ITS as Resolved.
Fri, Jun 17, 10:15 PM · Infrastructure-Foundations, Epic, Mail, SRE
Dzahn added a comment to T122144: Move most (all?) exim personal aliases to WMF ITS.
  • deleted store@ and merchandise@ after they were created in Google- coordinated with Brendan of ITS and Sandra Hust, store manager
Fri, Jun 17, 10:15 PM · Infrastructure-Foundations, Epic, Mail, SRE

Thu, Jun 16

Dzahn added a comment to T310777: Create Wikipedia Pa'O.

added to DNS

Thu, Jun 16, 10:59 PM · MW-1.39-notes (1.39.0-wmf.17; 2022-06-20), Wiki-Setup (Create), User-Urbanecm
Dzahn added a comment to T310776: Create Wikipedia Nigerian Pidgin.

added to DNS:

Thu, Jun 16, 10:59 PM · MW-1.39-notes (1.39.0-wmf.17; 2022-06-20), User-Urbanecm, Wiki-Setup (Create)
Dzahn added a comment to T310831: DNS cookbook failed syncing with netbox - 403 from netbox1002.

Thank you for the very quick response!

Thu, Jun 16, 10:50 PM · netbox, Infrastructure-Foundations, SRE
Dzahn lowered the priority of T308271: Deploy buildkitd to trusted GitLab runners from High to Medium.

It's deployed but we have some follow-ups. I guess lowering the prio a bit is appropriate for this state.

Thu, Jun 16, 10:08 PM · Patch-For-Review, User-brennen, GitLab (CI & Job Runners), Release-Engineering-Team (GitLab-a-thon 🦊)
Dzahn added a comment to T308271: Deploy buildkitd to trusted GitLab runners.

buildkitd is now running on all (6) gitlab-runners. It's 6 because the VMs 1001 and 2001 have been decom'ed earlier today and then there are 3 physical hosts per DC.

Thu, Jun 16, 9:50 PM · Patch-For-Review, User-brennen, GitLab (CI & Job Runners), Release-Engineering-Team (GitLab-a-thon 🦊)
Dzahn added a comment to T308271: Deploy buildkitd to trusted GitLab runners.

https://gerrit.wikimedia.org/r/c/operations/puppet/+/806250

Thu, Jun 16, 8:53 PM · Patch-For-Review, User-brennen, GitLab (CI & Job Runners), Release-Engineering-Team (GitLab-a-thon 🦊)
Dzahn added a comment to T310831: DNS cookbook failed syncing with netbox - 403 from netbox1002.

The run of the decom book was at:

Thu, Jun 16, 7:38 PM · netbox, Infrastructure-Foundations, SRE
Dzahn added a comment to T310831: DNS cookbook failed syncing with netbox - 403 from netbox1002.

After this I ran only the DNS cookbook directly and this time it finished without such an error. I am not sure if it tried though because it said "nothing to sync".

Thu, Jun 16, 7:23 PM · netbox, Infrastructure-Foundations, SRE
Dzahn added a project to T310831: DNS cookbook failed syncing with netbox - 403 from netbox1002: netbox.
Thu, Jun 16, 7:11 PM · netbox, Infrastructure-Foundations, SRE
Dzahn updated the task description for T310831: DNS cookbook failed syncing with netbox - 403 from netbox1002.
Thu, Jun 16, 7:11 PM · netbox, Infrastructure-Foundations, SRE
Dzahn created T310831: DNS cookbook failed syncing with netbox - 403 from netbox1002.
Thu, Jun 16, 7:08 PM · netbox, Infrastructure-Foundations, SRE
Dzahn updated the task description for T307142: bring new gitlab hardware servers into production.
Thu, Jun 16, 6:51 PM · Patch-For-Review, GitLab (Infrastructure), serviceops
Dzahn awarded T310742: Editing tasks results in "You cannot add more than 0 objects to the relationship" error a Mountain of Wealth token.
Thu, Jun 16, 6:18 PM · Regression, Release-Engineering-Team, Phabricator

Wed, Jun 15

Dzahn added a comment to T310738: Setup redirect of policy.wikimedia.org to Advocacy portal on Foundation website.

There are incoming redirects into policy.wikimedia.org:

Wed, Jun 15, 7:28 PM · Patch-For-Review, Traffic, wikimediafoundation.org, SRE, serviceops, DNS, WMF-Legal
Dzahn added a comment to T132104: Consider moving policy.wikimedia.org away from WordPress.com .

Looks like T310738 would make this obsolete.

Wed, Jun 15, 7:23 PM · Patch-For-Review, Privacy Engineering, WMF-Legal, SRE, Privacy
Dzahn added a comment to T310738: Setup redirect of policy.wikimedia.org to Advocacy portal on Foundation website.

just a note for serviceops: policy.wikimedia.org is not currently under the control of SRE/prod servers at WMF. It's hosted at Wordpress VIP.

Wed, Jun 15, 7:20 PM · Patch-For-Review, Traffic, wikimediafoundation.org, SRE, serviceops, DNS, WMF-Legal
Dzahn added a comment to T122144: Move most (all?) exim personal aliases to WMF ITS.
  • deleted aql-sms@ not needed anymore
Wed, Jun 15, 6:26 PM · Infrastructure-Foundations, Epic, Mail, SRE
Dzahn added a comment to T122144: Move most (all?) exim personal aliases to WMF ITS.
  • deleted order@, orders@, return@ and returns@ after Sandra Hust, manager of store.wikimedia.org confirmed they aren’t public knowledge on the store page and wasn't even aware of them. they only use merchandise@ and store@ which both go to a single zendesk email. So first simplify and then move the remaining redirects to ITS (in progress)
Wed, Jun 15, 5:37 PM · Infrastructure-Foundations, Epic, Mail, SRE

Tue, Jun 14

Dzahn added a comment to T122144: Move most (all?) exim personal aliases to WMF ITS.

there is always moar:)

Tue, Jun 14, 10:44 PM · Infrastructure-Foundations, Epic, Mail, SRE
Dzahn added a comment to T122144: Move most (all?) exim personal aliases to WMF ITS.

The other day I have deleted cpt-leads@ (after Tim told me it's ok and not used anymore since a while) and techcom@ (after asking ITS to create it on the Google side and agreeing with Timo that he is the new admin of that google group).

Tue, Jun 14, 9:04 PM · Infrastructure-Foundations, Epic, Mail, SRE

Mon, Jun 13

Dzahn added a comment to T310303: pontoon.traffic.eqiad1.wikimedia.cloud unable to run puppet agent due to certificate mismatch.

I also saw certificate errors pop up in a different project that uses a local puppetmaster. And we felt like we had not touched anything. Did not get to look yet but this seemed similar enoigh and I was already suspecting some change related to self-puppetmaster.

Mon, Jun 13, 10:37 PM · SRE, Traffic
Dzahn added a comment to T310385: Grant Access to wmf for Xcollazo.

done! added @XCollazo-WMF to https://phabricator.wikimedia.org/tag/wmf-nda/

Mon, Jun 13, 10:33 PM · SRE, LDAP-Access-Requests
Dzahn added a member for WMF-NDA: XCollazo-WMF.
Mon, Jun 13, 10:33 PM
Dzahn added a comment to T310555: Requesting access to Analytics for xcollazo.

Confirming @XCollazo-WMF exists and was introduced in SRE meeting today :) welcome to WMF. Confirmed signature and checked all other boxes. Just one is open for clinic duty.

Mon, Jun 13, 10:32 PM · SRE, SRE-Access-Requests
Dzahn updated the task description for T310555: Requesting access to Analytics for xcollazo.
Mon, Jun 13, 10:31 PM · SRE, SRE-Access-Requests
Dzahn updated the task description for T310555: Requesting access to Analytics for xcollazo.
Mon, Jun 13, 10:30 PM · SRE, SRE-Access-Requests
Dzahn added a comment to T309957: (Need By:TBD) rack/setup/install row A new PDUs.

ah ACK, ok, in that case we will just move forward as planned. Thanks Papaul

Mon, Jun 13, 9:22 PM · SRE, ops-codfw
Dzahn added a comment to T309957: (Need By:TBD) rack/setup/install row A new PDUs.

A1: serviceops: gitlab2002 is still in state "in setup". While we were going to change that we will hold back until this is done.

Mon, Jun 13, 4:24 PM · SRE, ops-codfw
Dzahn added a comment to T310455: thumbor2004 is down.

unfortunately this is purchase date 2016-12-12 .. so ...probably can't get it fixed

Mon, Jun 13, 4:00 PM · ops-codfw, Thumbor, SRE
Dzahn added a comment to T310455: thumbor2004 is down.

/admin1-> racadm serveraction powercycle

Mon, Jun 13, 4:51 AM · ops-codfw, Thumbor, SRE
Dzahn updated the task description for T310455: thumbor2004 is down.
Mon, Jun 13, 4:46 AM · ops-codfw, Thumbor, SRE
Dzahn added projects to T310455: thumbor2004 is down: Thumbor, ops-codfw.
Mon, Jun 13, 4:37 AM · ops-codfw, Thumbor, SRE
Dzahn updated the task description for T310455: thumbor2004 is down.
Mon, Jun 13, 4:35 AM · ops-codfw, Thumbor, SRE
Dzahn added a comment to T310455: thumbor2004 is down.

04:32 <+logmsgbot> !log dzahn@cumin2002 conftool action : set/pooled=no; selector: dc=codfw,name=thumbor2004.codfw.wmnet

Mon, Jun 13, 4:35 AM · ops-codfw, Thumbor, SRE
Dzahn created T310455: thumbor2004 is down.
Mon, Jun 13, 4:34 AM · ops-codfw, Thumbor, SRE

Sun, Jun 12

Dzahn removed a member for acl*security_team: Dsharpe.
Sun, Jun 12, 3:45 AM

Fri, Jun 10

Dzahn added a comment to T309648: Restore lost index in cloudelastic.

I ACKed the Icinga alerts with a link to this so they are not in "unhandled CRIT" anymore.

Fri, Jun 10, 9:47 PM · Patch-For-Review, Discovery-Search (Current work)
Dzahn merged task T310400: cloudelastic1001 through cloudelastic1006: CRITICAL - unassigned shard / commonswiki_file into T309648: Restore lost index in cloudelastic.
Fri, Jun 10, 9:46 PM · SRE
Dzahn merged T310400: cloudelastic1001 through cloudelastic1006: CRITICAL - unassigned shard / commonswiki_file into T309648: Restore lost index in cloudelastic.
Fri, Jun 10, 9:45 PM · Patch-For-Review, Discovery-Search (Current work)
Dzahn renamed T310400: cloudelastic1001 through cloudelastic1006: CRITICAL - unassigned shard / commonswiki_file from cloudelastic1001 through cloudelastic1006: CRITICAL - commonswiki_file to cloudelastic1001 through cloudelastic1006: CRITICAL - unassigned shard / commonswiki_file.
Fri, Jun 10, 9:45 PM · SRE
Dzahn updated the task description for T310400: cloudelastic1001 through cloudelastic1006: CRITICAL - unassigned shard / commonswiki_file.
Fri, Jun 10, 9:44 PM · SRE
Dzahn added a project to T310400: cloudelastic1001 through cloudelastic1006: CRITICAL - unassigned shard / commonswiki_file: SRE.
Fri, Jun 10, 9:43 PM · SRE
Dzahn created T310400: cloudelastic1001 through cloudelastic1006: CRITICAL - unassigned shard / commonswiki_file.
Fri, Jun 10, 9:42 PM · SRE
Dzahn added a comment to T293942: refactor OTRS role/module/cumin aliases.

@Arnoldokoth The last change we had uploaded in our meeting the other day is now merged. I would say we can call this resolved and close the ticket (but also create a new one for migration to bullseye sometime in the future.. which mentions that we should ._then_ also do the remaining rename of:

Fri, Jun 10, 5:23 PM · Patch-For-Review, Znuny, serviceops, SRE

Thu, Jun 9

Dzahn changed the status of T122144: Move most (all?) exim personal aliases to WMF ITS from Open to In Progress.
Thu, Jun 9, 9:50 PM · Infrastructure-Foundations, Epic, Mail, SRE
Dzahn added a comment to T122144: Move most (all?) exim personal aliases to WMF ITS.

I talked with Jesse about all this. We agreed I will follow-up about the last few things, you Faidon, also mentioned in our mail. cpt-leads@, techchom@ and the remaining fr-tech ones. I just sent mails about these. Then after that is done I'll close this ticket as resolved and tell ITS that everything related to wikiPedia.org (ongoing discussion about dropping things like jimmy@, personal aliases in wikiPedia.org etc) should be seen as a separate task and I will hand that over.

Thu, Jun 9, 9:50 PM · Infrastructure-Foundations, Epic, Mail, SRE
Dzahn changed the status of T122144: Move most (all?) exim personal aliases to WMF ITS from Stalled to Open.
Thu, Jun 9, 9:34 PM · Infrastructure-Foundations, Epic, Mail, SRE
Dzahn added a comment to T247653: replace doc1001.eqiad.wmnet with a buster VM and create the codfw equivalent.

@Krinkle Yep, that summary sounds right to me. That's what we had in mind. It's just that some time ago you had said it's not ready yet to be switched on that change https://gerrit.wikimedia.org/r/c/operations/dns/+/650625/. I don't recall what the specific reasons were for it not being ready. But if there is no concern anymore now then this should be ready to go anytime. Feel free to invite me via calendar to make this happen. I can deal with reasonably early time in my timezone.

Thu, Jun 9, 7:46 PM · Release-Engineering-Team (Seen), Patch-For-Review, Continuous-Integration-Infrastructure, serviceops, SRE
Dzahn added a comment to T310265: Reduce usage of public IPv4 addresses on GitLab hosts.

First and foremost though, the reason why gitlab has all public IPs is because we were trying to emulate the gerrit setup. And gerrit has public IPs and is not behind LVS because we wanted it that way. We wanted to be able to still use Gerrit and merge changes even if the caching layer is down for some reason. For the same reason icinga has a public IP. Certain services were not supposed to rely on loadbalancers.

Thu, Jun 9, 7:23 PM · GitLab (Infrastructure), serviceops
Dzahn added a comment to T310265: Reduce usage of public IPv4 addresses on GitLab hosts.

moving gitlab1001.wikimedia.org to gitlab1001.eqiad.wmnet

Thu, Jun 9, 7:16 PM · GitLab (Infrastructure), serviceops

Wed, Jun 8

Dzahn added a comment to T310238: Create new GitLab project group: <name>.

@Sabrecalyx If this is a legit request, please replace <groupname> with the actual group name requested and fill out the rationale section.

Wed, Jun 8, 11:16 PM · Trash
Dzahn added a comment to T310225: mw1415 fatals due to serving responses from 1.39.0-wmf.10 (was DBQueryError: Unknown column page_restrictions).

What happened here is:

Wed, Jun 8, 9:50 PM · serviceops, Deployments, Wikimedia-production-error
Dzahn added a comment to T310225: mw1415 fatals due to serving responses from 1.39.0-wmf.10 (was DBQueryError: Unknown column page_restrictions).

mw1415 does not serve 500s anymore. T307755#7990623

Wed, Jun 8, 9:46 PM · serviceops, Deployments, Wikimedia-production-error
Dzahn closed T307755: mw1415 (canary appserver) is down, incl. mgmt as Resolved.
Wed, Jun 8, 9:46 PM · ops-eqiad, serviceops, SRE
Dzahn added a comment to T307755: mw1415 (canary appserver) is down, incl. mgmt.

This caused T310225 because setting it to pooled=inactive does not mean monitoring will stop checking it and when this came back unexpectedly it caused new alerts for 500s on this box, which had not received scap updates. But setting it to pooled=no would have meant deployers would have gotten warnings about an unreachable host for a month. The deeper issue is there is no right status to set hosts to while they are waiting for hardware repair.

Wed, Jun 8, 9:45 PM · ops-eqiad, serviceops, SRE
Dzahn added a comment to T307755: mw1415 (canary appserver) is down, incl. mgmt.

21:13 < mutante> !log mw1415 - scap pull, restart apache, /usr/local/sbin/restart-php7.2-fpm (INFO: The server is depooled from all services. Restarting the service directly)

Wed, Jun 8, 9:15 PM · ops-eqiad, serviceops, SRE
Dzahn closed T185644: Switch phabricator from using apache to nginx as Declined.

something between resolved and declined. please feel free to reopen though if you feel differently about it.

Wed, Jun 8, 3:38 PM · serviceops-radar, SRE, Phabricator
Dzahn added a comment to T297411: Migrate gitlab-test instance to puppet.

Does this only affect this instance or maybe all users who have a local puppetmaster in their VPS project? It seems like we haven't touched anything and it was working before and the error makes me think something changed somewhere upstream or alternatively..someone tried to switch between local project puppetmaster and regular global puppet master.

Wed, Jun 8, 3:18 PM · Patch-For-Review, serviceops, GitLab (Infrastructure)

Mon, Jun 6

Dzahn claimed T305979: allow certain users to disable puppet on mwdebug hosts.

ok, thank you IF team! assigning back to me for the moment to follow-up. Yes, there was a specific person. I will readd this with a specific group after discussion.

Mon, Jun 6, 10:24 PM · Infrastructure-Foundations, serviceops, SRE
Dzahn changed the status of T308952: get a legend for haproxy "anomalous session termination states" from In Progress to Open.
Mon, Jun 6, 7:37 PM · SRE, Sustainability (Incident Followup)
Dzahn created P29443 ruby gems.
Mon, Jun 6, 7:33 PM
Dzahn added a comment to T308952: get a legend for haproxy "anomalous session termination states" .

This is a better link since it's directly upstream and latest docs from 2022:

Mon, Jun 6, 7:23 PM · SRE, Sustainability (Incident Followup)
Dzahn changed the status of T307755: mw1415 (canary appserver) is down, incl. mgmt from Open to In Progress.
Mon, Jun 6, 6:58 PM · ops-eqiad, serviceops, SRE
Dzahn added a comment to T307755: mw1415 (canary appserver) is down, incl. mgmt.

@Cmjohnson Alright, gotcha! Thanks for the updates and Dell request.

Mon, Jun 6, 6:58 PM · ops-eqiad, serviceops, SRE

Fri, Jun 3

Dzahn added a comment to T308013: Assign SPDX headers to puppet.git.

bundle exec rake 'spdx:convert:module[MODULENAME]'

Fri, Jun 3, 8:20 PM · Patch-For-Review, Infrastructure-Foundations, SRE
Dzahn created T309886: an-tool1005 - memcached Connection refused.
Fri, Jun 3, 6:55 PM · SRE