Page MenuHomePhabricator

Dzahn (Daniel Zahn)
Operations EngineerAdministrator

Projects (18)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Sep 30 2014, 4:39 PM (263 w, 1 d)
Roles
Administrator
Availability
Available
IRC Nick
mutante
LDAP User
Dzahn
MediaWiki User
Unknown

Recent Activity

Yesterday

Dzahn lowered the priority of T230245: Mediawiki maintenance job "generate-fancycaptcha" - fatal error when trying to copy new captchas to storage from High to Normal.

Workaround merged and deployed on prod maintenance server(s). Lowering priority from High to Normal because we have new captchas and a workaround for now.

Wed, Oct 16, 10:59 PM · Performance-Team, Operations, Core Platform Team, media-storage, Editing-team, Wikimedia-production-error, ConfirmEdit (CAPTCHA extension)
Dzahn added a comment to T230245: Mediawiki maintenance job "generate-fancycaptcha" - fatal error when trying to copy new captchas to storage.

I manually ran the script by @Reedy (thanks!) from https://gerrit.wikimedia.org/r/c/operations/puppet/+/543707/2/modules/mediawiki/files/captchaloop
on mwmaint1002.

Wed, Oct 16, 10:45 PM · Performance-Team, Operations, Core Platform Team, media-storage, Editing-team, Wikimedia-production-error, ConfirmEdit (CAPTCHA extension)
sbassett awarded T230245: Mediawiki maintenance job "generate-fancycaptcha" - fatal error when trying to copy new captchas to storage a The World Burns token.
Wed, Oct 16, 9:09 PM · Performance-Team, Operations, Core Platform Team, media-storage, Editing-team, Wikimedia-production-error, ConfirmEdit (CAPTCHA extension)
Dzahn updated the task description for T235215: Onboarding Reuven Lazarus.
Wed, Oct 16, 7:06 PM · LDAP-Access-Requests, SRE-Access-Requests, Operations
Dzahn added a comment to T235677: Automatic pickup of Gerrit clone master doesn't happen.

The changes made in T235013 added a requirement to have git-lfs installed and use a different command to pull data.

Wed, Oct 16, 6:39 PM · Gerrit, Release-Engineering-Team, Operations, Wikimedia Design Style Guide
Dzahn added a project to T235013: Use `git lfs` for large binary files of Design Style Guide: Release-Engineering-Team.
Wed, Oct 16, 5:49 PM · Release-Engineering-Team, Patch-For-Review, User-Ladsgroup, Wikimedia Design Style Guide
Dzahn added a comment to T235013: Use `git lfs` for large binary files of Design Style Guide.

@Ladsgroup git-lfs is not installed on the prod servers cloning from this and the puppet git:::clone class also does not support changing the command yet. So this breaks cloning on the prod servers.

Wed, Oct 16, 5:29 PM · Release-Engineering-Team, Patch-For-Review, User-Ladsgroup, Wikimedia Design Style Guide
Dzahn added a project to T235677: Automatic pickup of Gerrit clone master doesn't happen: Gerrit.
Wed, Oct 16, 5:11 PM · Gerrit, Release-Engineering-Team, Operations, Wikimedia Design Style Guide
Dzahn updated the task description for T233654: Make the parsoid cluster support parsoid/PHP.
Wed, Oct 16, 3:02 PM · Patch-For-Review, Operations, serviceops
Dzahn updated the task description for T233654: Make the parsoid cluster support parsoid/PHP.
Wed, Oct 16, 2:48 PM · Patch-For-Review, Operations, serviceops

Tue, Oct 15

Dzahn updated the task description for T233654: Make the parsoid cluster support parsoid/PHP.
Tue, Oct 15, 11:35 PM · Patch-For-Review, Operations, serviceops
Dzahn updated the task description for T235215: Onboarding Reuven Lazarus.
Tue, Oct 15, 10:10 PM · LDAP-Access-Requests, SRE-Access-Requests, Operations
Dzahn changed the status of T86541: setup wifi in codfw from Resolved to Declined.
Tue, Oct 15, 8:18 PM · DC-Ops, Operations, ops-codfw, netops
Dzahn updated the task description for T235215: Onboarding Reuven Lazarus.
Tue, Oct 15, 7:50 PM · LDAP-Access-Requests, SRE-Access-Requests, Operations
Dzahn updated the task description for T235215: Onboarding Reuven Lazarus.
Tue, Oct 15, 7:19 PM · LDAP-Access-Requests, SRE-Access-Requests, Operations
Dzahn updated the task description for T235215: Onboarding Reuven Lazarus.
Tue, Oct 15, 6:50 PM · LDAP-Access-Requests, SRE-Access-Requests, Operations
Dzahn added a comment to T235215: Onboarding Reuven Lazarus.
  • added to special groups in Phabricator to see private tickets (acl*SRE and WMF/NDA)
Tue, Oct 15, 6:21 PM · LDAP-Access-Requests, SRE-Access-Requests, Operations
Dzahn updated the task description for T235215: Onboarding Reuven Lazarus.
Tue, Oct 15, 6:20 PM · LDAP-Access-Requests, SRE-Access-Requests, Operations
Dzahn added a member for WMF-NDA: RLazarus.
Tue, Oct 15, 6:20 PM
Dzahn added a member for acl*sre-team: RLazarus.
Tue, Oct 15, 6:19 PM
Dzahn added a comment to T235215: Onboarding Reuven Lazarus.

Very nice. Welcome @RLazarus! I'll upload a change to code review to create your shell account. Could you create a SSH key pair and paste the public part here on ticket? Also feel free to come to IRC and ping so we can add you to some public and private channels. Cheers, Daniel

Tue, Oct 15, 4:42 PM · LDAP-Access-Requests, SRE-Access-Requests, Operations
CDanis awarded T235215: Onboarding Reuven Lazarus a Like token.
Tue, Oct 15, 4:26 PM · LDAP-Access-Requests, SRE-Access-Requests, Operations

Mon, Oct 14

Dzahn added a comment to T235215: Onboarding Reuven Lazarus.
  • added to maint-announce shared inbox / Google group
  • added to "Ops vendor maintenance" calendar and permissions
Mon, Oct 14, 6:42 PM · LDAP-Access-Requests, SRE-Access-Requests, Operations
Dzahn updated the task description for T235215: Onboarding Reuven Lazarus.
Mon, Oct 14, 6:41 PM · LDAP-Access-Requests, SRE-Access-Requests, Operations
Dzahn updated the task description for T235215: Onboarding Reuven Lazarus.
Mon, Oct 14, 6:36 PM · LDAP-Access-Requests, SRE-Access-Requests, Operations
Dzahn updated the task description for T235215: Onboarding Reuven Lazarus.
Mon, Oct 14, 6:31 PM · LDAP-Access-Requests, SRE-Access-Requests, Operations
Dzahn added a comment to T235215: Onboarding Reuven Lazarus.

Hello Reuven and welcome to the team!

Mon, Oct 14, 6:29 PM · LDAP-Access-Requests, SRE-Access-Requests, Operations
Dzahn updated the task description for T235215: Onboarding Reuven Lazarus.
Mon, Oct 14, 6:24 PM · LDAP-Access-Requests, SRE-Access-Requests, Operations
Dzahn closed T234775: Add banwiki to wikistats, a subtask of T234768: Create Balinese Wikipedia, as Resolved.
Mon, Oct 14, 6:19 PM · MW-1.35-notes (1.35.0-wmf.2; 2019-10-15), Patch-For-Review, User-Ladsgroup, Wiki-Setup (Create), User-Urbanecm
Dzahn closed T234775: Add banwiki to wikistats as Resolved.

added as 306th Wikipedia

Mon, Oct 14, 6:19 PM · VPS-project-Wikistats
Dzahn added a comment to T235425: webperf*002 running out of disk space (arc lamp, xhgui).

sorry, i was on 1001 and 2001 vs. 1002 and 2002 and was wondering why i don't even see /srv mounted on a separate device. yes, ACK. on 1002 / 2002 it's the xenon logs.

Mon, Oct 14, 5:53 PM · Arc-Lamp, Patch-For-Review, serviceops, Operations, Performance-Team
Dzahn added a comment to T235425: webperf*002 running out of disk space (arc lamp, xhgui).

looking at them now i see they are only using 14% and 8% of / . I ran "apt-get clean" and now it's down to 12% and 6%. Alerting would be at 95% by default. So looks like somebody (or something like a cron?) already deleted stuff.

Mon, Oct 14, 5:41 PM · Arc-Lamp, Patch-For-Review, serviceops, Operations, Performance-Team

Sat, Oct 12

Dzahn added a comment to T233654: Make the parsoid cluster support parsoid/PHP.

@mobrovac Yes, i agree. Making 2 new LVS and DNS services, one parsoid-php and one parsoid-js and then switching first from old parsoid to parsoid-js seems like the best plan to solve the conflict. My latest patch is the attempt to add that config for a new parsoid-php service so i could more or less copy that to make parsoid-js first. ACK.

Sat, Oct 12, 2:18 PM · Patch-For-Review, Operations, serviceops

Fri, Oct 11

Dzahn created P9316 devtools cloud instance list.
Fri, Oct 11, 8:10 PM
Dzahn added a comment to T235215: Onboarding Reuven Lazarus.

OIT reports E-mail account has been created. We can start now with some of these.

Fri, Oct 11, 7:50 PM · LDAP-Access-Requests, SRE-Access-Requests, Operations
Dzahn closed T232961: Creation of Wikispore mailing list as Resolved.

List has been created

Fri, Oct 11, 7:18 PM · Wikispore, Wikimedia-Mailing-lists, Operations
Dzahn assigned T235291: Please create engprod-mgt@ mailing list to greg.
Fri, Oct 11, 7:11 PM · User-greg, Operations, Wikimedia-Mailing-lists
Dzahn added a comment to T235291: Please create engprod-mgt@ mailing list.

@greg List created. I let it created a random pass, then added the secondary admins and ran a "reset password" command.

Fri, Oct 11, 7:07 PM · User-greg, Operations, Wikimedia-Mailing-lists
Dzahn added a comment to T191956: Document how to fix IPMI issues on Wikitech .

redirected to Management Interfaces

Fri, Oct 11, 6:36 PM · Operations, Documentation
Dzahn updated subscribers of T191956: Document how to fix IPMI issues on Wikitech .

Wikitech has the following list of IPMI related pages:

..

Fri, Oct 11, 6:11 PM · Operations, Documentation
Dzahn updated subscribers of T191956: Document how to fix IPMI issues on Wikitech .

@RobH there is a wikitech page you made back in 2012 about the ipmi_mgmt script at https://wikitech.wikimedia.org/wiki/Systems_management.

Fri, Oct 11, 5:50 PM · Operations, Documentation
Dzahn renamed T234698: ms-be1020 - firmware upgrade: (was: host went down) from ms-be1020 - host went down to ms-be1020 - firmware upgrade: (was: host went down).
Fri, Oct 11, 5:31 PM · ops-eqiad, User-fgiunchedi, media-storage, Operations
Dzahn added a project to T234698: ms-be1020 - firmware upgrade: (was: host went down): ops-eqiad.
Fri, Oct 11, 5:31 PM · ops-eqiad, User-fgiunchedi, media-storage, Operations
Dzahn triaged T234768: Create Balinese Wikipedia as High priority.
Fri, Oct 11, 5:28 PM · MW-1.35-notes (1.35.0-wmf.2; 2019-10-15), Patch-For-Review, User-Ladsgroup, Wiki-Setup (Create), User-Urbanecm
Dzahn awarded T234768: Create Balinese Wikipedia a Pterodactyl token.
Fri, Oct 11, 5:28 PM · MW-1.35-notes (1.35.0-wmf.2; 2019-10-15), Patch-For-Review, User-Ladsgroup, Wiki-Setup (Create), User-Urbanecm
Dzahn added a comment to T191956: Document how to fix IPMI issues on Wikitech .

see https://wikitech.wikimedia.org/wiki/Management_Interfaces

Fri, Oct 11, 5:01 AM · Operations, Documentation
Dzahn awarded T234890: Upgrade OTRS to 5.0.38 a Orange Medal token.
Fri, Oct 11, 4:01 AM · serviceops, OTRS, Security
Dzahn updated the task description for T221244: decommission astatine.
Fri, Oct 11, 2:37 AM · ops-eqiad, DC-Ops, decommission, Operations
Dzahn added a comment to T221244: decommission astatine.

The box for production DNS removed is checked but looking at DNS repo it's still there:

Fri, Oct 11, 2:36 AM · ops-eqiad, DC-Ops, decommission, Operations
Dzahn placed T235234: fix IPMI over LAN on certain HP hosts up for grabs.
Fri, Oct 11, 2:17 AM · DC-Ops, Operations
Dzahn updated the task description for T235234: fix IPMI over LAN on certain HP hosts.
Fri, Oct 11, 2:02 AM · DC-Ops, Operations
Dzahn assigned T235234: fix IPMI over LAN on certain HP hosts to Papaul.

assigning to Papaul per IRC chat (thanks!)

Fri, Oct 11, 12:39 AM · DC-Ops, Operations
Dzahn added a parent task for T235234: fix IPMI over LAN on certain HP hosts: T193155: IPMI Audit 2018-04.
Fri, Oct 11, 12:38 AM · DC-Ops, Operations
Dzahn added a subtask for T193155: IPMI Audit 2018-04: T235234: fix IPMI over LAN on certain HP hosts.
Fri, Oct 11, 12:38 AM · Operations
Dzahn added a subtask for T150160: Remote IPMI doesn't work for ~2% of the fleet: T235234: fix IPMI over LAN on certain HP hosts.
Fri, Oct 11, 12:38 AM · observability, Operations
Dzahn added parent tasks for T235234: fix IPMI over LAN on certain HP hosts: T150160: Remote IPMI doesn't work for ~2% of the fleet, T191956: Document how to fix IPMI issues on Wikitech .
Fri, Oct 11, 12:38 AM · DC-Ops, Operations
Dzahn added a subtask for T191956: Document how to fix IPMI issues on Wikitech : T235234: fix IPMI over LAN on certain HP hosts.
Fri, Oct 11, 12:38 AM · Operations, Documentation
Dzahn updated subscribers of T235234: fix IPMI over LAN on certain HP hosts.
Fri, Oct 11, 12:37 AM · DC-Ops, Operations
Dzahn updated subscribers of T235234: fix IPMI over LAN on certain HP hosts.

codfw db hosts - fixed

Fri, Oct 11, 12:35 AM · DC-Ops, Operations
Dzahn updated the task description for T235234: fix IPMI over LAN on certain HP hosts.
Fri, Oct 11, 12:30 AM · DC-Ops, Operations
Dzahn updated subscribers of T235234: fix IPMI over LAN on certain HP hosts.
Fri, Oct 11, 12:28 AM · DC-Ops, Operations
Dzahn updated the task description for T235234: fix IPMI over LAN on certain HP hosts.
Fri, Oct 11, 12:16 AM · DC-Ops, Operations
Dzahn added a parent task for T235234: fix IPMI over LAN on certain HP hosts: Unknown Object (Task).
Fri, Oct 11, 12:13 AM · DC-Ops, Operations
Dzahn created T235234: fix IPMI over LAN on certain HP hosts.
Fri, Oct 11, 12:12 AM · DC-Ops, Operations

Thu, Oct 10

Dzahn added projects to T235215: Onboarding Reuven Lazarus: SRE-Access-Requests, LDAP-Access-Requests.
Thu, Oct 10, 10:17 PM · LDAP-Access-Requests, SRE-Access-Requests, Operations
Dzahn triaged T235215: Onboarding Reuven Lazarus as High priority.
Thu, Oct 10, 10:17 PM · LDAP-Access-Requests, SRE-Access-Requests, Operations
Dzahn renamed T235215: Onboarding Reuven Lazarus from Onboarding Reuven to Onboarding Reuven Lazarus.
Thu, Oct 10, 10:17 PM · LDAP-Access-Requests, SRE-Access-Requests, Operations
Dzahn added a comment to T235140: package wikimedia-lvs-realserver for buster.

Oh, that was quick and easier than i thought. Thank you!

Thu, Oct 10, 9:23 PM · Release-Engineering-Team (Development services), Release-Engineering-Team-TODO, serviceops, Phabricator, Operations
Dzahn created T235215: Onboarding Reuven Lazarus.
Thu, Oct 10, 8:19 PM · LDAP-Access-Requests, SRE-Access-Requests, Operations
Dzahn set Due Date to Mon, Oct 21, 7:00 PM on T222391: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster).
Thu, Oct 10, 8:11 PM · Patch-For-Review, Release-Engineering-Team (Development services), Release-Engineering-Team-TODO, serviceops, Operations, Gerrit
Dzahn updated the task description for T222391: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster).
Thu, Oct 10, 8:11 PM · Patch-For-Review, Release-Engineering-Team (Development services), Release-Engineering-Team-TODO, serviceops, Operations, Gerrit
Dzahn updated the task description for T222391: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster).
Thu, Oct 10, 8:10 PM · Patch-For-Review, Release-Engineering-Team (Development services), Release-Engineering-Team-TODO, serviceops, Operations, Gerrit
Dzahn added a subtask for T222391: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster): T234866: Set gerrit1001 master switch date.
Thu, Oct 10, 8:09 PM · Patch-For-Review, Release-Engineering-Team (Development services), Release-Engineering-Team-TODO, serviceops, Operations, Gerrit
Dzahn added a parent task for T234866: Set gerrit1001 master switch date: T222391: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster).
Thu, Oct 10, 8:09 PM · serviceops, Release-Engineering-Team, Gerrit
Dzahn assigned T234866: Set gerrit1001 master switch date to Paladox.

Announcement text as agreed on on P9309. Paladox is sending mail to wikitech :)

Thu, Oct 10, 7:35 PM · serviceops, Release-Engineering-Team, Gerrit
Dzahn edited P9309 Gerrit: Server maintenance.
Thu, Oct 10, 7:27 PM
Dzahn edited P9309 Gerrit: Server maintenance.
Thu, Oct 10, 7:21 PM
Dzahn added a comment to P9309 Gerrit: Server maintenance.

https://wikitech.wikimedia.org/w/index.php?title=Deployments&type=revision&diff=1840717&oldid=1840640

Thu, Oct 10, 6:55 PM
Dzahn added a comment to T234866: Set gerrit1001 master switch date.

We agreed on Monday, October 21st.

Thu, Oct 10, 6:46 PM · serviceops, Release-Engineering-Team, Gerrit
Dzahn added a comment to T234866: Set gerrit1001 master switch date.

@thcipriani Sounds good and Mondays work for me (from around 10am PST). This coming one is "Wikimedia holiday email / Monday, October 14 US holiday" though. Unless you want to specifically use the WMF holiday to do it for less impact?

Thu, Oct 10, 6:35 PM · serviceops, Release-Engineering-Team, Gerrit
Dzahn added a comment to T234153: Can't SSH to mw1290.mgmt.

@Jclark-ctr checked on this. (Thanks!) but this still needs to happen. One minute i could SSH to it just fine and 12 minutes later it was alerting in Icinga again. So it keeps being "from time to time" and Chris' comment " we will need to power off the host for 10-30secs." still stands.

Thu, Oct 10, 6:16 PM · ops-eqiad, Operations
Dzahn added a comment to T234996: Login on wikitech wiki fails after OpenStack upgrade removed v2 identity API.

Could confirm yesterday i can login again with the hotfix. Thanks!

Thu, Oct 10, 2:51 PM · MW-1.35-notes (1.35.0-wmf.1; 2019-10-08), cloud-services-team (Kanban), wikitech.wikimedia.org, Operations
mmodell awarded T235140: package wikimedia-lvs-realserver for buster a Like token.
Thu, Oct 10, 7:19 AM · Release-Engineering-Team (Development services), Release-Engineering-Team-TODO, serviceops, Phabricator, Operations
Krinkle awarded T234996: Login on wikitech wiki fails after OpenStack upgrade removed v2 identity API a Orange Medal token.
Thu, Oct 10, 2:57 AM · MW-1.35-notes (1.35.0-wmf.1; 2019-10-08), cloud-services-team (Kanban), wikitech.wikimedia.org, Operations
Dzahn added a comment to T231525: cp1085 - IPMI not working.

mgmt password updated using cookbook.

Thu, Oct 10, 12:41 AM · ops-eqiad, Traffic, Operations

Wed, Oct 9

Dzahn updated subscribers of T190568: Reimage both phab1001 and phab2001 to stretch / buster.

@Muehlenhoff Currently moving to buster is blocked by T235140

Wed, Oct 9, 11:53 PM · Release-Engineering-Team (Development services), Release-Engineering-Team-TODO, serviceops, Phabricator, Operations
Dzahn added a comment to T190568: Reimage both phab1001 and phab2001 to stretch / buster.

Next we need to make a decision whether we keep phab1003 as the prod host permanently (why not i guess?)

What's the current procedure to switch over the active Phab server, just a DNS name change?

Wed, Oct 9, 11:52 PM · Release-Engineering-Team (Development services), Release-Engineering-Team-TODO, serviceops, Phabricator, Operations
Dzahn placed T235140: package wikimedia-lvs-realserver for buster up for grabs.
Wed, Oct 9, 11:49 PM · Release-Engineering-Team (Development services), Release-Engineering-Team-TODO, serviceops, Phabricator, Operations
Dzahn created T235140: package wikimedia-lvs-realserver for buster.
Wed, Oct 9, 11:49 PM · Release-Engineering-Team (Development services), Release-Engineering-Team-TODO, serviceops, Phabricator, Operations
Dzahn closed T223393: switch wikitech to PHP 7.2, a subtask of T208433: Package and install php 7.2 in place of php 7.0, as Resolved.
Wed, Oct 9, 11:39 PM · User-Joe, Operations
Dzahn closed T223393: switch wikitech to PHP 7.2, a subtask of T219127: SRE FY19-20 Q1 goal: complete the transition to PHP7, as Resolved.
Wed, Oct 9, 11:39 PM · Operations, serviceops
Dzahn closed T223393: switch wikitech to PHP 7.2 as Resolved.

switched over by @Andrew and @bd808

Wed, Oct 9, 11:39 PM · cloud-services-team (Kanban), Release-Engineering-Team-TODO, wikitech.wikimedia.org, PHP 7.2 support, serviceops, Operations
Dzahn closed T223393: switch wikitech to PHP 7.2, a subtask of T233849: 1.35.0-wmf.1 deployment blockers, as Resolved.
Wed, Oct 9, 11:39 PM · Patch-For-Review, Release, Train Deployments
Dzahn added a comment to T235135: replication/gerrit2001 issues.

replication.log shows it is replicating again and working on the backlog queue right now.

Wed, Oct 9, 10:10 PM · Gerrit, Operations
Dzahn added a comment to T235135: replication/gerrit2001 issues.

Broken by https://gerrit.wikimedia.org/r/c/operations/puppet/+/541386 when we renamed the replication target yesterday.

Wed, Oct 9, 9:48 PM · Gerrit, Operations

Tue, Oct 8

Dzahn added a comment to T234996: Login on wikitech wiki fails after OpenStack upgrade removed v2 identity API.

Enabled the debug log as suggested by Krenair.

Tue, Oct 8, 8:28 PM · MW-1.35-notes (1.35.0-wmf.1; 2019-10-08), cloud-services-team (Kanban), wikitech.wikimedia.org, Operations
Dzahn claimed T234775: Add banwiki to wikistats.
Tue, Oct 8, 8:15 PM · VPS-project-Wikistats
Dzahn added a project to T234866: Set gerrit1001 master switch date: serviceops.
Tue, Oct 8, 8:14 PM · serviceops, Release-Engineering-Team, Gerrit
Krenair awarded T234996: Login on wikitech wiki fails after OpenStack upgrade removed v2 identity API a The World Burns token.
Tue, Oct 8, 8:12 PM · MW-1.35-notes (1.35.0-wmf.1; 2019-10-08), cloud-services-team (Kanban), wikitech.wikimedia.org, Operations
Dzahn edited projects for T234996: Login on wikitech wiki fails after OpenStack upgrade removed v2 identity API, added: Operations, Cloud-Services; removed wikitech.wikimedia.org.
Tue, Oct 8, 8:09 PM · MW-1.35-notes (1.35.0-wmf.1; 2019-10-08), cloud-services-team (Kanban), wikitech.wikimedia.org, Operations