Page MenuHomePhabricator

jhathaway (Jesse Hathaway)
User

Today

  • No visible events.

Tomorrow

  • No visible events.

Wednesday

  • No visible events.

User Details

User Since
Nov 22 2021, 10:00 PM (228 w, 6 d)
Availability
Available
LDAP User
JHathaway
MediaWiki User
JHathaway (WMF) [ Global Accounts ]

Recent Activity

Fri, Apr 10

jhathaway added a comment to T422945: Create project tag for corto.

@Peachey88 awesome, thanks

Fri, Apr 10, 8:43 PM · Project-Admins
jhathaway created T422945: Create project tag for corto.
Fri, Apr 10, 2:25 PM · Project-Admins

Wed, Mar 25

jhathaway closed T367399: Default to the Puppet 7 PCC CI test, make it voting and eventually remove the Puppet 5 one as Resolved.

instances have been deleted.

Wed, Mar 25, 9:20 PM · Puppet CI, Patch-For-Review, Puppet-Infrastructure, SRE, Infrastructure-Foundations
jhathaway closed T367399: Default to the Puppet 7 PCC CI test, make it voting and eventually remove the Puppet 5 one, a subtask of T365798: Shutdown of Puppet 5 servers, as Resolved.
Wed, Mar 25, 9:20 PM · Patch-For-Review, Puppet-Infrastructure, SRE, Infrastructure-Foundations

Fri, Mar 20

jhathaway updated the task description for T419906: Allow IT Services to view inbound email logs.
Fri, Mar 20, 7:15 PM · Observability-Logging, Mail, Infrastructure-Foundations
jhathaway added a comment to T419906: Allow IT Services to view inbound email logs.

added, a few alternatives, after discussing with other folks, happy to hear of others.

Fri, Mar 20, 7:05 PM · Observability-Logging, Mail, Infrastructure-Foundations
jhathaway updated the task description for T419906: Allow IT Services to view inbound email logs.
Fri, Mar 20, 7:05 PM · Observability-Logging, Mail, Infrastructure-Foundations

Tue, Mar 17

jhathaway updated the task description for T419906: Allow IT Services to view inbound email logs.
Tue, Mar 17, 9:11 PM · Observability-Logging, Mail, Infrastructure-Foundations
jhathaway added a comment to T419906: Allow IT Services to view inbound email logs.

good catch, should be that a developer account is required, amending

This is still not correct, access to Logstash requires membership in the nda, logstash-access or ops groups (see T376790).

Tue, Mar 17, 9:09 PM · Observability-Logging, Mail, Infrastructure-Foundations

Mon, Mar 16

jhathaway added a project to T419906: Allow IT Services to view inbound email logs: Observability-Logging.
Mon, Mar 16, 3:48 PM · Observability-Logging, Mail, Infrastructure-Foundations
jhathaway triaged T419906: Allow IT Services to view inbound email logs as Medium priority.
Mon, Mar 16, 2:19 PM · Observability-Logging, Mail, Infrastructure-Foundations
jhathaway updated the task description for T419906: Allow IT Services to view inbound email logs.
Mon, Mar 16, 2:08 PM · Observability-Logging, Mail, Infrastructure-Foundations
jhathaway added a comment to T419906: Allow IT Services to view inbound email logs.

If this data is too sensitive for our normal Logstash instance, why are we fine shipping them to a yet another third party?

Mon, Mar 16, 2:07 PM · Observability-Logging, Mail, Infrastructure-Foundations

Mar 13 2026

jhathaway updated the task description for T419906: Allow IT Services to view inbound email logs.
Mar 13 2026, 4:31 PM · Observability-Logging, Mail, Infrastructure-Foundations

Mar 12 2026

jhathaway created T419906: Allow IT Services to view inbound email logs.
Mar 12 2026, 9:08 PM · Observability-Logging, Mail, Infrastructure-Foundations

Mar 10 2026

jhathaway added a comment to T386559: X-spam-score header missing on obvious spam delivered to multiple Mailman3 lists via HyperKitty web ui.

Okay, the posting flow is a bit different than I understood it at first glance. A user can only use the web form if they are logged in with an account. Otherwise the buttons appear, but they are mailto links which open the users email client.

Mar 10 2026, 9:43 PM · Patch-For-Review, collaboration-services, SRE, Wikimedia-Mailing-lists

Mar 9 2026

jhathaway added a comment to T386559: X-spam-score header missing on obvious spam delivered to multiple Mailman3 lists via HyperKitty web ui.

Ok we have two options now:

Mar 9 2026, 9:49 PM · Patch-For-Review, collaboration-services, SRE, Wikimedia-Mailing-lists

Mar 6 2026

jhathaway added a comment to T386559: X-spam-score header missing on obvious spam delivered to multiple Mailman3 lists via HyperKitty web ui.

if /message/new is the correct route [...]

FWIW, if I'm understanding correctly -- from the <!-- Reply form --> in the source of this Hyperkitty page, it seems like /hyperkitty/list/[mailing-list]/message/[id]/reply might be a route that's used for replies via the web UI.

Mar 6 2026, 10:40 PM · Patch-For-Review, collaboration-services, SRE, Wikimedia-Mailing-lists
jhathaway added a comment to T386559: X-spam-score header missing on obvious spam delivered to multiple Mailman3 lists via HyperKitty web ui.

if /message/new is the correct route, here is the count of usage from 03-05:

Mar 6 2026, 10:23 PM · Patch-For-Review, collaboration-services, SRE, Wikimedia-Mailing-lists
jhathaway added a comment to T386559: X-spam-score header missing on obvious spam delivered to multiple Mailman3 lists via HyperKitty web ui.

As briefly discussed, I think the only users of the web UI for posting are spammers […]

FWIW this is not true, I have used the web UI each time I posted a message to wikitech-l.

Mar 6 2026, 7:35 PM · Patch-For-Review, collaboration-services, SRE, Wikimedia-Mailing-lists
jhathaway added a comment to T386559: X-spam-score header missing on obvious spam delivered to multiple Mailman3 lists via HyperKitty web ui.

@bd808 based on the User-Agent, User-Agent: HyperKitty on https://lists.wikimedia.org/, and looking in the logs, this message appears to have been posted from the web UI. Messages posted from the web UI are sent via SMTP to exim4, but the source IP is localhost, so our exim4 config skips spam checking. As briefly discussed, I think the only users of the web UI for posting are spammers, https://gitlab.com/mailman/hyperkitty/-/issues/264, so let's disable it.

Mar 6 2026, 5:50 PM · Patch-For-Review, collaboration-services, SRE, Wikimedia-Mailing-lists
jhathaway closed T418700: VRT replies to Hotmail/Outlook bounce as Resolved.

great, thanks again for bringing this to my attention @Xaosflux

Mar 6 2026, 2:59 PM · Infrastructure-Foundations, Mail, vrts

Mar 5 2026

jhathaway claimed T418700: VRT replies to Hotmail/Outlook bounce.
Mar 5 2026, 10:00 PM · Infrastructure-Foundations, Mail, vrts
jhathaway added a comment to T418700: VRT replies to Hotmail/Outlook bounce.

@Xaosflux I have added a DKIM key and tested that Microsoft is able to verify the key correctly. If you could resend the failed email and confirm that it is sent successfully, that would be appreciated.

Mar 5 2026, 9:59 PM · Infrastructure-Foundations, Mail, vrts
jhathaway added a comment to T418803: DKIM/SPF.

sounds good, please let me know if you need help in any way

Mar 5 2026, 4:44 PM · Wikiportrait

Mar 4 2026

jhathaway added a comment to T418803: DKIM/SPF.

@Mbch331 thanks for reporting this issue. What is the expected mail flow? Does appel.wikimedia.nl originate the emails?

Mar 4 2026, 10:31 PM · Wikiportrait
jhathaway added a comment to T418700: VRT replies to Hotmail/Outlook bounce.

@Xaosflux thanks for reporting this issue. I did some tests with the info-en queue, but I wasn't able to reproduce the issue.

Mar 4 2026, 10:30 PM · Infrastructure-Foundations, Mail, vrts

Feb 27 2026

jhathaway updated the task description for T381919: Supermicro: unable to set boot order after using Redfish to boot once.
Feb 27 2026, 5:53 PM · Infrastructure-Foundations

Feb 26 2026

jhathaway added a comment to T409137: lists.wikimedia.org subscription email rejected by DKIM.

@DamianZaremba I tried with a couple of my test accounts, but I was unable to duplicate your results. The list manager confirmation emails all passed dkim. Would it be possible to forward me the entire email, ideally as an eml attachment? jhathaway@wikimedia.org

Feb 26 2026, 10:33 PM · collaboration-services, Wikimedia-Mailing-lists, SRE, Infrastructure-Foundations

Feb 24 2026

jhathaway closed T417410: Suggestion: email forwarding from wikipedia.org to wikimedia.org as Declined.

@Elli though this is technically possible, outside of some existing corner cases, I am not sure the added confusion of having multiple domains is worth the occasional bounced email.

Feb 24 2026, 10:17 PM · Mail, Infrastructure-Foundations, English-Arbitration-Committee
jhathaway renamed T418282: learn.wiki add SPF and DMARC DNS records from learn.wiki add SPF and DKIM DNS records to learn.wiki add SPF and DMARC DNS records.
Feb 24 2026, 5:39 PM · WikiLearn
jhathaway created T418282: learn.wiki add SPF and DMARC DNS records.
Feb 24 2026, 5:38 PM · WikiLearn
jhathaway added a comment to T417941: Remove mail alias/fork from dmarc-rua@wikimedia.org to dmarc@donate.wikimedia.org.

@Jgreen I removed the dmarc@donate.wikimedia.org line from that alias.

I did not find any "dmarc-ruf@" alias in that postfix alias file though.

It is being mentioned in T167337 and I guess it means this is in Google. But to be sure let me add @jhathaway

Feb 24 2026, 4:27 PM · SRE, Mail, Infrastructure-Foundations

Feb 23 2026

jhathaway closed T166291: Exim panics when spamd reaches maxchildren as Declined.

We only run rspamd in combination with postfix now

Feb 23 2026, 3:55 PM · Infrastructure-Foundations, Mail, SRE
jhathaway added a comment to T329158: systemd-timer puppet code triggers an execution when applying a schedule change.

@jcrespo is this still ongoing, or are you okay with closing?

Feb 23 2026, 3:48 PM · Puppet-Core, Infrastructure-Foundations
jhathaway closed T417771: puppetboard loads fonts from google as Resolved.

Thanks @taavi

Feb 23 2026, 3:36 PM · Infrastructure-Foundations, Privacy, Puppet-Infrastructure
jhathaway triaged T417941: Remove mail alias/fork from dmarc-rua@wikimedia.org to dmarc@donate.wikimedia.org as Medium priority.
Feb 23 2026, 3:29 PM · SRE, Mail, Infrastructure-Foundations

Jan 24 2026

jhathaway closed T415265: Phabricator task notification emails not delivered (Microsoft email; DMARC) as Resolved.

great, I'll resolve then, let me know if new problems arise.

Jan 24 2026, 4:21 PM · Infrastructure-Foundations, Mail, Phabricator

Jan 23 2026

jhathaway added a comment to T415265: Phabricator task notification emails not delivered (Microsoft email; DMARC).

@Nardog Microsoft appears to have resolved the issue, are you receiving ticket updates now?

Jan 23 2026, 3:39 PM · Infrastructure-Foundations, Mail, Phabricator

Jan 22 2026

jhathaway claimed T415265: Phabricator task notification emails not delivered (Microsoft email; DMARC).
Jan 22 2026, 11:27 PM · Infrastructure-Foundations, Mail, Phabricator
jhathaway triaged T415265: Phabricator task notification emails not delivered (Microsoft email; DMARC) as High priority.
Jan 22 2026, 11:26 PM · Infrastructure-Foundations, Mail, Phabricator
jhathaway added a comment to T415265: Phabricator task notification emails not delivered (Microsoft email; DMARC).

@Nardog this appears to be an issue with Microsoft's email platform, they are throttling our mail server. I have opened a ticket with the support desk to try and get the issue resolved.

Jan 22 2026, 10:15 PM · Infrastructure-Foundations, Mail, Phabricator
jhathaway updated subscribers of T415189: DHCP failing for at least 2 ms-be servers in codfw.

@jhathaway I did ms-be2077 today, and see the same failure mode - it failed entirely to DHCP. I killed the cookbook, re-ran it ( sudo cookbook sre.hosts.reimage --os bullseye -t T354872 --move-vlan --new ms-be2077) and it reimaged just fine. Obviously the re-run doesn't make any DNS changes because the first run does that, which does make me think the --move-vlan bit is what's going wrong.

Jan 22 2026, 9:57 PM · SRE, SRE-swift-storage, ops-codfw, DC-Ops, Infrastructure-Foundations
jhathaway added a comment to T415265: Phabricator task notification emails not delivered (Microsoft email; DMARC).

@Nardog we changed our DMARC policy on wikimedia.org to quarantine on 2026-01-20, so that is probably the cause of the deliverability issue. I'll look into why the emails are getting rejected and revert the change if necessary.

Jan 22 2026, 4:12 PM · Infrastructure-Foundations, Mail, Phabricator

Jan 21 2026

jhathaway added a comment to T415189: DHCP failing for at least 2 ms-be servers in codfw.

Strangely I re-imaged both servers from cumin2002 and ran into no issues. Perhaps when you ran the first re-images @MatthewVernon, though they failed, they setup the conditions for the following re-images to succeed? Was there any interesting output from the move-vlan cookbook? I ran the following commands:

Jan 21 2026, 7:14 PM · SRE, SRE-swift-storage, ops-codfw, DC-Ops, Infrastructure-Foundations
jhathaway claimed T415189: DHCP failing for at least 2 ms-be servers in codfw.

@MatthewVernon looking...

Jan 21 2026, 4:38 PM · SRE, SRE-swift-storage, ops-codfw, DC-Ops, Infrastructure-Foundations

Jan 9 2026

jhathaway added a comment to T412451: 4 failed reimages on wdqs1029, 1030, 1031, 1032.

@jhathaway yes please! You can use wdqs1029 or 1030 :)

Jan 9 2026, 8:47 PM · DC-Ops, Essential-Work, Data-Platform-SRE (2026.01.05 - 2026.01.23), Infrastructure-Foundations

Jan 8 2026

jhathaway added a comment to T412451: 4 failed reimages on wdqs1029, 1030, 1031, 1032.

I tried to reimage wdqs1029 today trying to test https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1214488, but the second try led me to:

Booting from Embedded SATA Port Disk B: debian
error: disk `mduuid/2884f27b943ef5a81ae46a251fc7960c' not found.
grub rescue>

@jhathaway this is really weird, it seems like T404356 but I can't pin point the issue. Any ideas?

Jan 8 2026, 9:32 PM · DC-Ops, Essential-Work, Data-Platform-SRE (2026.01.05 - 2026.01.23), Infrastructure-Foundations
jhathaway added a comment to T367399: Default to the Puppet 7 PCC CI test, make it voting and eventually remove the Puppet 5 one.

Something I forgot, the operations-puppet-catalog-compiler-test job (Puppet 5) was tied to the Jenkins label puppet5-compiler-node. The label is applied to three agents:

  • pcc-worker1014.puppet-diffs.eqiad1.wikimedia.cloud
  • pcc-worker1015.puppet-diffs.eqiad1.wikimedia.cloud
  • pcc-worker1016.puppet-diffs.eqiad1.wikimedia.cloud

The Jenkins agents do not any other labels and I am thus guessing they can be removed from Jenkins and the underlying WMCS instances can be decommissioned? They were setup by @jhathaway :)

Jan 8 2026, 8:20 PM · Puppet CI, Patch-For-Review, Puppet-Infrastructure, SRE, Infrastructure-Foundations

Dec 15 2025

jhathaway lowered the priority of T191018: Provide an option menu when booting via PXE from Medium to Low.
Dec 15 2025, 3:49 PM · Infrastructure-Foundations, SRE

Dec 8 2025

jhathaway triaged T411102: Puppet types causing issues around nftables::service as Medium priority.
Dec 8 2025, 3:51 PM · cloud-services-team (FY2025/2026-Q3-Q4), Infrastructure-Foundations, Puppet-Core

Dec 1 2025

jhathaway added a comment to T411027: Emails to Google group no-reply@wikimedia.org are not being delivered - SMTP server issue?.

@jhathaway I did create a group called "noreply@wikimedia.org" to see if messages to that address were also dropped, and it seems they are not. Noah is ok with using the noreply@ address instead of no-reply@ - do you see any problems with this? Thanks for your help!

Dec 1 2025, 5:51 PM · Mail, SRE, Infrastructure-Foundations
jhathaway added a comment to T411027: Emails to Google group no-reply@wikimedia.org are not being delivered - SMTP server issue?.

@JKelsoteel-WMF the addresses no-reply or noreply are used to indicate that the sender does not expect replies to be sent to that address, and any replies will be discarded. Why is using no-reply@wikimedia.org necessary for this use case?

Dec 1 2025, 4:24 PM · Mail, SRE, Infrastructure-Foundations
jhathaway renamed T411102: Puppet types causing issues around nftables::service from Typing issues around nftables::service to Puppet types causing issues around nftables::service.
Dec 1 2025, 3:31 PM · cloud-services-team (FY2025/2026-Q3-Q4), Infrastructure-Foundations, Puppet-Core
jhathaway triaged T411027: Emails to Google group no-reply@wikimedia.org are not being delivered - SMTP server issue? as Medium priority.
Dec 1 2025, 3:29 PM · Mail, SRE, Infrastructure-Foundations

Nov 17 2025

jhathaway lowered the priority of T408632: VRTS is spammed with bounce e-mails and is going to break from High to Medium.
Nov 17 2025, 3:37 PM · collaboration-services, Infrastructure-Foundations, SRE, vrts, Znuny

Nov 5 2025

jhathaway added a comment to T408967: VRTS outbound emails not working.

I think I have changed the setting and deployed it, but it still shows the old value, and now cannot be edited. Please check what is wrong.
Please change it to: volunteers-vrt@wikimedia.org

Nov 5 2025, 3:15 PM · collaboration-services, SRE, Znuny

Nov 4 2025

jhathaway added a comment to T409135: VRT queue index shows incorrect value.

I'm not sure how to remedy this issue. I see we switched to StaticDB in T355979, perhaps we need to rebuild the StaticDB index?

Nov 4 2025, 8:06 PM · collaboration-services, Infrastructure-Foundations, SRE, vrts, Znuny
jhathaway added a comment to T408967: VRTS outbound emails not working.

I just heard from one user that password recovery still doesn't work for them.

Nov 4 2025, 8:02 PM · collaboration-services, SRE, Znuny
jhathaway added a comment to T408632: VRTS is spammed with bounce e-mails and is going to break.

@Krd we are still receiving bounces for that user as their email rate is still too high. Do they need to subscribe to the 77 remaining queues? Could we perhaps unsubscribe them from all, and pop them a note to resubscribe?

Nov 4 2025, 7:51 PM · collaboration-services, Infrastructure-Foundations, SRE, vrts, Znuny
jhathaway added a comment to T408632: VRTS is spammed with bounce e-mails and is going to break.

I have unsubscribed the mentioned user. This appears to be the only one, and I will monitor this from now on. It makes no sense to get copies of thousands of spams messages each day.

Nov 4 2025, 5:27 PM · collaboration-services, Infrastructure-Foundations, SRE, vrts, Znuny
jhathaway added a comment to T409135: VRT queue index shows incorrect value.

Is this still occurring?

Nov 4 2025, 5:24 PM · collaboration-services, Infrastructure-Foundations, SRE, vrts, Znuny
jhathaway added a comment to T408632: VRTS is spammed with bounce e-mails and is going to break.

Please provide in private who that is and how you found the information.

Nov 4 2025, 3:44 PM · collaboration-services, Infrastructure-Foundations, SRE, vrts, Znuny
jhathaway added a comment to T408632: VRTS is spammed with bounce e-mails and is going to break.

If I checked correctly, nobody is subscribed to the Junk queue, so no notifications for that should have been created.
If there were any, I think that should be investigated further.

Nov 4 2025, 3:09 PM · collaboration-services, Infrastructure-Foundations, SRE, vrts, Znuny
jhathaway added a comment to T408632: VRTS is spammed with bounce e-mails and is going to break.

After some analysis today, I think the cause of the bounces were as follows:

Nov 4 2025, 4:52 AM · collaboration-services, Infrastructure-Foundations, SRE, vrts, Znuny
jhathaway added a comment to T408967: VRTS outbound emails not working.

@Xaosflux the outbound queue has now been cleared of all backscatter bounce emails, so delivery times should be back to normal.

Nov 4 2025, 4:44 AM · collaboration-services, SRE, Znuny

Nov 3 2025

jhathaway added a comment to T408967: VRTS outbound emails not working.

@Xaosflux I assume it is related, but I have not been able to confirm it yet.

Nov 3 2025, 4:37 PM · collaboration-services, SRE, Znuny
jhathaway added a comment to T408632: VRTS is spammed with bounce e-mails and is going to break.

@Krd I see the junk mail queue is now at 600k, how can I help clear it out, I saw some of the scheduled jobs were run, but that does not seem to be enough. Also feel free to contact me on IRC for some real time triaging.

Nov 3 2025, 4:12 PM · collaboration-services, Infrastructure-Foundations, SRE, vrts, Znuny
jhathaway claimed T407723: Sendmail network error (deployment).
Nov 3 2025, 2:44 PM · ServiceOps new, Infrastructure-Foundations, Mail, SRE
jhathaway claimed T408967: VRTS outbound emails not working.
Nov 3 2025, 2:41 PM · collaboration-services, SRE, Znuny
jhathaway claimed T408632: VRTS is spammed with bounce e-mails and is going to break.
Nov 3 2025, 2:41 PM · collaboration-services, Infrastructure-Foundations, SRE, vrts, Znuny

Oct 28 2025

jhathaway added a comment to T408632: VRTS is spammed with bounce e-mails and is going to break.

@Krd how else can I help?

Oct 28 2025, 10:22 PM · collaboration-services, Infrastructure-Foundations, SRE, vrts, Znuny
jhathaway added a comment to T408632: VRTS is spammed with bounce e-mails and is going to break.

The 219.240.37.89 looks like a common factor. Can we block this source IP for SMTP as a first measure?

Oct 28 2025, 8:49 PM · collaboration-services, Infrastructure-Foundations, SRE, vrts, Znuny
jhathaway triaged T408632: VRTS is spammed with bounce e-mails and is going to break as High priority.
Oct 28 2025, 8:20 PM · collaboration-services, Infrastructure-Foundations, SRE, vrts, Znuny
jhathaway added a comment to T408632: VRTS is spammed with bounce e-mails and is going to break.

@Krd thanks, I'm investigating, not sure of the cause either.

Oct 28 2025, 8:20 PM · collaboration-services, Infrastructure-Foundations, SRE, vrts, Znuny

Oct 20 2025

jhathaway updated subscribers of T407726: Increase net.nf_conntrack_max on kerberos hosts if needed.

From a brief look, most of these conntrack entries are from an-coord1003.eqiad.wmnet, along with log entries of the form:

Oct 20 2025, 5:03 PM · Patch-For-Review, Infrastructure-Foundations, SRE

Oct 17 2025

jhathaway added a comment to T407557: OpenSSH 10.1+ warns that Wikimedia SSH does not use post-quantum key exchange algorithm.
debug1: Remote protocol version 2.0, remote software version GerritCodeReview_3.10.6 (APACHE-SSHD-2.12.0)
debug2: peer server KEXINIT proposal
debug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,curve448-sha512,ecdh-sha2-nistp521,ecdh-sha2-nistp384,ecdh-sha2-nistp256,diffie-hellman-group-exchange-sha256,diffie-hellman-group18-sha512,diffie-hellman-group17-sha512,diffie-hellman-group16-sha512,diffie-hellman-group15-sha512,diffie-hellman-group14-sha256,ext-info-s,kex-strict-s-v00@openssh.com
Oct 17 2025, 8:16 PM · Release-Engineering-Team, Infrastructure-Foundations, GitLab
jhathaway added a comment to T407557: OpenSSH 10.1+ warns that Wikimedia SSH does not use post-quantum key exchange algorithm.

deploy2002 is running bullseye, which has ssh 1:8.4p1-5+deb11u5, so it does not have any of the post quantum algorithms that were first added in 9.0.

Oct 17 2025, 4:21 PM · Release-Engineering-Team, Infrastructure-Foundations, GitLab

Oct 6 2025

jhathaway lowered the priority of T370006: Investigate options for outbound email redundancy for mediawiki on kubernetes from High to Medium.
Oct 6 2025, 2:42 PM · Infrastructure-Foundations, Mail

Oct 3 2025

jhathaway added a comment to T371416: Q1:rack/setup/install backup1012.

The https interface has a terrible GUI a bit hidden between submenus.

Oct 3 2025, 2:07 PM · SRE, Data-Persistence-Backup, Data-Persistence, ops-eqiad, DC-Ops
jhathaway added a comment to T371416: Q1:rack/setup/install backup1012.

what steps did you take to re-image it

I had to redo the HW RAID setup, which was missing from the configuration of the host (it was there before) through the mgmt interface. Then I did a manual partitioning (which may not have been needed, but I did based on your feedback that the partitioning was correct). All of that was quite painful.

Oct 3 2025, 1:59 PM · SRE, Data-Persistence-Backup, Data-Persistence, ops-eqiad, DC-Ops
jhathaway added a comment to T371416: Q1:rack/setup/install backup1012.

It is normal that the recipe was not working, not only the logical configuration was destroyed, the RAID was not setup, too, so it had no virtual RAID configuration either.

Oct 3 2025, 1:48 PM · SRE, Data-Persistence-Backup, Data-Persistence, ops-eqiad, DC-Ops
jhathaway added a comment to T371416: Q1:rack/setup/install backup1012.

@jcrespo it took me a bit of time to coerce the box back into bios mode. I then tried reimaging with bookworm, but the raid step failed, due to the existence of the raid6 volume. After trying a couple of efforts, which failed, I booted off a rescue image and removed the raid6 volume with storcli.

Oct 3 2025, 4:47 AM · SRE, Data-Persistence-Backup, Data-Persistence, ops-eqiad, DC-Ops

Oct 2 2025

jhathaway updated subscribers of T341095: Puppet7: Update documentation .

As @jcrespo pointed out on IRC, there is also a quite a bit of puppet 5 documentation which needs to be removed or updated as part of this task.

Oct 2 2025, 2:57 PM · Documentation, Puppet-Infrastructure, Puppet (Puppet 7.0), Infrastructure-Foundations, SRE

Oct 1 2025

jhathaway updated subscribers of T364511: Suggestion on how to setup acme-chief in the dcl testing environment.
  1. Working hacks!
Oct 1 2025, 3:50 PM · Acme-chief

Sep 29 2025

jhathaway added a comment to T376949: UEFI and software RAID.

Thanks @CDanis I happened upon that post as well, I don't think their approach is unreasonable. I think there are different trade offs between complexity and adherence to spec. My preference is to try the sync script, but if that fails I'm happy to look at their approach.

Sep 29 2025, 5:12 PM · Patch-For-Review, Infrastructure-Foundations

Sep 25 2025

jhathaway added a comment to T404356: UEFI installer not installing grub correctly (at least on systems where / is RAID).

The host doesn't PXE/HTTP boot for some reason, I reopened the provision task in T394357#11184292.

Sep 25 2025, 9:05 PM · SRE-swift-storage, Infrastructure-Foundations
jhathaway added a comment to T404356: UEFI installer not installing grub correctly (at least on systems where / is RAID).

Does /boot even need to be on a separate partition for UEFI booting?

Sep 25 2025, 9:01 PM · SRE-swift-storage, Infrastructure-Foundations

Sep 18 2025

jhathaway added a comment to T404888: Parse DMARC reports and create a dashboard from data.

Would it be acceptable to store the data from the parsed DMARC reports in OpenSearch? My initial estimate is that it would require about 50MiB of data per day.

Sep 18 2025, 4:33 PM · Patch-For-Review, SRE Observability, Epic, Infrastructure-Foundations, Mail
jhathaway added a project to T404888: Parse DMARC reports and create a dashboard from data: SRE Observability.
Sep 18 2025, 4:16 PM · Patch-For-Review, SRE Observability, Epic, Infrastructure-Foundations, Mail

Sep 17 2025

jhathaway triaged T404891: Parse DMARC reports and expose Prometheus metrics as Medium priority.
Sep 17 2025, 4:27 PM · Epic, Infrastructure-Foundations, Mail
jhathaway created T404891: Parse DMARC reports and expose Prometheus metrics.
Sep 17 2025, 4:27 PM · Epic, Infrastructure-Foundations, Mail
jhathaway triaged T404888: Parse DMARC reports and create a dashboard from data as Medium priority.
Sep 17 2025, 4:18 PM · Patch-For-Review, SRE Observability, Epic, Infrastructure-Foundations, Mail
jhathaway created T404888: Parse DMARC reports and create a dashboard from data.
Sep 17 2025, 4:18 PM · Patch-For-Review, SRE Observability, Epic, Infrastructure-Foundations, Mail
jhathaway triaged T404884: DMARC improvements as Medium priority.
Sep 17 2025, 3:55 PM · Epic, Infrastructure-Foundations, Mail
jhathaway created T404884: DMARC improvements.
Sep 17 2025, 3:55 PM · Epic, Infrastructure-Foundations, Mail

Sep 15 2025

jhathaway triaged T403986: Donations@ doesn't forward to donate@ as Medium priority.
Sep 15 2025, 2:23 PM · Mail, Infrastructure-Foundations, FR-donorrelations, SRE
jhathaway triaged T404005: dcl: create a Trixie image as Medium priority.
Sep 15 2025, 2:22 PM · Infrastructure-Foundations

Sep 9 2025

jhathaway added a comment to P82917 provisioning for dse-k8s-worker1014.

I was able to reproduce on dse-k8s-worker1014 by flipping VT on that re-running the cookbook, how does this patch look, https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1186619, it tests for me okay on dse-k8s-worker1014.

Sep 9 2025, 10:26 PM
jhathaway added a comment to P82917 provisioning for dse-k8s-worker1014.

because of the error on line 22?

Sep 9 2025, 3:28 PM