User Details
- User Since
- Nov 22 2021, 10:00 PM (211 w, 5 h)
- Availability
- Available
- LDAP User
- JHathaway
- MediaWiki User
- JHathaway (WMF) [ Global Accounts ]
Yesterday
Mon, Dec 1
@JKelsoteel-WMF the addresses no-reply or noreply are used to indicate that the sender does not expect replies to be sent to that address, and any replies will be discarded. Why is using no-reply@wikimedia.org necessary for this use case?
Mon, Nov 17
Nov 5 2025
Nov 4 2025
I'm not sure how to remedy this issue. I see we switched to StaticDB in T355979, perhaps we need to rebuild the StaticDB index?
@Krd we are still receiving bounces for that user as their email rate is still too high. Do they need to subscribe to the 77 remaining queues? Could we perhaps unsubscribe them from all, and pop them a note to resubscribe?
Is this still occurring?
After some analysis today, I think the cause of the bounces were as follows:
@Xaosflux the outbound queue has now been cleared of all backscatter bounce emails, so delivery times should be back to normal.
Nov 3 2025
@Xaosflux I assume it is related, but I have not been able to confirm it yet.
@Krd I see the junk mail queue is now at 600k, how can I help clear it out, I saw some of the scheduled jobs were run, but that does not seem to be enough. Also feel free to contact me on IRC for some real time triaging.
Oct 28 2025
@Krd how else can I help?
@Krd thanks, I'm investigating, not sure of the cause either.
Oct 20 2025
From a brief look, most of these conntrack entries are from an-coord1003.eqiad.wmnet, along with log entries of the form:
Oct 17 2025
debug1: Remote protocol version 2.0, remote software version GerritCodeReview_3.10.6 (APACHE-SSHD-2.12.0) debug2: peer server KEXINIT proposal debug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,curve448-sha512,ecdh-sha2-nistp521,ecdh-sha2-nistp384,ecdh-sha2-nistp256,diffie-hellman-group-exchange-sha256,diffie-hellman-group18-sha512,diffie-hellman-group17-sha512,diffie-hellman-group16-sha512,diffie-hellman-group15-sha512,diffie-hellman-group14-sha256,ext-info-s,kex-strict-s-v00@openssh.com
deploy2002 is running bullseye, which has ssh 1:8.4p1-5+deb11u5, so it does not have any of the post quantum algorithms that were first added in 9.0.
Oct 6 2025
Oct 3 2025
@jcrespo it took me a bit of time to coerce the box back into bios mode. I then tried reimaging with bookworm, but the raid step failed, due to the existence of the raid6 volume. After trying a couple of efforts, which failed, I booted off a rescue image and removed the raid6 volume with storcli.
Oct 2 2025
As @jcrespo pointed out on IRC, there is also a quite a bit of puppet 5 documentation which needs to be removed or updated as part of this task.
Oct 1 2025
- Working hacks!
Sep 29 2025
Thanks @CDanis I happened upon that post as well, I don't think their approach is unreasonable. I think there are different trade offs between complexity and adherence to spec. My preference is to try the sync script, but if that fails I'm happy to look at their approach.
Sep 25 2025
Sep 18 2025
Would it be acceptable to store the data from the parsed DMARC reports in OpenSearch? My initial estimate is that it would require about 50MiB of data per day.
Sep 17 2025
Sep 15 2025
Sep 9 2025
I was able to reproduce on dse-k8s-worker1014 by flipping VT on that re-running the cookbook, how does this patch look, https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1186619, it tests for me okay on dse-k8s-worker1014.
because of the error on line 22?
Sep 8 2025
On the mx-in servers you can obtain routing information via sendmail -bv, however it is a bit more annoying to work with compared to exim -bt
Thanks @CDanis that is also worth looking into as a redundancy option.
Aug 25 2025
Aug 13 2025
Aug 11 2025
We get the correct exit codes now, which is fabulous. However, what is less fabulous as @Tgr notes is that the majority of the error codes are 74, which is used by msmtp to indicate either a disk I/O error or a network I/O error, both of which are broad categories. So, we really need more debug info, how do we obtain it?
Aug 5 2025
It looks like puppet 5 is failing, while puppet 7 is running successfully?
Jul 31 2025
It appears there were two issues:
@TAndic I ran an initial test and our setup appears to be working correctly. Could we perhaps jump on a call and review the configuration together and send some test emails?
Jul 28 2025
great, please re-open if you have any issues
Jul 25 2025
@ttaylor access should be setup, you should receive an email about setting up your kerberos credentials. Please try everything out and report back.
Supermicro indicated that the debug output from the supplied BIOS is only outputted to COM2. From briefly looking at the manual COM2 is only available as a header on board, "One serial port on the rear I/O panel (COM1) and one onboard header (COM2)." Which, from my read, indicates we will have to purchase the PCIE bracket and cable.
Jul 24 2025
@ttaylor when you have a moment, please review and sign the L3 server access document.
@HCoplin-WMF happy to help grant you access. You may be able to request access through our new IDM tool, https://idm.wikimedia.org. Can you try logging in and requesting access?
@KFrancis would you kindly confirm that @Novem_Linguae has signed the NDA?
Jul 23 2025
@nisrael I sent you an invite, let me know if you can get in.
Based on our discussion with Supermicro on the call today, my understanding is that we will need to support model specific quirks.
Jul 22 2025
ysuu9wx7@ag.us.dmarcian.com has been added to the rua record, and should start receiving aggregate reports
Closing for now, as the testing of the intel card is complete for the moment