Oh hi. Nice to see you here.
User Details
- User Since
- Dec 11 2018, 9:39 PM (364 w, 6 d)
- Availability
- Available
- IRC Nick
- sukhe
- LDAP User
- Unknown
- MediaWiki User
- SSingh (WMF) [ Global Accounts ]
Wed, Dec 3
Tue, Dec 2
@Dzahn has freed up some inodes. We were not out of disk space, we were out of inodes. We are trying to free up some more but for now, we should be running again. As @AntiCompositeNumber mentioned, there has been a steady rise for a while now so we should look into that.
Thu, Nov 27
And while there is that fallback mechanism to the old system, this is something to keep in mind.
Yeah I think that makes sense if we want to exert control over upstream issues and how it reflects to the proxy itself. We can approach this in two ways:
Wed, Nov 26
Tue, Nov 25
Fri, Nov 21
Yeah, there are certainly other files. I think you can remove your comment that has the SVG file contents otherwise it makes it difficult to read the task.
Thu, Nov 20
Once https://gerrit.wikimedia.org/r/c/operations/software/homer/deploy/+/1207915 is merged tomorrow or Monday, we will enable BGP, test the anycast address and then switch the backend to use that. But since the VMs are up and running with the desired role, marking this as resolved.
Wed, Nov 19
Hi @RoyZuo: we have tried to debug this on the CDN side and can't seem to find anything there that can point us to the problem. Can you upload any file at all, or is it simply this file, which given an SVG and 36.7Mb can be leading to some issues on the app layer.
Oh wow, thanks @MoritzMuehlenhoff! But what was the issue for my understanding?
Tue, Nov 18
Related: T410411.
Mon, Nov 17
Tagging Traffic for this is perfectly fine, thanks @A_smart_kitten.
Thu, Nov 13
hcaptcha-proxy3001 worked just fine but hcaptcha-proxy3002 does not come up after reboot (tried twice). A manual start also did not work and sudo gnt-instance info hcaptcha-proxy3002.wikimedia.org on ganeti3005.esams.wmnet also wasn't really helpful.
Wed, Nov 12
sudo cookbook sre.ganeti.makekevm --vcpus 2 --memory 2 --disk 20 --network public --os trixie -t T409860 --cluster <site> --group <group> <hostname>
Hi @Monneyboi: Thanks for reporting! Setting a user-agent will fix the error as your own example above shows (thanks for trying that!) And additionally, like you mentioned, the reason this policy is now being enforced, is to ensure fair-use of infrastructure; as such, user-agents are one of the ways of identifying or classifying the request, and thus we are requiring them to be set.
Tue, Nov 11
Once the VMs are up, we will need to enable BGP for all of them in Netbox and then run homer.
Initial role can be insetup::traffic_nftables. We will reimage to hcaptcha::proxy role later, with Debian bookworm as routed Ganeti setups (magru/esams) do not have the patched bird packaged for trixie yet and we don't want to wait. (We can reimage to trixie later.)
Hi @Jdrewniak: Daniel has already commented on the questions from Traffic's end (and as it related to the CDN and DNS) and what he has mentioned is correct per our understanding as well. We also set up some redirects under T408168 so we can work with those domains if required (wikipedia25.org).
Mon, Nov 10
Thanks for filing this task @cmooney! The geofeed link above is very helpful. So it seems from the above (57.141.8.0/24, 57.141.8.0/24), we are missing the entries in the geo-maps file so they default to codfw. (We have 57.141.4.0/24 and 57.141.5.0/24 in the geo-maps).
Nov 6 2025
Ah interesting, that explains why we couldn't see the verification option on the Search Console. So just to confirm, you are set for both?
Thanks @akosiaris, that sounds good. We would like to get this done in Q3 to resolve this blocker and to deploy Liberica everywhere, so please do factor that in for your planning. Thank you!
@JKelsoteel-WMF: Can you please try to log in to wikibooks.org as well so we can see the text of the DNS record that needs to be verified?
Thanks @JKelsoteel-WMF, we will be picking this up today.
Nov 5 2025
Nov 3 2025
Oct 29 2025
Oct 28 2025
Thanks for the help @Jhancock.wm. Marking this as resolved for now.
Oct 27 2025
Brett pointed out that the regression was introduced in https://gerrit.wikimedia.org/r/q/I9fab3e43a39456432eb148df91faffba54b1926e.
Sorry this took a while but this should now be resolved. Thanks to Giuseppe for taking care of it in https://gerrit.wikimedia.org/r/c/operations/puppet/+/1198424, which we will build on further as required, with Traffic responsible for ensuring parity.
Oct 24 2025
Oct 23 2025
Yeah I missed this in the previous fix. I am going to take this tomorrow since now essentially we have to guard the includes as well.
Oct 22 2025
Sorry about this, this should now be fixed. And glad to see that Traffic was added automatically, thanks to @bd808 and @Ladsgroup for their work on this!
This is because of:
Oct 21 2025
Hi @CRoslof: This is another ticket that we would like to take up and will need your help with so that we can reflect it in downstream services as well. Let me know if I should create a Zendesk thread for tracking by other Legal members? Thanks a lot of for bearing with us and helping us clean the ownership.
It looks like spicerack should check that alerts for the downtimed host have been resolved (not in firing state) before deleting the silence/downtime with ALERTS{alertstate="firing", instance=~"cp5018:.*"}
Oct 20 2025
Thanks for filing this task! I think this is a good idea to reduce the manual updates to this list, and something we have failed to keep updated. We will triage this after discussion in Traffic.
Oct 17 2025
Nice job indeed in pursuing this over the years, Brett!
[Adding Raine @kamila as well.]
Thanks for filing the task, @JVanderhoop-WMF. As per the discussion on Slack, the above sounds good.
Oct 16 2025
Thanks to @Jhancock.wm for the help with this!
FWIW doing one or two hosts is more than enough. We will reimage them again anyway so it doesn't make sense IMO for you both to spend time upgrading all of them to trixie. If one or two reimage fine, please leave the rest to us.
Oct 15 2025
I was also looped into a new request today. As part of the birthday initiative, the Fundraising team is developing a customized donation portal under the donate.wiki domain. Would it be possible to set up a redirect for this new portal as well? I don’t have the final destination URL yet, but we’d like to create the domain donate.wikipedia25.org to redirect to the donation portal once it’s ready. Is this something you could help with too?
Oct 14 2025
Thanks for working on this! We will try our best to follow up on our end in making sure that Puppet is not broken on the cache hosts in Beta.
FWIW we have typically reimaged for this in the past. I am not suggesting, just sharing! And given that this is lvs1020, that might be OK? (Leaving to you both for the final decision.)

