Page MenuHomePhabricator

Find a way for pywikibot GitHub Actions to avoid IP range blocks of Microsoft Azure hosted runners
Open, Needs TriagePublicFeature

Description

Pywikibot runs a relatively large set of post-merge tests via GitHub Actions. Some of these tests use Beta Cluster wikis as their target for end-to-end testing of various features.

The efforts to exclude unwanted bots from {T393487} have recently blocked some of the IP addresses used by GitHub Actions. GitHub Actions uses Microsoft Azure to host many (all?) of it's runners. There are over 5000 (!) IP ranges listed at https://api.github.com/meta that GitHub Actions might make requests from.

Some potential options:

  • Allowlist 5000+ CIDR ranges and keep that list updated.
  • Setup self-hosted GitHub runners for use by https://github.com/wikimedia/ organization projects.
  • Add a SOCKS5 proxy to the appropriate test suites to tunnel traffic to an exit that is unlikely to be blocked
  • Migrate all of these tests to a CI platform that is "in-house" (Zuul or GitLab CI) and unlikely to be blocked

See also:

Steps to implement SOCKS proxy from GitHub action to WMCS

  • @Xqt makes a Developer account specifically to act as the credential holder that can build an ssh SOCKS5 tunnel from GitHub Actions to bastion.wmcloud.org.
  • @Xqt adds an ssh public key to that new Developer account and keeps track of the associated public key for the GitHub Actions configuration.
  • @Xqt asks @bd808 to make the new Developer account a member of the bastion project so it can ssh in.
  • @bd808 does the needful
  • @Xqt figures out how to add config to the GitHub Actions to establish an ssh tunnel from the Action runner to bastion.wmcloud.org. A pure cli way to do this would be something like ssh -o StrictHostKeyChecking=accept-new -f -N -D 127.0.0.1:1080 -i $PRIVATE_KEY_FILE $USER@bastion.wmcloud.org
    • -o StrictHostKeyChecking=accept-new: Accept offered host key for any host not already in the known hosts file
    • -f: Background ssh process after connecting
    • -N: Do not exec a remote command
    • -D 127.0.0.1:1080: Create a SOCKS5 proxy listening on 127.0.0.1:1080 and terminated on the ssh connected host
    • -i $PRIVATE_KEY_FILE: Use the private key in $PRIVATE_KEY_FILE
  • @Xqt adds the needed equivalent of export HTTPS_PROXY="socks5h://127.0.0.1:1080" to the GitHub Actions to tell requests to proxy traffic though the tunnel and do DNS resolution on the proxy termination side so that the internal network IPs are contacted when traffic flows over the tunnel. There are some weird things that might happen if the DNS is done outside the Cloud VPS network. Public IPv4 addresses in Cloud VPS work in ways that are sometimes confusing.

(Copied from 399485#11011491 to keep track of the steps)

Event Timeline

I guess my first question is if these tests could run from Wikimedia infrastructure rather than GitHub Actions.

We could probably use self-hosted runners on WMF infrastructure:

But I am not able to set it up I guess but I am willing to support as much as I can if this is an appropriate solution. Background: Previously we had these tests at Travis and Appveyer. Both text matrix were ported to github action due to T296371 and T368192. These tests after Jenkins CI uses a wider variance of sites, Python releases, OS and tests users (Jenkins tests is for en-wiki and IP user only) and helps to verify that the code is ready to be published as a next stable release if tests passes. There are 128 jobs running on github.

To port these tests to Jenkins looks much more difficult to me and I have no idea if and how this would be possible.

The fundamental challenge today is that we only have IP range based blocking setup for the Beta Cluster without any currently documented way to bypass a range block by using a request header/authentication/etc.

I found out that tests needs five times as much time running on beta than before or on other sites (if we were lucky that the runner's IP is not blocked) and I understand the measure. But it is cumbersome to restart the failing jobs every time in the hope of reaching an unblocked IP. Blocking IP cannot be a long-term solution and you also have to ask yourself what to to if other sites than beta are affected. So there should be any bypass mechanism for trusted CI traffic through headers or tokens or maxlagish throttling. But you know that better than I do.

I guess my first question is if these tests could run from Wikimedia infrastructure rather than GitHub Actions.

We could probably use self-hosted runners on WMF infrastructure:

That might be possible. One of the challenges would be finding folks to monitor and keep these runners working. This is probably not an impossible challenge, but t won't be trivial either.

To port these tests to Jenkins looks much more difficult to me and I have no idea if and how this would be possible.

Moving to tests run by zuul + jenkins would probably be possible, but also annoying at the current moment. The Continuous-Integration-Infrastructure (Zuul upgrade) project is working towards changing a lot of things in that CI pipeline so the work would likely turn out to need an initial implementation and then a follow up project to move from Jenkins Job Builder described tests to the ansible replacement.

Yet another option might be figuring out how to mirror the pywikibot code to gitlab.wikimedia.org and then using the self-service CI pipelines there to run your tests. We currently have both locally hosted and externally hosted gitlab-runners. We do not however have windows or macOS runners which are things I see at least a few of the pywikibot GitHub Actions using.

Blocking IP cannot be a long-term solution and you also have to ask yourself what to to if other sites than beta are affected. So there should be any bypass mechanism for trusted CI traffic through headers or tokens or maxlagish throttling. But you know that better than I do.

IP blocking is likely here to stay. We are fundamentally having the same problem as production wikis trying to block LTA type vandals. The compounding issue here is that it not just edit traffic that is causing us problems, but read traffic as well. The production wikis are having the same core problem with aggressive scraper bots (https://diff.wikimedia.org/2025/04/01/how-crawlers-impact-the-operations-of-the-wikimedia-projects/), but they are taken care of by more people and are also getting a focus project to work on adding more automated traffic management. Unfortunately I have doubts that much of that work will be applicable to the beta cluster wikis due to staffing and technology constraints.

The proxy idea started with @thcipriani while discussing the self-hosted runner concept:

[16:55]  <thcipriani> I think keeping it alive shouldn't be too bad, but it'll be yet another thing to keep up-to-date for a strange one-off. I wonder if there's some kind of opensource cloudflare tunnel kinda thing that would be simpler here. Something to make the request look like it's coming from inside the house when folks hit a specific url. That would be less surface area than a github runner.
[16:57]  <thcipriani> maybe not tho
[17:02]  <    bd808> In theory a SOCKS5 proxy or similar would be possible. that might even be something that there is an existing Action for.
[17:04]  <    bd808> https://github.com/marketplace/actions/ssh-socks-action

This is probably the quickest option to test as a possible solution. it would basically need a new Developer account to act as the service account for the ssh access, an ssh keypair, and the new Developer account being added to the Cloud VPS bastion project to give it a place to terminate the SOCKS5 tunnel.

Proxy usage sounds promising . It’s usage is already available through requests package:

Nice. The steps to test this out then are probably something like:

  • @Xqt makes a Developer account specifically to act as the credential holder that can build an ssh SOCKS5 tunnel from GitHub Actions to bastion.wmcloud.org.
  • @Xqt adds an ssh public key to that new Developer account and keeps track of the associated public key for the GitHub Actions configuration.
  • @Xqt asks @bd808 to make the new Developer account a member of the bastion project so it can ssh in.
  • @bd808 does the needful
  • @Xqt figures out how to add config to the GitHub Actions to establish an ssh tunnel from the Action runner to bastion.wmcloud.org. A pure cli way to do this would be something like ssh -o StrictHostKeyChecking=accept-new -f -N -D 127.0.0.1:1080 -i $PRIVATE_KEY_FILE $USER@bastion.wmcloud.org
    • -o StrictHostKeyChecking=accept-new: Accept offered host key for any host not already in the known hosts file
    • -f: Background ssh process after connecting
    • -N: Do not exec a remote command
    • -D 127.0.0.1:1080: Create a SOCKS5 proxy listening on 127.0.0.1:1080 and terminated on the ssh connected host
    • -i $PRIVATE_KEY_FILE: Use the private key in $PRIVATE_KEY_FILE
  • @Xqt adds the needed equivalent of export HTTPS_PROXY="socks5h://127.0.0.1:1080" to the GitHub Actions to tell requests to proxy traffic though the tunnel and do DNS resolution on the proxy termination side so that the internal network IPs are contacted when traffic flows over the tunnel. There are some weird things that might happen if the DNS is done outside the Cloud VPS network. Public IPv4 addresses in Cloud VPS work in ways that are sometimes confusing.

Safely storing and using the ssh private key from GitHub Actions is something that @Xqt should research as part of this too. This is

Ah, this is not a http/https-proxy but SOCKS5 and requests needs requests[socks] which is PySocks. Hope this still works because the package is unsupported for 8 years.

@Xqt I see passing tests upstream marked as using "wpbeta" (e.g. https://github.com/wikimedia/pywikibot/actions/runs/18721896853/job/53396228584). Does this mean that things are working now? Can this task be updated with info about whay y'all had to change and resolved if so?

@bd808 Some of those failing tests are marked to be skipped (1) due to T399367 but code coverage obviously shows that they pass (2). So yes, it is working again now. So I think this issue can be closed-

In T426036#11915281, @bd808 wrote:

We have blocked Microsoft Azure deliberately. It has been implicated multiple times in traffic overload bursts for Beta Cluster. This feels like a duplicate of T399415: Unable to generate family for wpbeta:zh with github action (ClientError: (403) Request forbidden) and T399485: Find a way for pywikibot GitHub Actions to avoid IP range blocks of Microsoft Azure hosted runners.

Opening up Beta Cluster to an easy crawler source like Azure does not feel like a viable option with the current (lack) of resources to make fine grained blocking attempts for scraper bot traffic.

Is it possible to use test.wikipedia.org for your pywikibot tests? If not, is there some other environment that we could make available for your test suites that need a live wiki?

Add a SOCKS5 proxy to the appropriate test suites to tunnel traffic to an exit that is unlikely to be blocked

I do not have a deep understanding of the pywikibot tests, but assuming that generally all pywikibot traffic runs through requests it should be possible to do something like ssh -f -N -D 127.0.0.1:1080 serviceuser@bastion.wmcloud.org; export ALL_PROXY="socks5://127.0.0.1:1080 to setup a socks5 proxy that tunnels into the Cloud VPS network via ssh and configure requests to use it. It would be prudent to use a specially created Developer account for this that we can ensure only has basic ssh access and no more elevated rights in case the ssh credentials leak from the CI runner somehow.

Add a SOCKS5 proxy to the appropriate test suites to tunnel traffic to an exit that is unlikely to be blocked

I do not have a deep understanding of the pywikibot tests, but assuming that generally all pywikibot traffic runs through requests it should be possible to do something like ssh -f -N -D 127.0.0.1:1080 serviceuser@bastion.wmcloud.org; export ALL_PROXY="socks5://127.0.0.1:1080 to setup a socks5 proxy that tunnels into the Cloud VPS network via ssh and configure requests to use it. It would be prudent to use a specially created Developer account for this that we can ensure only has basic ssh access and no more elevated rights in case the ssh credentials leak from the CI runner somehow.

Eh. I forgot that requests made SOCKS support an optional library (requests[socks]). It is not clear in the upstream docs as they literally show a export ALL_PROXY="socks5://10.10.1.10:3434" example, but then have a separate section for SOCKS support.

One possible work around for the requests SOCKS issue would be to still use ssh to create a SOCKS tunnel and then run an additional https proxy like Privoxy between requests and the SOCKS proxy.

This comment was removed by Xqt.

@bd808: I made some tests with SOCKS and it looks like this is the ways to go.
Refer my tests at https://github.com/xqt/pwb/actions/runs/25926322872/job/76208431713 vs. the failing tests https://github.com/wikimedia/pywikibot/actions/runs/25914790840/job/76168653767 especially the tests on beta sites.

Xqt updated the task description. (Show Details)