Page MenuHomePhabricator

Unable to generate family for wpbeta:zh with github action (ClientError: (403) Request forbidden)
Open, Needs TriagePublic

Description

Generate family file steps on github action fails with ClientError: (403) Request forbidden, e.g.

Generating family file from http://zh.wikipedia.beta.wmcloud.org/
WARNING: Http response status 403
Traceback (most recent call last):
  File "pwb.py", line 40, in <module>
    sys.exit(main())
  File "pwb.py", line 36, in main
    runpy.run_path(str(path), run_name='__main__')
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "pywikibot/scripts/wrapper.py", line 557, in <module>
    main()
  File "pywikibot/scripts/wrapper.py", line 541, in main
    if not execute():
  File "pywikibot/scripts/wrapper.py", line 528, in execute
    run_python_file(filename, script_args, module)
  File "pywikibot/scripts/wrapper.py", line 154, in run_python_file
    exec(compile(source, filename, 'exec', dont_inherit=True),
  File "pywikibot/scripts/generate_family_file.py", line 335, in <module>
    main()
  File "pywikibot/scripts/generate_family_file.py", line 331, in main
    FamilyFileGenerator(*sys.argv[1:]).run()
  File "pywikibot/scripts/generate_family_file.py", line 151, in run
    w, verify = self.get_wiki()
  File "pywikibot/scripts/generate_family_file.py", line 133, in get_wiki
    w = self.Wiki(self.base_url, verify=verify)
  File "/home/runner/work/pywikibot/pywikibot/pywikibot/site_detect.py", line 48, in __init__
    check_response(r)
  File "/home/runner/work/pywikibot/pywikibot/pywikibot/site_detect.py", line 301, in check_response
    raise err_class(msg)
pywikibot.exceptions.ClientError: (403) Request forbidden -- authorization will not help
CRITICAL: Exiting due to uncaught exception ClientError: (403) Request forbidden -- authorization will not help
Error: Process completed with exit code 1.

See

Event Timeline

The blocks made for T399329: 2025-07-11 traffic overload have caught some of the Microsoft Azure network range which is affecting GitHub Actions. I guess my first question is if these tests could run from Wikimedia infrastructure rather than GitHub Actions. The fundamental challenge today is that we only have IP range based blocking setup for the Beta Cluster without any currently documented way to bypass a range block by using a request header/authentication/etc.

I guess my first question is if these tests could run from Wikimedia infrastructure rather than GitHub Actions.

We could probably use self-hosted runners on WMF infrastructure:

But I am not able to set it up I guess but I am willing to support as much as I can if this is an appropriate solution. Background: Previously we had these tests at Travis and Appveyer. Both text matrix were ported to github action due to T296371 and T368192. These tests after Jenkins CI uses a wider variance of sites, Python releases, OS and tests users (Jenkins tests is for en-wiki and IP user only) and helps to verify that the code is ready to be published as a next stable release if tests passes. There are 128 jobs running on github.

To port these tests to Jenkins looks much more difficult to me and I have no idea if and how this would be possible.

The fundamental challenge today is that we only have IP range based blocking setup for the Beta Cluster without any currently documented way to bypass a range block by using a request header/authentication/etc.

I found out that tests needs five times as much time running on beta than before or on other sites (if we were lucky that the runner's IP is not blocked) and I understand the measure. But it is cumbersome to restart the failing jobs every time in the hope of reaching an unblocked IP. Blocking IP cannot be a long-term solution and you also have to ask yourself what to to if other sites than beta are affected. So there should be any bypass mechanism for trusted CI traffic through headers or tokens or maxlagish throttling. But you know that better than I do.

I guess my first question is if these tests could run from Wikimedia infrastructure rather than GitHub Actions.

We could probably use self-hosted runners on WMF infrastructure:

That might be possible. One of the challenges would be finding folks to monitor and keep these runners working. This is probably not an impossible challenge, but t won't be trivial either.

To port these tests to Jenkins looks much more difficult to me and I have no idea if and how this would be possible.

Moving to tests run by zuul + jenkins would probably be possible, but also annoying at the current moment. The Continuous-Integration-Infrastructure (Zuul upgrade) project is working towards changing a lot of things in that CI pipeline so the work would likely turn out to need an initial implementation and then a follow up project to move from Jenkins Job Builder described tests to the ansible replacement.

Yet another option might be figuring out how to mirror the pywikibot code to gitlab.wikimedia.org and then using the self-service CI pipelines there to run your tests. We currently have both locally hosted and externally hosted gitlab-runners. We do not however have windows or macOS runners which are things I see at least a few of the pywikibot GitHub Actions using.

Blocking IP cannot be a long-term solution and you also have to ask yourself what to to if other sites than beta are affected. So there should be any bypass mechanism for trusted CI traffic through headers or tokens or maxlagish throttling. But you know that better than I do.

IP blocking is likely here to stay. We are fundamentally having the same problem as production wikis trying to block LTA type vandals. The compounding issue here is that it not just edit traffic that is causing us problems, but read traffic as well. The production wikis are having the same core problem with aggressive scraper bots (https://diff.wikimedia.org/2025/04/01/how-crawlers-impact-the-operations-of-the-wikimedia-projects/), but they are taken care of by more people and are also getting a focus project to work on adding more automated traffic management. Unfortunately I have doubts that much of that work will be applicable to the beta cluster wikis due to staffing and technology constraints.

Let's take the conversations about what we might be able to do to T399485: Find a way for pywikibot GitHub Actions to avoid IP range blocks of Microsoft Azure hosted runners. I think that task will be easier for everyone to keep track of over time.