Page MenuHomePhabricator

Rate limiting/status code 429 for mwclient?
Closed, InvalidPublicBUG REPORT

Description

Hi, I'm the maintainer of the WP 1.0 bot. In the past few weeks, we've been seeing a large number of exceptions originating in our calls to mwclient. They look like this:

2025-07-21 00:01:13,738:ERROR:rq.worker:[Job 1311575e-fbdd-4ad1-829a-4fba843ba6e8]: exception raised while executing (wp1.logic.project.update_project_by_name)
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/rq/worker.py", line 1639, in perform_job
    return_value = job.perform()
                   ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/rq/job.py", line 1331, in perform
    self._result = self._execute()
                   ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/rq/job.py", line 1365, in _execute
    result = self.func(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/wp1/logic/project.py", line 83, in update_project_by_name
    update_project(wikidb,
  File "/usr/src/app/wp1/logic/project.py", line 639, in update_project
    extra_assessments = api_project.get_extra_assessments(project.p_project)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/wp1/logic/api/project.py", line 25, in get_extra_assessments
    page = api.get_page(page_name)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/wp1/api.py", line 62, in get_page
    if not login():
           ^^^^^^^
  File "/usr/src/app/wp1/api.py", line 46, in login
    site = mwclient.Site('en.wikipedia.org',
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/mwclient/client.py", line 129, in __init__
    self.site_init()
  File "/usr/local/lib/python3.12/site-packages/mwclient/client.py", line 149, in site_init
    meta = self.get('query', meta='siteinfo|userinfo',
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/mwclient/client.py", line 233, in get
    return self.api(action, 'GET', *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/mwclient/client.py", line 284, in api
    info = self.raw_api(action, http_method, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/mwclient/client.py", line 424, in raw_api
    res = self.raw_call('api', data, retry_on_error=retry_on_error,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/mwclient/client.py", line 396, in raw_call
    stream.raise_for_status()
  File "/usr/local/lib/python3.12/site-packages/requests/models.py", line 1026, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 429 Client Error: 4914820 for url: https://en.wikipedia.org/w/api.php?meta=siteinfo%7Cuserinfo%7Cuserinfo&siprop=general%7Cnamespaces&uiprop=groups%7Crights%7Cblockinfo%7Chasmsg&continue=&action=query&format=json

It looks like mwclient makes a meta=siteinfo request every time a Site object is created. I naturally realized that this might be causing a large number of unnecessary API requests and causing us to get rate limited.

However, when trying to reproduce the issue with a test, locally, I got the same exception. I hadn't been using a large number of requests from my local dev server, probably only a few total.

This leads me to wonder, is the UserAgent "mwclient/*" being rate limited particularly aggressively?

Thanks!

Event Timeline

This is relatively high priority because the bot is currently offline pending resolution.

Thanks for sharing; we are looking into it.

@Audiodude - At least at the version of mwclient that you appear to be using (0.9.3 per your Pipfile), it looks like mwclient.Site only uses the provided clients_useragent when pool is None.

If that's the case, then it would seem you'll need to set the User-Agent header on the connection pool you provide to Site before hand, in order to align with the User-Agent policy.

Thanks for taking the time and looking into what mwclient version we use in production. Upgrading to 0.11.0 was the first thing I did when attempting to run the test code, but I immediately ran into the issue described above.

Thanks for the follow-up @Audiodude. So, from a quick look at 0.11.0 it looks like the same is true - i.e., in order to set a custom User-Agent when also providing a pool to Site, one needs to do so "manually" on the pool (i.e., clients_useragent does nothing).

Thanks for that. I think you might have incorrect assumptions about the wp1 code, though. We do not attempt to set any custom "WP 1.0 Bot" user agent, and we are not using a connection pool. Since 2018, we have relied on the mwclient/0.* UA, which has worked.

If setting a "WP 1.0 Bot/Audiodude <audiodude@gmail.com>" User Agent would fix our problem, I'd be happy to do so!

Oh nevermind, we do use a connection pool in order to re-use the login cookies! So it seems according to your analysis (which I just confirmed), because we are setting the pool, we are ending up with a null UA? And that is causing us to get rate limited, presumably because of new policy changes?

Thanks for taking a closer look.

Indeed, what's likely happening is that, in the absence of User-Agent being explicitly set, the requests library simply defaults to its generic python-requests/x.y.z User-Agent. While that can work, it's not guaranteed to per the policy (especially if it collides with abusive workloads that might be present at the same).

In any case, making the change to wire the custom User-Agent you've already intended to set into your connection pool would be greatly appreciated, and should help avoid scenarios like the above.

Joe subscribed.

The task is invalid as the bot was indeed using a user-agent that doesn't respect our UA policy., which has been in place since 2010.

This means that at any given moment, we'll block generic UAs for operational needs.

Our plan, though, is to move from episodic to systematic enforcement (see T400119), so I'd urge you to fix your client as soon as possible.

Thanks again @Scott_French for the extremely helpful analysis! I plan to submit a PR to mwclient to update the docs for that method to indicate which parameters are ignored when pool is set.