Page MenuHomePhabricator

User-agent policy failures should produce better diagnostics
Closed, InvalidPublicBUG REPORT

Description

To make a long story short, my OAuth app started failing today when run from my laptop with the error:

403 Client Error: Forbidden for url: https://meta.wikimedia.org/w/index.php?title=Special%3AOAuth%2Finitiate

I assume this is related to T400119. I don't yet fully understand everything that is going on, but it was working yesterday and works now when run inside toolforge. I get the "it works inside toolforge" part, based on what T400119 says, but I don't understand why it worked yesterday on my laptop.

I am aware of the user-agent policy and was setting an appropriate user-agent string in my app. But apparently there's a different code path through the python-social-core library that didn't pick that up, so it was sending the default "User-Agent: python-requests/2.32.4" in the OAuth flow.

Anyway, what should happen is the 403 error should be more explicit. It's good that it tells you what URL it failed on, but it should also tell you that the failure was due to a user-agent policy violation, and include a link to T400119 or some other place which describes this policy. It should also include the user-agent string that was received. Just saying "Forbidden" doesn't give the user any idea what actually went wrong. I spent most of this afternoon trying to figure out what I had done wrong with my consumer credentials, which was totally unrelated to the real problem.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I think security best practices for 403s are to return as little information as possible. You don't really want to advise bad actors how to get around the 403.

As far as it working on your laptop previously, you mentioned in your email that you fixed it by overriding the UA directly in the requests library. First, as someone who has used many libraries that interact with requests, you might alternately be able to use a "Session" (https://requests.readthedocs.io/en/latest/user/advanced/) and be able to encourage your OAuth library to use that instead of requests directly, to avoid the hack.

Also, depending on what other changes you mades between when it worked and when it stopped working, you may have activated completely different networking code, which had the UA propagation problem, while the previous version didn't. It's really impossible for anyone to know without seeing your code, both before and after. Occam's razor says it wasn't any network or code changes on the Cloud services side.

You don't really want to advise bad actors how to get around the 403.

That's true in general, but in this case, of course we do. We want people to set user agents.

Just for future reference, the fix ended up being to upgrade to social-auth-core 4.8.0 and set SOCIAL_AUTH_USER_AGENT in my django settings file.

But, I still think this needs a more user-friendly 403 message :-)

According to this python-social-auth issue log, we apparently used to have a better (perfectly reasonable) 403 message:

Please set a user-agent and respect our robot policy https://w.wiki/4wJS. See also T400119 (link)

That was written in August 2025, so fairly recently. Was this deliberately obfuscated?

taavi subscribed.

403 Client Error: Forbidden for url: https://meta.wikimedia.org/w/index.php?title=Special%3AOAuth%2Finitiate

This sounds like an error message generated by the client library, and only consists of the request URL and the numeric HTTP status code. We do serve a helpful HTTP body for u-a policy blocks but of course can't force clients to display it.

Interesting. Just for the sake of future phab archeologists, yeah, it turns out if I hit the WMF server directly from the command line, I get a nice, useful, error message:

% curl -v -H 'Authorization: OAuth oauth_nonce="xxx", oauth_timestamp="xxx", oauth_version="1.0", oauth_signature_method="HMAC-SHA1", oauth_consumer_key="xxx", oauth_callback="http%3A%2F%2F127.0.0.1%3A8000%2Foauth%2Fcomplete%2Fmediaw\
iki%2F", oauth_signature="xxx"\r\n\r\n'  'https://meta.wikimedia.org//w/index.php?title=Special%3AOAuth%2Finitiate' -H 'User-Agent: python-requests/2.32.5'

...

< HTTP/2 403
< content-length: 126
< content-type: text/plain
< x-request-id: 36fa61a9-9c0c-40ab-85b1-0214a7673f87
< server: HAProxy
< x-cache: cp1112 int
< x-cache-status: int-tls
< x-analytics:
<
Please set a user-agent and respect our robot policy https://w.wiki/4wJS. See also https://phabricator.wikimedia.org/T400119.
* Connection #0 to host meta.wikimedia.org left intact

I'm not sure how much further I want to go down this rabbit hole, but from reading through the code, it does indeed look like that original error string is indeed getting dropped somewhere in the chain of urllib3/requests/social-auth catching and re-raising exceptions and not preserving all the information. Or possibly even making it all the way up into the django core exception middleware which just fails to confess all it knows.

Audiodude changed the task status from Invalid to Resolved.Sun, Feb 15, 8:14 PM

Glad you were able to figure it out, at least somewhat. I think your intuition is correct. There a lot of layers/middleware in your stack and one of them is probably dripping the response body and just re-wrapping as a generic 403.

Aklapper changed the task status from Resolved to Invalid.Sun, Feb 15, 8:30 PM