Page MenuHomePhabricator

Pageviews API: Problems accessing data from python (requests)
Closed, ResolvedPublic

Description

The Pageviews API is working correctly when accessed trough a Web Browser, however, when I try to access from a python code using the requests library I get this error:

import requests
 url = "https://wikimedia.org/api/rest_v1/metrics/pageviews/top-by-country/eo.wikipedia.org/all-access/2016/08"                                                                                     

In [4]: a = requests.get(url)                                                                                                                                                                              

In [5]: a.content                                                                                                                                                                                          
Out[5]: b'<!DOCTYPE html>\n<html lang="en">\n<meta charset="utf-8">\n<title>Wikimedia Error</title>\n<style>\n* { margin: 0; padding: 0; }\nbody { background: #fff; font: 15px/1.6 sans-serif; color: #333; }\n.content { margin: 7% auto 0; padding: 2em 1em 1em; max-width: 640px; }\n.footer { clear: both; margin-top: 14%; border-top: 1px solid #e5e5e5; background: #f9f9f9; padding: 2em 0; font-size: 0.8em; text-align: center; }\nimg { float: left; margin: 0 2em 2em 0; }\na img { border: 0; }\nh1 { margin-top: 1em; font-size: 1.2em; }\n.content-text { overflow: hidden; overflow-wrap: break-word; word-wrap: break-word; -webkit-hyphens: auto; -moz-hyphens: auto; -ms-hyphens: auto; hyphens: auto; }\np { margin: 0.7em 0 1em 0; }\na { color: #0645ad; text-decoration: none; }\na:hover { text-decoration: underline; }\ncode { font-family: sans-serif; }\n.text-muted { color: #777; }\n</style>\n<div class="content" role="main">\n<a href="https://www.wikimedia.org"><img src="https://www.wikimedia.org/static/images/wmf-logo.png" srcset="https://www.wikimedia.org/static/images/wmf-logo-2x.png 2x" alt="Wikimedia" width="135" height="101">\n</a>\n<h1>Error</h1>\n<div class="content-text">\n<p>Our servers are currently under maintenance or experiencing a technical problem.\n\nPlease <a href="" title="Reload this page" onclick="window.location.reload(false); return false">try again</a> in a few&nbsp;minutes.</p>\n\n<p>See the error message at the bottom of this page for more&nbsp;information.</p>\n</div>\n</div>\n<div class="footer"><p>If you report this error to the Wikimedia System Administrators, please include the details below.</p><p class="text-muted"><code>Request from 139.47.116.164 via cp6012 cp6012, Varnish XID 1061981502<br>Upstream caches: cp6012 int<br>Error: 403, Scripted requests from your IP have been blocked, please see https://meta.wikimedia.org/wiki/User-Agent_policy. In case of further questions, please contact noc@wikimedia.org. at Mon, 03 Oct 2022 18:28:35 GMT</code></p>\n</div>\n</html>\n'

I have tried this from different computers, IPs, and Python Enviroments (notebooks, cli), and keep getting the same error. I guess that is something about the headers but could figure out what is the actual problem.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I see that the error says:

Scripted requests from your IP have been blocked.

However, the error persists from different IPs.

Sorry, I've read the documentation here: https://meta.wikimedia.org/wiki/User-Agent_policy and everything is clear. I'm going to close this ticket.

Just for the record the solution is written in the documentation above and is:

import requests

url = 'https://example/...'
headers = {'User-Agent': 'CoolBot/0.0 (https://example.org/coolbot/; coolbot@example.org)'}

response = requests.get(url, headers=headers)
taavi claimed this task.