Page MenuHomePhabricator

Wikidata updater from wikidata-query-rdf is blocked
Open, Needs TriagePublic

Description

At geneea.com we want to run a mirror of wikidata.org triple store and keep it in sync. Recently the updater is subject to restrictions blocking abusive traffic and there is no workaround.

We have maintained the mirror populated from published dumps and retrieved updates for a few years. For this purpose we run the updater from https://github.com/wikimedia/wikidata-query-rdf (currently version 0.3.159). Sometimes around the end of 2025 / early Jan 2026 we noticed that updater application is now blocked.
We're setting User-Agent string for our requests according to Wikidata policy (Geneea.WikidataMirrorBot/1.1 (https://geneea.com/; support@geneea.com)) but it doesn't help. We are able to update items but steady traffic is blocked.
The error from updater look like this:

Cannot fetch entity at https://www.wikidata.org/wiki/Special:EntityData/Q38277710.ttl?flavor=dump&nocache=1769731607219: UNEXPECTED_RESPONSE  status 429

After a series of these 429 responses there is also more elaborate version from /w/api.php when it asks for next batch of changes

Caused by: java.io.IOException: HTTP request to https://www.wikidata.org/w/api.php?format=json&action=query&list=recentchanges&rcdir=newer&rcprop=title%7Cids%7Ctimestamp&rcnamespace=0%7C120&rclimit=100&continue=-%7C%7C&rccontinue=20260105123747%7C2527180622 failed: 429 response:<!DOCTYPE html>
...
<h1>Error</h1>

<p>Your bot is making too many requests. Please reduce your request rate or contact bot-traffic@wikimedia.org (f263c81)</p>
</div>
</div>
<div class="footer"><p>If you report this error to the Wikimedia System Administrators, please include the details below.</p><p class="text-muted"><code>Request served via cp3072 cp3072, Varnish XID 350969705<br>Upstream caches: cp3072 int<br>Error: 429, Your bot is making too many requests. Please reduce your request rate or contact bot-traffic@wikimedia.org (f263c81) at Thu, 29 Jan 2026 23:46:49 GMT<br><details><summary>Sensitive client information</summary>IP address: 144.76.100.140</details></code></p>
</div>

How can we ask for increased request rate if we want to run our Wikidata mirror? Or what would be our option to keep it in sync and stay under request quota? We have mailed to bot-traffic@wikimedia.org several times to no avail. If needed we can supply also list of IP addresses where the traffic comes from. And we are willing to discuss what needs to be done to keep the updates running.

For a while we were also hit by https://phabricator.wikimedia.org/T402959 when testing simple queries directly against https://query.wikidata.org/ but this is now fixed. OTOH we want to run heavy queries regularly and hence the mirror is needed.

Event Timeline

Hi @Radim.kubacki, it seems you might have already chatted with my colleagues at the WMF and your request is being handled by a different team. Could you then please close this ticket if that's the case?