Page MenuHomePhabricator

Can not download logs from https://ws-export.wmcloud.org/logs with wget as anubis prevents bots
Closed, ResolvedPublicFeature

Description

Feature summary (what you would like to be able to do and where):

I like to create a report of books downloads in various formats for indic languages.
Long time back, did the same with the previous version of ws-export which gave the dump of sqlite db.

https://github.com/KaniyamFoundation/WikisourceEbooksReport

Now, the logs are provided as SQL here
https://ws-export.wmcloud.org/logs/

Can not download the logs using wget, as anubis prevents wget.

Provide a easy way to download the logs from wmcloud or upload the same logs to archive.org too

so that we can download, parse and create reports.

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):

 wget https://ws-export.wmcloud.org/logs/2025.sql.gz
--2025-11-17 21:34:35--  https://ws-export.wmcloud.org/logs/2025.sql.gz
Resolving ws-export.wmcloud.org (ws-export.wmcloud.org)... 2a02:ec80:a000:1::1d, 185.15.56.49
Connecting to ws-export.wmcloud.org (ws-export.wmcloud.org)|2a02:ec80:a000:1::1d|:443... connected.
HTTP request sent, awaiting response... 
403 Forbidden
2025-11-17 21:35:11 ERROR 403: Forbidden.

Downloading using wget is prevented by anubis.

Benefits (why should this be implemented?):

If the logs are open for dowload, we can build a one big dashboard for all the languages downloads report or each language can have their own reporting tools.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

You need to set a user agent, any value seems to be sufficient, e.g. wget --user-agent='bob' https://ws-export.wmcloud.org/logs/2025.sql.gz (but if you're scripting this, best to make it a sensible user agent!).

Tshrinivasan claimed this task.

anubis prevents wget, curl etc. tried adding the useragent. still not working.

I think your deleted comment mentioned that you were using a User Agent that looked like a web browser's. This will be caught by Anubis. You need to specify any other User Agent, preferably one that identifies what is running these requests (e.g. bob in my example above is a silly but working example).

Thanks @Samwilson the user agent as bob works magically. I gave the useragent as browser as usual. but anubis is kind enough to allow bob as user agent. it works for me too. Thanks for the info.

Samwilson claimed this task.

No no that was just an example! :-) You should use one that conforms to the WMF User-Agent Policy.