Page MenuHomePhabricator

Survey the third-party library market for UA policy compliance
Open, LowPublic

Description

There are several popular third-party libraries for accessing Wikipedia (by scraping or by API) that don't comply with the User-Agent policy. Some don't set any UA, so they come in as python-requests or node-fetch or whatever. In other cases, the library sets a UA, but that isn't descriptive enough, and some libraries don't provide an option for users to override that default. Users of those libraries sometimes get throttled due to violating the UA policy, and contact us for help.

Obviously we can't be responsible for the correctness of every library out there. But in a best-effort sort of way, we could make a list of the most popular libraries associated with non-compliant UA strings in our logs, go find those projects (assuming they're open source), and send pull requests to their code and documentation. That would help us by increasing the proportion of useful UA strings, and it would help users to not suddenly get 429s and have to go on a Learning Journey about UA headers.

By request, assigning to @CDanis initially for that logdiving expedition.