Page MenuHomePhabricator

Weblinkchecker should throttle connections to the same host
Closed, ResolvedPublicFeature

Description

(and probably listen to robots.txt in most cases as well)

In November 2016, a weblinkchecker bot on Tool Labs requested many pages from http://www.minorplanetcenter.net/, at a rate of > 100 req/minute. This caused excessive load on the web server, and resulted on Tool Labs being blocked from accessing the site.

Weblinkchecker should reduce the number of requests to a single site, preferrably to something along the lines of one every 10s or so. There are a large number of links to check, and pages different websites can reasonably be requested in parallel. The overall slowdown shouldn't be too significant.

Access log from external site: F4978348 (only visible to selected users)

Event Timeline

Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald Transcript
Framawiki subscribed.
This comment was removed by Framawiki.

Change 511334 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [IMPR] Weblinkchecker: throttle connections to the same host

https://gerrit.wikimedia.org/r/511334

Xqt triaged this task as Low priority.
Xqt changed the subtype of this task from "Task" to "Feature Request".
Aklapper removed Xqt as the assignee of this task.Jul 2 2021, 5:25 AM
Aklapper added a subscriber: Xqt.

Removing task assignee due to inactivity, as this open task has been assigned for more than two years (see emails sent to assignee on May26 and Jun17, and T270544). Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be very welcome!

(See https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator.)

Change 511334 merged by jenkins-bot:

[pywikibot/core@master] [IMPR] Weblinkchecker: throttle connections to the same host

https://gerrit.wikimedia.org/r/511334