Page MenuHomePhabricator

Add per user concurrent search request limiting
Closed, ResolvedPublic

Description

Per user concurrent search request limiting helps us with performance. There are a small handful of users who are hammering us with search queries, and with a really high limit this will help with performance.

Event Timeline

Manybubbles raised the priority of this task from to Needs Triage.
Manybubbles updated the task description. (Show Details)
Manybubbles moved this task to In Dev/Progress on the MediaWiki-Core-Team board.
Manybubbles changed Security from none to None.
Manybubbles added a subscriber: Manybubbles.

No longer blocked. Can be worked on if we have time. All we have to do is configure the config in Cirrus to do it. I suggest doing this only on group0 for a few days before rolling it to more. OTOH I don't really have time to work on it now.

Stalled for lack of time?

Any objection to moving this to the core backlog (from needs review), to be consistent with its state in the cirrus project?

Change 209802 had a related patch set uploaded (by EBernhardson):
Enable CirrusSearch-PerUser PoolCounter on group0 only

https://gerrit.wikimedia.org/r/209802

Change 209802 merged by jenkins-bot:
Enable CirrusSearch-PerUser PoolCounter on group0 only

https://gerrit.wikimedia.org/r/209802

Deployed, but after a quick test using nodejs to spit out 20 concurrent search requests to test.wikipedia.org, i'm not able to trigger an api request that returns the "cirrussearch-backend-error" that should be output. Not sure why yet. It could simply be that i'm not getting 5 requests inside the critical section at the same time, or it might not be working.

Verified this can trigger rate limiting, via a small nodejs test script. Seems safe to promote to all wikis.

var i,
    request = require('request'),
    handler = function(error, response, data) {
        if (error) {
                console.log(error);
                return;
        }
        var decoded = JSON.parse(data);
        if (decoded['error'] && decoded['error']['info'] === 'Search is currently too busy.  Please try again later.' ) {
                console.log('rate limited');
        }
    },
    options = {
        url: "http://test.wikipedia.org/w/api.php?action=query&format=json&list=search&srsearch=foo",
        pool: { maxSockets: Infinity },
    };

for (i=0; i<20; i++) {
        request(options, handler);
}

Change 210622 had a related patch set uploaded (by EBernhardson):
Enable the CirrusSearch per-user pool counter everywhere

https://gerrit.wikimedia.org/r/210622

Change 210622 merged by jenkins-bot:
Enable the CirrusSearch per-user pool counter everywhere

https://gerrit.wikimedia.org/r/210622

Change 214039 had a related patch set uploaded (by EBernhardson):
New option to log but not fail per-user pool counter failures

https://gerrit.wikimedia.org/r/214039

Change 214039 merged by jenkins-bot:
New option to log but not fail per-user pool counter failures

https://gerrit.wikimedia.org/r/214039

Stakeholders: Cirrus operators (and maybe more)
Benefits: Makes it more difficult for people that would abuse the system to do so. Ultimately part of some security recommendations.
Estimate: Its complicated. Code is simple - took a day. But turning it on safely is more complex. Its taking lots of effort.

Deskana added a subscriber: Deskana.

Backlogging this because I don't think it's important. Scream at me if you disagree and we can talk. :-)

Could go either way, this doesn't need much work anymore all the relevant code has been merged and is deploying today. For future reference what remains to be done: Turn on $wgCirrusSearchBypassPerUserFailure and restore the config changes in https://gerrit.wikimedia.org/r/#/c/210622 to prod. Last time we turned that patch on there was a wave of rejected queries. These will now report but not block the query. Monitor fluorine.eqiad.wmnet:/a/mw-log/poolcounter.log.

The hard part is trying to determine what to do with this data, if some ip's need to be whitelisted (proxys, such as qatar) or if some ip's are being correctly blocked.

I feel like this is one of those things that seems simple on the surface but stands a high chance of blowing up into a much, much larger time sink in order to get it exactly right for the reasons you mention (e.g. countries like Qatar going through two IPs, as you say).. Unless there's some significant problem that not doing this is causing, putting it in the freezer seems prudent for now to me.

Change 232650 had a related patch set uploaded (by EBernhardson):
Allow CIDR ranges to be opted into per-user poolcounter

https://gerrit.wikimedia.org/r/232650

Change 232650 merged by jenkins-bot:
Allow CIDR ranges to be opted into per-user poolcounter

https://gerrit.wikimedia.org/r/232650

Change 235274 had a related patch set uploaded (by EBernhardson):
Enable CirrusSearch per-user rate limiting

https://gerrit.wikimedia.org/r/235274

Change 235274 merged by jenkins-bot:
Enable CirrusSearch per-user rate limiting

https://gerrit.wikimedia.org/r/235274