Page MenuHomePhabricator

Expand poolcounter heuristics to better capture automated requests
Closed, ResolvedPublic

Description

We're seeing some odd patterns in our search that are consuming significant amounts of resources. We would like to move these requests into the CirrusSearch-ExpensiveFullText or the CirrusSearch-Automated buckets.

Decide on additional heuristics and implement them.
Here is a list of counts of requests on web with no cookies and requesting for an offset broken by day for september 2025

day	count
___	_________
1 	313236
2 	101544
3 	105899
4 	168357
5 	164263
6 	157980
7 	225061
8 	336701
9 	457321
10 	1038800
11 	1700177
12 	2406403
13 	1774566
14 	1553928
15 	2291088
16 	2197842
17 	2351362
18 	3208001
19 	3264179
20 	3433068
21 	3257574
22 	2818838
23 	4163360
24 	3410490 <-- only a partial day so far

Event Timeline

If a request is a web request and contains no cookies and contains an offset -> Automated

open question: Do they have to query commonswiki_file to end up there?

Should all commonswiki_file requests with offset go to ExpensiveFullText?

Change #1191157 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] Expand poolcounter heuristics for automated requests

https://gerrit.wikimedia.org/r/1191157

Change #1191157 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Expand poolcounter heuristics for automated requests

https://gerrit.wikimedia.org/r/1191157