Page MenuHomePhabricator

Use uap-core browser-family for bot detection
Open, Needs TriagePublic5 Estimated Story Points

Description

Currently, uap-core is used to detect "Spider" only through the parsed "device family":

Also, more bots are fetched anyway with a global pattern here: https://github.com/wikimedia/analytics-refinery-source/blob/58af9b25c0c1cc8503879ae55450b4ef8838e55a/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Webrequest.java#L129

But some bots like "Feedbin" may neither be detected by our "spiderPattern", neither by looking for "Spider" in "device family".

Let's add more potential bots with other uap provided regexes. e.g: https://github.com/ua-parser/uap-core/blob/master/regexes.yaml#L148
It is already located in user_agent_map['browser_family'].

Here is a POC which detects some bots using browser family: https://phabricator.wikimedia.org/P42881 .

Event Timeline

EChetty set the point value for this task to 5.Jan 16 2023, 4:27 PM