Page MenuHomePhabricator

Add ores like threshold support in LW for revscoring models
Open, Needs TriagePublic

Description

While examining at access logs on logstash I saw that quite a lot of requests related to thresholds that are not made by the ores extension (made by user_agent python-requests)

select Time, level, host, user_agent, uri, return_code, response_size, method, duration from logstash-default-1-7.0.0-1-2023.07.08
where host like '%ores%'
and uri like '%thresholds%'
limit 100;

This should be further investigated and if required we should do the following:

    • Check if the requested thresholds are the default (hardcoded ones) that we use in ores extension. If so we can just do a similar thing like we do in the extension and serve them from a file.
  • If the thresholds are different and we want to support this then we will need to work on the revscoring model server to allow querying the model directly when extra payload is included in the post body of the request. Then we could handle these requests in ores-legacy by translating them in the corresponding LW requests.

Event Timeline

After running this query on logstash

select Time, level, host, user_agent, uri, return_code, response_size, method, duration from logstash-default-1-7.0.0-1-2023.07.21
where host like '%ores%'
and uri like '%thresholds%'
and user_agent NOT IN ('ChangePropagation/WMF', 'MediaWiki/1.41.0-wmf.17', 'MediaWiki/1.41.0-wmf.18', 'MediaWiki/1.41.0-alpha')
limit 100;

I figured out there is this bot that runs threshold requests

IABot/2.0 (+https://meta.wikimedia.org/wiki/InternetArchiveBot/FAQ_for_sysadmins) (Checking if link from Wikipedia is broken and needs removal)