T157206: ORES Overloaded (particularly 2017-02-05 02:25-02:30) describes overloading of the ORES subsystem by a remote Python bot prior to a broader announcement of ORES Action API support, and T159753: Concerns about ores_classification table size on enwiki describes one of the consequences of that overloading.
This task is for defining a simple and reasonable strategy for rate limiting Action API calls for ORES and avoiding unnecessary storage in MariaDB, in order to restore ORES scores retrieval capabilities to the Action API.
Is it possible to do the following?
- Allow scores to be returned in Action API responses provided there are corresponding records in the recent changes corresponding to the revisions. Rely on existing limits for number of revisions.
- When revisions exist but they're not in the recent changes table, don't allow more than X unavailable revision scores to be fetched at a time. Use API continuation in batches of only X revisions at a time for some small X, but don't store them upon fetch; instead, delegate the decision on whether to store or somehow cache in the ORES backend to the ORES backend.
As a follow on action to this task, we'd like to consider the possibility of storing additional scored model output in MariaDB (e.g., wp10), and this will beg the question of whether to normalize the response into MariaDB columns, or to instead simplify assumptions and store the scored model output as a blob per revision. The context of showing or using this additional modeling output would likely be while while a user is reading an individual article, or in the context of applying filtering on a modestly sized result set (imagine top X viewed articles or top X geographically closest articles subsorting).