The Stats API does not include ptrp_deleted in its query in getUnreviewedArticleStat(): https://quarry.wmflabs.org/query/30020. The list API does: https://quarry.wmflabs.org/query/30019.
This leads to scenarios like this:
I've selected "Unreviewed articles" and am sorting by oldest which appears to be from May 2018. But the stats say the oldest article is from Jan 28, 2006 (4,626 days ago).
A possible fix would be to include ptrtp_deleted = 0 in getUnreviewedArticleStat():
$conds = [ 'ptrp_reviewed' => 0, 'page_id = ptrp_page_id', 'ptrp_deleted' => 0, 'page_is_redirect' => 0, 'page_namespace' => $namespace ];
More generally, there is a disconnect between the code path for stats and list API.
- Unreviewed stats come from`PageTriageUtil::getUnreviewedArticleStat()`, which has its own query
- reviewed stats are generated by getReviewedArticleStat, also with its own query
- But filtered stats come from getArticleFilterStat() which then calls the List API with additional params
I wonder if we could simplify things with a single call to the PageTriage List API, and that response would include relevant stats. The filtered stats which are more expensive to compute would already be part of the list API response. The unreviewed/reviewed stats would tack on two additional queries to each list API request, but we'd cut in half the GET requests, and unreviewed/reviewed info is stored in Memcache as well. We'd be able to remove a bunch of code and simplify the logic for list and stats.