Page MenuHomePhabricator

Investigate raising max edit count above 500,000
Open, Needs TriagePublicFeature

Description

Feature summary (what you would like to be able to do and where):

I have made too many edits for XTools to show anymore - when this happened previously somebody at WMF Labs tweaked something. Can the same happen again please? @GiantSnowman

Event Timeline

I sometimes wonder whether a graceful fallback might be possible for high-volume editors. Maybe everything in the last year, rather than everything? Or the last 100,000 edits, rather than everything?

I sometimes wonder whether a graceful fallback might be possible for high-volume editors. Maybe everything in the last year, rather than everything? Or the last 100,000 edits, rather than everything?

That's basically what T182182: Analyze most recent 400,000 edits instead of rejecting query in user-related tools is asking for. The issue is the ordering of revisions, which effectively (or at least last I checked) means we have the same query plan no matter how many edits we process. Going by a date range is much more feasible and likely to speed things up in most cases. That feature is tracked at T202552 (only the Edit Counter is left). I can try to prioritize that, but the issue is what if there are a million edits in the past year? The query is then still very likely to time out.

I think what I would like to do is have basically two thresholds -- one is the actual limit that can never be exceeded, and that's something really high like 800K edits. The other is the lower provisional limit that only requires you to login to proceed. The reason for this is web crawlers and the like will pound away at XTools, and the only real way to stop them is with a login wall. Generally speaking, for analytical tools it's totally fine for a query to take a long time to run, so long as the person truly wants and cares about that data and is not just needlessly consuming resources (as is the case with bots).

I will add however that the user in question, GiantSnowman, is not actually subject to the current limit, which is 600,000 edits. When I tried to run them in the Edit Counter, most of the queries were automatically killed. I can increase the query timeout as well, but that's a risky thing to do with all the web crawler traffic we receive. Implementing the provisional limit as described above will likely alleviate that, so I'll start with that and go from there.

Old edits do not change often, perhaps just some option to cache stats based on the olders edits and have more recent ones added to the cached ones dynamically? (And potentially have an option to invalidate cache every now and then, potentially just manually but with a throttle).