Page MenuHomePhabricator

Fix ‎very slow performance of Gadget-Cat-a-lot.js
Open, Needs TriagePublic

Description

Originally Cat-a-lot.js made submitted all edits parallel. This caused spikes which caused outages (see T370304). Fix for that was to add 1s delay between edits which made tool very slow.

Basic idea of Cat-a-lot is that user can select files in category view and then add, remove or move them in categories. Usage profile is that mostly users are usually selecting some files (in category view there is max 200 files) and then move them. However, in edge cases user could also do thousands edits per minute with average of 16 successful edits per second.

Proposed fix for this is to limit the concurrent edits to 5 and if maxlag is higher than 1.5s then limit it to 1. This would allow reasonable fast user experience when there is no high load and prevent choking the system with large automated edit streaks.

Example code

Event Timeline

I suggest starting with a lower number of concurrent edits (let's say two) and then bump it up slowly while we are monitoring the databases.

We can do so, but just as a note.

Based on the edits in the revision table, SDC bots continuously perform edits at a higher rate than two concurrent edits without any issues, and their total edit rate is multiple times higher than what was observed with Cat-a-lot. The system was able also to handle short bursts of 200 simultaneous edits (ie. "select all" and then "add/remove cats"), and if that had been a problem, it would have been noticed more frequently.

Based on edits that caused the problems in August was an editing spike with Cat-a-lot, which was five times larger than the daily baseline from Cat-a-lot (180k edits as total in that day), overwhelming Commons. However, SDC bots can do over 500k edits per day with high as 1M as combined total on Commons per day, so the number of edits itself was not the issue. The problem was that Cat-a-lot was coded to continuously make requests as fast as possible over an extended period of time. Most likely, any limit would prevent this, and a higher limit, even more than 5, would not cause any issues.

The biggest difference between Cat-A-Lot and SDC is that CAL makes edits that trigger more locks. They lock rows in category table to update number of members (both removal and additions) and they deadlock or cause lock contention.

just FYI, plan is to move this version to in use at next monday with two concurrent edits.

Changelog to current version

  • Update Cat-a-lot to use libAPI for editing to manage number of parallel edits.
  • Fixing the Special:Search selection user interface
  • Fixing the incorrect dialog height bug

Just as a update, update is live now.

@Ladsgroup, if there is no problems how we should handle the gradual increasing of the number of parallel edits? Ie. would it be too fast just to increase it with 1 per week until the number parallel writes is 5?

Also as another idea, in meanwhile I will try to add some integration with QuickCategories so Cat-a-lot user could offload large edits batches to it instead of doing them inside cat-a-lot.

Just as a update, update is live now.

@Ladsgroup, if there is no problems how we should handle the gradual increasing of the number of parallel edits? Ie. would it be too fast just to increase it with 1 per week until the number parallel writes is 5?

Let's wait a bit and see how it goes.

Also as another idea, in meanwhile I will try to add some integration with QuickCategories so Cat-a-lot user could offload large edits batches to it instead of doing them inside cat-a-lot.

That'd be amazing. If we can get this out the door, that'd be awesome. Specially, if you can just dump the work on QC instead.

Based on edit numbers there is nothing suspicious in cat-a-lot edits in Wikimedia Commons.

Ie. there were 2000 edits from cat-a-lot in 2024-10-29 23:00 - 2024-10-29 24:00 (server database time).

No suspicious peaks in editing per minute stats. Editing speed has been about 1-2 edit per seconds max as total.

There is peak in category changes in Wikimedia Commons between 2024-10-29 23:00 - 2024-10-29 24:00 so it could be possible cause though.

There was mass reverting of bot edits done by reguest in admin noticeboard. It was done by using some mass reverting tool as system tagged edits as rollback by non admin user who was added temporarily to noratelimit user group for the task. This caused 35k edits and 60k category changes in hour. Level is similar than edit numbers by cat-a-lot which caused problems.

Thanks for the investigation! 35K edits in an hour is a lot. It didn't cause issues but it is close. T365303: Move update of category members count to CategoryMembershipChangeJob really should happen soon. I try to ask for some people's time on this.

@Ladsgroup I think that it would be good idea to increase the number of parallel edits from 2 to 4. It doesn't make the difference in terms of server load, but doubling the speed would make significant difference for the people who are moving small number of files (ie. selections under 200 files) as it would double the speed.

@Ladsgroup I think that it would be good idea to increase the number of parallel edits from 2 to 4. It doesn't make the difference in terms of server load, but doubling the speed would make significant difference for the people who are moving small number of files (ie. selections under 200 files) as it would double the speed.

It can cause issues still since it would lead to deadlocks on the same rows. That being said we can do two things:

Just FYI that the proof-of-concept Cat-a-lot+Quickcategories integration is in T397849 for testing.

@Ladsgroup as T365303 is now done, I think that we could incrase parallel edits for selections under 200 edits to 4 and also put T397849 (quickcategories integration) to live for larger batches at same time.

Yeah, just let's wait a bit first (a week) in case we have to revert the job change. There will be lock contention on user_editcount too but hopefully much less pronounced.