This has contributed to the outages we have had in the past couple of weeks (see the parent ticket). The concurrency should go down to avoid overwhelming the primary database with too many writes.
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | Security | Ladsgroup | T370304 Bursts of occasional severe contention on s4 (commonswiki) primary mariadb causing recurrent user-facing outages on all wikis | ||
| Duplicate | cscott | T370624 Reduce concurrency of RecordLintJobs or shard it per section |
Event Timeline
Should this be done in the job queue? Or is there something we can do inside RecordLintJob? Is there an example of other jobs that are sharded by section?
OK, I found https://gerrit.wikimedia.org/g/mediawiki/services/change-propagation/jobqueue-deploy/+/05420ad000caa34a9351de4774d0196a860ca869/scap/vars.yaml#88 and I think this is probably a bit past what I feel comfortable doing so I'll leave it for someone else.
I'll note that T330036#9791309 will also address it by moving the updates into refreshLinks rather than having a separate job.
This seems harder to do given the joins we're doing in queries already, I don't want to make it more difficult for editors to get access to the linter data :/
@Ladsgroup what's the status of this task? Is there something that still needs to be done?
This still needs to be implemented, lint jobs do a lot of db writes and it would be much better if they were sharded per section but someone needs to make a decision on which direction (shard per section, merge into refreshlinks, etc. etc.)
We decided at the last engineering offsite that Lints are going to be put into ParserOutput, and that we'll move the DB maintenance to RefreshLinksJob, although not until RLJ is powered by Parsoid (T393716). I'm going to close this as a duplicate of T393717: Put lints in ParserOutput/RefreshLinksJob to reflect this consensus.