ATM we run a scheduled maintenance script to send image-suggestions notifications to experienced users
We need to change how this is done before we start sending section-image-suggestions notifications:
From @Ladsgroup:
The architecture of "let's update data from services by introducing regular cron maint scripts" is okay for small cases or small number of wikis but it has been creeping up in many places including Growth experiments and is quite unsustainable in so many ways:
- It's not distributed, all of our mw crons are in mwmaint1002 and basically a single point of failure. Any noisy neighbor can cause wide-scale disruption.
- It's quite wasteful. The updates usually happen by checking all of wiki or something like that. It needs a more robust event-driven architecture. You backfill the data once and with any change you trigger a job to update that page.
- Time-wise it is problematic. We don't have a central catalog of mw crons and when they get started yet. They put different levels of pressure on our system and if this way of doing things continue, in no time we will have outages caused by concurrent mw scripts bringing down database or something like that. The distribution of such changes must be automatic not through guessing or picking "low-load" times and crossing our fingers.
- There is no criticality levels in mw maint scripts. Higher priority scripts are being ran in the same place as low priority ones. It is quite possible a low-prio script could cause issues on high prio scripts (manual or automatic). e.g. the ones that clean up old private data so we could comply with data retention policies.
- This is basically making a system that is already fragile and making it even more fragile.
Generally I'm okay with having crons that clean up data, but regular updates from services seems wrong, they should build pipelines to update the database (mostly through mediawiki jobs) and then they can have monthly "let's update everything" crons.