I discussed this with @hashar as a potential way of discovering interacting maintenance scripts, and have a better picture of what is going on on mediawiki and its database servers. The spark of this proposal was T136150, which was started with little awareness, and created some light issues due to its long-running nature. In particular, I would not be confident on running a schema change on enwiki at the same time that that particular maintenance is running, as -while schema changes cannot interact with deployments (unless, obviously, one of each breaks something)- they interact a lot with increase activity due to long running scripts and they make things like dc and master db failovers really difficult.
So the proposal would be to have some kind of way of notifying long-running ongoing changes (specially those related to the database) such as:
- Schema changes: they do not block deployments, but can interact with long-running batch jobs. They can take weeks to be applied, such as T139090, so they cannot really be reduced to a single deployment slot (and it wouldn't be fair to block regular deployments). No schema change I made has gone wrong to create large problems, but there can always be a first time.
- Long-running maintenance jobs: e.g. such as update collation jobs T136150. I am not talking about "I deploy and then run this script that takes 10 minutes"; "i18n updates" or anything that takes less than a deplooyment window. I am talking about those that happen in the background and can take hours or days to be executed.
- High impacting Operations tasks such as DC failovers, network maintanance and application servers rolling upgrades.
@hashar mentioned the possibility of adding a section at the top of Deployments where me and developers can update those ongoing tasks. I would like to hear @greg 's opinion on that. Also helping me by communicating and, trying to enforce this (even if the only thing we can do is update the written policies and send an email asking for this). The section would not be maintained by Releng, each individual would do it and I would keep an eye on it and try to keep it up to date.
[https://wikitech.wikimedia.org/wiki/Deployments/Inclusion_criteria | The inclusion criteria] says that "Database schema changes" should be included on the Deployment calendar, but that doesn't have into account that no longer most Schema changes are high impacting (requiring read-only) but on the other side, They can take weeks to be deployed. I want to fix that.