Follows-up from incident T241251.
Proposal: An Icinga alert of some sorts that fires if there is any server in a production app server cluster serving a MediaWiki version other than one of the current version(s), as defined by the deployment server or a mwmaint server (e.g. noc.wm.o).
This is meant to catch a wide range of possible failure scenarios such as:
- a server missing from dsh.
- scap syncs failing consistently for a prolonger period of time to the point that it be more than a week behind.
Why: It makes it difficult to reason about the integrity and security of production if an app server could be significantly behind. In particular, if a server is able to talk to one or more shared services like Memcached, session store, job queue, external store, Swift, or Graphite; then a server out there with an outdated copy of MediaWiki could behave in ways developers do not account for.
This is because we generally assume that for internally breaking changes we only keep compatibility until the branch is fully deployed and the next one starts, after that we may assume the state to be at that point and never go back more than 2 versions. Violating this assumption could cause corruption or other damage.