Page MenuHomePhabricator

Audit alerting for `wmf/next` branch creation/update failures to ensure that failures are visible to RelEng and other interested parties
Closed, ResolvedPublic

Description

Work for T398873: Move nightly image build from releases-jenkins to deployment.eqiad.wmnet will result in the container build job being separated from the branch cut job. @bd808 has some concern that this detachment may decrease visibility of branch cut failures by creating separate success/failure signal that seems more directly connected to eventual deployment. An audit of the current failure alerting will either show that the alerting is sufficiently visible and durable or help focus us on ways that that signal can be improved.

Event Timeline

bd808 triaged this task as Medium priority.

@dancy and I looked at this during a project meeting this week. The https://releases-jenkins.wikimedia.org/job/MediaWiki%20branch%20and%20publish%20WMF%20single-version%20image/ job records failures on the Jenkins instance. It also posts a status message on the releng@lists.wikimedia.org mailing list. These signals seem like enough error reporting at this point.

A related question that could use a bit of attention is determining who generally should be taking action when the job fails and what that should typically entail.

Closing per the T399970#11071083 update. The job failure response question will sort itself out inside the RelEng team.