Work for T398873: Move nightly image build from releases-jenkins to deployment.eqiad.wmnet will result in the container build job being separated from the branch cut job. @bd808 has some concern that this detachment may decrease visibility of branch cut failures by creating separate success/failure signal that seems more directly connected to eventual deployment. An audit of the current failure alerting will either show that the alerting is sufficiently visible and durable or help focus us on ways that that signal can be improved.
Description
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | Lferreira | T215217 deployment-prep (beta cluster): Code stewardship request | |||
| Open | Goal | None | T369112 Pretrain (née Group -1) QTE validation environment | ||
| Resolved | Goal | dancy | T398868 [FY25-26 WE6.1.1] Move image build to deployment server and update for backports | ||
| Resolved | bd808 | T399970 Audit alerting for `wmf/next` branch creation/update failures to ensure that failures are visible to RelEng and other interested parties |
Event Timeline
Comment Actions
@dancy and I looked at this during a project meeting this week. The https://releases-jenkins.wikimedia.org/job/MediaWiki%20branch%20and%20publish%20WMF%20single-version%20image/ job records failures on the Jenkins instance. It also posts a status message on the releng@lists.wikimedia.org mailing list. These signals seem like enough error reporting at this point.
A related question that could use a bit of attention is determining who generally should be taking action when the job fails and what that should typically entail.
Comment Actions
Closing per the T399970#11071083 update. The job failure response question will sort itself out inside the RelEng team.