Page MenuHomePhabricator

[EPIC] Enforce "no increase in log errors" during deployments
Closed, ResolvedPublic

Description

Essentially:

  • If we deploy and the log errors increase, revert immediately.

Event Timeline

greg raised the priority of this task from to Needs Triage.
greg updated the task description. (Show Details)
greg added a project: Release-Engineering-Team.
greg subscribed.
greg triaged this task as Medium priority.Dec 3 2015, 9:07 PM

@greg: The problem is the new branch cut on tuesday. Since it incorporates a lot of new code it's difficult to avoid new errors sneaking in.

That's why we need to shoot these errors while they're still on beta.

The errors need to be a lot more visible, honestly. If mediawiki-vagrant, and the beta cluster, would surface the errors in a way that's not easily ignored, then they would be a lot more likely to be fixed before they are holding up a deployment.

I'd like to see something like http://phpdebugbar.com/ enabled by-default on vagrant and beta cluster. Perhaps it could even be offered as a per-user preference on production.

hashar subscribed.

The task got filed back in 2015 when release engineering had plans to improve the overall quality of deployment. After several years of efforts we collectively improved our logging system (Monolog, ELK), we have logging dashboards we closely track and have a process to triage all those errors (eg Wikimedia-production-error ).

We now enforces Zero* log by blocking the train whenever there are new logs, so I am claiming this goal to be a success.