Page MenuHomePhabricator

Proposal: Force any WARNINGs on Beta Cluster to fail completely
Closed, DeclinedPublic


Reasoning: This will force developers to fix their log spamming code before it hits production (as much of it as Beta Cluster can catch).

Event Timeline

greg raised the priority of this task from to Needs Triage.
greg updated the task description. (Show Details)
greg added subscribers: MaxSem, Aklapper, greg, Jdforrester-WMF.

We could set_error_handler() and turn PHP notice/warnings to exceptions :-}

I had an idea to introduce a config setting for that into MW proper, to easily enable developers to see problems, no matter of environment they're developing in.

We should build an error console that integrates into the page so that errors surface instead of getting buried in the logs. Breaking beta cluster isn't really the solution, that will just lead to more interruptions in everyone's workflows.

hashar triaged this task as Medium priority.Oct 21 2015, 7:50 PM
hashar moved this task from To Triage to Backlog on the Beta-Cluster-Infrastructure board.
hashar set Security to None.

MediaWiki has an integrated debug toolbar that does contain a bunch of logs

$wgDebugToolbar = true;

But IIRC that renders the page to not be cached.

In order to test anything effectively we would need to bypass cache, right? You won't see an error, even a fatal, if you're just loading a cached page from varnish.

The thing I would like to see is a floating red box on the page that is always visible when there are errors in the console. In phabricator, they change the header color to red and you type ` to bring up an error console. I think we should do something similar to surface errors on testwiki and beta. See Phabricator "Dark Console" Documentation for more details about how phabricator does it. We had something similar at deviantART and we had a social contract that we would not sync anything that significantly increased error counts or page-generation times on staging. We had tech debt weeks where everyone concentrated on killing warnings and notices until eventually there were none.

The state of Beta-Cluster-Infrastructure is now maintained on a best effort basis. The logging stack does not really work and we do not actively triage error logs there. The intent was to block potential issues ahead of time, that is nowadays done by blocking the train whenever an alarm happens, notably in the early stage (testwiki or group0).