Page MenuHomePhabricator

Enhance scap-pull with swagger and error level checks
Open, Needs TriagePublic

Description

There was a minor deployment incident today, where a large spike of PHP errors made it to prod traffic via canary servers for several minutes (ref T314286).

PHP Warning: in_array() expects parameter 2 to be array, string given

The issue would have been caught during staging on mwdebug if humans knew and remember to test the "right" URLs. In this case, the change was tested on enwiki only, but failed on all other wikis.

Objective

Catch these kinds of trivial issues that deterministically affect all pageviews (incl Special:Blankpage) either on the deployment server or on the staging server.

Proposal

  • During scap pull on a mwdebug server and/or during scap backport, run the swagger checks.
  • Run something like logstash_checker after this and, unlike for canary servers, on mwdebug servers we can make it warn based on zero tolerance since there is no other traffic on mwdebug besides the swagger checks (which have no known warnings by default) and requests from the person verifying their deployment.

In addition to giving us the benefit of the swagger checks themselves (which assert HTTP status codes on those requests), these also provide value by giving the mwdebug some example traffic to cover our basis more widely. This in turn increases the benefit of checking logstash, as we will then check logstash not only for what was manually tested but also over the result of the swagger requests and any non-fatal errors and warnings they might have emitted during the responses post-send/deferred/warnings/errors they might h emit.