The problem: The beta cluster is continuously switching to read only mode after upgrade work has finished.
Note the downwards spike while the work was being carried on but how it continues long after at a rate of 300 events every 3 hours: https://logstash-beta.wmflabs.org/goto/3939eafc6573af921c4231766d81982f
Implications: The entire mobile site (and various other extensions) are without reliable browser test coverage.
= More background
Over the last few weeks the beta cluster was in read only mode. I am told this is no longer the case, but we are consistently seeing failures [[ https://integration.wikimedia.org/ci/view/Reading-Web/job/selenium-MinervaNeue/ | in our browser tests ]] stating "The wiki is currently in read-only mode. (readonly) (MediawikiApi::ApiError)". I've spent much of day trying to find out the cause of this without success, so I'm turning to Phabricator and hopefully a wider audience to track down the source of this problem. The strange thing is, is that this is inconsistent - sometimes the same scenarios calling exactly the same API queries pass!
It's been impossible to replicate this issue locally so far but the issue can be traced to the mediawiki_selenium Gem (we are using 1.7.3). The HTTP API request being made that triggers the MediawikiApi::ApiError is a simple edit action which look a little like this:
The wiki is currently in read-only mode. (readonly) (MediawikiApi::ApiError)
Given repeating the request can be done so without the error, the only possible explanation I can think of is that the read-only state has been cached somewhere in the beta cluster stack and only applies to the Selenium user that's being used to run the browser tests.
This is high priority from web side as every day this remains broken we are without test coverage for a key part of the site and we are out of our depth here.