In the context of T387509, I noticed that a routine test of read-only mode (via $wgReadOnly or $wgLBFactoryConf) resulted in DBReadOnlyError exceptions for writes to the MainStash.
Status quo
Read-only mode in MediaWiki is specific to, and propagated by, the LBFactory service class in MediaWiki. This is in charge of connections and queries to "MediaWiki databases".
As such, MediaWiki read-only mode should not (and does not) affect local services like Memcached, Cassandra (SessionStore), Swift, or Kafka (EventBus).
Idem for ParserCache at WMF. In wmf-config, we configure ParserCache as an instance of SqlBagOStuff that is directly given a list of hostnames. It does not use the LBFactory service class to establish database connections.
The MainStash at WMF (wmf-config source), is also an instance of SqlBagOStuff. But, it uses the cluster, and dbDomain options to fetch a list of x2 hostnames from the LBFactory service. I suspect this helps certain DBA tasks, because it means the full range of dbctl features for MediaWiki databases is also offered to the MainStash/x2 cluster (unlike for ParserCache).
Evaluate
This task is to answer these questions:
- Is this expected and useful from an SRE ServiceOps perspective? (context: Routine switchover tests)
Afaik when a datacenter is placed into MW read-only only, the intent is to make sure there are no cross-datacenter connections being initiated. That is, if for some reason read-write requests were routed here, they will fail. And any rare use of optional/defered/best-effort writes on read requests, is proactive skipped.
As such, MW read-only mode does not prevent writes to php-apcu, Memcached, Cassandra, or ParserCache. Yet, it does currently prevent writes to MainStash, even though it would be writing to a DC-local service (same as ParserCache).
- Is this expected and useful from a DBA perspective? (context: DB maintenance)
I don't know if it's common to MW's read-only mode during DBA work. If it is, is this difference considered useful?
- Is this supported in MediaWiki from a developer perspective?
The MainStash service catches database failures so that they don't crash the web request. It instead returns false, and lets the caller decide whether this write is a functional requirement.
During actions thought of as "writes by users", the MainStash may be functionally relied upon in this way.
However during "read" requests it is (afaik) not functionally relied on. By "read" requests I mean, requests routed to the secondary DC; not strictly GET/POST per Multi-DC T91820.)
This means, while perhaps not ideal, the current situation is supported by and compatible with MediaWiki.