Error
- mwversion: 1.44.0-wmf.18
- reqId: c1f2ae6c-7eef-4b84-b341-26707ade4f7e
- Find reqId in Logstash
Expectation (masterConns <= 0) by MediaWiki\MediaWikiEntryPoint::restInPeace not met (actual: 1): [connect to db2220 (metawiki)]
Notes
The code has been this way since 2012 (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CentralNotice/+/23628.
The code has a comment suggesting that the author believes it to be called from the JobQueue (where reading from the primary may be necessary in order to inform a database write in that same job). However, today it is clearly called from web requests to e.g. Special:Translate as well, where this violates connection profile requirements.
Impact
WMF's scale strategy requires that we read from replicas, unless part of a write transaction (i.e. low frequency). We avoid primary connections during read traffic as there is only a singular primary DB host per cluster. This also avoid distractions and false alarms during switchover excercises (T387509), and ensures read views in the the secondary datacenter are not slowed down or disrupted by a dependency on a cross-datacenter connection.
The CentralNotice code in question does not appear to directly inform a database write, and it is unclear why it would need to read data from a primary database as such. It seems likely that the code was written with incorrect assumptions around databases and caching. See also WANObjectCache from a high-level.
In the past, when developers query data from a primary database "just in case", it was often motivated by an assumption that isn't true in practice when you have more than 1 parallel request (e.g. a single localhost wiki with 2 Apache threads is often enough to break the assumption). Reliablity concerns are nowdays generally addressed automatically for you at the cache level, such that you can safely read from a replica.