Page MenuHomePhabricator

Phabricator: Unable to view tasks in DB read-only mode
Open, HighPublic

Description

During a network partition last week, Phabricator's DB proxy failed over to a read-only replica. Even though the database was available in read-only mode, Phab was unavailable even for read-only operations like viewing a task:

Unhandled Exception ("AphrontQuery Exception")
#1290: The MariaDB server is running with the --read-only option so it cannot execute this statement

After the proxy fails over, it has to be restored manually (this is by design, in order to prevent flapping) which means that Phabricator won't even partially self-heal from the network partition: it will be completely unavailable until fixed by human intervention. That in turn can make disaster recovery more difficult.

Better would be if read-only operations were possible in read-only mode.

(Previously: It looks like this was noticed and declined in the context of an eqiad-codfw switchover, T232883.)

Event Timeline

Aklapper renamed this task from Unable to view tasks in read-only mode to Unable to view tasks in DB read-only mode.Jul 27 2022, 9:26 AM
Dzahn renamed this task from Unable to view tasks in DB read-only mode to Phabricator: Unable to view tasks in DB read-only mode.Jul 29 2022, 5:59 PM
Dzahn triaged this task as High priority.

It looks like this was noticed and declined in the context of an eqiad-codfw switchover, T232883

Note that the declining specifically was in terms of having the alternative phabricator app running, not in terms of the alternative database.

Last time I was told, however (please take this with a grain of salt, I am not Phab expert), that phabricator requires read-write even for basic read-only functionality, due to how it handles its session data, so I am not sure how to prevent this, other than either let it split-brain, or some kind of redundant consensus making automation layer. O moving session data outside of the db so it can read data from the db in some kind of "emergency" mode.

Phabricator has a read only config flag which can be set for expected read only. It is very poor at degrading naturally to read only in an emergency.

If someone wants me to do some testing on my phorge instance, I can do.