Page MenuHomePhabricator

Make sure all GrowthExperiments DB writes handle readonly mode well
Closed, ResolvedPublic

Description

When the database used by GrowthExperiments is in readonly mode, any write attempts result in an exception being thrown. For user-facing features this is not ideal; also it might interrupt something that would otherwise work (e.g. user registration). We should ensure everything is behind a wfReadOnly() check or prepared to handle DBReadOnlyError.

This is time-sensitive because of T281212: Restart x1 database master (db1103).

Event Timeline

kostajh added a subscriber: kostajh.

@Tgr do you want to take this one?

We do wfReadonly() checks in a few places but that's actually not the right thing to check since we use a non-default cluster and the default one will be writable. At a glance the most robust check is LoadBalancer::getReadOnlyReason() which will handle configuration-level readonly settings, DB-level readonly settings and temporary read blocks due to replication lag.

Code that does writes:

  • LinkRecommendationStore::insert()
    • RefreshLinkRecommendations. Probably easiest to just abort the script if the DB is readonly, it will be re-run in an hour. It's a long-running script so checking for readonly at start won't suffice.
  • LinkRecommendationStore::deleteByPageIds()
    • FixLinkRecommendationData: manual script, not worth the effort fixing.
    • HomepageHooks::onSearchDataForIndex(): probably just suppress the exception. We'll still want to delete the recommendation from the search index; leaving it in the DB doesn't affect anything.
    • AddLinkSubmissionHandler: same as above. In this case not deleting the DB record means the page will be considered unavailable for future task generation, but that fixes itself the next time it is edited so not such a big deal.
  • LinkRecommendationStore::deleteByLinkTarget() - not used
  • LinkRecommendationStore::recordSubmission()
    • AddLinkSubmissionHandler: when it's called, the edit is already saved so not much we can do, just accept it won't be recorded in the link_submission table.
  • DatabaseMentorStore::setMentorForUserReal()
    • ApiSetMentor: should be disabled in readonly mode
    • SpecialClaimMentee: should be disabled in readonly mode
    • MentorManager::getMentorForUser(): called from many places, selects a mentor if one isn't already selected. Should gracefully return null in readonly mode.
    • SetUserMentorDatabaseJob: called from the one above on GET requests. Fail the job and hopefully get rescheduled outside the maintenance window.
    • MigrateMentorMenteeRelationship: one-time script, not worth the effort fixing.
  • ApiSetMentor: should be disabled in readonly mode
  • SpecialClaimMentee: should be disabled in readonly mode

But that would require injecting the LoadBalancer in the API module / special page. Since all it would do is replace an error with a slightly nicer error, and the error is already not terrible (MWExceptionRenderer special-cases readonly DB exceptions), probably not worth the added complexity.

  • SetUserMentorDatabaseJob: called from the one above on GET requests. Fail the job and hopefully get rescheduled outside the maintenance window.

Throwing an exception and manually signalling failure are handled the same way by the job queue, so again not really worth the effort.

Change 684029 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Handle DB readonly errors

https://gerrit.wikimedia.org/r/684029

Change 684029 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Handle DB readonly errors

https://gerrit.wikimedia.org/r/684029

Done but will need a backport.

Change 684078 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.3] Handle DB readonly errors

https://gerrit.wikimedia.org/r/684078

Change 684078 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.3] Handle DB readonly errors

https://gerrit.wikimedia.org/r/684078

Mentioned in SAL (#wikimedia-operations) [2021-05-03T11:56:45Z] <kharlan@deploy1002> Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments: Backport: [[gerrit:684080|refreshLinkRecommendations.php: Use per-wiki locks]] [[gerrit:684078|Handle DB readonly errors (T281382)]] (duration: 00m 58s)