Page MenuHomePhabricator

Frequent Echo DB_MASTER write queries on HTTP GET
Open, Needs TriagePublic

Description

The main two offenders are:

Expectation (writes <= 0) by MediaWiki::restInPeace not met (actual: 5):
query-m: DELETE FROM `echo_unread_wikis` WHERE euw_user = 'X'
#0 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/rdbms/TransactionProfiler.php(245): Wikimedia\Rdbms\TransactionProfiler->reportExpectationViolated()
#1 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/rdbms/database/Database.php(1318): Wikimedia\Rdbms\TransactionProfiler->recordQueryCompletion()
#2 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/rdbms/database/Database.php(1198): Wikimedia\Rdbms\Database->attemptQuery()
#3 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/rdbms/database/Database.php(3004): Wikimedia\Rdbms\Database->query()
#4 /srv/mediawiki/php-1.33.0-wmf.23/extensions/Echo/includes/UnreadWikis.php(141): Wikimedia\Rdbms\Database->delete()
#5 /srv/mediawiki/php-1.33.0-wmf.23/extensions/Echo/includes/NotifUser.php(440): EchoUnreadWikis->updateCount()
#6 /srv/mediawiki/php-1.33.0-wmf.23/extensions/Echo/includes/NotifUser.php(249): MWEchoNotifUser->resetNotificationCount()
#7 /srv/mediawiki/php-1.33.0-wmf.23/extensions/Echo/includes/EchoHooks.php(940): MWEchoNotifUser->markRead()
#8 /srv/mediawiki/php-1.33.0-wmf.23/includes/deferred/MWCallableUpdate.php(34): Closure$EchoHooks::processMarkAsRead#2()
#9 /srv/mediawiki/php-1.33.0-wmf.23/includes/deferred/DeferredUpdates.php(273): MWCallableUpdate->doUpdate()
#10 /srv/mediawiki/php-1.33.0-wmf.23/includes/deferred/DeferredUpdates.php(219): DeferredUpdates::runUpdate()
#11 /srv/mediawiki/php-1.33.0-wmf.23/includes/deferred/DeferredUpdates.php(143): DeferredUpdates::execute()
#12 /srv/mediawiki/php-1.33.0-wmf.23/includes/MediaWiki.php(909): DeferredUpdates::doUpdates()
#13 /srv/mediawiki/php-1.33.0-wmf.23/includes/MediaWiki.php(733): MediaWiki->restInPeace()
#14 (): Closure$MediaWiki::doPostOutputShutdown()
#15 {main}
Expectation (writes <= 0) by MediaWiki::restInPeace not met (actual: 13):
query-m: INSERT INTO `echo_event` (event_type,event_variant,event_deleted,event_extra,event_agent_id,event_page_id) VALUES ('X')
#0 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/rdbms/TransactionProfiler.php(245): Wikimedia\Rdbms\TransactionProfiler->reportExpectationViolated()
#1 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/rdbms/database/Database.php(1318): Wikimedia\Rdbms\TransactionProfiler->recordQueryCompletion()
#2 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/rdbms/database/Database.php(1198): Wikimedia\Rdbms\Database->attemptQuery()
#3 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/rdbms/database/Database.php(2123): Wikimedia\Rdbms\Database->query()
#4 /srv/mediawiki/php-1.33.0-wmf.23/extensions/Echo/includes/mapper/EventMapper.php(22): Wikimedia\Rdbms\Database->insert()
#5 /srv/mediawiki/php-1.33.0-wmf.23/extensions/Echo/includes/model/Event.php(232): EchoEventMapper->insert()
#6 /srv/mediawiki/php-1.33.0-wmf.23/extensions/Echo/includes/model/Event.php(167): EchoEvent->insert()
#7 /srv/mediawiki/php-1.33.0-wmf.23/extensions/Echo/includes/DiscussionParser.php(95): EchoEvent::create()
#8 /srv/mediawiki/php-1.33.0-wmf.23/extensions/Echo/includes/EchoHooks.php(561): EchoDiscussionParser::generateEventsForRevision()
#9 /srv/mediawiki/php-1.33.0-wmf.23/includes/deferred/MWCallableUpdate.php(34): Closure$EchoHooks::onPageContentSaveComplete()
#10 /srv/mediawiki/php-1.33.0-wmf.23/includes/deferred/DeferredUpdates.php(273): MWCallableUpdate->doUpdate()
#11 /srv/mediawiki/php-1.33.0-wmf.23/includes/deferred/DeferredUpdates.php(219): DeferredUpdates::runUpdate()
#12 /srv/mediawiki/php-1.33.0-wmf.23/includes/deferred/DeferredUpdates.php(143): DeferredUpdates::execute()
#13 /srv/mediawiki/php-1.33.0-wmf.23/includes/MediaWiki.php(909): DeferredUpdates::doUpdates()
#14 /srv/mediawiki/php-1.33.0-wmf.23/includes/MediaWiki.php(733): MediaWiki->restInPeace()
#15 (): Closure$MediaWiki::doPostOutputShutdown()
#16 {main}

Event Timeline

aaron created this task.Mar 29 2019, 8:53 AM
aaron renamed this task from Frequent Echo queries on HTTP GET to Frequent Echo DB_MASTER write queries on HTTP GET.Mar 29 2019, 5:32 PM
aaron removed aaron as the assignee of this task.May 1 2019, 5:29 PM

The first stack trace is from a deferred update scheduled from EchoHooks::processMarkAsRead() on read requests with a ?markasread=123 query parameter. If writing from deferred updates on GET requests is not allowed, how are we supposed to schedule a write when we notice that it's needed in a GET request? Do we submit a job to the job queue?

The second stack trace confuses me. It comes from a deferred update scheduled from EchoHooks::onPageContentSaveComplete(). How is it possible for the PageContentSaveComplete hook to be fired in a GET request? It seems like a reasonable assumption that a hook with "SaveComplete" in the name will only be fired in POST requests.

Catrope moved this task from Inbox to Upcoming Work on the Growth-Team board.Jul 31 2019, 2:10 AM
aaron added a comment.Jul 31 2019, 6:16 AM

Jobs are fine...though this case is complicated since people want their "latest views" to be immediately reflected...so it would have to do something like WatchedItemStore.

I did a bit of thinking about what a WatchedItemStore-like solution could look like. Here's my best idea so far:

  • When marking a notification as read:
    • Check if the notification is already marked as read in the DB; if it is, bail
    • Add its ID to the cached going-to-be-marked-as-read list in MainStash (using merge())
    • Enqueue a job to mark it as read in the DB
    • Purge the cached notification count in WANCache
  • When computing the notification count (i.e. the getWithSetCallback calback for the cached notification count in WANCache):
    • Get the going-to-be-marked-as-read list from cache
    • Query the number of unread notifications from the DB
    • Subtract the length of the going-to-be-marked-as-read list from the DB query result, and return that
  • In the job that marks things as read:
    • Update the DB to mark the notification as read
    • Remove the notification from the going-to-be-marked-as-read list (using merge())
    • Purge the cached notification count in WANCache (needed for convergence after race conditions, see below)

Complications/details:

  • Also need to support "mark as unread"
    • Have a cached going-to-be-marked-as-unread list that works analogously
    • When computing notification count, add the length of this list (in addition to subtracting the length of the going-to-be-marked-as-read list)
  • Global notification counts
    • Every wiki needs to be able to correctly compute the global notification count, even if another wiki has a pending mark-read/mark-unread operation
    • Put the cached mark-read/mark-unread lists in the global cache, so that every wiki knows every other wiki's pending operations and can adjust for them
    • When writing notification counts to the echo_unread_wikis table, don't account for pending operations, but reflect the state of the DB (otherwise pending operations are double-counted)
    • Only write to the echo_unread_wikis table from the job
  • Race condition considerations
    • The following race condition is possible:
      • Notification is added to the mark-read list
      • Job runs, marks notification as read in the master DB, removes it from the mark-read list
      • Try to mark the notification as read a second time; it still seems unread in the DB due to replication lag, so we'll add it to the mark-read list again
      • At this point notification count computations will be wrong, because we'll subtract a pending action we've already taken
      • The second job runs, marks the notification as read in the DB (no-op), and removes it from the mark-read list
      • At this point the count will be computed correctly again
    • Purging the cached count from the job seems unnecessary, but it's needed to make sure this race condition is short-lived and the wrong count doesn't get stuck in the cache

though this case is complicated since people want their "latest views" to be immediately reflected

This is offtopic, but interesting this is brought up, because I normally do not get acknowledgement as a wiki user of viewed notifications, when I click them or mark them as read, and get alerted multiple times on multiple wikis for some amount of time.

marcella added a subscriber: marcella.

Moving this to our Q3 column so we can assess with our other maintenance tasks for the quarter and decide whether we have time in balance with our feature work to prioritize this task.