In T390510#10754750, @Ladsgroup noted intermittent spikes in queried rows in s1 consistent with full table scans.
Through a combination of rdbms channel mediawiki logs in logstash [0] and setting the slow-query log threshold to 3s on db1218, it became clear that these are due to IP block queries with stack traces such as:
from /srv/mediawiki/php-1.44.0-wmf.25/includes/libs/rdbms/TransactionProfiler.php(558)
#0 /srv/mediawiki/php-1.44.0-wmf.25/includes/libs/rdbms/TransactionProfiler.php(363): Wikimedia\Rdbms\TransactionProfiler->reportExpectationViolated(string, Wikimedia\Rdbms\GeneralizedSql, float, string, string)
#1 /srv/mediawiki/php-1.44.0-wmf.25/includes/libs/rdbms/database/TransactionManager.php(574): Wikimedia\Rdbms\TransactionProfiler->recordQueryCompletion(Wikimedia\Rdbms\GeneralizedSql, float, bool, int, string, string)
#2 /srv/mediawiki/php-1.44.0-wmf.25/includes/libs/rdbms/database/Database.php(858): Wikimedia\Rdbms\TransactionManager->recordQueryCompletion(Wikimedia\Rdbms\GeneralizedSql, float, bool, int, string)
#3 /srv/mediawiki/php-1.44.0-wmf.25/includes/libs/rdbms/database/Database.php(711): Wikimedia\Rdbms\Database->attemptQuery(Wikimedia\Rdbms\Query, string, bool)
#4 /srv/mediawiki/php-1.44.0-wmf.25/includes/libs/rdbms/database/Database.php(638): Wikimedia\Rdbms\Database->executeQuery(Wikimedia\Rdbms\Query, string, int)
#5 /srv/mediawiki/php-1.44.0-wmf.25/includes/libs/rdbms/database/Database.php(1367): Wikimedia\Rdbms\Database->query(Wikimedia\Rdbms\Query, string)
#6 /srv/mediawiki/php-1.44.0-wmf.25/includes/libs/rdbms/database/DBConnRef.php(127): Wikimedia\Rdbms\Database->select(array, array, array, string, array, array)
#7 /srv/mediawiki/php-1.44.0-wmf.25/includes/libs/rdbms/database/DBConnRef.php(351): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#8 /srv/mediawiki/php-1.44.0-wmf.25/includes/libs/rdbms/querybuilder/SelectQueryBuilder.php(762): Wikimedia\Rdbms\DBConnRef->select(array, array, array, string, array, array)
#9 /srv/mediawiki/php-1.44.0-wmf.25/includes/block/DatabaseBlockStore.php(562): Wikimedia\Rdbms\SelectQueryBuilder->fetchResultSet()
#10 /srv/mediawiki/php-1.44.0-wmf.25/includes/block/BlockManager.php(522): MediaWiki\Block\DatabaseBlockStore->newListFromIPs(array, bool, bool)
#11 /srv/mediawiki/php-1.44.0-wmf.25/includes/block/BlockManager.php(475): MediaWiki\Block\BlockManager->getBlocksForIPList(array, bool, bool)
#12 /srv/mediawiki/php-1.44.0-wmf.25/includes/block/BlockManager.php(212): MediaWiki\Block\BlockManager->getXffBlocks(string, array, bool, bool)
#13 /srv/mediawiki/php-1.44.0-wmf.25/includes/user/User.php(1489): MediaWiki\Block\BlockManager->getBlock(MediaWiki\User\User, MediaWiki\Request\WebRequest, bool)
#14 /srv/mediawiki/php-1.44.0-wmf.25/includes/block/BlockManager.php(764): MediaWiki\User\User->getBlock()
#15 /srv/mediawiki/php-1.44.0-wmf.25/includes/MediaWikiEntryPoint.php(408): MediaWiki\Block\BlockManager->trackBlockWithCookie(MediaWiki\User\User, MediaWiki\Request\WebResponse)
#16 /srv/mediawiki/php-1.44.0-wmf.25/includes/MediaWikiEntryPoint.php(191): MediaWiki\MediaWikiEntryPoint->commitMainTransaction()
#17 /srv/mediawiki/php-1.44.0-wmf.25/includes/actions/ActionEntryPoint.php(206): MediaWiki\MediaWikiEntryPoint->doPrepareForOutput()
#18 /srv/mediawiki/php-1.44.0-wmf.25/includes/MediaWikiEntryPoint.php(174): MediaWiki\Actions\ActionEntryPoint->doPrepareForOutput()
#19 /srv/mediawiki/php-1.44.0-wmf.25/includes/actions/ActionEntryPoint.php(153): MediaWiki\MediaWikiEntryPoint->prepareForOutput()
#20 /srv/mediawiki/php-1.44.0-wmf.25/includes/MediaWikiEntryPoint.php(202): MediaWiki\Actions\ActionEntryPoint->execute()
#21 /srv/mediawiki/php-1.44.0-wmf.25/index.php(58): MediaWiki\MediaWikiEntryPoint->run()
#22 /srv/mediawiki/w/index.php(3): require(string)
#23 {main}Looking specifically at the queries, we can see a large number of terms in the predicate of the WHERE clause, for example [1]:
SELECT bl_id,bt_address,bt_user,bt_user_text,bl_timestamp,bt_auto,bl_anon_only,bl_create_account,bl_enable_autoblock,bl_expiry,bl_deleted,bl_block_email,bl_allow_usertalk,bl_parent_block_id,bl_sitewide,bl_by_actor,block_by_actor.actor_user AS `bl_by`,block_by_actor.actor_name AS `bl_by_text`,comment_bl_reason.comment_text AS `bl_reason_text`,comment_bl_reason.comment_data AS `bl_reason_data`,comment_bl_reason.comment_id AS `bl_reason_cid`,bt_range_start,bt_range_end FROM `block` JOIN `block_target` ON ((bt_id=bl_target)) JOIN `actor` `block_by_actor` ON ((actor_id=bl_by_actor)) JOIN `comment` `comment_bl_reason` ON ((comment_bl_reason.comment_id = bl_reason_id)) WHERE ((((bt_ip_hex = '2F4FD5F9' AND bt_range_start IS NULL) OR (bt_range_start LIKE '2F4F%' ESCAPE '`' AND bt_range_start <= '2F4FD5F9' AND bt_range_end >= '2F4FD5F9'))) OR (((bt_ip_hex = 'AC44A453' AND bt_range_start IS NULL) OR (bt_range_start LIKE 'AC44%' ESCAPE '`' AND bt_range_start <= 'AC44A453' AND bt_range_end >= 'AC44A453'))) OR (((bt_ip_hex = '7F000001' AND bt_range_start IS NULL) OR (bt_range_start LIKE '7F00%' ESCAPE '`' AND bt_range_start <= '7F000001' AND bt_range_end >= '7F000001')))
In short, we're seeing queries with complex WHERE clauses due to unusually long inbound X-Forwarded-For headers (note the getXffBlocks in the stack trace). Correlating these with samples httpd access logs (e.g., by reqId), these appear to be short-lived spikes of action=edit calls on enwiki from a specific client IP [2], which in turn pile up.
@Daimona had the insight that these kinds of queries will result in full table scans - i.e., paring down to one predicate avoids that, but it may also be possible to take advantage of the index with some changes to the structure of the combined predicate.
We have not seen a repeat of these events since ~ 15:20 UTC.
[0] https://logstash.wikimedia.org/goto/cce5e059226c57232c4ebc97aa7be0be
[2] https://logstash.wikimedia.org/goto/e28d44590c3262d208148291d7881d96