Page MenuHomePhabricator

Add caching (back) to SqlBlobStore::getBlobBatch (revert I94c6f9ba7b9caeeb)
Open, Needs TriagePublic

Description

The function SqlBlobStore::getBlobBatch has a fixme:

	public function getBlobBatch( $blobAddresses, $queryFlags = 0 ) {
		// FIXME: All caching has temporarily been removed in I94c6f9ba7b9caeeb due to T235188.
		//        Caching behavior should be restored by reverting I94c6f9ba7b9caeeb as soon as
		//        the root cause of T235188 has been resolved.

Linking: T235188 / e1f8a8148d7329e01116ccf9133e58c6c0377f6b

The same comment was removed from WANObjectCache in 527fd0109fd62725aaaa80d9979b5f2f1348bf09

Please verify if it would be safe now to add caching back to that function (by reverting the hot fix)

The function is used by RevisionStore for batch requests of revisions records with the 'content' option and used to fetch big file metadata stored in the text blobs

Event Timeline

Krinkle subscribed.

As I understand it, the non-batch version of this RevisionStore method already has caching. The batched method is afaik only used outside the criticial path, e.g. by CLI maintenance scripts only. As such lack of caching isn't a concern from our team's perspective.

The perf team was briefly involved with this part of the RevisionStore code two years ago, because we found that it exposed a bug in WANObjectCache (which we maintain). This bug is fixed as of T235188, thus we now support using WANObjectCache to easily add caching to complex batch-methods like this one.

My suggestion to RevisionStore code owners would be to either adopt this here, or to remove the method if it isn't considered worth implementing/maintaining. The method feels like a footgun in its current state, as this kind of method in an API tends to signal to developers that it is optimised for batching. If removed, the one consumer (CLI script) can perform the same loop that the method currently performs, that way its cost is more apparent.

I want to use RevisionStore::newRevisionsFromBatch in api's revision modules (https://gerrit.wikimedia.org/r/c/mediawiki/core/+/910808), at the moment without preloading the content as that is not cached. It would give a benefit on batching the queries to slots table.
So the function should/could stay, just preloading with the uncached batch lookup needs remove from RevisionStore in that case.

RevisionStore::newRevisionsFromBatch is used with the content option on MessageCache since f6404ebf38be490bf500204e67a86e476248cc19
The usage in findBadBlobs.php is without the content option.