Page MenuHomePhabricator

[SPIKE] Attribution API Citation count followup: Citation # in Flagged Revs wikis
Closed, ResolvedPublic5 Estimated Story PointsSpike

Description

During the work on the citation counts for the Attribution API (T418499: Attribution API MVP: Provide initial-pass base reference count), we found that the method we're using (through Parsoid parser cache HTML) will not be available in wikis that have Flagged Revs enabled.

This ticket is meant to verify the problem and see whether we have options:

  • Validate what we actually get in Flagged Revs wikis(dewiki/ruwiki) and whether we get anything useful out of the box
  • Check if there is a way to get a performant cached HTML of the *visible version* as a fallback from these wikis
  • There are many wikis with FlaggedRevs installed, but not all of them have this problem; if we do need to create some fallback operation, see if there is a way to identify the relevant instance (through config, perhaps?)

If any of the above is found to be problematic, we will need to re-assess how to approach the citation count on those wikis and whether we can serve them directly.

Details

Other Assignee
pmiazga

Event Timeline

HCoplin-WMF renamed this task from Attribution API Citation count followup: Citation # in Flagged Revs wikis to [SPIKE] Attribution API Citation count followup: Citation # in Flagged Revs wikis.Mar 17 2026, 5:22 PM
HCoplin-WMF assigned this task to Mooeypoo.
HCoplin-WMF triaged this task as High priority.
HCoplin-WMF updated Other Assignee, added: pmiazga.
HCoplin-WMF set the point value for this task to 5.
Restricted Application changed the subtype of this task from "Task" to "Spike". · View Herald TranscriptMar 17 2026, 5:22 PM

Investigation questions

What do we get in FlaggedRevs wikis

Right now: Nothing unless we change it to read out of the FlaggedRevs cache.

The simplified answer is that Parsoid HTML cache is not available on wikis that utilize FlaggedRevs for "stable" articles views (like German and Russian Wikipedias). There's some work being done by the Content Transform team to enable Persoid for reads on those wikis, but the cached parsed HTML will likely always need to come from the FlaggedRevs cache, which is a different operation than Parser cache.

Can we get relevant cached HTML in FlaggedRevs wikis

Partially.

There is a way to request the HTML cache from FlaggedRevs (see this operation in Kartographer for FlaggedRevs wikis). However, for pages that do not have cached HTML (so, for cache misses) there will be no triggering of a parsing action (like there is when asking for Parsoid HTML cache) unless something is specifically done about it. Triggering intentionally a parse through FlaggedRevs is not trivial, and we should really consider whether we want to go there.

What this means is that for the relevant wikis, we could request Flagged Revs HTML and produce the information from it, and if it doesn't exist (a cache miss) then we output no information for citation count (null.)

How do we recognize the relevant FlaggedRevs wikis for fallback

Not easily.

This is a tougher question than anticipated. FlaggedRevs is installed on multiple wikis (including English Wikipedia) but having it installed is not the problem. English Wikipedia, for example, has FlaggedRevs installed under specific configuration that does not produce the problem described in this ticket, meaning we can get our HTML from Parsoid.

Some wikis (German, Russian, and a few others) that utilize Flagged Revs *while* having FlaggedRevs take over the views (with a combination of config and templates) -- and those will be what we need to trigger fallback for. However, recognizing those programmatically is not easy at all, and it's not as straightforward as "is FlaggedRevs installed on this instance."

Summary

  • Wikis that have FlaggedRevs installed *and* takeover the views will not give us HTML cache through Parsoid.
  • We will need to create a fallback for those cases to fetch the relevant "stable" HTML from FlaggedRevs cache.
  • For those cases, when cache does not exist, we get nothing, and cannot provide number of references.
    • We should make sure we measure to see how many of the requests we get end up with a cache miss for FlaggedRevs to keep on top of this and see if it's an actual problem.
  • There is no easy way to recognize the affected wikis. We might need to go by a hard-coded list.

Next steps / recommendations

  • Check more deeply if there's a performant programmatic way to identify the relevant FlaggedRev wikis that require the fallback. If there is none, we will have to identify a list of wikis and run this fallback for the instances that are on that list.
  • For the immediate/medium future:
    • Implement a fallback for those cases -- read cached HTML from FlaggedRevsParserCacheFactory (see this patch for reference)
    • In cases where we get no HTML (cache-misses for FlaggedRevs cache)
      • output an empty result (likely null)
      • implement some observability counter so we can follow up and keep an eye on this
  • For long term:
    • If this information (# of citations) is something that we want to support better for the long term, we might want to think about more sustainable systemic ways to calculate it rather than pulling it off of the parser HTML.

So, assuming the parsoid variant of the HTML for the stable version was likely to be in cache, you think you could do something like:

$span = $this->tracer->createSpan( 'Attribution GetReferenceCount' )->start();

$this->parserOptions->setUseParsoid( true );
$this->parserOptions->setRenderReason( 'attribution' );

$html = null;
if ( ExtensionRegistry::getInstance()->isLoaded( 'FlaggedRevs' ) && $this->flaggedRevsParserCacheFactory ) {
	$fwp = FlaggableWikiPage::getInstance( $title );

	if ( $fwp->getStable() && $fwp->isStableShownByDefault() && !$fwp->revsArePending() ) {
		$parserCache = $this->flaggedRevsParserCacheFactory->getParserCache( $this->parserOptions  );
		$parserOutput = $parserCache->get( $page, $this->parserOptions );
		if ( $parserOutput ) {
			$html = $parserOutput->getContentHolderText();
		}
	}
}

if ( $html === null ) {
	// Note: on wikis using FlaggedRevisions (e.g. dewiki, ruwiki), this returns the latest
	// revision's output rather than the stable (reader-visible) one. The count may therefore
	// differ from what readers see if there are pending unreviewed edits. See T414359, T322426.
	$status = $this->parserOutputAccess->getParserOutput( $page, $this->parserOptions );
	if ( !$status->isOK() ) {
		return null;
	}

	$html = $status->getValue()->getContentHolderText();
}

...however, neither Article.php nor FlaggablePageView use the parsoid-style output. T419048 hasn't got there yet. This means that the special code would only get reuse among AttributionHandler calls...it seems like it would be better to just stick to the current version parsoid HTML (at least until Article and FlaggablePageView use parsoid).

Article::generateContentOutput() supports parsoid caching warming via ParsoidCachePrewarmJob, which is enabled for WMF except commonswiki/wikidata. However, this only happens when people view the current version and get a cache miss (FlaggablePageView doesn't set $ouputDone, so all the this warming stuff still happens).

Let me just drop the Grafana dashboard for ParserCache hits/misses - https://grafana.wikimedia.org/d/000000106/parser-cache?orgId=1&from=now-24h&to=now&timezone=utc&var-dc=eqiad&var-Content_Model=$__all&var-Cache=postproc_parsoid_pcache

It also has a section with FlaggedRevsParserCache hit rate, which can be used to get a taste of what is the hit/miss ratio.

@pmiazga -- thanks for sharing the dashboards! Very helpful. To summarize what I'm seeing there, it looks like:

  • Normal parser hit rate is ~60%
  • FlaggedRevs parser hit rate fluctuates more (it looks like they do a daily purge, based on the cycles?) between from ~30% to 80%

With those numbers in mind, the fallback plan is valid. I also agree with Moriel though, where I do think it's worth exploring something a little more robust to fill the space. I'll proceed with filling out another DPE prep pantry request to keep that moving.

@Mooeypoo is this good to close out since the research section is complete?

Note: there's a (very) significant chance that the observed daily purge on flaggedrevs is T416616 - so, not FlaggedRevs-related.

Closing out MW-Interfaces-Team (MWI-Sprint-29 (2026-03-10 to 2026-03-24)); marking everything from that sprint as resolved.

https://phabricator.wikimedia.org/project/board/8573/