Page MenuHomePhabricator

Provide getUsageAccumulatorFactory (or equivalent) for Parsoid?
Open, Needs TriagePublic

Description

For the Wikifunctions integration into MediaWiki pages, we were using \Wikibase\Client\WikibaseClient::getUsageAccumulatorFactory()->newFromParser( $parser ) to get a usage tracker so we could add our usage of Wikidata items into the mix. However, we don't have a legacy Parser instance, as we're (exclusively) in Parsoid page-parsing.

Is there a plan to migrate this code any time soon? Alternatively, I suppose we could spin up a new Parser instance, but that feels wrong?

Event Timeline

DVrandecic triaged this task as Medium priority.Jul 2 2025, 3:36 PM
DVrandecic raised the priority of this task from Medium to Needs Triage.
DSantamaria added a project: OKR-Work.
DSantamaria raised the priority of this task from High to Needs Triage.Jul 14 2025, 2:29 PM
DSantamaria removed a project: OKR-Work.

Change #1173405 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/extensions/WikiLambda@master] Provide a wrapper for using Parsoid with Wikibase's getUsageAccumulatorFactory()

https://gerrit.wikimedia.org/r/1173405

For now, we're just going to do our own thing inside our codebase, so this no longer blocks us, but this probably needs a proper solution at the Wikibase level.

The problem as I see it, in very basic terms:

  • Wikidata items and other kinds of entity are used in rendering many Wikimedia source pages (wikitext) into human visible pages (HTML).
  • This usage takes several forms; mainly it's through Lua/Scribunto, but also directly, and most recently, through embedded Wikifunctions calls.
  • When Wikidata items get updated, we need to (a) make sure the pages are re-rendered so the readers get the up-to-date content, and (b) alert editors that a page they care about uses a Wikidata item that has just changed, so they can check if the change was good or they need to react in some way.
  • The system that Wikidata uses for this (Wikibase change propagation) relies on the legacy MediaWiki parser triggering the render of the bit of wikitext that calls out to Wikidata with a full, synchronous page context.
  • This is a problem because the legacy parser is being replaced by Parsoid, and the Wikidata system doesn't currently let code register that it's Wikidata-affected inside Parsoid yet.
  • This is especially a problem because some content now isn't ever triggered in a synchronous page context (like embedded Wikifunctions calls, but more content types over time), so Wikidata's model of how to register and track usage needs re-designing.

JoelyRooke-WMDE moved this task from In Review (ext. tickets) to Done (ext.tickets) on the Wikidata Integration in Wikimedia projects (Kanban Board) board.

This isn't done.

Hey James, that's just our way of tracking that we've reviewed your change - Done (ext.tickets) means that the review was submitted on external changes. Maybe we'll change the title of that column for clarity. With regards to any work we need to do, the WIT team has a meeting with Content Transform this afternoon, so we'll make action points after that. Thanks!

Hey James, that's just our way of tracking that we've reviewed your change - Done (ext.tickets) means that the review was submitted on external changes. Maybe we'll change the title of that column for clarity. With regards to any work we need to do, the WIT team has a meeting with Content Transform this afternoon, so we'll make action points after that. Thanks!

Yes, I understand, but the external-code-review-is-done task is T398993. This is the Wikibase-team-should-do-a-proper-replacement task.

Ok! Gerritbot links to both and we were tagged on this one, so that's why I registered it as reviewed. I'll put it back into our backlog for clarity.

Small update from the WIT side since discussing with Content Transform is that they are also aware of the changes, but have no imminent plans to work on a broader Parsoid fix since all other extensions consuming WD data can still use the legacy parser in conjunction with Parsoid. We will have further meetings to make a technical plan and will share the details as we decide them :)

I'm suggest that the usage tracking be done in a ParserOutput/ContentMetadataCollector? There is already a ParserOutputUsageAccumulator, which can probably be made into a ContentMetadataCollectorUsageAccumulator fairly easily. It seems the ParserOutputProvider which is the argument to that is the main sticking point?

One main difference is that ParserOutputUsageAccumulator conflates the "read" and the "write" aspects, whereas ContentMetadataCollector is explicitly "write only", with the "read" step occuring later, at the end of the parse. Concretely, this would mean splitting ::addUsage() and ::getUsage() into separate interfaces, instead of combining them into the same UsageAccumulator class.