Page MenuHomePhabricator

Make RDF dumps more resiliant against state transition errors
Open, Needs TriagePublic

Description

As a Wikidata data reuser, I want the RDF dumps to be available even if a few individual items can’t be dumped due to localized issues, in order to keep using Wikidata data.

Problem:
The Wikibase dumper includes support for skipping errors in individual entities and continuing with the next entity. However, as seen in T384625, this isn’t always enough for the RDF dumps. If a bug leaves the RdfWriter in an unexpected state (e.g. RdfWriterBase::STATE_SUBJECT), then the writer will stay in that state even as the dumper tries to continue with the next entity, and as a result the next entity (and all subsequent entities) will also fail dumping.

To fix this, we need to reset the RDF writer to a sensible state again (or replace it with a fresh writer, but I think that’s less feasible). This POC change implements one way to do it, though a cleaner way would be to add a suitable method to Purtle.

Example:
T384625: Special:EntityData, dump creation: LogicException: Bad transition: 10 -> 10

Acceptance criteria:

  • In local testing, if the fix for T384625 is reverted and the reproduction steps (T384625#10567303) are followed, items after the affected item can still be dumped successfully.