Page MenuHomePhabricator

Do not use Content::prepareSave when importing or undeleting (drop hard global constraints)
Open, Needs TriagePublic

Description

The documentation of Content::prepareSave() currently states:

This may be used to check the content's consistency with global state. This function should
NOT write any information to the database.
Note that this method will usually be called inside the same transaction
bracket that will be used to save the new revision.

This implies that prepareSave() may check that some aspect of the page content is globally unique, and can prevent the revision from being saved if this is not the case. Wikibase currently uses this to enforce uniqueness of sitelinks.

MediaWiki's behavior is currently inconsistent with respect to this: when undeleting, prepareSave() is called for the latest restorable row, preventing undeletion of pages if they would cause an inconsistency. However, no such check is performed by ImportableOldRevisionImporter when importing revisions.

Observation: strong guarantees about global consistency are not consistently enforced, and are hard to reconcile with undeletion and import, and seem incompatible with the wiki way of working.

Proposal: only call prepareSave() for explicit edits (via UI or API), but not for undeletion, import, or other programmatic creation of revisions. This still allows constraints to be enforced for user edits, while removing the risk of preventing undeletion due to conflicts. This technically doesn't weaken any guarantees, since prepareSave() was never calls during import. This adjustment firs well with the refactoring work for T174038: Initial implementation of MCR page update interface.

NOTE: This is arguably a breaking change for Wikibase, as it may cause conflicting sitelinks to surface in the current revision of a page through import or undeletion. This would cause a database error when running the respective DataUpdate. Wikibase would have to be changed to ignore this kind of failure, and instead show a warning on pages that have conflicting sitelinks, asking users to resolve the issue.

Event Timeline

Or just allow multiple sitelinks and treat them like constraint violations. That would fix T54564: Allow sitelinks to redirect pages to fix the 'Bonnie and Clyde problem' too, in a way. Enforcing one sitelink per language has always been somewhat of an outlier from Wikidata's generic "assume users know what they are doing and don't force rigid schemas on them" approach.

@Tgr wrote

Or just allow multiple sitelinks and treat them like constraint violations.

Yes, but currently, we have a unique key on that, and the table isn't small. A schema change is an option, sure.

But we'll have that discussion if it's clear that we actually want to do this... do you agree that prepareSave() should conceptually be bound to edits, not revision creation?

I think there are two separate issues here:

  • Should prepareSave be allowed to check for global consistency? I'd agree that global state requirements do not mix well with the conceptual model of a wiki; implementers should be discouraged from relying on state outside the content object itself. (At a glance, current extensions don't do anything like that, apart from Wikibase.)
  • Should checks happen on import / undelete? I suppose both approaches are justifiable; on one hand, it would be nice to offer a consistency guarantee when content is created (e.g. undeleting something after the code or configuration has been changed in a backwards-incompatble way should trigger an error). On the other hand, when an edit save fails, the problem can be surfaced to the user, who can fix it; but when an undelete or import fails, there usually isn't any feasible way to recover. So it would definitely be good to avoid errors in those cases. (Same goes for a rollback, somewhat, although in that case the user could be asked to do a manual edit instead.) Only checking after edits is maybe the less disruptive option.

(Also prepareSave seems like a misnomer; everything uses it for validation, not preprocessing or such.)

prepareSave() is indeed a misnomer, but this is hard to fix. There is an isValid() method which we could call when performing an edit, but that returns a boolean. It does not provide a way to report anything helpful to the user.

We could introduce a new method, but adding methods to the Content interface is always problematic, since there is no way to provide backwards compatibility for extensions that implement the interface directly.

In T143761: Fix Content validation - something to replace Content::isValid() I outlined why the current usage of both prepareSave() and isValid() are problematic, and we should either fix prepareSave() or introduce a new message and deprecate the other two.

The task needs some rewording since T287159 / https://gerrit.wikimedia.org/r/c/mediawiki/core/+/732962 moved/renamed the function and replaced the old usages.