Page MenuHomePhabricator

Clarify semantics of "base revision" and "parent revision" in EditPage, WikiPage, and PageUpdater
Closed, ResolvedPublic

Description

There is considerable confusion about what "parent revision" and "base revision" means in EditPage, WikiPage, and PageUpdater.

Status Quo

EditPage:

  • editRevId and getBaseRevision() return the latest current revision as known to the client. This can be used to detect edit conflicts.
  • oldId is the revision the edit was based on. When editing based on an old revision, it's the old revision's ID.
  • parentRevId and getParentRevId() - The documentation says The "parent" revision is the ancestor that should be recorded in this page's revision history. It is either the revision ID of the in-memory article content, or in the case of a 3-way merge in order to rebase across a recoverable edit conflict, the ID of the newer revision to which we have rebased this page. However, this value does not to be recorded or in fact be used for anything at all. The semantics as described matches the rev_parent_id, but there is no need for EditPage to track this or set it explicitly. WikiPage (resp PageUpdater) just do this automatically.
  • A CAS check is performed based on rev_id, ensuring the that current revision did not change after it was compared against editRevId to detect an edit conflict.
  • see also T58849: Edit conflict detection by timestamp should be deprecated for how editRevId and parentRevId where introduced.

WikiPage:

  • $baseRevId of doEditContent: The revision ID this edit reverts to, if any. Not stored, but passed to some hooks. Doesn't seems to be used for anything by core, but some extensions, like TimedMediaHandler, use it to detect reverts.
  • rev_parent_id field: The ID of the revision that was current immediately before an edit. Used to display size differences in the page history (compare T193690)

NewRevisionFromEditComplete hook:

  • $baseID: this hook parameter contains the value of $baseRevId from WikiPage::doEditContent. It's also set to the previous revision ID by MovePage. So the itnended semantics is "ID of an earlier revision that is restored or repeated by the edit".

PageUpdater:

  • parent revision: the revision that is current before the edit is performed. Used for CAS checks.
  • baseRevisionId: the latest current revision as known to the client. This can be used to detect edit conflicts. However, it is currently called set by doEditContent() with WikiPage's semantics in mind, causing confusion. Should probably be renamed to revertsToRevisionId.

Proposed Solution

  • Improve documentation (and field names?) in EditPage to avoid confusion with the concepts as used in WikiPage.
  • Improve documentation and parameter name in WikiPage::doEditContent to specify semantics to by "ID of an earlier revision that is restored or repeated by the edit"".
  • Change PageUpdater::hasEditConflict() to not be based on the revision provided by setBaseRevision, but take a parameter and check against that instead.
  • Change name of PageUpdater::setBaseRevisionId() to setOriginalRevisionId().
  • Remove parentRevId logic from EditPage. getParentRevId() could just always return the current revision, if needed for B/C. But it does not seem like anything calls this.

Event Timeline

daniel created this task.Jun 19 2018, 1:35 PM
daniel triaged this task as High priority.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 19 2018, 1:35 PM

Change 441045 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Change PageUpdater::hasEditConflict to take a parameter.

https://gerrit.wikimedia.org/r/441045

Change 441046 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Pass EditPage->oldid to WikiPage::doEditContent as $baseRevId.

https://gerrit.wikimedia.org/r/441046

Change 441047 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Kill unused and confusing EditPage::getParentRevId().

https://gerrit.wikimedia.org/r/441047

Change 441065 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Improve documentation of fields in EditPage

https://gerrit.wikimedia.org/r/441065

Change 441065 merged by jenkins-bot:
[mediawiki/core@master] Improve documentation of fields in EditPage

https://gerrit.wikimedia.org/r/441065

It looks like EditPage editRevId/getBaseRevision() and parentRevId/getParentRevId() should be equivalent until the edit is actually saved. They differ after EditPage successfully does merge3() during the save, at which point editRevId continues to represent the parent of the user's submission while parentRevId represents the parent of the content actually being saved.

It looks like EditPage's oldid and WikiPage's $baseRevId should be equivalent when oldid != 0. When oldid == 0, I believe $baseRevId should be editRevId. It more or less indicates the revision that was used to initialize the textarea.

The rev_parent_id saved should match EditPage's parentRevId. It looks like there's a possible race where edits A→B→C could have C's rev_parent_id as A even though merge3() used B's content.[1]

PageUpdater's baseRevId seems to be used as both parentRevId and oldid in different places. It's used like parentRevId to detect conflicts (and maybe in checkFlags() too), and like oldid in being passed to the 'NewRevisionFromEditComplete' and 'PageContentSaveComplete' hooks.

[1]: EditPage::internalAttemptSave() calls $this->page->getLatest(), which caches inside WikiPage. 🏎 Then mergeChangesIntoContent() loads the current revision via Revision::loadFromTitle() instead of getting it from WikiPage. Pre-PageUpdater, WikiPage::doEditContent() reuses that cached ->getLatest() value for rev_parent_id. Post-PageUpdater, DerivedPageDataUpdater::grabCurrentRevision() calls WikiPage::getRevision() which uses that cached ->getLatest(). If edit B gets saved at the 🏎, that's the race.

Tgr added a subscriber: Tgr.Jun 21 2018, 2:45 PM

It might help with the terminology to differentiate between the history parent (the preceding row in the page history, ie. the current verion of the page before saving the edit) and the logical parent (the revision the user based their edit on, e.g. the one pointed to by the oldid parameter, or the rollback target). There are then four possible kinds of parents:

  • history parent as seen by the client (last at the time of starting the edit) - this is EditPage::$editRevId / EditPage::getBaseRevision()
  • actual history parent (differs from the previous one if there were concurrent edits) - this is rev_parent_id
  • logical parent as seen by the client (oldid, or last version at the time of starting the edit if the user was not editing an oldid, or the conflicting version, if the user is doing manual edit conflict resolution) - this is EditPage::$oldId (except it's only set when it's different from the current revision at time of starting the edit)
  • actual logical parent (same as previous one except after a successful automatic edit conflict resolution, in which case it points to the (non)conflicting revision) - this is EditPage::getParentRevId() (and EditPage::$parentRevId when set)

$baseRevId in WikiPage::doEditContent and PageContentSaveComplete and $baseId in NewRevisionFromEditComplete are probably meant to be the logical parent (actual, since they are only called after automatic edit conflict resolution), although given the limited documentation and that the paramterer is not actually passed in the one case where it would be relevant, it's hard to tell. At least FlaggedRevs seems to assume instead that it is the revision the edit is identical to (rollback target or null revision parent), which is the current behavior.

PageUpdater::$baseRevId is documented as "the last revision known to the client" but is set after automatic edit conflict resolution, at which point that is not relevant anymore, so it probably should be the actual logical parent. PageUpdater::grabParentRevision() is the history parent as seen in the early phase of the edit submission logic (same as the actual history parent if the edit actually gets saved; might differ from the latest version of the page at the time of attempting the DB write, but in that case the save will fail).

The semantics [of parentRevId and getParentRevId()] as described matches the rev_parent_id

Not really, when editing an old version, parentRevId points to that version (the code is confusingly written; $out->addHTML( Html::hidden( 'parentRevId', $this->getParentRevId() ) ) does the real work), while rev_parent_id (in its current incarnation) always points to the previous DB row for the same title.

$baseRevId of doEditContent: The revision ID this edit was based off, if any. Not stored, but passed to some hooks. Doesn't seems to be used for anything by core.

It seems that some hook implementations interpret it as "the ID of the revision this edit is identical to" (ie. rollback target or null edit parent). Which is the current behavior but probably not the intent (although the hook documentation is minimal so it's hard to tell).

rev_parent_id field: The ID of the revision that was current immediately before an edit. Used to display size differences in the page history (compare T193690)

Might be used for other things as well as the revision object is passed to some hooks.

It looks like EditPage editRevId/getBaseRevision() and parentRevId/getParentRevId() should be equivalent until the edit is actually saved.

getBaseRevision() is the last revision at the time of starting the edit. getParentRevId() is the logical parent. They are only equivalent if the edit was based on the latest revision.

It looks like EditPage's oldid and WikiPage's $baseRevId should be equivalent when oldid != 0. When oldid == 0, I believe $baseRevId should be editRevId.

See the FlaggedRevs part of this comment for why that's not necessarily a good idea.

It looks like there's a possible race where edits A→B→C could have C's rev_parent_id as A even though merge3() used B's content.[1]

rev_parent_id is set in PageUpdater::makeNewRevision() based on PageUpdater::grabParentRevision(), which is frozen early in saveRevision() (possibly earlier, due to in-process caching of WikiPage::getRevision()). There cannot be any race after that; PageUpdater::doModify() uses WikiPage::lockAndGetLatest() which gets the current DB value and locks it against later races, and that gets compared to the frozen value. So that part is OK.

PageUpdater::hasEditConflict() would compare the frozen value with PageUpdater::getBaseRevisionId(), which prevents conflicts in theory. It is never actually called though, so as long as the concurrent edit happens after the edit conflict detection in EditPage and before PageUpdater::saveRevision() freezing the parent, nothing will stop it. Also, for that to work, the check would have to be against EditPage::$parentRevId but that does not match how getBaseRevisionId() is documented (as the task description already notes).

Change 441046 abandoned by Daniel Kinzler:
Pass EditPage->oldid to WikiPage::doEditContent as $baseRevId.

Reason:
Changed Ib78257d4d6ee7c4e to reflect current semantics

https://gerrit.wikimedia.org/r/441046

daniel updated the task description. (Show Details)Jun 21 2018, 4:46 PM
  • logical parent as seen by the client (oldid, or last version at the time of starting the edit if the user was not editing an oldid, or the conflicting version, if the user is doing manual edit conflict resolution) - this is EditPage::$oldId (except it's only set when it's different from the current revision at time of starting the edit)

I think you may be giving too much weight to this "logical parent". Nothing should be using this value except the 'NewRevisionFromEditComplete' and 'PageContentSaveComplete' hooks. Anything else that is using it probably should be using editRevId instead.

daniel updated the task description. (Show Details)Jun 22 2018, 11:19 AM

Change 441045 merged by jenkins-bot:
[mediawiki/core@master] MCR: rename $baseRevId paramter to match actual semantics.

https://gerrit.wikimedia.org/r/441045

daniel closed this task as Resolved.Jun 22 2018, 3:40 PM

With the updated documentation in EditPage and the clarified semantics in WikiPage and PageUpdater, this is now resolved. The remaining patch is not crucial.

Change 441824 had a related patch set uploaded (by WMDE-Fisch; owner: WMDE-Fisch):
[mediawiki/extensions/TwoColConflict@master] Replace parentRevisionId in event log

https://gerrit.wikimedia.org/r/441824

Change 441824 merged by jenkins-bot:
[mediawiki/extensions/TwoColConflict@master] Replace parentRevisionId in event log

https://gerrit.wikimedia.org/r/441824

Vvjjkkii renamed this task from Clarify semantics of "base revision" and "parent revision" in EditPage, WikiPage, and PageUpdater to 2naaaaaaaa.Jul 1 2018, 1:02 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed daniel as the assignee of this task.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
CommunityTechBot assigned this task to daniel.
CommunityTechBot closed this task as Resolved.
CommunityTechBot renamed this task from 2naaaaaaaa to Clarify semantics of "base revision" and "parent revision" in EditPage, WikiPage, and PageUpdater.
CommunityTechBot added subscribers: gerritbot, Aklapper.

Change 441047 abandoned by Daniel Kinzler:
Kill unused and confusing EditPage::getParentRevId().

https://gerrit.wikimedia.org/r/441047