Page MenuHomePhabricator

Factor out a backend from EditPage
Open, MediumPublic

Description

Currently, EditPage contains the web UI tightly integrated with edit authorisation and edit conflict merging. ApiEditPage wraps EditPage by making a fake web UI request. And VisualEditor and LiquidThreads wrap ApiEditPage by making a second fake request.

I propose introducing a new backend containing edit authorisation, automatic conflict merging, and the call through to WikiPage::doEditContent(), separated from the web UI. EditPage, ApiEditPage, VisualEditor, SemanticForms and LiquidThreads would call this backend.

The interface would be similar to EditPage::internalAttemptSave(), except with relevant request state injected instead of being fetched from global or class static data. updateWatchlist() would be moved to the UI layer.

A code comment notes:

@todo FIXME: This interface is TERRIBLE, but hard to get rid of due to various error display idiosyncrasies. There are also lots of cases where error metadata is set in the object and retrieved later instead of being returned, e.g. AS_CONTENT_TOO_BIG and AS_BLOCKED_PAGE_FOR_USER. All that stuff needs to be cleaned up some

So we would try to fix that.

The rationale for this work is to encourage further development of editing customisations and extensions. It may be possible to make MCR editing (T107595) easier to implement.

Usage survey in WMF deployed extensions

SemanticForms (a.k.a. PageForms) creates an EditPage object in order to execute the backend EditPage::internalAttemptSave().

JsonConfig and LiquidThreads create an EditPage object in order to execute the combined frontend and backend EditPage::edit(), with hooks to customise its operation.

ProofreadPage subclasses EditPage in order to override the UI and content object creation.

VisualEditor and LiquidThreads invoke ApiEditPage to perform an edit.

The following extensions hook the backend part of EditPage (EditFilter, EditFilterMergedContent, EditPage::attemptSave, EditPage::attemptSave:after):

  • AbuseFilter
  • ConfirmEdit
  • EventLogging
  • Gadgets
  • JsonConfig
  • ProofreadPage
  • Scribunto
  • SpamBlacklist
  • TitleBlacklist
  • Translate
  • UploadWizard
  • WikiEditor
  • Wikidata

FlaggedRevs and VisualEditor create an EditPage object in order to extract UI components that they wish to reuse. MassMessage calls an EditPage static method for the same reason.

TwoColConflict subclasses EditPage in order to simulate conflicts, and couples tightly to the form parameters and internal EditPage fields when implementing a new conflict workflow.

Request access deeper in the stack

The proposed backend is not useful if the caller still has to set up a fake request to fool deeper parts of the stack. As such, the following methods that access RequestContext or WebRequest are concerning:

  • The context title is accessed from WikitextContent::isCountable(), which is called during edit with $title = null
  • Various things access the IP address from the request context, such as User::getBlockedStatus(), User::pingLimiter(), RecentChange::checkIPAddress() and AbuseFilter.
  • User::getBlockedStatus() also accesses the request cookies and X-Forwarded-For header
  • Probably other things

Event Timeline

tstarling created this task.Feb 9 2017, 3:48 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 9 2017, 3:48 AM

I would love to see this happen. BTW, the "FlaggedRevs and VisualEditor create an EditPage object in order to extract UI components that they wish to reuse." is one of the most disgusting pieces of our architecture and I would love for the ability to get the displayed checkboxes and their labels as data, not HTML strings – but I imagine that would involve breaking the current EditPageBeforeEditChecks hook.

I would love to see this happen. BTW, the "FlaggedRevs and VisualEditor create an EditPage object in order to extract UI components that they wish to reuse." is one of the most disgusting pieces of our architecture and I would love for the ability to get the displayed checkboxes and their labels as data, not HTML strings – but I imagine that would involve breaking the current EditPageBeforeEditChecks hook.

For additional context: VE does this because FlaggedRevs uses a hook to inject an extra checkbox. If it was just the minor edit and watchlist checkboxes from core, we'd be happy to duplicate a little bit of code in VE (which we in fact did until we found the FlaggedRevs problem).

TTO awarded a token.Feb 13 2017, 2:05 AM
Ebe123 added a subscriber: Ebe123.Feb 17 2017, 3:01 AM

I've made an attempt to refactor EditPage in https://gerrit.wikimedia.org/r/#/c/338276/.
This goes a bit beyond factoring out a backend as I've also rewritten the UI part based on an MVC pattern.

The backend for saving is the EditAttempt class.
The Edit class models an edit, the Editor class controls the editing and the EditView class contains the UI elements.
The EditFacilitator class transforms the form data into an Edit object and mediates the call to EditAttempt for an Editor object.

The API only needs to make an Edit object and inject it into an EditAttempt object to make the save. Same for extensions needing only backend access.

I've also:

  • passed the hook status as edit status rather than having a hook error
  • required hooks to pass an error when returning a fatal status
  • removed the wpUltimateParam cutover.
tstarling triaged this task as Medium priority.Apr 20 2017, 5:22 AM
WDoranWMF added a subscriber: WDoranWMF.

This is strongly related to T208801 and should be taken into account when working on that task.

The big unsolved question here was what to do about request context access from hooks, AbuseFilter being the most intractable example. I think the answer is to just let them call RequestContext::getMain(), without any FauxRequest. Most access to RequestContext is just for the user and IP address, which is correct for all web callers. If we need to have editing from the job queue or some other CLI request, the context user can be faked as necessary. The session manager documentation suggests using SessionManager::getSessionById() for authenticated jobs.

awight awarded a token.Mar 4 2020, 9:29 PM
awight updated the task description. (Show Details)
awight added a subscriber: awight.
awight updated the task description. (Show Details)Mar 4 2020, 9:32 PM
awight updated the task description. (Show Details)Mar 4 2020, 9:34 PM
DannyS712 claimed this task.EditedApr 22 2020, 12:13 AM
DannyS712 added a subscriber: DannyS712.

I'll try to tackle this - sounds fun :)

There are a number of public methods that are unused outside of the EditPage class, or are only used in mediawiki core. I propose officially making them deprecated for public use in 1.35, so that breaking changes can be made in 1.36 (eg remove or move to separate class, etc) without needing to maintain backwards compatibility. The methods are:[1]

Only in the class, public
* initialiseForm
* getParentRevId
* matchSpamRegex
* extractSectionTitle
* getSummaryInputWidget
* getCancelLink
* noSuchSectionPage

Only in the class, protected[2]
* getEditPermissionErrors
* displayPermissionsError
* displayViewSourcePage
* isWrongCaseUserConfigPage
* getPreloadedContent
* setPostEditCookie
* runPostMergeFilters
* addContentModelChangeLogEntry
* updateWatchlist
* matchSpamRegexInternal
* showIntro
* showCustomIntro
* toEditText
* showHeader
* showSummaryInput
* getSummaryPreview
* showFormBeforeText
* showFormAfterText
* showTextbox1
* showTextbox2
* displayPreviewArea
* showHeaderCopyrightWarning
* showTosSummary
* showEditTools
* getCopywarn
* showStandardInputs
* showConflict
* incrementConflictStats
* wasDeletedSinceLastEdit
* getLastDelete
* getSubmitButtonLabel
* addTalkPageText
* addLongPageWarningHeader
* addPageProtectionWarningHeaders
* addExplainConflictHeader

Only in core, outside of the class:
* setApiEditOverride

No known uses:[3]
* setPreloadedContent
* buildTextboxAttribs
* addNewLineAtEnd

Additionally, the following variables are unused outside of the class:

Public:
* $action
* $isNew
* $deletedSinceEdit
* $lastDelete
* $mTokenOk
* $mTokenOkExceptSuffix
* $mTriedSave
* $incompleteForm
* $tooBig
* $missingComment
* $missingSummary
* $allowBlankSummary
* $autoSumm
* $hasPresetSummary
* $diff
* $recreate
* $nosummary
* $parentRevId
* $editintro
* $markAsBot
* $mPreloadContent


Protected:
* $edit
* $contentLength
* $blankArticle
* $allowBlankArticle
* $selfRedirect
* $allowSelfRedirect

These can also be deprecated for external use

[1] Methods already deprecated not listed. Uses in core tests are not counted.
[2] Yes, protected methods aren't bound by the stable interface policy, but it would be nice to include a note in the release notes.
[3] Can be hard deprecated entirely?

Restricted Application added a project: User-DannyS712. · View Herald TranscriptApr 22 2020, 12:13 AM
DannyS712 moved this task from Unsorted to Next on the User-DannyS712 board.Apr 22 2020, 12:13 AM

Question: While trying to separate out what is and isn't used by what, I found that EditPage::showEditForm accepts an optional parameter, and that passing that parameter is deprecated but only logs wfWarn rather than wfDeprecated. Given that the deprecation was in 1.25, is it okay to remove the support for the parameter? It is unused in repositories indexed by codesearch

matmarex removed a subscriber: matmarex.Apr 25 2020, 4:10 PM
DannyS712 moved this task from Next to In progress on the User-DannyS712 board.Apr 29 2020, 5:50 AM
DannyS712 added a comment.EditedJun 4 2020, 3:14 AM

Proposed architecture for moving everything and splitting it up:

Overall architecture

Internal attempt save currently includes a number of specific condition checks that can cause the edit to fail, each returning a different status. These checks (termed Constraints) all

  • Either prevent the edit or do nothing
  • Have a unique status associated with them
  • Should not all be applied to all edits[1]

Proposal for constraints:

IEditConstraint
interface IEditConstraint extends IEditObject {

	/** @var string - Constraint passed, no error */
	public const CONSTRAINT_PASSED = 'constraint-passed';

	/** @var string - Constraint failed, use getLegacyStatus to see the failure */
	public const CONSTRAINT_FAILED = 'constraint-failed';

	/**
	 * @return string whether the constraint passed or not, as a CONSTRAINT_* value
	 */
	public function checkConstraint() : string;

	/**
	 * Get the legacy status for failure (or success)
	 *
	 * Called "legacy" status because this part of the interface should probably be redone;
	 * Currently Status objects have a value of an IEditObject constant, as well as a fatal
	 * message
	 *
	 * @return Status
	 */
	public function getLegacyStatus() : Status;

}

new EditConstraintRunner is given constraints, and goes through them until one fails. If one failed, EditPage returns that constraint's status. If none failed, the edit proceeds

EditConstraintRunner
class EditConstraintRunner {

	/** @var array */
	private $constraints = [];

	/** @var IEditConstraint|bool */
	private $failedConstraint = false;

	/**
	 * Add a constraint to check
	 *
	 * Not all constraints are applicable to the action api or other methods of submitting
	 * an edit
	 *
	 * @param IEditConstraint $constraint
	 */
	public function addConstraint( IEditConstraint $constraint ) {
		$this->constraints[] = $constraint;
	}

	/**
	 * Run constraint checks
	 *
	 * Returns true if all constraints pass, false otherwise
	 * Use getFailedConstraint for the reason
	 *
	 * @return bool
	 */
	public function checkConstraints() : bool {
		foreach ( $this->constraints as $index => $constraint ) {
			// Allow checking more constraints later, remove constraint checked
			unset( $this->constraints[$index] );
			if ( $constraint->checkConstraint() !== IEditConstraint::CONSTRAINT_PASSED ) {
				$this->failedConstraint = $constraint;
				return false;
			}
		}
		return true;
	}

	/**
	 * Get the constraint that failed
	 *
	 * @return IEditConstraint
	 */
	public function getFailedConstraint() : IEditConstraint {
		return $this->failedConstraint;
	}

}

My current WIP patch for migrating everything to a new backend is over 5000 lines, and unlikely to land as one piece. Thus:

Phase 1 - Constraint migration

Migrate to using EditConstraints while still doing everything withing EditPage::internalAttemptSave. This would allow each individual check to be moved one at a time to a constraint object.

For example, the easiest constraint both to implement and to review would be UnicodeConstraint, ensuring that the browser (and the api, for now, because everything still uses the same interface) supports unicode.

Current code
if ( $this->unicodeCheck !== self::UNICODE_CHECK ) {
	$status->fatal( 'unicode-support-fail' );
	$status->value = self::AS_UNICODE_NOT_SUPPORTED;
	return $status;
}
New code
$constraintRunner = new EditConstraintRunner();
$constraintRunner->addConstraint(
	new UnicodeConstraint( $this->unicodeCheck )
);
if ( $constraintRunner->checkConstraints() === false ) {
	$failed = $constraintRunner->getFailedConstraint();
	return $failed->getLegacyStatus();
}
New UnicodeConstraint
class UnicodeConstraint implements IEditConstraint {

	/**
	 * @var string
	 * Correct unicode
	 */
	public const VALID_UNICODE = 'ℳ𝒲♥𝓊𝓃𝒾𝒸ℴ𝒹ℯ';

	/**
	 * @var string
	 * Unicode string provided, to compare
	 */
	private $input;

	/**
	 * @param string $inputUnicode
	 */
	public function __construct( string $inputUnicode ) {
		$this->input = $inputUnicode;
	}

	/**
	 * @inheritDoc
	 */
	public function checkConstraint() : string {
		if ( $this->input === self::VALID_UNICODE ) {
			return self::CONSTRAINT_PASSED;
		}
		return self::CONSTRAINT_FAILED;
	}

	/**
	 * @inheritDoc
	 */
	public function getLegacyStatus() : Status {
		$status = Status::newGood();
		if ( $this->input !== self::VALID_UNICODE ) {
			$status->fatal( 'unicode-support-fail' );
			$status->value = self::AS_UNICODE_NOT_SUPPORTED;
		}
		return $status;
	}

}

Each constraint has all of its dependencies injected, including configuration values, services, loggers, etc.[2] This means that they can have 100% test coverage with pure unit tests[3]
During the migration, there will be some code duplication, since the overhead of converting a single check to a Constraint, and using the ConstraintRunner, will be more than the few lines to manually check the condition. However, this will go await as more are converted. The new constraint system also makes it easier to add new conditions in the future.

Phase 2 - hooks

Before the reliance on EditPage can be entirely removed, the hooks that pass EditPage instances and are used to prevent edits must be removed - T251588: EditPage save hooks pass an entire `EditPage` object

Phase 3 - EditManager

Once everything is done in constraints, and the EditPage class is no longer passed in hooks, the entire internalAttemptSave method and its logic can move to a new service, with dependencies injected. At this point, everything that used to modify EditPage values during the attempted edit (eg setting EditPage::$tooBig) is returned from the service call; if a constraint called the failure, that constraint is included:

// $failedConstraint was retrieved from the result returned from the service
if ( $failedConstraint instanceof PageSizeConstraint ) {
	// Error will be displayed by showEditForm()
	$this->tooBig = true;
} elseif ( $failed instanceof SelfRedirectConstraint ) {
	$this->selfRedirect = true;
}

and so on for each specific constraint that EditPage needs to handle specially. Some of these will simply not apply to other callers, eg SelfRedirectConstraint (unless the user confirms their intention, prevent edits from creating self redirects) because the api already says to allow self redirects.

Phase 4 - Caller migration

Once EditPage::internalAttemptSave is a wrapper for the new service, ApiEditPage and others can be converted to use the service directly

Other

As part of the splitting, a new EditConflictManager service should be added to handle edits to existing pages, including checking if there is a conflict, merging the content, logging, etc. This would move the ::mergeChangesIntoContent method out of EditPage, along with the code currently in internalAttemptSave that checks for and resolves EditConflicts. The addition and use of the new service can also be split and done separately.[4]

[1] Currently, the checks for browsers supporting Unicode, and spambots not filling in the empty hidden form field meant to cache them, also apply to edits made via the api that calls the EditPage class. The same is true of the checks against self redirects, blank articles, and blank summaries, all of which require the api to define those options as the correct Unicode and to ignore the self redirect/blank article/blank summary. See https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/12a3883a7b73ad6cc25d1fcff43a61ea6fe8de27/includes/api/ApiEditPage.php#289
[2] For Constraints that require services or loggers, there is a common EditConstraintFactory with a method for each constraint needed. For those that do not require services, like the UnicodeConstraint, EditPage (at first, and the backend eventually) can construct those manually.
[3] With the exception of (currently) the check to ChangeTags::canAddTagsAccompanyingChange, which should be available via dependency injection following T245964: Split up and servicify ChangeTags class
[4] And probably should be done separately, to reduce the scope of individual patches that are merged.

DannyS712 added a comment.EditedJul 8 2020, 9:10 AM

@daniel @Pchelolo is the proposal above approved / should I start sending patches?